First release of ICE Scotland
The first release contains the following text categories:
Written part:
Academic Writing Humanities, Academic Writing Natural Sciences
Press News Reports, Skills & Hobbies
Spoken part:
Broadcast Talks, Broadcast News
ELAN_files_spoken_data: contains eaf-files with automated and manually corrected phonemic annotations (forced alignment, MAUS) of the spoken data
raw_data: contains the written raw data files; spoken raw data files are in .wav format and can be downloaded as additional files of the release (click on "releases")
tagged_files: contains POS-tagged files of spoken and written data in both xml and txt format
txt_files_plain: contains files of spoken and written data in txt format without any annotations or tags
For further questions regarding file structure, parts-of-speech tagging, annotation, etc., please consult the corpus manual and the annotation schema.