First release of ICE Scotland

The first release contains the following text categories:

Written part:

     Academic Writing Humanities, Academic Writing Natural Sciences

     Press News Reports, Skills & Hobbies

Spoken part:

     Broadcast Talks, Broadcast News


ELAN_files_spoken_data: contains eaf-files with automated and manually corrected phonemic annotations (forced alignment, MAUS) of the spoken data

raw_data: contains the written raw data files; spoken raw data files are in .wav format and can be downloaded as additional files of the release (click on "releases")

tagged_files: contains POS-tagged files of spoken and written data in both xml and txt format

txt_files_plain: contains files of spoken and written data in txt format without any annotations or tags

For further questions regarding file structure, parts-of-speech tagging, annotation, etc., please consult the corpus manual and the annotation schema.