For the benefit of researchers who want to deal with Cordial-Sin data using external tools, we offer them below in a text format, distributed by interview excerpt, location and transcription/annotation level. We also offer the whole corpus in XML and PSDX format:
Distribution | Transcription(1) | Edition(1) | Annotated version(1) | Treebank(2, 3) |
---|---|---|---|---|
By excerpt (2058 files) | transcription_excerpt.zip | edition_excerpt.zip | annotation_excerpt.zip | treebank_excerpt.zip |
By location (42 files) | transcription_location.zip | edition_location.zip | annotation_location.zip | treebank_location.zip |
Total corpus (1 file) | transcription_corpus.zip | edition_corpus.zip | annotation_corpus.zip | treebank_corpus.zip |
XML corpus | PSDX corpus |
---|---|
XML_corpus.zip | PSDX_corpus.zip |
(1) This work was funded by national funds through FCT – Fundação para a Ciência e a Tecnologia, I.P., within the project PTDC/LLT-LIN/32086/2017.
(2) This work was funded by national funds through FCT – Fundação para a Ciência e a Tecnologia, I.P., within the projects PTDC/LIN/71559/2006, PTDC/LLT-LIN/32086/2017, UID/LIN/00214/2013 and UID/00214/2019.
(3) PSD files are searchable with CorpusSearch. The storage of equivalent PSDX files in TEITOK is currently under development. Once completed, syntactic data will also be searchable online.