Main Menu
Powered by <TEI:TOK>
Maarten Janssen, 2014-
For the benefit of researchers who want to deal with our data using external tools, we offer them below in a text format.
Table 1: Corpus distribution by language, century and format:
(1) PSD files are searchable with CorpusSearch. PSDX files are stored in TEITOK and are searchable online.
(2) The revision of the Parts Of Speech annotation of the Spanish corpus is still in progress.
Table 2: Corpus distribution by gender, language, century and format:
Table 3: Corpus distribution by social status3, language, century and format:
Inquisition | |||||
---|---|---|---|---|---|
Language | Century | Original text ISLRN: 375-405-009-147-2 | Standardized text ISLRN: 375-405-009-147-2 | POS annotation ISLRN: 321-583-358-829-1 | Parsed corpus ISLRN: 662-489-499-707-7 |
Portuguese | XVI | PT1500inq_ORIG_TXT.ZIP | PT1500inq_MOD_TXT.ZIP | PT1500inq_POS.ZIP | PT1500inq_PSD.ZIP |
Portuguese | XVII | PT1600inq_ORIG_TXT.ZIP | PT1600inq_MOD_TXT.ZIP | PT1600inq_POS.ZIP | PT1600inq_PSD.ZIP |
Portuguese | XVIII | PT1700inq_ORIG_TXT.ZIP | PT1700inq_MOD_TXT.ZIP | PT1700inq_POS.ZIP | PT1700inq_PSD.ZIP |
Portuguese | XIX | Still no data | Still no data | Still no data | Still no data |
Language | Century | Original text ISLRN: 305-406-112-712-3 | Standardized text ISLRN: 305-406-112-712-3 | POS annotation ISLRN: 042-997-465-008-9 | Parsed corpus ISLRN: |
Spanish | XVI | ES1500inq_ORIG_TXT.ZIP | ES1500inq_MOD_TXT.ZIP | ES1500inq_POS.ZIP | ES1500inq_PSD.ZIP |
Spanish | XVII | ES1600inq_ORIG_TXT.ZIP | ES1600inq_MOD_TXT.ZIP | ES1600inq_POS.ZIP | ES1600inq_PSD.ZIP |
Spanish | XVIII | ES1700inq_ORIG_TXT.ZIP | ES1700inq_MOD_TXT.ZIP | ES1700inq_POS.ZIP | Still no data |
Spanish | XIX | Still no data | Still no data | Still no data | Still no data |
Knightly orders | |||||
---|---|---|---|---|---|
Language | Century | Original text ISLRN: 375-405-009-147-2 | Standardized text ISLRN: 375-405-009-147-2 | POS annotation ISLRN: 321-583-358-829-1 | Parsed corpus ISLRN: 662-489-499-707-7 |
Portuguese | XVI | PT1500kni_ORIG_TXT.ZIP | PT1500kni_MOD_TXT.ZIP | PT1500kni_POS.ZIP | PT1500kni_PSD.ZIP |
Portuguese | XVII | PT1600kni_ORIG_TXT.ZIP | PT1600kni_MOD_TXT.ZIP | PT1600kni_POS.ZIP | PT1600kni_PSD.ZIP |
Portuguese | XVIII | PT1700kni_ORIG_TXT.ZIP | PT1700kni_MOD_TXT.ZIP | PT1700kni_POS.ZIP | Still no data |
Portuguese | XIX | PT1800kni_ORIG_TXT.ZIP | PT1800kni_MOD_TXT.ZIP | PT1800kni_POS.ZIP | PT1800kni_PSD.ZIP |
Language | Century | Original text ISLRN: 305-406-112-712-3 | Standardized text ISLRN: 305-406-112-712-3 | POS annotation ISLRN: 042-997-465-008-9 | Parsed corpus ISLRN: 306-113-341-591-4 |
Spanish | XVI | Still no data | Still no data | Still no data | Still no data |
Spanish | XVII | ES1600kni_ORIG_TXT.ZIP | ES1600kni_MOD_TXT.ZIP | ES1600kni_POS.ZIP | ES1600kni_PSD.ZIP |
Spanish | XVIII | ES1700kni_ORIG_TXT.ZIP | ES1700kni_MOD_TXT.ZIP | ES1700kni_POS.ZIP | Still no data |
Spanish | XIX | Still no data | Still no data | Still no data | Still no data |
Universitary | |||||
---|---|---|---|---|---|
Language | Century | Original text ISLRN: 375-405-009-147-2 | Standardized text ISLRN: 375-405-009-147-2 | POS annotation ISLRN: 321-583-358-829-1 | Parsed corpus ISLRN: 662-489-499-707-7 |
Portuguese | XVI | PT1500uni_ORIG_TXT.ZIP | PT1500uni_MOD_TXT.ZIP | PT1500uni_POS.ZIP | PT1500uni_PSD.ZIP |
Portuguese | XVII | PT1600uni_ORIG_TXT.ZIP | PT1600uni_MOD_TXT.ZIP | PT1600uni_POS.ZIP | Still no data |
Portuguese | XVIII | PT1700uni_ORIG_TXT.ZIP | PT1700uni_MOD_TXT.ZIP | PT1700uni_POS.ZIP | Still no data |
Portuguese | XIX | PT1800uni_ORIG_TXT.ZIP | PT1800uni_MOD_TXT.ZIP | PT1800uni_POS.ZIP | Still no data |
Language | Century | Original text ISLRN: 305-406-112-712-3 | Standardized text ISLRN: 305-406-112-712-3 | POS annotation ISLRN: 042-997-465-008-9 | Parsed corpus ISLRN: 306-113-341-591-4 |
Spanish | XVI | ES1500uni_ORIG_TXT.ZIP | ES1500uni_MOD_TXT.ZIP | ES1500uni_POS.ZIP | ES1500uni_PSD.ZIP |
Spanish | XVII | ES1600uni_ORIG_TXT.ZIP | ES1600uni_MOD_TXT.ZIP | ES1600uni_POS.ZIP | ES1600uni_PSD.ZIP |
Spanish | XVIII | ES1700uni_ORIG_TXT.ZIP | ES1700uni_MOD_TXT.ZIP | ES1700uni_POS.ZIP | ES1700uni_PSD.ZIP |
Spanish | XIX | ES1800uni_ORIG_TXT.ZIP | ES1800uni_MOD_TXT.ZIP | ES1800uni_POS.ZIP | Still no data |
(3) Apart from the seven social status types included in Table 3, there is an eighth type, slaves, for which there is only one woman author (Teresa de Jesus Faria), with only one letter (PSCR0620).
Table 4: Balanced corpus (one letter per author) distributed by language, century and format:
Table 5: Documents in XML-TEI P5 can be downloaded from below:
The table below has the XML-TEIP5 version of the digital archive of Post Scriptum, to be validated by the customized version of the TEI schema built by the said project. The validation schema can be automatically obtained from the Post Scriptum ODD (by means of the Roma tool, for instance) or directly on the Relax NG format by choosing Post Scriptum TEI schema. You are advised to save the schema on the same folder of the other XML files. Along with the digital archive, this same resource validates the XML-TEIP5 versions of the Biographical Database (i.e. cdd.xml ) and the Social-historical Keywords Database (i.e. kw.xml ).
For detailed information on the TEIP5 versions of Post Scriptum, please read the following document (in Spanish): P.S. Post Scriptum: Archivo digital de escritura cotidiana. Personalización del esquema TEI y documentación.
Language | Century | XML-TEI: P5. Complete corpus | XML-TEI: P5. Balanced corpus(4) |
---|---|---|---|
Portuguese | XVI | PT1500_XML-TEI_P5.ZIP | PT1500bal_XML-TEI_P5.ZIP |
Portuguese | XVII | PT1600_XML-TEI_P5.ZIP | PT1600bal_XML-TEI_P5.ZIP |
Portuguese | XVIII | PT1700_XML-TEI_P5.ZIP | PT1700bal_XML-TEI_P5.ZIP |
Portuguese | XIX | PT1800_XML-TEI_P5.ZIP | PT1800bal_XML-TEI_P5.ZIP |
Language | Century | XML-TEI: P5. Complete corpus | XML-TEI: P5. Balanced corpus |
Spanish | XVI | ES1500_XML-TEI_P5.ZIP | ES1500bal_XML-TEI_P5.ZIP |
Spanish | XVII | ES1600_XML-TEI_P5.ZIP | ES1600bal_XML-TEI_P5.ZIP |
Spanish | XVIII | ES1700_XML-TEI_P5.ZIP | ES1700bal_XML-TEI_P5.ZIP |
Spanish | XIX | ES1800_XML-TEI_P5.ZIP | ES1800bal_XML-TEI_P5.ZIP |
(4) The balanced corpus is a subcorpus created from the automatic selection of one letter per author, usually the one that shows more different types of words in its standardized edition (cf. section 1.3.1.3. in Manual de Edición y Anotación en TEITOK de los Materiales de P.S. Post Scriptum).
DICER statistics on a standardized version of 478 Portuguese Letters: Portuguese Post Scriptum by eDictor. As can be seen by its name, the process of manual standardization was carried out by means of the eDictor tool.