The corpus DILeB comprises 180 aligned audio transcriptions of spontaneous conversations in European Portuguese. These talks, which follow the reading tests of the interview, do not obey a pre-established guide. Each file is classified according to the demographic and linguistic profile of the interviewee. Each interviewee is identifiable by the number at the beginning of the file name (for instance, 3, 142, and so on). According to the metadata, different queries may be done by word (or word part). The alphanumeric codes identifying the interviews (BM33, LF25, for instance) refer to B = Braga, L = Lisbon, M = Male gender, F = Female gender, the next digit refers to the educational cell of the speaker, and the last digit to the age cell, according to the following tables.
Education |
|
0 |
Illiterate |
1 |
Until the 9th grade |
2 |
From 9th to 12nd grade |
3 |
Graduation |
Age |
|
1 |
13-19 years old |
2 |
20-25 years old |
3 |
26-39 years old |
4 |
40-55 years old |
5 |
> 55 years old |
How to cite the DILeB corpus:
Rodrigues, Celeste. 2022. DILeB - Discurso Informal de Lisboa e Braga. Lisboa: CLUL - 2020: UIDB/00214/.
ISLRN: 266-767-716-130-2. http://teitok.clul.ul.pt/dileb
The reference to the DILeB corpus must be included in all documents using data of the corpus, including books, papers, oral presentations or posters, evaluation tools, or any other product.