Powered by TEITOK
Croatian Learner Corpus
Author: Nives Mikelic Preradovic
CroLTeC (CROatian Learner TExt Corpus) contains texts collected from learners of Croatian as a second and foreign language (from beginners – A1 to advanced learners – C1 and higher). The purpose of CROLTEC corpus is to enable the in-depth analysis of learner language, describe that language and allow the extraction of important linguistic patterns, as well as contrastive interlanguage analysis and computer-aided error analysis.
CroLTeC consists of transcribed manuscripts with preserved corrections made by learners themselves (deletions, insertions and changes in the word order). Texts are systematically described by detailed socio-linguistic metadata (gender, age, nationality, mother tongue, bilingual and multilingual competence and parents’ language proficiency).
Also, language instructors were asked to submit a report on the essay topic accompanying each weekly essay with the following data (that were used as essay meta tags): the level of linguistic competence required, the title of the essay, the number of the essay/week of learning, genre, scope, conditions under which the essay was produced (time limit, size limit, etc.) and the circumstances under which the essay was produced (homework, part of the exam, part of the field work, etc.).
CroLTeC corpus currently contains 7,213 documents (both transcribed manuscripts and teacher corrections): 3527 essays were scanned, anonymized, transcribed and converted to XML, while 1217 essays were digitally born and converted to XML.
It contains 1.073,512 tokens. Essays are collected from 755 learners with 36 different first language backgrounds.
Please cite as: Mikelić Preradović, N.; Berać, M.; Boras, D. 2015. Learner Corpus of Croatian as a Second and Foreign Language. Multidisciplinary Approaches to Multilingualism. Ur. Cergol Kovačević, Kristina i Udier, Sanda Lucija. Peter Lang. Frankfurt am Main, Njemačka. 107-126.