R&D Unit funded by


A corpus of Guinea-Bissau Kriol  


Kriol (or Ginensi) is the most spoken language in Guinea-Bissau and represents the lingua franca and the language of national identity. It is also the main language of Bissau-Guinean communities in the diaspora in different countries of the world. The strategic importance of Kriol in Guinea-Bissau is represented by the fact that it allows interethnic communication in the country: here, about 21 national languages (Atlantic and Mande) are spoken. While Kriol represents the main L1 (first language) in the cities, it is spoken mostly as L2 (second language) in rural areas, where one or more national languages represent the L1s of the dominant groups. Despite Kriol's crucial importance in the country, the only official language of Guinea-Bissau is Portuguese. 

CoKri aims to represent an instrument for the documentation and further investigation of Kriol, which is still considered an understudied language. This annotated and searchable corpus of Kriol consists of about 37 hours of transcribed audio materials, recorded during interview sessions with Kriol native speakers in Guinea-Bissau between 2018 and 2019. A further aim of CoKri is to represent intralinguistic variation in Kriol: although this issue has never been seriously addressed in the scientific literature on this language, there is a certain degree of morpho-phonological and lexical variation within Kriol. Whether these are remnants of the historical varieties of Kriol (northern, central, and eastern) - considered as nearly extinct already in the early 1990s - or whether these are due to geographical variation and to the contact between Kriol and the other languages spoken in the country, this needs to be assessed better. All in all, CoKri represents a picture of present-day Kriol.

Please, notice that the corpus is currently under final revision. Informative materials on the orthographic choices used for the transcription of the audio data and on the part-of-speech (POS) labels used for the annotation will be made available soon.

The choice of having the lemma in both Portuguese and English and the type of search interface were adopted from the corpus LUDViC (Pratas 2020).


Chiara Truppi (PI)
Rute Gomes
Technical support
José Aires
Raïssa Gillier
FCT, Fundação para a Ciência e a Tecnologia (SFRH/BPD/118401/2016)
CLUL, Centro de Linguística da Universidade de Lisboa (UIDB/00214/2020)
Terms of use
All materials in this corpus are protected by a Creative Commons license, according to the terms and conditions described here:

Appropriate credit in all circumstances
Truppi, Chiara. 2022. CoKri: a corpus of Guinea-Bissau Kriol / um corpus do Kriol da Guiné-Bissau.