R&D Unit funded by

Corpus Search

CQL Query: query builder | visualize | options


The search system is divided into two distinct groups of information: linguistic annotation (Text Search) and metalinguistic information (Document Search).

Searches can be carried out individually at any of the information levels. For example, a search through the lexeme kaza returns the lexical forms kaza, kazinha, kazas, etc. Likewise, a search through Language Variety will result in a list of all interviews corresponding to the selected language variety.

The Word search field is case sensitive. In order to obtain all the results corresponding to a given lexical form, it is suggested that searches at this level are always carried out with upper and lower case letters (e.g. and ). In the fields Lexeme, Lemma version EN and Lemma version PT, the intended lexical forms should be typed in lowercase letters (except for the proper names).

The available levels of search can be combined together, which allows to refine the desired results. Combining the lexeme kaza with the POS tag V, for example, restricts the results to verbal forms corresponding to 'marry'), excluding nouns (corresponding to 'house'). If the variable Level of Instruction is added to this search, the results will be all the verbal forms corresponding to the lexeme kaza produced by speakers with the selected level of instruction.

Each search field referring to the linguistic annotation contains a small 'menu' with the options matches, starts with, ends by and contains, whose selection determines the obtained results. This functionality is particularly relevant regarding ltag’s. Given that a lexical form can have more than one ltag (e.g. the word bonh has two ltags: a-o, C-fin), it is suggested that searches at this level be carried out with the contains option selected, in order to avoid the omission of any result.

You can find more information about the ltag’s here.

In addition to the pre-defined search options, you can search via the CQL Query text box located at the top of the page. The construction of CQL queries allows to obtain finer results, namely in the specification of contexts. The query [lemma="el"] [pos="V.*"], for example, will return all occurrences of the personal pronoun el, in all its phonological variants, followed by any verbal form.

The list of searchable attributes can be found here. You can find more information about CQL syntax here.

List all documents