Main Menu
Powered by <TEI:TOK>
Maarten Janssen, 2014-
Syntactic trees in TEITOK are stored in the PSDX format, which is the XML version of the Penn Treebank PSD format.
In the PSDX format, trees are represented as an XML hierarchy mimicking the syntactic tree.
Elements |
Attributes |
Description |
forest |
|
the root node of a syntactic tree |
eTree |
|
any syntactic or morphosyntactic node (non terminal) |
|
Label |
syntactic or POS label |
|
index |
numerical index codifying a syntactic dependency (matches the index of another element within the same tree) |
eLeaf |
|
a lexical/empty terminal node |
|
Text |
the lexical content of an eLeaf |
|
Notext |
the empty content of an eLeaf (null categories) |
|
index |
numerical index codifying a syntactic dependency (matches the index of another element within the same tree) |
To query through PSDX files, TEITOK offers an XPath search function. XPath is the most common way to indicate nodes in an XML tree. The idea behind it is comparable to that of the filepath for files on your computer, with slashes separating folders (query language overview). The following table presents the most relevant XPath syntax expressions for querying syntactic trees.
. |
the current node |
.. |
the parent of the current node |
[ ] |
any predicate of a node |
@ |
any attribute |
" " |
any value |
// |
dominance |
/ |
immediate dominance |
preceding-sibling |
precedence |
contains |
function: selects a partial attribute of an element |
count |
function: counts the number of childs of a selected element. |
and |
conjunction of two search conditions |
or |
disjunction of two search conditions |
An example of a syntactic XPath query is the following:
//eTree/eTree[@Label=“NP-SBJ” and ./eLeaf[@Notext=“*pro*”]]
In this query, we look for a node that has a child with the label NP-SBJ, which immediately dominates a terminal node with an empty content: *pro*. Or, to say it in a different way, we look for a phrasal element with a referential null subject.
In the same manner, in order to retrieve ditransitive constructions, we can write a query that looks for all NP-ACC nodes (accusative NP) that have a sister node of type NP-DAT (dative NP):
//eTree[@Label="NP-ACC" and ../eTree[@Label="NP-DAT"]]
Apart from going up or down in the tree, it is also possible to do comparisons in XPath on numbers and strings, for instance, we can search all IP-SUB (subordinate clauses) with exactly three trees below it:
//eTree[@Label="IP-SUB" and count(eTree) = 3]
Or we can select nodes with the same @Label as their parent:
//eTree[@Label = ../@Label]
Users accustomed to the query language of the CorpusSearch tool, which targets PSD files, have a list of the main correspondences with XPath queries in here: CorpusSearch in PSD vs. XPath in PSDX.