What is a tree bench in linguistics

Tree bench (linguistics)

Example tree for John loves Mary
Hybrid constituency / dependency tree from the Quranic Arabic Corpus

A Tree bench (EnglishTreebank), also parsed body, is a text corpus in which every sentence has been parsed, i.e. annotated with a syntactic structure. The term tree bank refers to the fact that the syntactic structure is usually represented as a tree structure.

Tree banks are often created on corpora that have already been annotated with part-of-speech tags. In addition, tree banks are sometimes expanded with semantic or other linguistic information.

Tree banks can manually be created by linguists annotating each sentence with syntactic structure, but also semi-automaticso that a parser automatically assigns syntactic structure, which is then checked by a linguist and, if necessary, corrected. In practice, the complete checking and parsing of natural language texts is a labor-intensive process.

Some tree banks follow a particular linguistic theory in their syntactic annotation (e.g. the BulTreeBank with HPSG), but most are less theory-specific. Nevertheless, two groups can essentially be distinguished: Tree banks that annotate the phrase structure (e.g. Penn Treebank or ICE-GB), and those that annotate the dependency structure (e.g. Prague Dependency Treebank or the Quranic Arabic Dependency Treebank).


  • Werner Kallmeyer, Gisela Zifonun (Ed.): Language corpora - amount of data and progress in knowledge. Walter de Gruyter GmbH & Co KG, Berlin 2007, ISBN 978-3-11-019273-5.

Web links

Media used on this page