INESS-logo
Metadata

Treebanks

Tools


LinGO Redwoods Treebank (copy @ INESS)
Full metadata record:
Persistent identifier for the resource:
Contact Person: Rosén, Victoria
This resource is licensed under the following terms:
General Public License (GPL)
BY SA
BY SA
Please click on the link to read the license terms.
By accepting the terms of the license you will be granted access to the resource.
Attribution:
Please use the following text to cite this resource:
LinGO Redwoods Treebank (copy @ INESS). Distributed by the INESS Portal: hdl:11495/DB0A-7D96-098F-5
Size: 47805 sentences , 535006 words
Language(s): English (en)
Description:
The LinGO Redwoods Treebank is a collection of hand-annotated corpora analysed with the LinGO ERG.

For each utterance from a corpus, the treebank records (in principle) all analyses hypothesized by the grammar, together with an annotator decision as to which reading is preferred in context.

The key innovative aspect of the Redwoods approach to treebanking is the anchoring of all linguistic data captured in the treebank to the HPSG framework and a generally-available broad-coverage grammar of English, viz. the LinGO English Resource Grammar. Unlike existing treebanks, there is no need to define a (new) form of grammatical representation specific to the treebank (and, consequently, less dissemination effort in establishing this representation). Instead, the treebank records complete syntacto-semantic analyses as defined by the LinGO ERG; tools are provided to extract many different types of linguistic information at varying granularity.

Other relevant aspects of the Redwoods Treebank include the integration of alternate, though dispreferred analyses for each utterance and the dynamic nature of the annotations: as the underlying grammar evolves and improves its analyses, there is a provision for a (nearly) fully automated update of the treebank against a version of the original corpus analysed with the revised grammar. As a methodological results, part of the Redwoods data are now regularly maintained as part of the grammar regression cycle with each new release of the ERG.