INESS :: Treebank Selection

Treebank Selection

Select a set of treebanks to work with. ?

Click on a treebank name below to proceed. All selected treebanks will be available for viewing and searching. | Show treebank descriptions

Selected	Name	Collection	Type	Sentences	Words	Indexed	Description	License	Downloads
all \| none				16 901 133	241 436 462
	English (eng)			85 146	1 241 704
	eng-pargram (aligned)	eng-pargram (aligned)	ParGram	lfg	101	658	yes	The ParGram collection is a collection of parallel treebanks covering a set of chosen syntactic constructions. The ParGram collection is a collaborative effort of the ParGram project, along with the ParSem project, by researcher groups in industrial and academic institutions around the world. The aim of ParGram is to produce wide coverage grammars for a variety of languages. These are written collaboratively within the linguistic framework of LFG (Lexical Functional Grammar) and with a commonly-agreed-upon set of grammatical features. The XLE (Xerox Linguistic Environment) is used as a development platform. ParSem develops semantic structures based on the ParGram syntactic structures. Most of the ParSem systems use the XLE’s XFR system. Regular semiannual meetings are being held to bring together the various research groups involved in ParGram and ParSem. [less] The ParGram collection is a collection of parallel treebanks covering a set of chosen syntactic cons… [more]	CC-BY	no
	eng-partma	eng-partma	ParTMA	lfg	45	189	yes	The ParTMA collection is a collaborative effort by researcher groups in academic institutions around the world. The aim of ParTMA is to produce parallel treebanks that cover constructions relevant for the semantics of Tense, Mode and Aspect. The treebank sentences are analyzed with the grammars in the ParGram project. [less] The ParTMA collection is a collaborative effort by researcher groups in academic institutions around… [more]	CC-BY	no
	eng-partma-rat	eng-partma-rat	ParTMA	lfg	10	163	yes	The ParTMA collection is a collaborative effort by researcher groups in academic institutions around the world. The aim of ParTMA is to produce parallel treebanks that cover constructions relevant for the semantics of Tense, Mode and Aspect. The treebank sentences are analyzed with the grammars in the ParGram project. [less] The ParTMA collection is a collaborative effort by researcher groups in academic institutions around… [more]	CC-BY	no
	eng-partma-scorpion	eng-partma-scorpion	ParTMA	lfg	10	127	yes	The ParTMA collection is a collaborative effort by researcher groups in academic institutions around the world. The aim of ParTMA is to produce parallel treebanks that cover constructions relevant for the semantics of Tense, Mode and Aspect. The treebank sentences are analyzed with the grammars in the ParGram project. [less] The ParTMA collection is a collaborative effort by researcher groups in academic institutions around… [more]	CC-BY	no
	eng-partma-tempeval3	eng-partma-tempeval3	ParTMA	lfg	273	6 176	yes	The ParTMA collection is a collaborative effort by researcher groups in academic institutions around the world. The aim of ParTMA is to produce parallel treebanks that cover constructions relevant for the semantics of Tense, Mode and Aspect. The treebank sentences are analyzed with the grammars in the ParGram project. [less] The ParTMA collection is a collaborative effort by researcher groups in academic institutions around… [more]	CC-BY	no
	eng-deepbank	eng-deepbank	DELPH-IN	hpsg	36 902	699 385	yes	DeepBank is a treebank of English, containing text from the 1989 Wall Street Journal (the same set of sentences annotated in the original Penn Treebank project) annotated with the English Resource Grammar, with rich linguistic annotation on both syntactic and semantic structures, augmented with a robust approximating PCFG for complete coverage. The treebank is searchable via the INESS interface. For downloads and details of the output formats, please see the following MetaShare site: http://metashare.dfki.de/repository/browse/deepbank/d550713c0bd211e38e2e003048d082a41c57b04b11e146f1887ceb7158e2038c/ [less] DeepBank is a treebank of English, containing text from the 1989 Wall Street Journal (the same set o… [more]	MSCommons-BY-SA	no
	eng-redwoods	eng-redwoods	DELPH-IN	hpsg	47 805	535 006	yes	The LinGO Redwoods Treebank is a collection of hand-annotated corpora analysed with the LinGO ERG. For each utterance from a corpus, the treebank records (in principle) all analyses hypothesized by the grammar, together with an annotator decision as to which reading is preferred in context. The key innovative aspect of the Redwoods approach to treebanking is the anchoring of all linguistic data captured in the treebank to the HPSG framework and a generally-available broad-coverage grammar of English, viz. the LinGO English Resource Grammar. Unlike existing treebanks, there is no need to define a (new) form of grammatical representation specific to the treebank (and, consequently, less dissemination effort in establishing this representation). Instead, the treebank records complete syntacto-semantic analyses as defined by the LinGO ERG; tools are provided to extract many different types of linguistic information at varying granularity. Other relevant aspects of the Redwoods Treebank include the integration of alternate, though dispreferred analyses for each utterance and the dynamic nature of the annotations: as the underlying grammar evolves and improves its analyses, there is a provision for a (nearly) fully automated update of the treebank against a version of the original corpus analysed with the revised grammar. As a methodological results, part of the Redwoods data are now regularly maintained as part of the grammar regression cycle with each new release of the ERG. [less] The LinGO Redwoods Treebank is a collection of hand-annotated corpora analysed with the LinGO ERG. … [more]	GPL	no
	German (deu)			418	4 365
	deu-pargram (aligned)	deu-pargram (aligned)	ParGram	lfg	102	644	yes	The ParGram collection is a collection of parallel treebanks covering a set of chosen syntactic constructions. The ParGram collection is a collaborative effort of the ParGram project, along with the ParSem project, by researcher groups in industrial and academic institutions around the world. The aim of ParGram is to produce wide coverage grammars for a variety of languages. These are written collaboratively within the linguistic framework of LFG (Lexical Functional Grammar) and with a commonly-agreed-upon set of grammatical features. The XLE (Xerox Linguistic Environment) is used as a development platform. ParSem develops semantic structures based on the ParGram syntactic structures. Most of the ParSem systems use the XLE’s XFR system. Regular semiannual meetings are being held to bring together the various research groups involved in ParGram and ParSem. [less] The ParGram collection is a collection of parallel treebanks covering a set of chosen syntactic cons… [more]	CC-BY	no
	deu-partma	deu-partma	ParTMA	lfg	56	262	yes	The ParTMA collection is a collaborative effort by researcher groups in academic institutions around the world. The aim of ParTMA is to produce parallel treebanks that cover constructions relevant for the semantics of Tense, Mode and Aspect. The treebank sentences are analyzed with the grammars in the ParGram project. [less] The ParTMA collection is a collaborative effort by researcher groups in academic institutions around… [more]	CC-BY	no
	deu-partma-manifesto	deu-partma-manifesto	ParTMA	lfg	260	3 459	no	The ParTMA collection is a collaborative effort by researcher groups in academic institutions around the world. The aim of ParTMA is to produce parallel treebanks that cover constructions relevant for the semantics of Tense, Mode and Aspect. The treebank sentences are analyzed with the grammars in the ParGram project. [less] The ParTMA collection is a collaborative effort by researcher groups in academic institutions around… [more]	CC-BY	no
	Hungarian (hun)			94	437
	hun-pargram (aligned)	hun-pargram (aligned)	HunGram, ParGram	lfg	49	281	yes	The ParGram collection is a collection of parallel treebanks covering a set of chosen syntactic constructions. The ParGram collection is a collaborative effort of the ParGram project, along with the ParSem project, by researcher groups in industrial and academic institutions around the world. The aim of ParGram is to produce wide coverage grammars for a variety of languages. These are written collaboratively within the linguistic framework of LFG (Lexical Functional Grammar) and with a commonly-agreed-upon set of grammatical features. The XLE (Xerox Linguistic Environment) is used as a development platform. ParSem develops semantic structures based on the ParGram syntactic structures. Most of the ParSem systems use the XLE’s XFR system. Regular semiannual meetings are being held to bring together the various research groups involved in ParGram and ParSem. [less] The ParGram collection is a collection of parallel treebanks covering a set of chosen syntactic cons… [more]	CC-BY	no
	hun-partma	hun-partma	HunGram, ParTMA	lfg	45	156	yes	The ParTMA collection is a collaborative effort by researcher groups in academic institutions around the world. The aim of ParTMA is to produce parallel treebanks that cover constructions relevant for the semantics of Tense, Mode and Aspect. The treebank sentences are analyzed with the grammars in the ParGram project. [less] The ParTMA collection is a collaborative effort by researcher groups in academic institutions around… [more]	CC-BY	no
	Italian (ita)			50	228
	ita-partma	ita-partma	ParTMA	lfg	50	228	yes	The ParTMA collection is a collaborative effort by researcher groups in academic institutions around the world. The aim of ParTMA is to produce parallel treebanks that cover constructions relevant for the semantics of Tense, Mode and Aspect. The treebank sentences are analyzed with the grammars in the ParGram project. [less] The ParTMA collection is a collaborative effort by researcher groups in academic institutions around… [more]	CC-BY	no
	Norwegian (nor)			1 219 910	24 085 506
	nor-stortinget	nor-stortinget	NorGram, NorGramBank	lfg	227 699	4 477 383	yes	The treebank "Proceedings of Norwegian parliamentary debates (2008-2015)" is a collection of transcriptions of Norwegian parliamentary debates between 2008 and 2015, downloaded from https://data.stortinget.no/. Of the total set of documents, it is ongoing work to preprocess documents (e.g register previously unknown words in the document) and load the preprocessed documents into INESS for automatic parsing; hence, as of June 2016, the size of the treebank is still growing. To see the updated info on treebank size and which documents are included, please choose the relevant treebank, and then click "Treebank Details" (in the left-hand menu). Each sentence has the following metadata which is searchable in the INESS search system: (1) language variety - Norwegian bokmål (nob) or Norwegian nynorsk (nno), based on the automatic recognition of language variety, implemented by Paul Meurer at Uni Research Computing. There are also some transcriptions from speeches in English and Danish. (2) Speaker's name (3) Date and time (4) Political party to which the speaker belongs (5) Type of contribution (e.g. 'hovedinnlegg' [main contribution] or 'replikk' [reply]). AVAILABILITY: The material from 2008- 2015 is searchable via the corpus tool Corpuscle. Via the treebank portal INESS (clarino.uib.no/iness) you can search in sentence analyses from the material (for that set of documents that have currently been preprocessed and the automatically parsed). [less] The treebank "Proceedings of Norwegian parliamentary debates (2008-2015)" is a collection of transcr… [more]	NLOD	no
	nor-stortinget_1	nor-stortinget_1	NorGram, NorGramBank	lfg	257 130	5 090 162	yes	The treebank "Proceedings of Norwegian parliamentary debates (2008-2015)" is a collection of transcriptions of Norwegian parliamentary debates between 2008 and 2015, downloaded from https://data.stortinget.no/. Of the total set of documents, it is ongoing work to preprocess documents (e.g register previously unknown words in the document) and load the preprocessed documents into INESS for automatic parsing; hence, as of June 2016, the size of the treebank is still growing. To see the updated info on treebank size and which documents are included, please choose the relevant treebank, and then click "Treebank Details" (in the left-hand menu). Each sentence has the following metadata which is searchable in the INESS search system: (1) language variety - Norwegian bokmål (nob) or Norwegian nynorsk (nno), based on the automatic recognition of language variety, implemented by Paul Meurer at Uni Research Computing. There are also some transcriptions from speeches in English and Danish. (2) Speaker's name (3) Date and time (4) Political party to which the speaker belongs (5) Type of contribution (e.g. 'hovedinnlegg' [main contribution] or 'replikk' [reply]). AVAILABILITY: The material from 2008- 2015 is searchable via the corpus tool Corpuscle. Via the treebank portal INESS (clarino.uib.no/iness) you can search in sentence analyses from the material (for that set of documents that have currently been preprocessed and the automatically parsed). [less] The treebank "Proceedings of Norwegian parliamentary debates (2008-2015)" is a collection of transcr… [more]	NLOD	no
	nor-stortinget_2	nor-stortinget_2	NorGram, NorGramBank	lfg	236 006	4 648 667	yes	The treebank "Proceedings of Norwegian parliamentary debates (2008-2015)" is a collection of transcriptions of Norwegian parliamentary debates between 2008 and 2015, downloaded from https://data.stortinget.no/. Of the total set of documents, it is ongoing work to preprocess documents (e.g register previously unknown words in the document) and load the preprocessed documents into INESS for automatic parsing; hence, as of June 2016, the size of the treebank is still growing. To see the updated info on treebank size and which documents are included, please choose the relevant treebank, and then click "Treebank Details" (in the left-hand menu). Each sentence has the following metadata which is searchable in the INESS search system: (1) language variety - Norwegian bokmål (nob) or Norwegian nynorsk (nno), based on the automatic recognition of language variety, implemented by Paul Meurer at Uni Research Computing. There are also some transcriptions from speeches in English and Danish. (2) Speaker's name (3) Date and time (4) Political party to which the speaker belongs (5) Type of contribution (e.g. 'hovedinnlegg' [main contribution] or 'replikk' [reply]). AVAILABILITY: The material from 2008- 2015 is searchable via the corpus tool Corpuscle. Via the treebank portal INESS (clarino.uib.no/iness) you can search in sentence analyses from the material (for that set of documents that have currently been preprocessed and the automatically parsed). [less] The treebank "Proceedings of Norwegian parliamentary debates (2008-2015)" is a collection of transcr… [more]	NLOD	no
	nor-stortinget_3	nor-stortinget_3	NorGram, NorGramBank	lfg	252 845	4 924 530	yes	The treebank "Proceedings of Norwegian parliamentary debates (2008-2015)" is a collection of transcriptions of Norwegian parliamentary debates between 2008 and 2015, downloaded from https://data.stortinget.no/. Of the total set of documents, it is ongoing work to preprocess documents (e.g register previously unknown words in the document) and load the preprocessed documents into INESS for automatic parsing; hence, as of June 2016, the size of the treebank is still growing. To see the updated info on treebank size and which documents are included, please choose the relevant treebank, and then click "Treebank Details" (in the left-hand menu). Each sentence has the following metadata which is searchable in the INESS search system: (1) language variety - Norwegian bokmål (nob) or Norwegian nynorsk (nno), based on the automatic recognition of language variety, implemented by Paul Meurer at Uni Research Computing. There are also some transcriptions from speeches in English and Danish. (2) Speaker's name (3) Date and time (4) Political party to which the speaker belongs (5) Type of contribution (e.g. 'hovedinnlegg' [main contribution] or 'replikk' [reply]). AVAILABILITY: The material from 2008- 2015 is searchable via the corpus tool Corpuscle. Via the treebank portal INESS (clarino.uib.no/iness) you can search in sentence analyses from the material (for that set of documents that have currently been preprocessed and the automatically parsed). [less] The treebank "Proceedings of Norwegian parliamentary debates (2008-2015)" is a collection of transcr… [more]	NLOD	no
	nor-stortinget_4	nor-stortinget_4	NorGram, NorGramBank	lfg	246 230	4 944 764	yes	The treebank "Proceedings of Norwegian parliamentary debates (2008-2015)" is a collection of transcriptions of Norwegian parliamentary debates between 2008 and 2015, downloaded from https://data.stortinget.no/. Of the total set of documents, it is ongoing work to preprocess documents (e.g register previously unknown words in the document) and load the preprocessed documents into INESS for automatic parsing; hence, as of June 2016, the size of the treebank is still growing. To see the updated info on treebank size and which documents are included, please choose the relevant treebank, and then click "Treebank Details" (in the left-hand menu). Each sentence has the following metadata which is searchable in the INESS search system: (1) language variety - Norwegian bokmål (nob) or Norwegian nynorsk (nno), based on the automatic recognition of language variety, implemented by Paul Meurer at Uni Research Computing. There are also some transcriptions from speeches in English and Danish. (2) Speaker's name (3) Date and time (4) Political party to which the speaker belongs (5) Type of contribution (e.g. 'hovedinnlegg' [main contribution] or 'replikk' [reply]). AVAILABILITY: The material from 2008- 2015 is searchable via the corpus tool Corpuscle. Via the treebank portal INESS (clarino.uib.no/iness) you can search in sentence analyses from the material (for that set of documents that have currently been preprocessed and the automatically parsed). [less] The treebank "Proceedings of Norwegian parliamentary debates (2008-2015)" is a collection of transcr… [more]	NLOD	no
	Norwegian Bokmål (nob)			15 595 312	216 102 917
	nob-avis	nob-avis	NorGram, NorGramBank	lfg	246 397	3 157 550	yes	The "NorGramBank – Newspaper text (years 2012, 2013) in Norwegian Bokmål from the Norwegian Newspaper Corpus" treebank is a syntactically annotated corpus based on data taken from the years 2012 and 2013 from the Norwegian Newspaper Corpus (NCC). This treebank is part of INESS NorGramBank collection (see URL in metadata). As of October 2015, the treebank comprises 246397 sentences, 3157558 words and 1543 documents. Note that the available treebank contains only those newspaper articles from 2012 and 2013 that have been manually preprocessed; see details otherwheres in the metadata. [less] The "NorGramBank – Newspaper text (years 2012, 2013) in Norwegian Bokmål from the Norwegian Newspape… [more]	CC-BY	no
	nob-child	nob-child	NorGram, NorGramBank	lfg	389 557	4 110 961	yes	The "NorGramBank children’s fiction in Norwegian Bokmål" treebank is a syntactically annotated corpus based on data taken from bokhylla.no at the National Library of Norway. This treebank is part of INESS NorGramBank collection (see URL in metadata). As of October 2015, the treebank comprises 389564 sentences, 4111213 words and 155 documents. The source text was OCR-read by the National Library of Norway; INESS has preprocessed the source text semi-automatically with regard to OCR errors (misinterpreted letters etc) before syntactic parsing. [less] The "NorGramBank children’s fiction in Norwegian Bokmål" treebank is a syntactically annotated corpu… [more]	CLARIN_ACA	no
	nob-fn	nob-fn	NorGram, NorGramBank	lfg	489 341	8 321 494	yes	The "NorGram Non-fiction text in Norwegian Bokmål from Forskning.no" treebank is a syntactically annotated corpus based on data taken from the Norwegian popular science website Forskning.no. This treebank is part of INESS NorGramBank collection (see URL in metadata). As of October 2015, the treebank comprises 489341 sentences, 8321480 words and 13243 documents. [less] The "NorGram Non-fiction text in Norwegian Bokmål from Forskning.no" treebank is a syntactically ann… [more]	CLARIN_RES-DEP	no
	nob-jrc-acquis (aligned)	nob-jrc-acquis (aligned)	Acquis, NorGram	lfg	101	1 862	yes	The Norwegian part of the META-NORD Acquis Parallel Treebank.	CC-BY	no
	nob-lbk-av	nob-lbk-av	NorGram, NorGramBank	lfg	1 336	18 971	yes	The "NorGramBank Newspaper text in Norwegian Bokmål from the LBK" treebank is a syntactically annotated corpus based on data taken from the Norwegian reference corpus for Norwegian Bokmål, Leksikografisk Bokmålskorpus (LBK). This treebank is part of INESS NorGramBank collection (see URL in metadata). As of October 2015, the treebank comprises 173914 sentences, 2661597 words and 599 documents. [less] The "NorGramBank Newspaper text in Norwegian Bokmål from the LBK" treebank is a syntactically annota… [more]	CLARIN_ACA-NC	no
	nob-lbk-sa	nob-lbk-sa	NorGram, NorGramBank	lfg	189 137	2 911 173	yes	The "NorGramBank non-fiction text in Norwegian Bokmål from the LBK" treebank is a syntactically annotated corpus based on data taken from the Norwegian reference corpus for Norwegian Bokmål, Leksikografisk Bokmålskorpus (LBK). This treebank is part of INESS NorGramBank collection (see URL in metadata). As of October 2015, the treebank comprises 173914 sentences, 2661597 words and 599 documents. [less] The "NorGramBank non-fiction text in Norwegian Bokmål from the LBK" treebank is a syntactically anno… [more]	CLARIN_ACA-NC	no
	nob-lbk-tv	nob-lbk-tv	NorGram, NorGramBank	lfg	18 043	127 844	yes	The "NorGramBank television subtitles in Norwegian Bokmål from LBK" treebank is a syntactically annotated corpus based on data taken from the Norwegian reference corpus for Norwegian Bokmål, Leksikografisk Bokmålskorpus (LBK). This treebank is part of INESS NorGramBank collection (see URL in metadata). As of October 2015, the treebank comprises 18043 sentences, 127844 words and 16 documents. [less] The "NorGramBank television subtitles in Norwegian Bokmål from LBK" treebank is a syntactically anno… [more]	CLARIN_ACA	no
	nob-naob	nob-naob	NAOB, NorGram, NorGramBank	lfg	678 773	9 885 233	yes	The "NorGramBank fiction in Norwegian Bokmål" treebank is a syntactically annotated corpus based on data taken from bokhylla.no at the National Library of Norway. This treebank is part of INESS NorGramBank collection (see URL in metadata). As of October 2015, the treebank comprises 2 469 916 sentences and 26 903 637 words. The source text was OCR-read by the National Library of Norway; INESS has preprocessed the source text semi-automatically with regard to OCR errors (misinterpreted letters etc) before syntactic parsing. [less] The "NorGramBank fiction in Norwegian Bokmål" treebank is a syntactically annotated corpus based on … [more]	CLARIN_ACA	no
	nob-naob_1	nob-naob_1	NAOB, NorGram, NorGramBank	lfg	621 622	9 207 093	yes	The "NorGramBank fiction in Norwegian Bokmål" treebank is a syntactically annotated corpus based on data taken from bokhylla.no at the National Library of Norway. This treebank is part of INESS NorGramBank collection (see URL in metadata). As of October 2015, the treebank comprises 2 469 916 sentences and 26 903 637 words. The source text was OCR-read by the National Library of Norway; INESS has preprocessed the source text semi-automatically with regard to OCR errors (misinterpreted letters etc) before syntactic parsing. [less] The "NorGramBank fiction in Norwegian Bokmål" treebank is a syntactically annotated corpus based on … [more]	CLARIN_ACA	no
	nob-naob_2	nob-naob_2	NAOB, NorGram, NorGramBank	lfg	999 909	15 414 738	yes	The "NorGramBank fiction in Norwegian Bokmål" treebank is a syntactically annotated corpus based on data taken from bokhylla.no at the National Library of Norway. This treebank is part of INESS NorGramBank collection (see URL in metadata). As of October 2015, the treebank comprises 2 469 916 sentences and 26 903 637 words. The source text was OCR-read by the National Library of Norway; INESS has preprocessed the source text semi-automatically with regard to OCR errors (misinterpreted letters etc) before syntactic parsing. [less] The "NorGramBank fiction in Norwegian Bokmål" treebank is a syntactically annotated corpus based on … [more]	CLARIN_ACA	no
	nob-naob_3	nob-naob_3	NAOB, NorGram, NorGramBank	lfg	1 045 847	16 020 711	yes	The "NorGramBank fiction in Norwegian Bokmål" treebank is a syntactically annotated corpus based on data taken from bokhylla.no at the National Library of Norway. This treebank is part of INESS NorGramBank collection (see URL in metadata). As of October 2015, the treebank comprises 2 469 916 sentences and 26 903 637 words. The source text was OCR-read by the National Library of Norway; INESS has preprocessed the source text semi-automatically with regard to OCR errors (misinterpreted letters etc) before syntactic parsing. [less] The "NorGramBank fiction in Norwegian Bokmål" treebank is a syntactically annotated corpus based on … [more]	CLARIN_ACA	no
	nob-naob_4	nob-naob_4	NAOB, NorGram, NorGramBank	lfg	1 077 587	16 643 145	yes	The "NorGramBank fiction in Norwegian Bokmål" treebank is a syntactically annotated corpus based on data taken from bokhylla.no at the National Library of Norway. This treebank is part of INESS NorGramBank collection (see URL in metadata). As of October 2015, the treebank comprises 2 469 916 sentences and 26 903 637 words. The source text was OCR-read by the National Library of Norway; INESS has preprocessed the source text semi-automatically with regard to OCR errors (misinterpreted letters etc) before syntactic parsing. [less] The "NorGramBank fiction in Norwegian Bokmål" treebank is a syntactically annotated corpus based on … [more]	CLARIN_ACA	no
	nob-naob_5	nob-naob_5	NAOB, NorGram, NorGramBank	lfg	953 413	14 093 832	yes	The "NorGramBank fiction in Norwegian Bokmål" treebank is a syntactically annotated corpus based on data taken from bokhylla.no at the National Library of Norway. This treebank is part of INESS NorGramBank collection (see URL in metadata). As of October 2015, the treebank comprises 2 469 916 sentences and 26 903 637 words. The source text was OCR-read by the National Library of Norway; INESS has preprocessed the source text semi-automatically with regard to OCR errors (misinterpreted letters etc) before syntactic parsing. [less] The "NorGramBank fiction in Norwegian Bokmål" treebank is a syntactically annotated corpus based on … [more]	CLARIN_ACA	no
	nob-naob_6	nob-naob_6	NAOB, NorGram, NorGramBank	lfg	942 721	12 090 392	yes	The "NorGramBank fiction in Norwegian Bokmål" treebank is a syntactically annotated corpus based on data taken from bokhylla.no at the National Library of Norway. This treebank is part of INESS NorGramBank collection (see URL in metadata). As of October 2015, the treebank comprises 2 469 916 sentences and 26 903 637 words. The source text was OCR-read by the National Library of Norway; INESS has preprocessed the source text semi-automatically with regard to OCR errors (misinterpreted letters etc) before syntactic parsing. [less] The "NorGramBank fiction in Norwegian Bokmål" treebank is a syntactically annotated corpus based on … [more]	CLARIN_ACA	no
	nob-naob_7	nob-naob_7	NAOB, NorGram, NorGramBank	lfg	1 063 592	14 424 554	yes	The "NorGramBank fiction in Norwegian Bokmål" treebank is a syntactically annotated corpus based on data taken from bokhylla.no at the National Library of Norway. This treebank is part of INESS NorGramBank collection (see URL in metadata). As of October 2015, the treebank comprises 2 469 916 sentences and 26 903 637 words. The source text was OCR-read by the National Library of Norway; INESS has preprocessed the source text semi-automatically with regard to OCR errors (misinterpreted letters etc) before syntactic parsing. [less] The "NorGramBank fiction in Norwegian Bokmål" treebank is a syntactically annotated corpus based on … [more]	CLARIN_ACA	no
	nob-naob_8	nob-naob_8	NAOB, NorGram, NorGramBank	lfg	438 743	6 708 000	yes	The "NorGramBank fiction in Norwegian Bokmål" treebank is a syntactically annotated corpus based on data taken from bokhylla.no at the National Library of Norway. This treebank is part of INESS NorGramBank collection (see URL in metadata). As of October 2015, the treebank comprises 2 469 916 sentences and 26 903 637 words. The source text was OCR-read by the National Library of Norway; INESS has preprocessed the source text semi-automatically with regard to OCR errors (misinterpreted letters etc) before syntactic parsing. [less] The "NorGramBank fiction in Norwegian Bokmål" treebank is a syntactically annotated corpus based on … [more]	CLARIN_ACA	no
	nob-naob_9	nob-naob_9	NAOB, NorGram, NorGramBank	lfg	73 041	1 434 798	yes	The "NorGramBank fiction in Norwegian Bokmål" treebank is a syntactically annotated corpus based on data taken from bokhylla.no at the National Library of Norway. This treebank is part of INESS NorGramBank collection (see URL in metadata). As of October 2015, the treebank comprises 2 469 916 sentences and 26 903 637 words. The source text was OCR-read by the National Library of Norway; INESS has preprocessed the source text semi-automatically with regard to OCR errors (misinterpreted letters etc) before syntactic parsing. [less] The "NorGramBank fiction in Norwegian Bokmål" treebank is a syntactically annotated corpus based on … [more]	CLARIN_ACA	no
	nob-naob-dan	nob-naob-dan	NAOB	lfg	941 702	14 205 518	no	The "NorGramBank fiction in older Norwegian Bokmål" treebank is a syntactically annotated corpus based on data taken from bokhylla.no at the National Library of Norway. This treebank is part of INESS NorGramBank collection (see URL in metadata). As of October 2015, the treebank comprises 2 469 916 sentences and 26 903 637 words. The source text was OCR-read by the National Library of Norway; INESS has preprocessed the source text semi-automatically with regard to OCR errors (misinterpreted letters etc) before syntactic parsing. [less] The "NorGramBank fiction in older Norwegian Bokmål" treebank is a syntactically annotated corpus bas… [more]	CLARIN_ACA	no
	nob-naob-dan_1	nob-naob-dan_1	NAOB	lfg	1 104 420	16 080 811	no	The "NorGramBank fiction in older Norwegian Bokmål" treebank is a syntactically annotated corpus based on data taken from bokhylla.no at the National Library of Norway. This treebank is part of INESS NorGramBank collection (see URL in metadata). As of October 2015, the treebank comprises 2 469 916 sentences and 26 903 637 words. The source text was OCR-read by the National Library of Norway; INESS has preprocessed the source text semi-automatically with regard to OCR errors (misinterpreted letters etc) before syntactic parsing. [less] The "NorGramBank fiction in older Norwegian Bokmål" treebank is a syntactically annotated corpus bas… [more]	CLARIN_ACA	no
	nob-naob-dan_2	nob-naob-dan_2	NAOB	lfg	312 743	4 399 173	no	The "NorGramBank fiction in older Norwegian Bokmål" treebank is a syntactically annotated corpus based on data taken from bokhylla.no at the National Library of Norway. This treebank is part of INESS NorGramBank collection (see URL in metadata). As of October 2015, the treebank comprises 2 469 916 sentences and 26 903 637 words. The source text was OCR-read by the National Library of Norway; INESS has preprocessed the source text semi-automatically with regard to OCR errors (misinterpreted letters etc) before syntactic parsing. [less] The "NorGramBank fiction in older Norwegian Bokmål" treebank is a syntactically annotated corpus bas… [more]	CLARIN_ACA	no
	nob-ndt-lfg	nob-ndt-lfg	NDT, NorGram, NorGramBank	lfg	20 045	276 943	yes	The treebank "NorGram NDT in LFG in Norwegian Bokmål (derivate from the Norwegian Dependency Treebank)" is based on the text material in the Norwegian Dependency Treebank (NDT), available from Språkbanken at National Library of Norway. The sentences have been parsed and disambiguated in the Norwegian LFG treebank using the NorGram LFG grammar. [less] The treebank "NorGram NDT in LFG in Norwegian Bokmål (derivate from the Norwegian Dependency Treeban… [more]	CC-BY	no
	nob-novel	nob-novel	NorGram, NorGramBank	lfg	271 366	3 111 321	yes	The "NorGramBank fiction in Norwegian Bokmål" treebank is a syntactically annotated corpus based on data taken from bokhylla.no at the National Library of Norway. This treebank is part of INESS NorGramBank collection (see URL in metadata). As of October 2015, the treebank comprises 2 469 916 sentences and 26 903 637 words. The source text was OCR-read by the National Library of Norway; INESS has preprocessed the source text semi-automatically with regard to OCR errors (misinterpreted letters etc) before syntactic parsing. [less] The "NorGramBank fiction in Norwegian Bokmål" treebank is a syntactically annotated corpus based on … [more]	CLARIN_ACA	no
	nob-novel_1	nob-novel_1	NorGram, NorGramBank	lfg	406 280	4 369 998	yes	The "NorGramBank fiction in Norwegian Bokmål" treebank is a syntactically annotated corpus based on data taken from bokhylla.no at the National Library of Norway. This treebank is part of INESS NorGramBank collection (see URL in metadata). As of October 2015, the treebank comprises 2 469 916 sentences and 26 903 637 words. The source text was OCR-read by the National Library of Norway; INESS has preprocessed the source text semi-automatically with regard to OCR errors (misinterpreted letters etc) before syntactic parsing. [less] The "NorGramBank fiction in Norwegian Bokmål" treebank is a syntactically annotated corpus based on … [more]	CLARIN_ACA	no
	nob-novel_2	nob-novel_2	NorGram, NorGramBank	lfg	498 130	5 529 358	yes	The "NorGramBank fiction in Norwegian Bokmål" treebank is a syntactically annotated corpus based on data taken from bokhylla.no at the National Library of Norway. This treebank is part of INESS NorGramBank collection (see URL in metadata). As of October 2015, the treebank comprises 2 469 916 sentences and 26 903 637 words. The source text was OCR-read by the National Library of Norway; INESS has preprocessed the source text semi-automatically with regard to OCR errors (misinterpreted letters etc) before syntactic parsing. [less] The "NorGramBank fiction in Norwegian Bokmål" treebank is a syntactically annotated corpus based on … [more]	CLARIN_ACA	no
	nob-novel_3	nob-novel_3	NorGram, NorGramBank	lfg	441 721	4 639 538	yes	The "NorGramBank fiction in Norwegian Bokmål" treebank is a syntactically annotated corpus based on data taken from bokhylla.no at the National Library of Norway. This treebank is part of INESS NorGramBank collection (see URL in metadata). As of October 2015, the treebank comprises 2 469 916 sentences and 26 903 637 words. The source text was OCR-read by the National Library of Norway; INESS has preprocessed the source text semi-automatically with regard to OCR errors (misinterpreted letters etc) before syntactic parsing. [less] The "NorGramBank fiction in Norwegian Bokmål" treebank is a syntactically annotated corpus based on … [more]	CLARIN_ACA	no
	nob-novel_4	nob-novel_4	NorGram, NorGramBank	lfg	467 551	5 184 128	yes	The "NorGramBank fiction in Norwegian Bokmål" treebank is a syntactically annotated corpus based on data taken from bokhylla.no at the National Library of Norway. This treebank is part of INESS NorGramBank collection (see URL in metadata). As of October 2015, the treebank comprises 2 469 916 sentences and 26 903 637 words. The source text was OCR-read by the National Library of Norway; INESS has preprocessed the source text semi-automatically with regard to OCR errors (misinterpreted letters etc) before syntactic parsing. [less] The "NorGramBank fiction in Norwegian Bokmål" treebank is a syntactically annotated corpus based on … [more]	CLARIN_ACA	no
	nob-novel_5	nob-novel_5	NorGram, NorGramBank	lfg	443 891	4 817 445	yes	The "NorGramBank fiction in Norwegian Bokmål" treebank is a syntactically annotated corpus based on data taken from bokhylla.no at the National Library of Norway. This treebank is part of INESS NorGramBank collection (see URL in metadata). As of October 2015, the treebank comprises 2 469 916 sentences and 26 903 637 words. The source text was OCR-read by the National Library of Norway; INESS has preprocessed the source text semi-automatically with regard to OCR errors (misinterpreted letters etc) before syntactic parsing. [less] The "NorGramBank fiction in Norwegian Bokmål" treebank is a syntactically annotated corpus based on … [more]	CLARIN_ACA	no
	nob-novel_6	nob-novel_6	NorGram, NorGramBank	lfg	395 700	5 121 558	yes	The "NorGramBank fiction in Norwegian Bokmål" treebank is a syntactically annotated corpus based on data taken from bokhylla.no at the National Library of Norway. This treebank is part of INESS NorGramBank collection (see URL in metadata). As of October 2015, the treebank comprises 2 469 916 sentences and 26 903 637 words. The source text was OCR-read by the National Library of Norway; INESS has preprocessed the source text semi-automatically with regard to OCR errors (misinterpreted letters etc) before syntactic parsing. [less] The "NorGramBank fiction in Norwegian Bokmål" treebank is a syntactically annotated corpus based on … [more]	CLARIN_ACA	no
	nob-novel_7	nob-novel_7	NorGram, NorGramBank	lfg	221 444	3 230 446	yes	The "NorGramBank fiction in Norwegian Bokmål" treebank is a syntactically annotated corpus based on data taken from bokhylla.no at the National Library of Norway. This treebank is part of INESS NorGramBank collection (see URL in metadata). As of October 2015, the treebank comprises 2 469 916 sentences and 26 903 637 words. The source text was OCR-read by the National Library of Norway; INESS has preprocessed the source text semi-automatically with regard to OCR errors (misinterpreted letters etc) before syntactic parsing. [less] The "NorGramBank fiction in Norwegian Bokmål" treebank is a syntactically annotated corpus based on … [more]	CLARIN_ACA	no
	nob-novel_8	nob-novel_8	NAOB, NorGram, NorGramBank	lfg	570 543	7 201 100	yes	The "NorGramBank fiction in Norwegian Bokmål" treebank is a syntactically annotated corpus based on data taken from bokhylla.no at the National Library of Norway. This treebank is part of INESS NorGramBank collection (see URL in metadata). As of October 2015, the treebank comprises 2 469 916 sentences and 26 903 637 words. The source text was OCR-read by the National Library of Norway; INESS has preprocessed the source text semi-automatically with regard to OCR errors (misinterpreted letters etc) before syntactic parsing. [less] The "NorGramBank fiction in Norwegian Bokmål" treebank is a syntactically annotated corpus based on … [more]	CLARIN_ACA	no
	nob-novel_9	nob-novel_9	NAOB, NorGram, NorGramBank	lfg	265 790	3 298 570	yes	The "NorGramBank fiction in Norwegian Bokmål" treebank is a syntactically annotated corpus based on data taken from bokhylla.no at the National Library of Norway. This treebank is part of INESS NorGramBank collection (see URL in metadata). As of October 2015, the treebank comprises 2 469 916 sentences and 26 903 637 words. The source text was OCR-read by the National Library of Norway; INESS has preprocessed the source text semi-automatically with regard to OCR errors (misinterpreted letters etc) before syntactic parsing. [less] The "NorGramBank fiction in Norwegian Bokmål" treebank is a syntactically annotated corpus based on … [more]	CLARIN_ACA	no
	nob-nrk	nob-nrk	NorGram, NorGramBank	lfg	3 267	45 428	yes	The «Corona texts from NRK» treebank is a syntactically annotated corpus. It is based on data transcribed from the two newscasts Dagsrevyen and Supernytt produced by the Norwegian Broadcasting Corporation (NRK). [less] The «Corona texts from NRK» treebank is a syntactically annotated corpus. It is based on data transc… [more]	CC-BY	no
	nob-pargram (aligned)	nob-pargram (aligned)	NorGram, ParGram	lfg	112	603	yes	The ParGram collection is a collection of parallel treebanks covering a set of chosen syntactic constructions. The ParGram collection is a collaborative effort of the ParGram project, along with the ParSem project, by researcher groups in industrial and academic institutions around the world. The aim of ParGram is to produce wide coverage grammars for a variety of languages. These are written collaboratively within the linguistic framework of LFG (Lexical Functional Grammar) and with a commonly-agreed-upon set of grammatical features. The XLE (Xerox Linguistic Environment) is used as a development platform. ParSem develops semantic structures based on the ParGram syntactic structures. Most of the ParSem systems use the XLE’s XFR system. Regular semiannual meetings are being held to bring together the various research groups involved in ParGram and ParSem. [less] The ParGram collection is a collection of parallel treebanks covering a set of chosen syntactic cons… [more]	CC-BY	no
	nob-partma (aligned)	nob-partma (aligned)	NorGram, ParTMA	lfg	46	285	yes	The ParTMA collection is a collaborative effort by researcher groups in academic institutions around the world. The aim of ParTMA is to produce parallel treebanks that cover constructions relevant for the semantics of Tense, Mode and Aspect. The treebank sentences are analyzed with the grammars in the ParGram project. [less] The ParTMA collection is a collaborative effort by researcher groups in academic institutions around… [more]	CC-BY	no
	nob-sofie (aligned)	nob-sofie (aligned)	NorGram, NorGramBank	lfg	1 151	15 224	yes	The INESS Sofie Norwegian Treebank. The treebank is a syntactically annotated corpus based on the first chapters of the novel “Sofies verden” by Jostein Gaarder, published by Aschehoug forlag. The sentence-analyses are produced by INESS for the META-NORD project, whose goal was to promote the accessability of existing treebanks for the languages in the project. The corpus is automatically analyzed with the NorGram LFG grammar and all analyses are manually verified. [less] The INESS Sofie Norwegian Treebank. The treebank is a syntactically annotated corpus based on the … [more]	unspecified	no
	nob-sofie-lfg (aligned)	nob-sofie-lfg (aligned)	NorGram, Sofie	lfg	250	3 119	yes	The Norwegian part of the META-NORD Sofie Parallel Treebank, a syntactically annotated parallel corpus based on the first chapters of the novel “Sofies verden” (Sophie's World) by Jostein Gaarder, published by Aschehoug forlag. The treebank consists of grammatical annotations of extracts from the original and was created by the INESS project for META-NORD. For more information, see the metadata description of the META-NORD Sofie Parallel Treebank. [less] The Norwegian part of the META-NORD Sofie Parallel Treebank, a syntactically annotated parallel corp… [more]	unspecified	no
	Portuguese (por)			50	340
	por-pargram	por-pargram	ParGram	lfg	50	340	yes	The ParGram collection is a collection of parallel treebanks covering a set of chosen syntactic constructions. The ParGram collection is a collaborative effort of the ParGram project, along with the ParSem project, by researcher groups in industrial and academic institutions around the world. The aim of ParGram is to produce wide coverage grammars for a variety of languages. These are written collaboratively within the linguistic framework of LFG (Lexical Functional Grammar) and with a commonly-agreed-upon set of grammatical features. The XLE (Xerox Linguistic Environment) is used as a development platform. ParSem develops semantic structures based on the ParGram syntactic structures. Most of the ParSem systems use the XLE’s XFR system. Regular semiannual meetings are being held to bring together the various research groups involved in ParGram and ParSem. [less] The ParGram collection is a collection of parallel treebanks covering a set of chosen syntactic cons… [more]	CC-BY	no
	Russian (rus)			10	134
	rus-partma-scorpion	rus-partma-scorpion	ParTMA	lfg	10	134	yes	The ParTMA collection is a collaborative effort by researcher groups in academic institutions around the world. The aim of ParTMA is to produce parallel treebanks that cover constructions relevant for the semantics of Tense, Mode and Aspect. The treebank sentences are analyzed with the grammars in the ParGram project. [less] The ParTMA collection is a collaborative effort by researcher groups in academic institutions around… [more]	CC-BY	no
	Urdu (urd)			143	831
	urd-pargram (aligned)	urd-pargram (aligned)	ParGram	lfg	96	600	yes	The ParGram collection is a collection of parallel treebanks covering a set of chosen syntactic constructions. The ParGram collection is a collaborative effort of the ParGram project, along with the ParSem project, by researcher groups in industrial and academic institutions around the world. The aim of ParGram is to produce wide coverage grammars for a variety of languages. These are written collaboratively within the linguistic framework of LFG (Lexical Functional Grammar) and with a commonly-agreed-upon set of grammatical features. The XLE (Xerox Linguistic Environment) is used as a development platform. ParSem develops semantic structures based on the ParGram syntactic structures. Most of the ParSem systems use the XLE’s XFR system. Regular semiannual meetings are being held to bring together the various research groups involved in ParGram and ParSem. [less] The ParGram collection is a collection of parallel treebanks covering a set of chosen syntactic cons… [more]	CC-BY	no
	urd-partma (aligned)	urd-partma (aligned)	ParTMA	lfg	47	231	yes	The ParTMA collection is a collaborative effort by researcher groups in academic institutions around the world. The aim of ParTMA is to produce parallel treebanks that cover constructions relevant for the semantics of Tense, Mode and Aspect. The treebank sentences are analyzed with the grammars in the ParGram project. [less] The ParTMA collection is a collaborative effort by researcher groups in academic institutions around… [more]	CC-BY	no

Design & implementation: Paul Meurer, CLARINO Bergen Centre, 2025 · Accessibility statement (in Norwegian only)