INESS :: Treebank Selection

Treebank Selection

Select a set of treebanks to work with. ?

Click on a treebank name below to proceed. All selected treebanks will be available for viewing and searching. | Show treebank descriptions

Selected	Name	Collection	Type	Sentences	Words	Indexed	Description	License	Downloads
all \| none				14 988 167	211 542 370
	Norwegian (nor)			1 219 910	24 085 506
	nor-stortinget	nor-stortinget	NorGram, NorGramBank	lfg	227 699	4 477 383	yes	The treebank "Proceedings of Norwegian parliamentary debates (2008-2015)" is a collection of transcriptions of Norwegian parliamentary debates between 2008 and 2015, downloaded from https://data.stortinget.no/. Of the total set of documents, it is ongoing work to preprocess documents (e.g register previously unknown words in the document) and load the preprocessed documents into INESS for automatic parsing; hence, as of June 2016, the size of the treebank is still growing. To see the updated info on treebank size and which documents are included, please choose the relevant treebank, and then click "Treebank Details" (in the left-hand menu). Each sentence has the following metadata which is searchable in the INESS search system: (1) language variety - Norwegian bokmål (nob) or Norwegian nynorsk (nno), based on the automatic recognition of language variety, implemented by Paul Meurer at Uni Research Computing. There are also some transcriptions from speeches in English and Danish. (2) Speaker's name (3) Date and time (4) Political party to which the speaker belongs (5) Type of contribution (e.g. 'hovedinnlegg' [main contribution] or 'replikk' [reply]). AVAILABILITY: The material from 2008- 2015 is searchable via the corpus tool Corpuscle. Via the treebank portal INESS (clarino.uib.no/iness) you can search in sentence analyses from the material (for that set of documents that have currently been preprocessed and the automatically parsed). [less] The treebank "Proceedings of Norwegian parliamentary debates (2008-2015)" is a collection of transcr… [more]	NLOD	no
	nor-stortinget_1	nor-stortinget_1	NorGram, NorGramBank	lfg	257 130	5 090 162	yes	The treebank "Proceedings of Norwegian parliamentary debates (2008-2015)" is a collection of transcriptions of Norwegian parliamentary debates between 2008 and 2015, downloaded from https://data.stortinget.no/. Of the total set of documents, it is ongoing work to preprocess documents (e.g register previously unknown words in the document) and load the preprocessed documents into INESS for automatic parsing; hence, as of June 2016, the size of the treebank is still growing. To see the updated info on treebank size and which documents are included, please choose the relevant treebank, and then click "Treebank Details" (in the left-hand menu). Each sentence has the following metadata which is searchable in the INESS search system: (1) language variety - Norwegian bokmål (nob) or Norwegian nynorsk (nno), based on the automatic recognition of language variety, implemented by Paul Meurer at Uni Research Computing. There are also some transcriptions from speeches in English and Danish. (2) Speaker's name (3) Date and time (4) Political party to which the speaker belongs (5) Type of contribution (e.g. 'hovedinnlegg' [main contribution] or 'replikk' [reply]). AVAILABILITY: The material from 2008- 2015 is searchable via the corpus tool Corpuscle. Via the treebank portal INESS (clarino.uib.no/iness) you can search in sentence analyses from the material (for that set of documents that have currently been preprocessed and the automatically parsed). [less] The treebank "Proceedings of Norwegian parliamentary debates (2008-2015)" is a collection of transcr… [more]	NLOD	no
	nor-stortinget_2	nor-stortinget_2	NorGram, NorGramBank	lfg	236 006	4 648 667	yes	The treebank "Proceedings of Norwegian parliamentary debates (2008-2015)" is a collection of transcriptions of Norwegian parliamentary debates between 2008 and 2015, downloaded from https://data.stortinget.no/. Of the total set of documents, it is ongoing work to preprocess documents (e.g register previously unknown words in the document) and load the preprocessed documents into INESS for automatic parsing; hence, as of June 2016, the size of the treebank is still growing. To see the updated info on treebank size and which documents are included, please choose the relevant treebank, and then click "Treebank Details" (in the left-hand menu). Each sentence has the following metadata which is searchable in the INESS search system: (1) language variety - Norwegian bokmål (nob) or Norwegian nynorsk (nno), based on the automatic recognition of language variety, implemented by Paul Meurer at Uni Research Computing. There are also some transcriptions from speeches in English and Danish. (2) Speaker's name (3) Date and time (4) Political party to which the speaker belongs (5) Type of contribution (e.g. 'hovedinnlegg' [main contribution] or 'replikk' [reply]). AVAILABILITY: The material from 2008- 2015 is searchable via the corpus tool Corpuscle. Via the treebank portal INESS (clarino.uib.no/iness) you can search in sentence analyses from the material (for that set of documents that have currently been preprocessed and the automatically parsed). [less] The treebank "Proceedings of Norwegian parliamentary debates (2008-2015)" is a collection of transcr… [more]	NLOD	no
	nor-stortinget_3	nor-stortinget_3	NorGram, NorGramBank	lfg	252 845	4 924 530	yes	The treebank "Proceedings of Norwegian parliamentary debates (2008-2015)" is a collection of transcriptions of Norwegian parliamentary debates between 2008 and 2015, downloaded from https://data.stortinget.no/. Of the total set of documents, it is ongoing work to preprocess documents (e.g register previously unknown words in the document) and load the preprocessed documents into INESS for automatic parsing; hence, as of June 2016, the size of the treebank is still growing. To see the updated info on treebank size and which documents are included, please choose the relevant treebank, and then click "Treebank Details" (in the left-hand menu). Each sentence has the following metadata which is searchable in the INESS search system: (1) language variety - Norwegian bokmål (nob) or Norwegian nynorsk (nno), based on the automatic recognition of language variety, implemented by Paul Meurer at Uni Research Computing. There are also some transcriptions from speeches in English and Danish. (2) Speaker's name (3) Date and time (4) Political party to which the speaker belongs (5) Type of contribution (e.g. 'hovedinnlegg' [main contribution] or 'replikk' [reply]). AVAILABILITY: The material from 2008- 2015 is searchable via the corpus tool Corpuscle. Via the treebank portal INESS (clarino.uib.no/iness) you can search in sentence analyses from the material (for that set of documents that have currently been preprocessed and the automatically parsed). [less] The treebank "Proceedings of Norwegian parliamentary debates (2008-2015)" is a collection of transcr… [more]	NLOD	no
	nor-stortinget_4	nor-stortinget_4	NorGram, NorGramBank	lfg	246 230	4 944 764	yes	The treebank "Proceedings of Norwegian parliamentary debates (2008-2015)" is a collection of transcriptions of Norwegian parliamentary debates between 2008 and 2015, downloaded from https://data.stortinget.no/. Of the total set of documents, it is ongoing work to preprocess documents (e.g register previously unknown words in the document) and load the preprocessed documents into INESS for automatic parsing; hence, as of June 2016, the size of the treebank is still growing. To see the updated info on treebank size and which documents are included, please choose the relevant treebank, and then click "Treebank Details" (in the left-hand menu). Each sentence has the following metadata which is searchable in the INESS search system: (1) language variety - Norwegian bokmål (nob) or Norwegian nynorsk (nno), based on the automatic recognition of language variety, implemented by Paul Meurer at Uni Research Computing. There are also some transcriptions from speeches in English and Danish. (2) Speaker's name (3) Date and time (4) Political party to which the speaker belongs (5) Type of contribution (e.g. 'hovedinnlegg' [main contribution] or 'replikk' [reply]). AVAILABILITY: The material from 2008- 2015 is searchable via the corpus tool Corpuscle. Via the treebank portal INESS (clarino.uib.no/iness) you can search in sentence analyses from the material (for that set of documents that have currently been preprocessed and the automatically parsed). [less] The treebank "Proceedings of Norwegian parliamentary debates (2008-2015)" is a collection of transcr… [more]	NLOD	no
	Norwegian Bokmål (nob)			13 235 938	181 411 546
	nob-avis	nob-avis	NorGram, NorGramBank	lfg	246 397	3 157 550	yes	The "NorGramBank – Newspaper text (years 2012, 2013) in Norwegian Bokmål from the Norwegian Newspaper Corpus" treebank is a syntactically annotated corpus based on data taken from the years 2012 and 2013 from the Norwegian Newspaper Corpus (NCC). This treebank is part of INESS NorGramBank collection (see URL in metadata). As of October 2015, the treebank comprises 246397 sentences, 3157558 words and 1543 documents. Note that the available treebank contains only those newspaper articles from 2012 and 2013 that have been manually preprocessed; see details otherwheres in the metadata. [less] The "NorGramBank – Newspaper text (years 2012, 2013) in Norwegian Bokmål from the Norwegian Newspape… [more]	CC-BY	no
	nob-child	nob-child	NorGram, NorGramBank	lfg	389 557	4 110 961	yes	The "NorGramBank children’s fiction in Norwegian Bokmål" treebank is a syntactically annotated corpus based on data taken from bokhylla.no at the National Library of Norway. This treebank is part of INESS NorGramBank collection (see URL in metadata). As of October 2015, the treebank comprises 389564 sentences, 4111213 words and 155 documents. The source text was OCR-read by the National Library of Norway; INESS has preprocessed the source text semi-automatically with regard to OCR errors (misinterpreted letters etc) before syntactic parsing. [less] The "NorGramBank children’s fiction in Norwegian Bokmål" treebank is a syntactically annotated corpu… [more]	CLARIN_ACA	no
	nob-fn	nob-fn	NorGram, NorGramBank	lfg	489 341	8 321 494	yes	The "NorGram Non-fiction text in Norwegian Bokmål from Forskning.no" treebank is a syntactically annotated corpus based on data taken from the Norwegian popular science website Forskning.no. This treebank is part of INESS NorGramBank collection (see URL in metadata). As of October 2015, the treebank comprises 489341 sentences, 8321480 words and 13243 documents. [less] The "NorGram Non-fiction text in Norwegian Bokmål from Forskning.no" treebank is a syntactically ann… [more]	CLARIN_RES-DEP	no
	nob-lbk-av	nob-lbk-av	NorGram, NorGramBank	lfg	1 336	18 971	yes	The "NorGramBank Newspaper text in Norwegian Bokmål from the LBK" treebank is a syntactically annotated corpus based on data taken from the Norwegian reference corpus for Norwegian Bokmål, Leksikografisk Bokmålskorpus (LBK). This treebank is part of INESS NorGramBank collection (see URL in metadata). As of October 2015, the treebank comprises 173914 sentences, 2661597 words and 599 documents. [less] The "NorGramBank Newspaper text in Norwegian Bokmål from the LBK" treebank is a syntactically annota… [more]	CLARIN_ACA-NC	no
	nob-lbk-sa	nob-lbk-sa	NorGram, NorGramBank	lfg	189 137	2 911 173	yes	The "NorGramBank non-fiction text in Norwegian Bokmål from the LBK" treebank is a syntactically annotated corpus based on data taken from the Norwegian reference corpus for Norwegian Bokmål, Leksikografisk Bokmålskorpus (LBK). This treebank is part of INESS NorGramBank collection (see URL in metadata). As of October 2015, the treebank comprises 173914 sentences, 2661597 words and 599 documents. [less] The "NorGramBank non-fiction text in Norwegian Bokmål from the LBK" treebank is a syntactically anno… [more]	CLARIN_ACA-NC	no
	nob-lbk-tv	nob-lbk-tv	NorGram, NorGramBank	lfg	18 043	127 844	yes	The "NorGramBank television subtitles in Norwegian Bokmål from LBK" treebank is a syntactically annotated corpus based on data taken from the Norwegian reference corpus for Norwegian Bokmål, Leksikografisk Bokmålskorpus (LBK). This treebank is part of INESS NorGramBank collection (see URL in metadata). As of October 2015, the treebank comprises 18043 sentences, 127844 words and 16 documents. [less] The "NorGramBank television subtitles in Norwegian Bokmål from LBK" treebank is a syntactically anno… [more]	CLARIN_ACA	no
	nob-naob	nob-naob	NAOB, NorGram, NorGramBank	lfg	678 773	9 885 233	yes	The "NorGramBank fiction in Norwegian Bokmål" treebank is a syntactically annotated corpus based on data taken from bokhylla.no at the National Library of Norway. This treebank is part of INESS NorGramBank collection (see URL in metadata). As of October 2015, the treebank comprises 2 469 916 sentences and 26 903 637 words. The source text was OCR-read by the National Library of Norway; INESS has preprocessed the source text semi-automatically with regard to OCR errors (misinterpreted letters etc) before syntactic parsing. [less] The "NorGramBank fiction in Norwegian Bokmål" treebank is a syntactically annotated corpus based on … [more]	CLARIN_ACA	no
	nob-naob_1	nob-naob_1	NAOB, NorGram, NorGramBank	lfg	621 622	9 207 093	yes	The "NorGramBank fiction in Norwegian Bokmål" treebank is a syntactically annotated corpus based on data taken from bokhylla.no at the National Library of Norway. This treebank is part of INESS NorGramBank collection (see URL in metadata). As of October 2015, the treebank comprises 2 469 916 sentences and 26 903 637 words. The source text was OCR-read by the National Library of Norway; INESS has preprocessed the source text semi-automatically with regard to OCR errors (misinterpreted letters etc) before syntactic parsing. [less] The "NorGramBank fiction in Norwegian Bokmål" treebank is a syntactically annotated corpus based on … [more]	CLARIN_ACA	no
	nob-naob_2	nob-naob_2	NAOB, NorGram, NorGramBank	lfg	999 909	15 414 738	yes	The "NorGramBank fiction in Norwegian Bokmål" treebank is a syntactically annotated corpus based on data taken from bokhylla.no at the National Library of Norway. This treebank is part of INESS NorGramBank collection (see URL in metadata). As of October 2015, the treebank comprises 2 469 916 sentences and 26 903 637 words. The source text was OCR-read by the National Library of Norway; INESS has preprocessed the source text semi-automatically with regard to OCR errors (misinterpreted letters etc) before syntactic parsing. [less] The "NorGramBank fiction in Norwegian Bokmål" treebank is a syntactically annotated corpus based on … [more]	CLARIN_ACA	no
	nob-naob_3	nob-naob_3	NAOB, NorGram, NorGramBank	lfg	1 045 847	16 020 711	yes	The "NorGramBank fiction in Norwegian Bokmål" treebank is a syntactically annotated corpus based on data taken from bokhylla.no at the National Library of Norway. This treebank is part of INESS NorGramBank collection (see URL in metadata). As of October 2015, the treebank comprises 2 469 916 sentences and 26 903 637 words. The source text was OCR-read by the National Library of Norway; INESS has preprocessed the source text semi-automatically with regard to OCR errors (misinterpreted letters etc) before syntactic parsing. [less] The "NorGramBank fiction in Norwegian Bokmål" treebank is a syntactically annotated corpus based on … [more]	CLARIN_ACA	no
	nob-naob_4	nob-naob_4	NAOB, NorGram, NorGramBank	lfg	1 077 587	16 643 145	yes	The "NorGramBank fiction in Norwegian Bokmål" treebank is a syntactically annotated corpus based on data taken from bokhylla.no at the National Library of Norway. This treebank is part of INESS NorGramBank collection (see URL in metadata). As of October 2015, the treebank comprises 2 469 916 sentences and 26 903 637 words. The source text was OCR-read by the National Library of Norway; INESS has preprocessed the source text semi-automatically with regard to OCR errors (misinterpreted letters etc) before syntactic parsing. [less] The "NorGramBank fiction in Norwegian Bokmål" treebank is a syntactically annotated corpus based on … [more]	CLARIN_ACA	no
	nob-naob_5	nob-naob_5	NAOB, NorGram, NorGramBank	lfg	953 413	14 093 832	yes	The "NorGramBank fiction in Norwegian Bokmål" treebank is a syntactically annotated corpus based on data taken from bokhylla.no at the National Library of Norway. This treebank is part of INESS NorGramBank collection (see URL in metadata). As of October 2015, the treebank comprises 2 469 916 sentences and 26 903 637 words. The source text was OCR-read by the National Library of Norway; INESS has preprocessed the source text semi-automatically with regard to OCR errors (misinterpreted letters etc) before syntactic parsing. [less] The "NorGramBank fiction in Norwegian Bokmål" treebank is a syntactically annotated corpus based on … [more]	CLARIN_ACA	no
	nob-naob_6	nob-naob_6	NAOB, NorGram, NorGramBank	lfg	942 721	12 090 392	yes	The "NorGramBank fiction in Norwegian Bokmål" treebank is a syntactically annotated corpus based on data taken from bokhylla.no at the National Library of Norway. This treebank is part of INESS NorGramBank collection (see URL in metadata). As of October 2015, the treebank comprises 2 469 916 sentences and 26 903 637 words. The source text was OCR-read by the National Library of Norway; INESS has preprocessed the source text semi-automatically with regard to OCR errors (misinterpreted letters etc) before syntactic parsing. [less] The "NorGramBank fiction in Norwegian Bokmål" treebank is a syntactically annotated corpus based on … [more]	CLARIN_ACA	no
	nob-naob_7	nob-naob_7	NAOB, NorGram, NorGramBank	lfg	1 063 592	14 424 554	yes	The "NorGramBank fiction in Norwegian Bokmål" treebank is a syntactically annotated corpus based on data taken from bokhylla.no at the National Library of Norway. This treebank is part of INESS NorGramBank collection (see URL in metadata). As of October 2015, the treebank comprises 2 469 916 sentences and 26 903 637 words. The source text was OCR-read by the National Library of Norway; INESS has preprocessed the source text semi-automatically with regard to OCR errors (misinterpreted letters etc) before syntactic parsing. [less] The "NorGramBank fiction in Norwegian Bokmål" treebank is a syntactically annotated corpus based on … [more]	CLARIN_ACA	no
	nob-naob_8	nob-naob_8	NAOB, NorGram, NorGramBank	lfg	438 743	6 708 000	yes	The "NorGramBank fiction in Norwegian Bokmål" treebank is a syntactically annotated corpus based on data taken from bokhylla.no at the National Library of Norway. This treebank is part of INESS NorGramBank collection (see URL in metadata). As of October 2015, the treebank comprises 2 469 916 sentences and 26 903 637 words. The source text was OCR-read by the National Library of Norway; INESS has preprocessed the source text semi-automatically with regard to OCR errors (misinterpreted letters etc) before syntactic parsing. [less] The "NorGramBank fiction in Norwegian Bokmål" treebank is a syntactically annotated corpus based on … [more]	CLARIN_ACA	no
	nob-naob_9	nob-naob_9	NAOB, NorGram, NorGramBank	lfg	73 041	1 434 798	yes	The "NorGramBank fiction in Norwegian Bokmål" treebank is a syntactically annotated corpus based on data taken from bokhylla.no at the National Library of Norway. This treebank is part of INESS NorGramBank collection (see URL in metadata). As of October 2015, the treebank comprises 2 469 916 sentences and 26 903 637 words. The source text was OCR-read by the National Library of Norway; INESS has preprocessed the source text semi-automatically with regard to OCR errors (misinterpreted letters etc) before syntactic parsing. [less] The "NorGramBank fiction in Norwegian Bokmål" treebank is a syntactically annotated corpus based on … [more]	CLARIN_ACA	no
	nob-ndt-lfg	nob-ndt-lfg	NDT, NorGram, NorGramBank	lfg	20 045	276 943	yes	The treebank "NorGram NDT in LFG in Norwegian Bokmål (derivate from the Norwegian Dependency Treebank)" is based on the text material in the Norwegian Dependency Treebank (NDT), available from Språkbanken at National Library of Norway. The sentences have been parsed and disambiguated in the Norwegian LFG treebank using the NorGram LFG grammar. [less] The treebank "NorGram NDT in LFG in Norwegian Bokmål (derivate from the Norwegian Dependency Treeban… [more]	CC-BY	no
	nob-novel	nob-novel	NorGram, NorGramBank	lfg	271 366	3 111 321	yes	The "NorGramBank fiction in Norwegian Bokmål" treebank is a syntactically annotated corpus based on data taken from bokhylla.no at the National Library of Norway. This treebank is part of INESS NorGramBank collection (see URL in metadata). As of October 2015, the treebank comprises 2 469 916 sentences and 26 903 637 words. The source text was OCR-read by the National Library of Norway; INESS has preprocessed the source text semi-automatically with regard to OCR errors (misinterpreted letters etc) before syntactic parsing. [less] The "NorGramBank fiction in Norwegian Bokmål" treebank is a syntactically annotated corpus based on … [more]	CLARIN_ACA	no
	nob-novel_1	nob-novel_1	NorGram, NorGramBank	lfg	406 280	4 369 998	yes	The "NorGramBank fiction in Norwegian Bokmål" treebank is a syntactically annotated corpus based on data taken from bokhylla.no at the National Library of Norway. This treebank is part of INESS NorGramBank collection (see URL in metadata). As of October 2015, the treebank comprises 2 469 916 sentences and 26 903 637 words. The source text was OCR-read by the National Library of Norway; INESS has preprocessed the source text semi-automatically with regard to OCR errors (misinterpreted letters etc) before syntactic parsing. [less] The "NorGramBank fiction in Norwegian Bokmål" treebank is a syntactically annotated corpus based on … [more]	CLARIN_ACA	no
	nob-novel_2	nob-novel_2	NorGram, NorGramBank	lfg	498 130	5 529 358	yes	The "NorGramBank fiction in Norwegian Bokmål" treebank is a syntactically annotated corpus based on data taken from bokhylla.no at the National Library of Norway. This treebank is part of INESS NorGramBank collection (see URL in metadata). As of October 2015, the treebank comprises 2 469 916 sentences and 26 903 637 words. The source text was OCR-read by the National Library of Norway; INESS has preprocessed the source text semi-automatically with regard to OCR errors (misinterpreted letters etc) before syntactic parsing. [less] The "NorGramBank fiction in Norwegian Bokmål" treebank is a syntactically annotated corpus based on … [more]	CLARIN_ACA	no
	nob-novel_3	nob-novel_3	NorGram, NorGramBank	lfg	441 721	4 639 538	yes	The "NorGramBank fiction in Norwegian Bokmål" treebank is a syntactically annotated corpus based on data taken from bokhylla.no at the National Library of Norway. This treebank is part of INESS NorGramBank collection (see URL in metadata). As of October 2015, the treebank comprises 2 469 916 sentences and 26 903 637 words. The source text was OCR-read by the National Library of Norway; INESS has preprocessed the source text semi-automatically with regard to OCR errors (misinterpreted letters etc) before syntactic parsing. [less] The "NorGramBank fiction in Norwegian Bokmål" treebank is a syntactically annotated corpus based on … [more]	CLARIN_ACA	no
	nob-novel_4	nob-novel_4	NorGram, NorGramBank	lfg	467 551	5 184 128	yes	The "NorGramBank fiction in Norwegian Bokmål" treebank is a syntactically annotated corpus based on data taken from bokhylla.no at the National Library of Norway. This treebank is part of INESS NorGramBank collection (see URL in metadata). As of October 2015, the treebank comprises 2 469 916 sentences and 26 903 637 words. The source text was OCR-read by the National Library of Norway; INESS has preprocessed the source text semi-automatically with regard to OCR errors (misinterpreted letters etc) before syntactic parsing. [less] The "NorGramBank fiction in Norwegian Bokmål" treebank is a syntactically annotated corpus based on … [more]	CLARIN_ACA	no
	nob-novel_5	nob-novel_5	NorGram, NorGramBank	lfg	443 891	4 817 445	yes	The "NorGramBank fiction in Norwegian Bokmål" treebank is a syntactically annotated corpus based on data taken from bokhylla.no at the National Library of Norway. This treebank is part of INESS NorGramBank collection (see URL in metadata). As of October 2015, the treebank comprises 2 469 916 sentences and 26 903 637 words. The source text was OCR-read by the National Library of Norway; INESS has preprocessed the source text semi-automatically with regard to OCR errors (misinterpreted letters etc) before syntactic parsing. [less] The "NorGramBank fiction in Norwegian Bokmål" treebank is a syntactically annotated corpus based on … [more]	CLARIN_ACA	no
	nob-novel_6	nob-novel_6	NorGram, NorGramBank	lfg	395 700	5 121 558	yes	The "NorGramBank fiction in Norwegian Bokmål" treebank is a syntactically annotated corpus based on data taken from bokhylla.no at the National Library of Norway. This treebank is part of INESS NorGramBank collection (see URL in metadata). As of October 2015, the treebank comprises 2 469 916 sentences and 26 903 637 words. The source text was OCR-read by the National Library of Norway; INESS has preprocessed the source text semi-automatically with regard to OCR errors (misinterpreted letters etc) before syntactic parsing. [less] The "NorGramBank fiction in Norwegian Bokmål" treebank is a syntactically annotated corpus based on … [more]	CLARIN_ACA	no
	nob-novel_7	nob-novel_7	NorGram, NorGramBank	lfg	221 444	3 230 446	yes	The "NorGramBank fiction in Norwegian Bokmål" treebank is a syntactically annotated corpus based on data taken from bokhylla.no at the National Library of Norway. This treebank is part of INESS NorGramBank collection (see URL in metadata). As of October 2015, the treebank comprises 2 469 916 sentences and 26 903 637 words. The source text was OCR-read by the National Library of Norway; INESS has preprocessed the source text semi-automatically with regard to OCR errors (misinterpreted letters etc) before syntactic parsing. [less] The "NorGramBank fiction in Norwegian Bokmål" treebank is a syntactically annotated corpus based on … [more]	CLARIN_ACA	no
	nob-novel_8	nob-novel_8	NAOB, NorGram, NorGramBank	lfg	570 543	7 201 100	yes	The "NorGramBank fiction in Norwegian Bokmål" treebank is a syntactically annotated corpus based on data taken from bokhylla.no at the National Library of Norway. This treebank is part of INESS NorGramBank collection (see URL in metadata). As of October 2015, the treebank comprises 2 469 916 sentences and 26 903 637 words. The source text was OCR-read by the National Library of Norway; INESS has preprocessed the source text semi-automatically with regard to OCR errors (misinterpreted letters etc) before syntactic parsing. [less] The "NorGramBank fiction in Norwegian Bokmål" treebank is a syntactically annotated corpus based on … [more]	CLARIN_ACA	no
	nob-novel_9	nob-novel_9	NAOB, NorGram, NorGramBank	lfg	265 790	3 298 570	yes	The "NorGramBank fiction in Norwegian Bokmål" treebank is a syntactically annotated corpus based on data taken from bokhylla.no at the National Library of Norway. This treebank is part of INESS NorGramBank collection (see URL in metadata). As of October 2015, the treebank comprises 2 469 916 sentences and 26 903 637 words. The source text was OCR-read by the National Library of Norway; INESS has preprocessed the source text semi-automatically with regard to OCR errors (misinterpreted letters etc) before syntactic parsing. [less] The "NorGramBank fiction in Norwegian Bokmål" treebank is a syntactically annotated corpus based on … [more]	CLARIN_ACA	no
	nob-nrk	nob-nrk	NorGram, NorGramBank	lfg	3 267	45 428	yes	The «Corona texts from NRK» treebank is a syntactically annotated corpus. It is based on data transcribed from the two newscasts Dagsrevyen and Supernytt produced by the Norwegian Broadcasting Corporation (NRK). [less] The «Corona texts from NRK» treebank is a syntactically annotated corpus. It is based on data transc… [more]	CC-BY	no
	nob-sofie (aligned)	nob-sofie (aligned)	NorGram, NorGramBank	lfg	1 151	15 224	yes	The INESS Sofie Norwegian Treebank. The treebank is a syntactically annotated corpus based on the first chapters of the novel “Sofies verden” by Jostein Gaarder, published by Aschehoug forlag. The sentence-analyses are produced by INESS for the META-NORD project, whose goal was to promote the accessability of existing treebanks for the languages in the project. The corpus is automatically analyzed with the NorGram LFG grammar and all analyses are manually verified. [less] The INESS Sofie Norwegian Treebank. The treebank is a syntactically annotated corpus based on the … [more]	unspecified	no
	Norwegian Nynorsk (nno)			532 319	6 045 318
	nno-child	nno-child	NorGram, NorGramBank	lfg	106 447	1 043 278	yes	The treebank "NorGrambank children's fiction in Norwegian Nynorsk" is a syntactically annotated corpus based on data taken from bokhylla.no at the National Library of Norway. This treebank is part of INESS NorGramBank collection (see URL in metadata). As of October 2015, the treebank comprises 106434 sentences, 1043260 words, 76 documents. The source text was OCR-read by the National Library of Norway; INESS has preprocessed the source text semi-automatically with regard to OCR errors (misinterpreted letters etc) before syntactic parsing. [less] The treebank "NorGrambank children's fiction in Norwegian Nynorsk" is a syntactically annotated corp… [more]	CLARIN_ACA	no
	nno-ndt-lfg	nno-ndt-lfg	NDT, NorGram, NorGramBank	lfg	17 579	272 023	yes	The treebank "NorGram NDT in LFG in Norwegian Nynorsk (derivate from Norwegian Dependency Treebank)" is based on the text material in the Norwegian Dependency Treebank (NDT), available from Språkbanken at National Library of Norway. The sentences have been parsed and disambiguated in the Norwegian LFG treebank using the NorGram LFG grammar. [less] The treebank "NorGram NDT in LFG in Norwegian Nynorsk (derivate from Norwegian Dependency Treebank)"… [more]	CC-BY	no
	nno-nnk-av	nno-nnk-av	NorGram, NorGramBank	lfg	7 847	123 436	yes	The treebank "NorGramBank annotations of Newspaper text from 'Nynorskkorpuset ved Norsk Ordbok 2014'" is a syntactically annotated corpus which uses text extracts from Nynorskkorpuset ved Norsk Ordbok 2014 (no2014.uio.no). This treebank is part of INESS NorGramBank collection (see URL in metadata). [less] The treebank "NorGramBank annotations of Newspaper text from 'Nynorskkorpuset ved Norsk Ordbok 2014'… [more]	CLARIN_ACA-DEP	no
	nno-nnk-sa	nno-nnk-sa	NorGram, NorGramBank	lfg	38 332	623 281	yes	The treebank "Annotations of non-fiction text from 'Nynorskkorpuset ved Norsk Ordbok 2014'" is a syntactically annotated corpus which uses text extracts from Nynorskkorpuset ved Norsk Ordbok 2014 (no2014.uio.no). This treebank is part of INESS NorGramBank collection (see URL in metadata). [less] The treebank "Annotations of non-fiction text from 'Nynorskkorpuset ved Norsk Ordbok 2014'" is a syn… [more]	CLARIN_ACA-DEP	no
	nno-nnk-sk	nno-nnk-sk	NorGram, NorGramBank	lfg	94 409	969 308	yes	The treebank "Annotations of fiction text from 'Nynorskkorpuset ved Norsk Ordbok 2014' is a syntactically annotated corpus which uses text extracts from Nynorskkorpuset ved Norsk Ordbok 2014 (no2014.uio.no). This treebank is part of INESS NorGramBank collection (see URL in metadata). [less] The treebank "Annotations of fiction text from 'Nynorskkorpuset ved Norsk Ordbok 2014' is a syntacti… [more]	CLARIN_ACA-DEP	no
	nno-novel	nno-novel	NorGram, NorGramBank	lfg	267 705	3 013 992	yes	The "NorGramBank fiction in Norwegian Nynorsk" treebank is a syntactically annotated corpus based on data taken from bokhylla.no at the National Library of Norway. This treebank is part of INESS NorGramBank collection (see URL in metadata). As of October 2015, the treebank comprises 260285 sentences, 2884376 words and 91 documents. The source text was OCR-read by the National Library of Norway; INESS has preprocessed the source text semi-automatically with regard to OCR errors (misinterpreted letters etc) before syntactic parsing. [less] The "NorGramBank fiction in Norwegian Nynorsk" treebank is a syntactically annotated corpus based on… [more]	CLARIN_ACA	no

Design & implementation: Paul Meurer, CLARINO Bergen Centre, 2024 · Accessibility statement (in Norwegian only)