The Lassy Small Corpus 1.1 is a 1 million word corpus with manually verified syntactic annotations. The lemma and postag annotations have been automatically assigned using Tadpole. The syntactic dependency annotations have been assigned using the Alpino parser. The automatically assigned lemmas, postags and syntactic dependency annotations were checked and corrected. Organisations involved in the building of the Lassy Large Corpus: Alfa-informatica, University of Groningen; CCL, K.U. Leuven. ACCESS: The INESS copy can be used by all employees and students of University of Bergen, Dep. of Linguistic, Literary and Aesthetic studies. Others need to apply to the rights holders of the original first. The Lassy version at INESS may be used for academic purposes under the following conditions: attribution required, no derivatives, no redistribution, non-commercial. [less]
The Lassy Small Corpus 1.1 is a 1 million word corpus with manually verified syntactic annotations. … [more]
The ParGram collection is a collection of parallel treebanks covering a set of chosen syntactic constructions. The ParGram collection is a collaborative effort of the ParGram project, along with the ParSem project, by researcher groups in industrial and academic institutions around the world. The aim of ParGram is to produce wide coverage grammars for a variety of languages. These are written collaboratively within the linguistic framework of LFG (Lexical Functional Grammar) and with a commonly-agreed-upon set of grammatical features. The XLE (Xerox Linguistic Environment) is used as a development platform. ParSem develops semantic structures based on the ParGram syntactic structures. Most of the ParSem systems use the XLE’s XFR system. Regular semiannual meetings are being held to bring together the various research groups involved in ParGram and ParSem. [less]
The ParGram collection is a collection of parallel treebanks covering a set of chosen syntactic cons… [more]
The ParGram collection is a collection of parallel treebanks covering a set of chosen syntactic constructions. The ParGram collection is a collaborative effort of the ParGram project, along with the ParSem project, by researcher groups in industrial and academic institutions around the world. The aim of ParGram is to produce wide coverage grammars for a variety of languages. These are written collaboratively within the linguistic framework of LFG (Lexical Functional Grammar) and with a commonly-agreed-upon set of grammatical features. The XLE (Xerox Linguistic Environment) is used as a development platform. ParSem develops semantic structures based on the ParGram syntactic structures. Most of the ParSem systems use the XLE’s XFR system. Regular semiannual meetings are being held to bring together the various research groups involved in ParGram and ParSem. [less]
The ParGram collection is a collection of parallel treebanks covering a set of chosen syntactic cons… [more]
The Georgian part of the META-NORD Sofie Parallel Treebank. This is a syntactically annotated parallel corpus based on the first chapters of the novel “Sofies verden” (Sophie's World) by Jostein Gaarder, published by Aschehoug forlag. The treebank consists of grammatical annotations of extracts from the Georgian translation of the novel. The Georgian translation is published by Bakur Sulakauri Publishing. [less]
The Georgian part of the META-NORD Sofie Parallel Treebank. This is a syntactically annotated parall… [more]
The ParGram collection is a collection of parallel treebanks covering a set of chosen syntactic constructions. The ParGram collection is a collaborative effort of the ParGram project, along with the ParSem project, by researcher groups in industrial and academic institutions around the world. The aim of ParGram is to produce wide coverage grammars for a variety of languages. These are written collaboratively within the linguistic framework of LFG (Lexical Functional Grammar) and with a commonly-agreed-upon set of grammatical features. The XLE (Xerox Linguistic Environment) is used as a development platform. ParSem develops semantic structures based on the ParGram syntactic structures. Most of the ParSem systems use the XLE’s XFR system. Regular semiannual meetings are being held to bring together the various research groups involved in ParGram and ParSem. [less]
The ParGram collection is a collection of parallel treebanks covering a set of chosen syntactic cons… [more]
The treebank "Proceedings of Norwegian parliamentary debates (2008-2015)" is a collection of transcriptions of Norwegian parliamentary debates between 2008 and 2015, downloaded from https://data.stortinget.no/. Of the total set of documents, it is ongoing work to preprocess documents (e.g register previously unknown words in the document) and load the preprocessed documents into INESS for automatic parsing; hence, as of June 2016, the size of the treebank is still growing. To see the updated info on treebank size and which documents are included, please choose the relevant treebank, and then click "Treebank Details" (in the left-hand menu). Each sentence has the following metadata which is searchable in the INESS search system: (1) language variety - Norwegian bokmål (nob) or Norwegian nynorsk (nno), based on the automatic recognition of language variety, implemented by Paul Meurer at Uni Research Computing. There are also some transcriptions from speeches in English and Danish. (2) Speaker's name (3) Date and time (4) Political party to which the speaker belongs (5) Type of contribution (e.g. 'hovedinnlegg' [main contribution] or 'replikk' [reply]). AVAILABILITY: The material from 2008- 2015 is searchable via the corpus tool Corpuscle. Via the treebank portal INESS (clarino.uib.no/iness) you can search in sentence analyses from the material (for that set of documents that have currently been preprocessed and the automatically parsed). [less]
The treebank "Proceedings of Norwegian parliamentary debates (2008-2015)" is a collection of transcr… [more]
The treebank "Proceedings of Norwegian parliamentary debates (2008-2015)" is a collection of transcriptions of Norwegian parliamentary debates between 2008 and 2015, downloaded from https://data.stortinget.no/. Of the total set of documents, it is ongoing work to preprocess documents (e.g register previously unknown words in the document) and load the preprocessed documents into INESS for automatic parsing; hence, as of June 2016, the size of the treebank is still growing. To see the updated info on treebank size and which documents are included, please choose the relevant treebank, and then click "Treebank Details" (in the left-hand menu). Each sentence has the following metadata which is searchable in the INESS search system: (1) language variety - Norwegian bokmål (nob) or Norwegian nynorsk (nno), based on the automatic recognition of language variety, implemented by Paul Meurer at Uni Research Computing. There are also some transcriptions from speeches in English and Danish. (2) Speaker's name (3) Date and time (4) Political party to which the speaker belongs (5) Type of contribution (e.g. 'hovedinnlegg' [main contribution] or 'replikk' [reply]). AVAILABILITY: The material from 2008- 2015 is searchable via the corpus tool Corpuscle. Via the treebank portal INESS (clarino.uib.no/iness) you can search in sentence analyses from the material (for that set of documents that have currently been preprocessed and the automatically parsed). [less]
The treebank "Proceedings of Norwegian parliamentary debates (2008-2015)" is a collection of transcr… [more]
The treebank "Proceedings of Norwegian parliamentary debates (2008-2015)" is a collection of transcriptions of Norwegian parliamentary debates between 2008 and 2015, downloaded from https://data.stortinget.no/. Of the total set of documents, it is ongoing work to preprocess documents (e.g register previously unknown words in the document) and load the preprocessed documents into INESS for automatic parsing; hence, as of June 2016, the size of the treebank is still growing. To see the updated info on treebank size and which documents are included, please choose the relevant treebank, and then click "Treebank Details" (in the left-hand menu). Each sentence has the following metadata which is searchable in the INESS search system: (1) language variety - Norwegian bokmål (nob) or Norwegian nynorsk (nno), based on the automatic recognition of language variety, implemented by Paul Meurer at Uni Research Computing. There are also some transcriptions from speeches in English and Danish. (2) Speaker's name (3) Date and time (4) Political party to which the speaker belongs (5) Type of contribution (e.g. 'hovedinnlegg' [main contribution] or 'replikk' [reply]). AVAILABILITY: The material from 2008- 2015 is searchable via the corpus tool Corpuscle. Via the treebank portal INESS (clarino.uib.no/iness) you can search in sentence analyses from the material (for that set of documents that have currently been preprocessed and the automatically parsed). [less]
The treebank "Proceedings of Norwegian parliamentary debates (2008-2015)" is a collection of transcr… [more]
The treebank "Proceedings of Norwegian parliamentary debates (2008-2015)" is a collection of transcriptions of Norwegian parliamentary debates between 2008 and 2015, downloaded from https://data.stortinget.no/. Of the total set of documents, it is ongoing work to preprocess documents (e.g register previously unknown words in the document) and load the preprocessed documents into INESS for automatic parsing; hence, as of June 2016, the size of the treebank is still growing. To see the updated info on treebank size and which documents are included, please choose the relevant treebank, and then click "Treebank Details" (in the left-hand menu). Each sentence has the following metadata which is searchable in the INESS search system: (1) language variety - Norwegian bokmål (nob) or Norwegian nynorsk (nno), based on the automatic recognition of language variety, implemented by Paul Meurer at Uni Research Computing. There are also some transcriptions from speeches in English and Danish. (2) Speaker's name (3) Date and time (4) Political party to which the speaker belongs (5) Type of contribution (e.g. 'hovedinnlegg' [main contribution] or 'replikk' [reply]). AVAILABILITY: The material from 2008- 2015 is searchable via the corpus tool Corpuscle. Via the treebank portal INESS (clarino.uib.no/iness) you can search in sentence analyses from the material (for that set of documents that have currently been preprocessed and the automatically parsed). [less]
The treebank "Proceedings of Norwegian parliamentary debates (2008-2015)" is a collection of transcr… [more]
The treebank "Proceedings of Norwegian parliamentary debates (2008-2015)" is a collection of transcriptions of Norwegian parliamentary debates between 2008 and 2015, downloaded from https://data.stortinget.no/. Of the total set of documents, it is ongoing work to preprocess documents (e.g register previously unknown words in the document) and load the preprocessed documents into INESS for automatic parsing; hence, as of June 2016, the size of the treebank is still growing. To see the updated info on treebank size and which documents are included, please choose the relevant treebank, and then click "Treebank Details" (in the left-hand menu). Each sentence has the following metadata which is searchable in the INESS search system: (1) language variety - Norwegian bokmål (nob) or Norwegian nynorsk (nno), based on the automatic recognition of language variety, implemented by Paul Meurer at Uni Research Computing. There are also some transcriptions from speeches in English and Danish. (2) Speaker's name (3) Date and time (4) Political party to which the speaker belongs (5) Type of contribution (e.g. 'hovedinnlegg' [main contribution] or 'replikk' [reply]). AVAILABILITY: The material from 2008- 2015 is searchable via the corpus tool Corpuscle. Via the treebank portal INESS (clarino.uib.no/iness) you can search in sentence analyses from the material (for that set of documents that have currently been preprocessed and the automatically parsed). [less]
The treebank "Proceedings of Norwegian parliamentary debates (2008-2015)" is a collection of transcr… [more]
The treebank "NorGrambank children's fiction in Norwegian Nynorsk" is a syntactically annotated corpus based on data taken from bokhylla.no at the National Library of Norway. This treebank is part of INESS NorGramBank collection (see URL in metadata). As of October 2015, the treebank comprises 106434 sentences, 1043260 words, 76 documents. The source text was OCR-read by the National Library of Norway; INESS has preprocessed the source text semi-automatically with regard to OCR errors (misinterpreted letters etc) before syntactic parsing. [less]
The treebank "NorGrambank children's fiction in Norwegian Nynorsk" is a syntactically annotated corp… [more]
The "NorGram Non-fiction text in Norwegian Nynorsk from Forskning.no" treebank is a syntactically annotated corpus based on data taken from the Norwegian popular science website Forskning.no. This treebank is part of INESS NorGramBank collection (see URL in metadata). As of October 2015, the treebank comprises 21723 sentences, 371744 words and 582 documents. [less]
The "NorGram Non-fiction text in Norwegian Nynorsk from Forskning.no" treebank is a syntactically an… [more]
The treebank "NorGram NDT in LFG in Norwegian Nynorsk (derivate from Norwegian Dependency Treebank)" is based on the text material in the Norwegian Dependency Treebank (NDT), available from Språkbanken at National Library of Norway. The sentences have been parsed and disambiguated in the Norwegian LFG treebank using the NorGram LFG grammar. [less]
The treebank "NorGram NDT in LFG in Norwegian Nynorsk (derivate from Norwegian Dependency Treebank)"… [more]
The treebank "NorGramBank annotations of Newspaper text from 'Nynorskkorpuset ved Norsk Ordbok 2014'" is a syntactically annotated corpus which uses text extracts from Nynorskkorpuset ved Norsk Ordbok 2014 (no2014.uio.no). This treebank is part of INESS NorGramBank collection (see URL in metadata). [less]
The treebank "NorGramBank annotations of Newspaper text from 'Nynorskkorpuset ved Norsk Ordbok 2014'… [more]
The treebank "Annotations of non-fiction text from 'Nynorskkorpuset ved Norsk Ordbok 2014'" is a syntactically annotated corpus which uses text extracts from Nynorskkorpuset ved Norsk Ordbok 2014 (no2014.uio.no). This treebank is part of INESS NorGramBank collection (see URL in metadata). [less]
The treebank "Annotations of non-fiction text from 'Nynorskkorpuset ved Norsk Ordbok 2014'" is a syn… [more]
The treebank "Annotations of fiction text from 'Nynorskkorpuset ved Norsk Ordbok 2014' is a syntactically annotated corpus which uses text extracts from Nynorskkorpuset ved Norsk Ordbok 2014 (no2014.uio.no). This treebank is part of INESS NorGramBank collection (see URL in metadata). [less]
The treebank "Annotations of fiction text from 'Nynorskkorpuset ved Norsk Ordbok 2014' is a syntacti… [more]
The "NorGramBank fiction in Norwegian Nynorsk" treebank is a syntactically annotated corpus based on data taken from bokhylla.no at the National Library of Norway. This treebank is part of INESS NorGramBank collection (see URL in metadata). As of October 2015, the treebank comprises 260285 sentences, 2884376 words and 91 documents. The source text was OCR-read by the National Library of Norway; INESS has preprocessed the source text semi-automatically with regard to OCR errors (misinterpreted letters etc) before syntactic parsing. [less]
The "NorGramBank fiction in Norwegian Nynorsk" treebank is a syntactically annotated corpus based on… [more]
The ParGram collection is a collection of parallel treebanks covering a set of chosen syntactic constructions. The ParGram collection is a collaborative effort of the ParGram project, along with the ParSem project, by researcher groups in industrial and academic institutions around the world. The aim of ParGram is to produce wide coverage grammars for a variety of languages. These are written collaboratively within the linguistic framework of LFG (Lexical Functional Grammar) and with a commonly-agreed-upon set of grammatical features. The XLE (Xerox Linguistic Environment) is used as a development platform. ParSem develops semantic structures based on the ParGram syntactic structures. Most of the ParSem systems use the XLE’s XFR system. Regular semiannual meetings are being held to bring together the various research groups involved in ParGram and ParSem. [less]
The ParGram collection is a collection of parallel treebanks covering a set of chosen syntactic cons… [more]
The ParGram collection is a collection of parallel treebanks covering a set of chosen syntactic constructions. The ParGram collection is a collaborative effort of the ParGram project, along with the ParSem project, by researcher groups in industrial and academic institutions around the world. The aim of ParGram is to produce wide coverage grammars for a variety of languages. These are written collaboratively within the linguistic framework of LFG (Lexical Functional Grammar) and with a commonly-agreed-upon set of grammatical features. The XLE (Xerox Linguistic Environment) is used as a development platform. ParSem develops semantic structures based on the ParGram syntactic structures. Most of the ParSem systems use the XLE’s XFR system. Regular semiannual meetings are being held to bring together the various research groups involved in ParGram and ParSem. [less]
The ParGram collection is a collection of parallel treebanks covering a set of chosen syntactic cons… [more]