ართული ნის როვნული ორპუსი
The Georgian National Corpus
The Georgian National Corpus

The Georgian National Corpus is a comprehensive corpus of the Georgian language covering all stages of its historical development.

The corpus, which is still under development, contains subcorpora of Old, Middle and Modern Georgian (GNC Old Georgian, GNC Middle Georgian, GNC Modern Georgian), plus two subcorpora of transcribed recordings of spoken language (the Georgian Dialect Corpus, GDC, and the corpus of the project on the Sociolinguistic Situation of Present-Day Georgia, SSGG). Corpora of Mingrelian and Svan texts are under construction as well.

A large Georgian reference corpus (GRC) is included that contains less thoroughly processed texts from various fictional and non-fictional domains.

The Georgian texts (within GNC and GRC) are fully grammatically annotated (lemma forms and morphosyntactic features), and all texts in the GNC subcorpora have comprehensive metadata.

Getting started

You can find a gentle introduction into working with the GNC on the page Using the corpus. More detailed information can be found on the Documentation page.


Design & implementation: Paul Meurer, Uni Research Computing, 2017 | Copyright (C) GNC Project 2011 – 2017