UDLV-LVTB  Search Word Frequency List

Latvian UD Treebank

The corpus ir annotated using UD dependency grammar. The data is converted form the manually annotated Latvian Treebank.

Publication to be cited:
L. Pretkalnina, L. Rituma, B. Saulite
Deriving Enhanced Universal Dependencies from a Hybrid Dependency-Constituency Treebank
Springer, 2018
Corpus size 16803 sentences (282 671 tokens)
Development period 2015-2022
Developers Institute of Mathematics and Computer Science UL
Funding European Regional Development Fund, "Full Stack of Language Resources for Natural Language Understanding and Generation in Latvian" (; PostDoc grant No.; State Research Programme "Digital Resources of the Humanities" (VPP-IZM-DH-2020/1-0001); State Research Programme "Research on Modern Latvian Language and Development of Language Technology" (VPP-LETONIKA-2021/1-0006)
Homepage http://sintakse.korpuss.lv/
CLARIN http://hdl.handle.net/11234/1-4611
Other publications
N. Gruzitis, L. Pretkalnina, B. Saulite, L. Rituma, G. Nespore-Berzkalne, A. Znotins, P. Paikens
Creation of a Balanced State-of-the-Art Multilayer Corpus for NLU
L. Pretkalnina, L. Rituma, B. Saulite
Universal Dependency treebank for Latvian: A pilot
IOS Press, 2016