LVTB  Search

Latvian Treebank

Latvian Treebank is a balanced manually and syntactically annotated text corpus. It is a part of The Balanced Corpus of Modern Latvian and employs a hybrid dependency-constituency model.

Publication to be cited:
L. Rituma, B. Saulite, G. Nespore-Berzkalne
Latviešu valodas sintaktiski marķētā korpusa gramatikas modelis
Language: Meaning and Form (The grammar model of Latvian Treebank), 10, 200-216, 2019
PDF DOI
Corpus size 16803 sentences (282 167 tokens)
Development period 2010-2022
Developers Institute of Mathematics and Computer Science UL
Funding European Regional Development Fund, "Full Stack of Language Resources for Natural Language Understanding and Generation in Latvian" (1.1.1.1/16/A/219), National Research Programme "National identity"; State Research Programme "Digital Resources of the Humanities" (VPP-IZM-DH-2020/1-0001); State Research Programme "Research on Modern Latvian Language and Development of Language Technology" (VPP-LETONIKA-2021/1-0006)
Homepage http://sintakse.korpuss.lv/
CLARIN http://hdl.handle.net/20.500.12574/56
Other publications
L. Pretkalnina, L. Rituma, B. Saulite
Deriving Enhanced Universal Dependencies from a Hybrid Dependency-Constituency Treebank
Springer, 2018
PDF DOI
N. Gruzitis, L. Pretkalnina, B. Saulite, L. Rituma, G. Nespore-Berzkalne, A. Znotins, P. Paikens
Creation of a Balanced State-of-the-Art Multilayer Corpus for NLU
2018
PDF
L. Pretkalnina and L. Rituma
Constructions in Latvian Treebank: the impact of annotation decisions on the dependency parsing performance
IOS Press, 2014
PDF DOI
L. Pretkalnina and L. Rituma
Syntactic issues identified developing the Latvian treebank
IOS Press, 2012
PDF DOI