LVTB  Search

Latvian Treebank

Latvian Treebank is a balanced manually and syntactically annotated text corpus. It employs a hybrid dependency-constituency model.

Citation
Publication
L. Rituma, B. Saulite, G. Nespore-Berzkalne
Latviešu valodas sintaktiski marķētā korpusa gramatikas modelis
Language: Meaning and Form (The grammar model of Latvian Treebank), 10, 200-216, 2019
Data
L. Rituma, L. Pretkalniņa, B. Saulīte, G. Nešpore-Bērzkalne, N. Grūzītis
Latvian Treebank (LVTB)
CLARIN-LV digital library, 2023
http://hdl.handle.net/20.500.12574/91
Corpus size 18295 sentences (300K tokens) (v2.13)
Development period 2010–2023
Developers Institute of Mathematics and Computer Science UL
Funding European Regional Development Fund, "Full Stack of Language Resources for Natural Language Understanding and Generation in Latvian" (1.1.1.1/16/A/219), National Research Programme "National identity"; State Research Programme "Digital Resources of the Humanities" (VPP-IZM-DH-2020/1-0001); State Research Programme "Research on Modern Latvian Language and Development of Language Technology" (VPP-LETONIKA-2021/1-0006)
Homepage http://sintakse.korpuss.lv/
CLARIN http://hdl.handle.net/20.500.12574/91
Other publications
L. Pretkalnina, L. Rituma, B. Saulite
Deriving Enhanced Universal Dependencies from a Hybrid Dependency-Constituency Treebank
Springer, 2018
N. Gruzitis, L. Pretkalnina, B. Saulite, L. Rituma, G. Nespore-Berzkalne, A. Znotins, P. Paikens
Creation of a Balanced State-of-the-Art Multilayer Corpus for NLU
2018
PDF
L. Pretkalnina and L. Rituma
Constructions in Latvian Treebank: the impact of annotation decisions on the dependency parsing performance
IOS Press, 2014
L. Pretkalnina and L. Rituma
Syntactic issues identified developing the Latvian treebank
IOS Press, 2012