LVTB Search
Latvian Treebank
Latvian Treebank is a balanced manually and syntactically annotated text corpus. It is a part of The Balanced Corpus of Modern Latvian and employs a hybrid dependency-constituency model.
Publication to be cited:
L. Rituma,
B. Saulite,
G. Nespore-Berzkalne
Latviešu valodas sintaktiski marķētā korpusa gramatikas modelis
Language: Meaning and Form (The grammar model of Latvian Treebank), 10, 200-216, 2019
PDF DOI
Latviešu valodas sintaktiski marķētā korpusa gramatikas modelis
Language: Meaning and Form (The grammar model of Latvian Treebank), 10, 200-216, 2019
PDF DOI
Corpus size | 16803 sentences (282 167 tokens) |
Development period | 2010-2022 |
Developers | Institute of Mathematics and Computer Science UL |
Funding | European Regional Development Fund, "Full Stack of Language Resources for Natural Language Understanding and Generation in Latvian" (1.1.1.1/16/A/219), National Research Programme "National identity"; State Research Programme "Digital Resources of the Humanities" (VPP-IZM-DH-2020/1-0001); State Research Programme "Research on Modern Latvian Language and Development of Language Technology" (VPP-LETONIKA-2021/1-0006) |
Homepage | http://sintakse.korpuss.lv/ |
CLARIN | http://hdl.handle.net/20.500.12574/56 |
Other publications |
L. Pretkalnina,
L. Rituma,
B. Saulite
Deriving Enhanced Universal Dependencies from a Hybrid Dependency-Constituency Treebank Springer, 2018 PDF DOI
N. Gruzitis,
L. Pretkalnina,
B. Saulite,
L. Rituma,
G. Nespore-Berzkalne,
A. Znotins,
P. Paikens
Creation of a Balanced State-of-the-Art Multilayer Corpus for NLU 2018 |