LaVA  Search Word Frequency List

Latvian Language Learner Corpus

The corpus includes more than 1000 texts created by foreign Latvian language learners studying at Latvian higher education institutions for the first or second semester. The morphologically annotated texts have been checked manually; the language learners' errors have been manually annotated.

Publication to be cited:
R. Dargis, I. Auzina, K. Levane-Petrova, I. Kaija
Quality Focused Approach to a Learner Corpus Development
2020
PDF
Corpus size 192k words (241k tokens)
Development period 2018–2021
Developers Institute of Mathematics and Computer Science UL
Funding Latvian Council of Science, "Development of Learner corpus of Latvian: methods, tools and applications" (lzp-2018/1-0527)
Homepage http://lava.korpuss.lv/lv/
CLARIN http://hdl.handle.net/20.500.12574/49
Other publications
I. Kaija and I. Auzina
Data collection for learner corpus of Latvian: copyright and personal data protection
Selected papers from the CLARIN Annual Conference 2019, 41-47, 2020
PDF DOI
I. Auzina, I. Kaija, K. Levane-Petrova
Mērķhipotēžu izvirzīšana latviešu valodas apguvēju korpusā
Valoda: nozīme un forma, 11, 7-26, 2020
PDF DOI
K. Levane-Petrova, I. Auzina, K. Pokratniece
Latviešu valodas apguvēju korpusa datu ieguves un apstrādes metodoloģijas izstrāde
LiePA, 2020
PDF