text (30)
speech (9)
general (11)
specialised (28)
morphology (33)
syntax (3)
semantics (1)
error annotation (2)
manually annotated (6)
diachronic (6)
web (2)
learner (2)
literary (4)
parallel (1)
parliamentary (1)
historical (2)
newspapers (5)
representative (9)
latgalian (3)
blog (2)
Order by:
Tīmeklis2020
CommonCrawl of Latvian 2020
2013–2022, 403.6M words (492.6M tokens)
Developers: IMCS UL
BalsuTalka
Balsutalka.lv Speech Corpus (Common Voice 17.0)
2023–2024, 277 hours (1.3M tokens)
Developers: IMCS, UL, ILFA UL, LATA
LATE-sarunas
LATE-conversational // LATE-conversations
2012–2024, 35 hours (347 000 tekstvienību)
Developers: IMCS, UL, ILFA UL
Pārspriedumi
Corpus of Students' Essays
2018, 185k words (226k tokens)
Developers: IMCS UL, LiepU, RAT
BolsuTolka
Bolsutolka.lv Speech Corpus (Common Voice 17.0)
2023–2024, 24 hours (130k tokens)
Developers: RATA, IMCS, UL, ILFA UL, LATA
FullStack-LV
Full Stack of Latvian Language Resources
1991–2018, 13691 sentences
Developers: IMCS UL
B. Saulīte, R. Darģis, N. Grūzītis, I. Auziņa, K. Levāne-Petrova, L. Pretkalniņa, L. Rituma, P. Paikens, A. Znotiņš, L. Strankale, K. Pokratniece, I. Poikāns, G. Bārzdiņš, I. Skadiņa, A. Baklāne, V. Saulespurēns, J. Ziediņš.
Latvian National Corpora Collection – Korpuss.lv
Proceedings of the 13th Language Resources and Evaluation Conference (LREC), 2022, pp. 5123–5129
Latvian National Corpora Collection – Korpuss.lv
Proceedings of the 13th Language Resources and Evaluation Conference (LREC), 2022, pp. 5123–5129