LATE-mediji Search Word Frequency List
LATE-media
Corpus includes audio recordings of media broadcasts and their transcripts in orthographic transcription. The data are written down in the orthography of Standard Latvian, observing also the principles of punctuation.
Corpus size | 50 hours (433 000 tokens) |
Data period | 2015–2020 |
Development period | 2021–... |
Developers | Institute of Mathematics and Computer Science UL |
Funding | State Research Programme "Letonika – Fostering a Latvian and European Society" (VPP-LETONIKA-2021/1-0006) |