Tīmeklis2007  Search Word Frequency List

Latvian Web Corpus 2007

The Latvian Web Corpus 2007 contains 700,000 Latvian webpages published before 2005. The corpus is automatically annotated.

Citation
Publication
J. Dzerins and K. Dzonsons
Harvesting national language text corpora from the Web
2007
Data
J. Džeriņš, K. Džonsons
Latvian Web Corpus 2007 (Tīmeklis2007)
CLARIN-LV digital library, 2007
http://hdl.handle.net/20.500.12574/46
Corpus size 99M words (123M tokens)
Development period 2006–2007
Developers Institute of Mathematics and Computer Science UL
Funding Research and Development of the Semantic Web Technologies for Latvia (SemTi-Kamols)
CLARIN http://hdl.handle.net/20.500.12574/46