Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Kimmo Kettunen.
D-lib Magazine | 2016
Tuula Pääkkönen; Jukka Kervinen; Asko Nivala; Kimmo Kettunen; Eetu Mäkelä
Digital collections of the National Library of Finland (NLF) contain over 10 million pages of historical newspapers, journals and some technical ephemera. The material ranges from the early Finnish newspapers from 1771 until the present day. The material up to 1910 can be viewed in the public web service, where as anything later is available at the six legal deposit libraries in Finland. A recent user study noticed that a different type of researcher use is one of the key uses of the collection. National Library of Finland has gotten several requests to provide the content of the digital collections as one offline bundle, where all the needed content is included. For this purpose we introduced a new format, which contains three different information sets: the full metadata of a publication page, the actual page content as ALTO XML, and the raw text content. We consider these formats most useful to be provided as raw data for the researchers. In this paper we will describe how the export format was created, how other parties have packaged the same data and what the benefits are of the current approach. We shall also briefly discuss word level quality of the content and show a real research scenario for the data.
Proceedings of the 2nd International Conference on Digital Access to Textual Cultural Heritage | 2017
Kimmo Kettunen; Teemu Ruokolainen
Named Entity Recognition (NER), search, classification and tagging of names and name like frequent informational elements in texts, has become a standard information extraction procedure for textual data. NER has been applied to many types of texts and different types of entities: newspapers, fiction, historical records, persons, locations, chemical compounds, protein families, animals etc. In general a NER systems performance is genre and domain dependent and also used entity categories vary [16]. The most general set of named entities is usually some version of three partite categorization of locations, persons and organizations. In this paper we report evaluation result of NER with data out of a digitized Finnish historical newspaper collection Digi. Experiments, results and discussion of this research serve development of the Web collection of historical Finnish newspapers. Digi collection contains 1,960,921 pages of newspaper material from years 1771-1910 both in Finnish and Swedish. We use only material of Finnish documents in our evaluation. The OCRed newspaper collection has lots of OCR errors; its estimated word level correctness is about 70-75% [7]. Our baseline NER tagger is a rule-based tagger of Finnish, FiNER, provided by the FIN-CLARIN consortium. Three other available tools are also evaluated: a Finnish Semantic Tagger (FST), Connexors NER tool and Polyglots NER.
language resources and evaluation | 2016
Kimmo Kettunen; Tuula Pääkkönen
Archive | 2014
Kimmo Kettunen; Timo Honkela; Krister Lindén; Pekka Kauppinen; Tuula Pääkkönen; Jukka Kervinen
Baltic HLT | 2016
Kimmo Kettunen; Tuula Pääkkönen; Mika Koistinen
CEUR WORKSHOP PROCEEDINGS | 2016
Kimmo Kettunen; Eetu Mäkelä; Juha Kuokkala; Teemu Ruokolainen; Jyrki Niemi
Proceedings of the 21st Nordic Conference on Computational Linguistics, NoDaLiDa, 22-24 May 2017, Gothenburg, Sweden | 2017
Kimmo Kettunen; Laura Löfberg
arXiv: Computation and Language | 2016
Kimmo Kettunen; Tuula Pääkkönen
DHN | 2018
Tuula Pääkkönen; Kimmo Kettunen; Jukka Kervinen
DHN | 2018
Kimmo Kettunen; Mika Koistinen; Teemu Ruokolainen