Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Kimmo Kettunen is active.

Publication


Featured researches published by Kimmo Kettunen.


D-lib Magazine | 2016

Exporting Finnish digitized historical newspaper contents for offline use

Tuula Pääkkönen; Jukka Kervinen; Asko Nivala; Kimmo Kettunen; Eetu Mäkelä

Digital collections of the National Library of Finland (NLF) contain over 10 million pages of historical newspapers, journals and some technical ephemera. The material ranges from the early Finnish newspapers from 1771 until the present day. The material up to 1910 can be viewed in the public web service, where as anything later is available at the six legal deposit libraries in Finland. A recent user study noticed that a different type of researcher use is one of the key uses of the collection. National Library of Finland has gotten several requests to provide the content of the digital collections as one offline bundle, where all the needed content is included. For this purpose we introduced a new format, which contains three different information sets: the full metadata of a publication page, the actual page content as ALTO XML, and the raw text content. We consider these formats most useful to be provided as raw data for the researchers. In this paper we will describe how the export format was created, how other parties have packaged the same data and what the benefits are of the current approach. We shall also briefly discuss word level quality of the content and show a real research scenario for the data.


Proceedings of the 2nd International Conference on Digital Access to Textual Cultural Heritage | 2017

Names, Right or Wrong: Named Entities in an OCRed Historical Finnish Newspaper Collection

Kimmo Kettunen; Teemu Ruokolainen

Named Entity Recognition (NER), search, classification and tagging of names and name like frequent informational elements in texts, has become a standard information extraction procedure for textual data. NER has been applied to many types of texts and different types of entities: newspapers, fiction, historical records, persons, locations, chemical compounds, protein families, animals etc. In general a NER systems performance is genre and domain dependent and also used entity categories vary [16]. The most general set of named entities is usually some version of three partite categorization of locations, persons and organizations. In this paper we report evaluation result of NER with data out of a digitized Finnish historical newspaper collection Digi. Experiments, results and discussion of this research serve development of the Web collection of historical Finnish newspapers. Digi collection contains 1,960,921 pages of newspaper material from years 1771-1910 both in Finnish and Swedish. We use only material of Finnish documents in our evaluation. The OCRed newspaper collection has lots of OCR errors; its estimated word level correctness is about 70-75% [7]. Our baseline NER tagger is a rule-based tagger of Finnish, FiNER, provided by the FIN-CLARIN consortium. Three other available tools are also evaluated: a Finnish Semantic Tagger (FST), Connexors NER tool and Polyglots NER.


language resources and evaluation | 2016

Measuring Lexical Quality of a Historical Finnish Newspaper Collection ― Analysis of Garbled OCR Data with Basic Language Technology Tools and Means.

Kimmo Kettunen; Tuula Pääkkönen


Archive | 2014

Analyzing and Improving the Quality of a Historical News Collection using Language Technology and Statistical Machine Learning Methods

Kimmo Kettunen; Timo Honkela; Krister Lindén; Pekka Kauppinen; Tuula Pääkkönen; Jukka Kervinen


Baltic HLT | 2016

Between Diachrony and Synchrony: Evaluation of Lexical Quality of a Digitized Historical Finnish Newspaper and Journal Collection with Morphological Analyzers.

Kimmo Kettunen; Tuula Pääkkönen; Mika Koistinen


CEUR WORKSHOP PROCEEDINGS | 2016

Modern tools for old content-in search of named entities in a finnish ocred historical newspaper collection 1771-1910

Kimmo Kettunen; Eetu Mäkelä; Juha Kuokkala; Teemu Ruokolainen; Jyrki Niemi


Proceedings of the 21st Nordic Conference on Computational Linguistics, NoDaLiDa, 22-24 May 2017, Gothenburg, Sweden | 2017

Tagging Named Entities in 19th Century and Modern Finnish Newspaper Material with a Finnish Semantic Tagger

Kimmo Kettunen; Laura Löfberg


arXiv: Computation and Language | 2016

How to do lexical quality estimation of a large OCRed historical Finnish newspaper collection with scarce resources.

Kimmo Kettunen; Tuula Pääkkönen


DHN | 2018

Digitisation and Digital Library Presentation System – A Resource-Conscientious Approach

Tuula Pääkkönen; Kimmo Kettunen; Jukka Kervinen


DHN | 2018

Research and Development Efforts on the Digitized Historical Newspaper and Journal Collection of The National Library of Finland.

Kimmo Kettunen; Mika Koistinen; Teemu Ruokolainen

Collaboration


Dive into the Kimmo Kettunen's collaboration.

Top Co-Authors

Avatar

Teemu Ruokolainen

Helsinki Institute for Information Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jyrki Niemi

University of Helsinki

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge