Oskari Heinonen | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Oskari Heinonen is active.

Explore More

Publication

Featured researches published by Oskari Heinonen.

document engineering | 2002

A dynamic user interface for document assembly

Miro Lehtonen; Renaud Petit; Oskari Heinonen; Greger Lindén

Document assembly has turned out to be a convenient approach to corporate publishing and reuse of large collections of documents. Automated assembly of a document reduces the amount of human effort when creating customized documents consisting of document fragments from a collection.However, most methods used require a number of parameters to be defined prior to the assembly process, and providing these parameters in the correct format is seen to be too demanding for an average user. We have designed and implemented a graphical user interface that provides the user with a simple way to specify the parameters of the assembly process. The interface, which is dynamically generated based on a given document configuration, lets the user create and customize documents such as technical manuals.In our example assembly case, the user can select the product, the manual type, the language of the manual as well as the optional components to be included in the manual.

meeting of the association for computational linguistics | 1998

Optimal Multi-Paragraph Text Segmentation by Dynamic Programming

Oskari Heinonen

There exist several methods of calculating a similarity curve, or a sequence of similarity values, representing the lexical cohesion of successive text constituents, e.g., paragraphs. Methods for deciding the locations of fragment boundaries are, however, scarce. We propose a fragmentation method based on dynamic programming. The method is theoretically sound and guaranteed to provide an optimal splitting on the basis of a similarity curve, a preferred fragment length, and a cost function defined. The method is especially useful when control on fragment size is of importance.

european conference on principles of data mining and knowledge discovery | 1997

Mining in the Phrasal Frontier

Helena Ahonen; Oskari Heinonen; Mika Klemettinen; A. Inkeri Verkamo

Data mining methods have been applied to a wide variety of domains. Surprisingly enough, only a few examples of data mining in text are available. However, considering the amount of existing document collections, text mining would be most useful. Traditionally, texts have been analysed using various information retrieval related methods and natural language processing. In this paper, we present our first experiments in applying general methods of data mining to discovering phrases and co-occurring terms. We also describe the text mining process developed. Our results show that data mining methods — with appropriate preprocessing — can be used in text processing, and that by shifting the focus the process can be used to obtain results for various purposes.

international conference on electronic publishing | 1998

Design and Implementation of a Document Assembly Workbench

Helena Ahonen; Barbara Heikkinen; Oskari Heinonen; Jani Jaakkola; Pekka Kilpeläinen; Greger Lindén

Computers support the management of large collections of text documents, but efficient reuse of document collections for producing new documents remains inherently difficult. We describe and discuss the design and implementation of a document assembly system based on a document assembly model, where the user produces new specialized documents by querying and browsing a collection of structured document fragments.

european conference on information retrieval | 2003

Question Answering System for Incomplete and Noisy Data

Lili Aunimo; Oskari Heinonen; Reeta Kuuskoski; Juha Makkonen; Renaud Petit; Otso Virtanen

We present a question answering system that can handle noisy and incomplete natural language data, and methods and measures for the evaluation of question answering systems. Our question answering system is based on the vector space model and linguistic analysis of the natural language data. In the evaluation procedure, we test eight different preprocessing schemes for the data, and come to the conclusion that lemmatization combined with breaking compound words into their constituents gives significantly better results than the baseline. The evaluation process is based on stratified random sampling and bootstrapping. To measure the correctness of an answer, we use partial credits as well as full credits.

database and expert systems applications | 1997

Assembling Documents from Digital Libraries

Helena Ahonen; Barbara Heikkinen; Oskari Heinonen; Pekka Kilpeläinen

We consider assembling documents using, as a source, a digital library containing SGML documents. The assembly process contains two parts: 1) finding interesting fragments, and 2) constructing a coherent document. We present a general document assembly framework. First, we describe a system for tailoring control engineering textbooks. Its assembling facilities are rather restricted but, on the other hand, the quality of documents produced is high. Second, we address the problem of filtering and combining interesting information from a large heterogeneous document collection. The methods presented offer various ways to find the interesting document fragments. Moreover, the elements found in the fragments are mapped to generic elements, like sections, paragraph containers, paragraphs and strings, which have known semantics. Hence, even arbitrary compositions can be formatted and printed.

international conference on electronic publishing | 1998

Analysis of Document Structures for Element Type Classification

Helena Ahonen; Barbara Heikkinen; Oskari Heinonen; Jani Jaakkola; Mika Klemettinen

As more and more digital documents become available for the public use from different sources, also the needs of the users increase. Seamless integration of heterogenous collections, e.g., a possibility to query and format documents in a uniform way, is one of these needs. Processing of documents is greatly enhanced if the structure of documents is explicitly represented by some standard (SGML, XML, HTML). Hence, the problem of integrating heterogenous structures has to be taken into consideration. We address this problem by introducing a classification method that acquires knowledge from document instances and their document type definitions, and uses this knowledge to attach a generic class to each SGML element type. The classification retains the tree hierarchy of elements. Although the structure is simplified, enough distinctions remain to facilitate versatile further processing, e.g., formatting. The class of an element type can be stored in the document type definition and, using the architectural form feature of SGML, the documents can be processed as virtual documents obeying a pre-defined generic DTD. The specific usages of the classification, in addition to formatting and querying, include assembly of new documents from existing document fragments and automatic generation of style sheet templates for original document type definitions. We have implemented the classification method and experimented with several document types.

Archive | 1999