Juha Makkonen
University of Helsinki
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Juha Makkonen.
european conference on information retrieval | 2004
Juha Makkonen; Helena Ahonen-Myka; Marko Salmenkivi
Topic Detection and Tracking (TDT) is a research initiative that aims at techniques to organize news documents in terms of news events. We propose a method that incorporates simple semantics into TDT by splitting the term space into groups of terms that have the meaning of the same type. Such a group can be associated with an external ontology. This ontology is used to determine the similarity of two terms in the given group. We extract proper names, locations, temporal expressions and normal terms into distinct sub-vectors of the document representation. Measuring the similarity of two documents is conducted by comparing a pair of their corresponding sub-vectors at a time. We use a simple perceptron to optimize the relative emphasis of each semantic class in the tracking and detection decisions. The results suggest that the spatial and the temporal similarity measures need to be improved. Especially the vagueness of spatial and temporal terms needs to be addressed.
european conference on information retrieval | 2003
Juha Makkonen; Helena Ahonen-Myka; Marko Salmenkivi
Topic Detection and Tracking is an event-based information organization task where online news streams are monitored in order to spot new unreported events and link documents with previously detected events. The detection has proven to perform rather poorly with traditional information retrieval approaches. We present an approach that formalizes temporal expressions and augments spatial terms with ontological information and uses this data in the detection. In addition, instead using a single term vector as a document representation, we split the terms into four semantic classes and process and weigh the classes separately. The approach is motivated by experiments.
north american chapter of the association for computational linguistics | 2003
Juha Makkonen
Topic detection and tracking approaches monitor broadcast news in order to spot new, previously unreported events and to track the development of the previously spotted ones. The dynamical nature of the events makes the use of state-of-the-art methods difficult. We present a new topic definition that has potential to model evolving events. We also discuss incorporating ontologies into the similarity measures of the topics, and illustrate a dynamic hierarchy that decreases the exhaustive computation performed in the TDT process. This is mainly work-in-progress.
international conference theory and practice digital libraries | 2003
Juha Makkonen; Helena Ahonen-Myka
The harnessing of time-related information from text for the use of information retrieval requires a leap from the surface forms of the expressions to a formalized time-axis. Often the expressions are used to form chronological sequences of events. However, we want to be able to determine the temporal similarity, i.e., the overlap of temporal references of two documents and use this similarity in Topic Detection and Tracking, for example. We present a methodology for extraction of temporal expressions and a scheme of comparing the temporal evidence of the news documents. We also examine the behavior of the temporal expressions and run experiments on English News corpus.
cross language evaluation forum | 2004
Lili Aunimo; Reeta Kuuskoski; Juha Makkonen
This paper presents a bilingual question answering system that has Finnish as its source language and English as its target language. The system was evaluated in the QA@CLEF 2004 evaluation campaign. It is the only officially evaluated QA system that takes Finnish as input. The system is based on question classification and analysis, translation of important query terms, document retrieval, answer pattern instantiation and answer selection. The system achieves an accuracy of 10,88%.
european conference on information retrieval | 2003
Lili Aunimo; Oskari Heinonen; Reeta Kuuskoski; Juha Makkonen; Renaud Petit; Otso Virtanen
We present a question answering system that can handle noisy and incomplete natural language data, and methods and measures for the evaluation of question answering systems. Our question answering system is based on the vector space model and linguistic analysis of the natural language data. In the evaluation procedure, we test eight different preprocessing schemes for the data, and come to the conclusion that lemmatization combined with breaking compound words into their constituents gives significantly better results than the baseline. The evaluation process is based on stratified random sampling and bootstrapping. To measure the correctness of an answer, we use partial credits as well as full credits.
Natural Language Processing | 2002
Juha Makkonen; Helena Ahonen-Myka; Marko Salmenkivi
CLEF (Working Notes) | 2004
Lili Aunimo; Reeta Kuuskoski; Juha Makkonen
WI | 2004
Lili Aunimo; Juha Makkonen; Reeta Kuuskoski
Archive | 2003
Juha Makkonen; Helena Ahonen-Myka