Helena Ahonen-Myka
University of Helsinki
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Helena Ahonen-Myka.
european conference on information retrieval | 2004
Juha Makkonen; Helena Ahonen-Myka; Marko Salmenkivi
Topic Detection and Tracking (TDT) is a research initiative that aims at techniques to organize news documents in terms of news events. We propose a method that incorporates simple semantics into TDT by splitting the term space into groups of terms that have the meaning of the same type. Such a group can be associated with an external ontology. This ontology is used to determine the similarity of two terms in the given group. We extract proper names, locations, temporal expressions and normal terms into distinct sub-vectors of the document representation. Measuring the similarity of two documents is conducted by comparing a pair of their corresponding sub-vectors at a time. We use a simple perceptron to optimize the relative emphasis of each semantic class in the tracking and detection decisions. The results suggest that the spatial and the temporal similarity measures need to be improved. Especially the vagueness of spatial and temporal terms needs to be addressed.
european conference on information retrieval | 2003
Juha Makkonen; Helena Ahonen-Myka; Marko Salmenkivi
Topic Detection and Tracking is an event-based information organization task where online news streams are monitored in order to spot new unreported events and link documents with previously detected events. The detection has proven to perform rather poorly with traditional information retrieval approaches. We present an approach that formalizes temporal expressions and augments spatial terms with ontological information and uses this data in the detection. In addition, instead using a single term vector as a document representation, we split the terms into four semantic classes and process and weigh the classes separately. The approach is motivated by experiments.
Lecture Notes in Computer Science | 2002
Helena Ahonen-Myka
We have developed a method that extracts all maximal frequent word sequences from the documents of a collection. A sequence is said to be frequent if it appears in more than ? documents, in which ? is the frequency threshold given. Furthermore, a sequence is maximal, if no other frequent sequence exists that contains this sequence. The words of a sequence do not have to appear in text consecutively.In this paper, we describe briefly the method for finding all maximal frequent word sequences in text and then extend the method for extracting generalized sequences from annotated texts, where each word has a set of additional, e.g. morphological, features attached to it. We aim at discovering patterns which preserve as many features as possible such that the frequency of the pattern still exceeds the frequency threshold given.
international conference theory and practice digital libraries | 2003
Juha Makkonen; Helena Ahonen-Myka
The harnessing of time-related information from text for the use of information retrieval requires a leap from the surface forms of the expressions to a formalized time-axis. Often the expressions are used to form chronological sequences of events. However, we want to be able to determine the temporal similarity, i.e., the overlap of temporal references of two documents and use this similarity in Topic Detection and Tracking, for example. We present a methodology for extraction of temporal expressions and a scheme of comparing the temporal evidence of the news documents. We also examine the behavior of the temporal expressions and run experiments on English News corpus.
conference on information and knowledge management | 2005
Helena Ahonen-Myka
We present an efficient algorithm for finding all maximal frequent word sequences in a set of sentences. A word sequence <i>s</i> is considered frequent, if all its words occur in at least <i>σ</i> sentences and the words occur in each of these sentences in the same order as in <i>s</i>, given a frequency threshold <i>σ</i>. Hence, the words of a sequence <i>s</i> do not have to occur consecutively in the sentences.
Natural Language Engineering | 2010
Gaël Dias; Rumen Moraliyski; João Cordeiro; Antoine Doucet; Helena Ahonen-Myka
Thesauri, which list the most salient semantic relations between words, have mostly been compiled manually. Therefore, the inclusion of an entry depends on the subjective decision of the lexicographer. As a consequence, those resources are usually incomplete. In this paper, we propose an unsupervised methodology to automatically discover pairs of semantically related words by highlighting their local environment and evaluating their semantic similarity in local and global semantic spaces. This proposal differs from all other research presented so far as it tries to take the best of two different methodologies, i.e. semantic space models and information extraction models. In particular, it can be applied to extract close semantic relations, it limits the search space to few, highly probable options and it is unsupervised.
language resources and evaluation | 2010
Antoine Doucet; Helena Ahonen-Myka
In this paper, we address the problem of the exploitation of text phrases in a multilingual context. We propose a technique to benefit from multi-word units in adhoc document retrieval, whatever the language of the document collection. We present principles to optimize the performance improvement obtained through this approach. The work is validated through retrieval experiments conducted on Chinese, Japanese, Korean and English.
hawaii international conference on system sciences | 2007
Seppo Nyrkkö; Lauri Carlson; Matti Keijola; Helena Ahonen-Myka; Jyrki Niemi; Jussi Piitulainen; Sirke Viitanen; Martti Meri; Lauri Seitsonen; Petri Mannonen; Jani Juvonen
This paper describes 4M, a language technology research project where a dialogue system is applied on a mobile platform in a maintenance job scenario. The human-machine interface uses speech synthesis and recognition, assisted with a hypertext display. We describe a modular agent architecture, composed of independent program components which are implemented by or communicate using ontology programming techniques. Domain content and lingware are developed and shared using standard Web ontology formats and ontology-aware offline tools. A contribution of the project is the attention paid to standardization to help provide the system with new content and to migrate it to new domains, languages and purposes
INEX Workshop | 2002
Antoine Doucet; Helena Ahonen-Myka
Natural Language Processing | 2002
Juha Makkonen; Helena Ahonen-Myka; Marko Salmenkivi