Helena Ahonen-Myka | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Helena Ahonen-Myka is active.

Explore More

Publication

Featured researches published by Helena Ahonen-Myka.

european conference on information retrieval | 2004

Simple Semantics in Topic Detection and Tracking

Juha Makkonen; Helena Ahonen-Myka; Marko Salmenkivi

Topic Detection and Tracking (TDT) is a research initiative that aims at techniques to organize news documents in terms of news events. We propose a method that incorporates simple semantics into TDT by splitting the term space into groups of terms that have the meaning of the same type. Such a group can be associated with an external ontology. This ontology is used to determine the similarity of two terms in the given group. We extract proper names, locations, temporal expressions and normal terms into distinct sub-vectors of the document representation. Measuring the similarity of two documents is conducted by comparing a pair of their corresponding sub-vectors at a time. We use a simple perceptron to optimize the relative emphasis of each semantic class in the tracking and detection decisions. The results suggest that the spatial and the temporal similarity measures need to be improved. Especially the vagueness of spatial and temporal terms needs to be addressed.

european conference on information retrieval | 2003

Topic detection and tracking with spatio-temporal evidence

Juha Makkonen; Helena Ahonen-Myka; Marko Salmenkivi

Topic Detection and Tracking is an event-based information organization task where online news streams are monitored in order to spot new unreported events and link documents with previously detected events. The detection has proven to perform rather poorly with traditional information retrieval approaches. We present an approach that formalizes temporal expressions and augments spatial terms with ontological information and uses this data in the detection. In addition, instead using a single term vector as a document representation, we split the terms into four semantic classes and process and weigh the classes separately. The approach is motivated by experiments.

Lecture Notes in Computer Science | 2002

Discovery of Frequent Word Sequences in Text

Helena Ahonen-Myka

We have developed a method that extracts all maximal frequent word sequences from the documents of a collection. A sequence is said to be frequent if it appears in more than ? documents, in which ? is the frequency threshold given. Furthermore, a sequence is maximal, if no other frequent sequence exists that contains this sequence. The words of a sequence do not have to appear in text consecutively.In this paper, we describe briefly the method for finding all maximal frequent word sequences in text and then extend the method for extracting generalized sequences from annotated texts, where each word has a set of additional, e.g. morphological, features attached to it. We aim at discovering patterns which preserve as many features as possible such that the frequency of the pattern still exceeds the frequency threshold given.

international conference theory and practice digital libraries | 2003

Utilizing Temporal Information in Topic Detection and Tracking

Juha Makkonen; Helena Ahonen-Myka

The harnessing of time-related information from text for the use of information retrieval requires a leap from the surface forms of the expressions to a formalized time-axis. Often the expressions are used to form chronological sequences of events. However, we want to be able to determine the temporal similarity, i.e., the overlap of temporal references of two documents and use this similarity in Topic Detection and Tracking, for example. We present a methodology for extraction of temporal expressions and a scheme of comparing the temporal evidence of the news documents. We also examine the behavior of the temporal expressions and run experiments on English News corpus.

conference on information and knowledge management | 2005

Mining all maximal frequent word sequences in a set of sentences

Helena Ahonen-Myka

We present an efficient algorithm for finding all maximal frequent word sequences in a set of sentences. A word sequence s is considered frequent, if all its words occur in at least σ sentences and the words occur in each of these sentences in the same order as in s, given a frequency threshold σ. Hence, the words of a sequence s do not have to occur consecutively in the sentences.

Natural Language Engineering | 2010

Automatic discovery of word semantic relations using paraphrase alignment and distributional lexical semantics analysis

Gaël Dias; Rumen Moraliyski; João Cordeiro; Antoine Doucet; Helena Ahonen-Myka

Thesauri, which list the most salient semantic relations between words, have mostly been compiled manually. Therefore, the inclusion of an entry depends on the subjective decision of the lexicographer. As a consequence, those resources are usually incomplete. In this paper, we propose an unsupervised methodology to automatically discover pairs of semantically related words by highlighting their local environment and evaluating their semantic similarity in local and global semantic spaces. This proposal differs from all other research presented so far as it tries to take the best of two different methodologies, i.e. semantic space models and information extraction models. In particular, it can be applied to extract close semantic relations, it limits the search space to few, highly probable options and it is unsupervised.

language resources and evaluation | 2010

An efficient any language approach for the integration of phrases in document retrieval

Antoine Doucet; Helena Ahonen-Myka

In this paper, we address the problem of the exploitation of text phrases in a multilingual context. We propose a technique to benefit from multi-word units in adhoc document retrieval, whatever the language of the document collection. We present principles to optimize the performance improvement obtained through this approach. The work is validated through retrieval experiments conducted on Chinese, Japanese, Korean and English.

hawaii international conference on system sciences | 2007

Ontology-based Knowledge in Interactive Maintenance Guide

Seppo Nyrkkö; Lauri Carlson; Matti Keijola; Helena Ahonen-Myka; Jyrki Niemi; Jussi Piitulainen; Sirke Viitanen; Martti Meri; Lauri Seitsonen; Petri Mannonen; Jani Juvonen

This paper describes 4M, a language technology research project where a dialogue system is applied on a mobile platform in a maintenance job scenario. The human-machine interface uses speech synthesis and recognition, assisted with a hypertext display. We describe a modular agent architecture, composed of independent program components which are implemented by or communicate using ontology programming techniques. Domain content and lingware are developed and shared using standard Web ontology formats and ontology-aware offline tools. A contribution of the project is the attention paid to standardization to help provide the system with new content and to migrate it to new domains, languages and purposes

INEX Workshop | 2002