Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Marko Salmenkivi is active.

Publication


Featured researches published by Marko Salmenkivi.


european conference on information retrieval | 2004

Simple Semantics in Topic Detection and Tracking

Juha Makkonen; Helena Ahonen-Myka; Marko Salmenkivi

Topic Detection and Tracking (TDT) is a research initiative that aims at techniques to organize news documents in terms of news events. We propose a method that incorporates simple semantics into TDT by splitting the term space into groups of terms that have the meaning of the same type. Such a group can be associated with an external ontology. This ontology is used to determine the similarity of two terms in the given group. We extract proper names, locations, temporal expressions and normal terms into distinct sub-vectors of the document representation. Measuring the similarity of two documents is conducted by comparing a pair of their corresponding sub-vectors at a time. We use a simple perceptron to optimize the relative emphasis of each semantic class in the tracking and detection decisions. The results suggest that the spatial and the temporal similarity measures need to be improved. Especially the vagueness of spatial and temporal terms needs to be addressed.


european conference on information retrieval | 2003

Topic detection and tracking with spatio-temporal evidence

Juha Makkonen; Helena Ahonen-Myka; Marko Salmenkivi

Topic Detection and Tracking is an event-based information organization task where online news streams are monitored in order to spot new unreported events and link documents with previously detected events. The detection has proven to perform rather poorly with traditional information retrieval approaches. We present an approach that formalizes temporal expressions and augments spatial terms with ontological information and uses this data in the detection. In addition, instead using a single term vector as a document representation, we split the terms into four semantic classes and process and weigh the classes separately. The approach is motivated by experiments.


knowledge discovery and data mining | 2001

Finding simple intensity descriptions from event sequence data

Heikki Mannila; Marko Salmenkivi

Sequences of events are an important type of data arising in various applications, including telecommunications, bio-statistics, web access analysis, etc. A basic approach to modeling such sequences is to find the underlying intensity functions describing the expected number of events per time unit. Typically, the intensity functions are assumed to be piecewise constant. We therefore consider different ways of fitting intensity models to event sequence data. We start by considering a Bayesian approach using Markov chain Monte Carlo (MCMC) methods with varying number of pieces. These methods can be used to produce posterior distributions on the intensity functions and they can also accomodate covariates. The drawback is that they are computationally intensive and thus are not very suitable for data mining applications in which large numbers of intensity functions have to be estimated. We consider dynamic programming approaches to finding the change points in the intensity functions. These methods can find the maximum likelihood intensity function in O(n2k) time for a sequence of n events and k different pieces of intensity. We show that simple heuristics can be used to prune the number of potential change points, yielding speedups of several orders of magnitude. The results of the improved dynamic programming method correspond very closely with the posterior averages produced by the MCMC methods.


Neuroscience Letters | 2011

The preattentive processing of major vs. minor chords in the human brain: An event-related potential study.

Paula Virtala; Venla Berg; Maari Kivioja; Juha Purhonen; Marko Salmenkivi; Petri Paavilainen; Mari Tervaniemi

Western music has two classifications that are highly familiar to all Western listeners: the dichotomy between the major and minor modalities and consonance vs. dissonance. We aimed at determining whether these classifications already take place at the level of the elicitation of the change-related mismatch negativity (MMN) component of the event-related potential (ERP). To this end, we constructed an oddball-paradigm with root minor, dissonant and inverted major chords in a context of root major chords. These stimuli were composed so that the standard and deviant chords did not include a physically deviant frequency which could cause the MMN. The standard chords were transposed into 12 different keys (=pitch levels) and delivered to the participants while they were watching a silent movie (ignore condition) or detecting softer target sounds (detection condition). In the ignore condition, the MMN was significant for all but inverted major chords. In the detection condition, the MMN was significant for dissonant chords and soft target chords. Our results indicate that the processes underlying MMN are able to make discriminations which are qualitative by nature. Whether the classifications between major and minor modalities and consonance vs. dissonance are innate or based on implicit learning remains a question for the future.


Literary and Linguistic Computing | 2007

Multivariate Analysis of Finnish Dialect Data—An Overview of Lexical Variation

Saara Hyvönen; Antti Leino; Marko Salmenkivi

During the process of writing a comprehensive dictionary of Finnish dialects, a large set of maps describing the regional distribution of the dialect words have been compiled in electronic form. In this article, we set out to analyse this corpus of data in order to gain new insight on the variation of Finnish dialects. We use a wide range of multivariate data analysis methods, including principal components analysis, independent components analysis, clustering, and multidimensional scaling. We explain how to preprocess the data to overcome the problem of uneven sampling caused by the way the data has been collected. We discuss the results obtained by these methods and compare them to the traditional view of Finnish dialect groups.


Archive | 1996

BASS: Bayesian Analyzer of Event Sequences

Elja Arjas; Heikki Mannila; Marko Salmenkivi; R. Suramo; Hannu Toivonen

We describe the BASS system, a Bayesian analyzer of event sequences. BASS uses Markov chain Monte Carlo methods, especially Metropolis-Hastings algorithm, for exploring posterior distributions. The system allows the user to specify an intensity model in a high-level definition language, and then runs the Metropolis-Hastings algorithm on it.


data mining in bioinformatics | 2005

Piecewise Constant Modeling of Sequential Data Using Reversible Jump Markov Chain Monte Carlo

Marko Salmenkivi; Heikki Mannila

We describe the use of reversible jump Markov chain Monte Carlo (RJMCMC) methods for finding piecewise constant descriptions of sequential data. The method provides posterior distributions on the number of segments in the data and thus gives a much broader view on the potential data than do methods (such as dynamic programming) that aim only at finding a single optimal solution. On the other hand, MCMC methods can be more difficult to implement than discrete optimization techniques, and monitoring convergence of the simulations is not trivial. We illustrate the methods by modeling the GC content and distribution of occurrences of ORFs and SNPs along the human genomes. We show how the simple models can be extended by modeling the influence of GC content on the intensity of ORF occurrence.


Archive | 1998

Frailty Factors and Time-dependent Hazards in Modelling Ear Infections in Children Using BASSIST

Mervi Eerola; Heikki Mannila; Marko Salmenkivi

The BASSIST system is a general purpose tool for MCMC sampling for intensity models. The system allows the user to specify an intensity model in a high-level language. The model is used to generate a simulation program that uses the Metropolis-Hastings algorithm to obtain the desired samples. In contrast to BUGS (Spiegelhalter et al., 1996), BASSIST contains several primitives that are suited for modelling event data, including piecewise constant functions etc.


Natural Language Processing | 2002

Applying Semantic Classes in Event Detection and Tracking

Juha Makkonen; Helena Ahonen-Myka; Marko Salmenkivi


european conference on computational biology | 2002

Genome segmentation using piecewise constant intensity models and reversible jump MCMC.

Marko Salmenkivi; Juha Kere; Heikki Mannila

Collaboration


Dive into the Marko Salmenkivi's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Antti Leino

University of Helsinki

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Saara Hyvönen

Helsinki Institute for Information Technology

View shared research outputs
Top Co-Authors

Avatar

Elja Arjas

University of Helsinki

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge