Taneli Mielikäinen | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Taneli Mielikäinen is active.

Explore More

Publication

Featured researches published by Taneli Mielikäinen.

IEEE Transactions on Knowledge and Data Engineering | 2008

The Discrete Basis Problem

Pauli Miettinen; Taneli Mielikäinen; Aristides Gionis; Gautam Das; Heikki Mannila

Matrix decomposition methods represent a data matrix as a product of two factor matrices: one containing basis vectors that represent meaningful concepts in the data, and another describing how the observed data can be expressed as combinations of the basis vectors. Decomposition methods have been studied extensively, but many methods return real-valued matrices. Interpreting real-valued factor matrices is hard if the original data is Boolean. In this paper, we describe a matrix decomposition formulation for Boolean data, the Discrete Basis Problem. The problem seeks for a Boolean decomposition of a binary matrix, thus allowing the user to easily interpret the basis vectors. We also describe a variation of the problem, the Discrete Basis Partitioning Problem. We show that both problems are NP-hard. For the Discrete Basis Problem, we give a simple greedy algorithm for solving it; for the Discrete Basis Partitioning Problem we show how it can be solved using existing methods. We present experimental results for the greedy algorithm and compare it against other, well known methods. Our algorithm gives intuitive basis vectors, but its reconstruction error is usually larger than with the real-valued methods. We discuss about the reasons for this behavior.

ACM Transactions on Knowledge Discovery From Data | 2007

Assessing data mining results via swap randomization

Aristides Gionis; Heikki Mannila; Taneli Mielikäinen; Panayiotis Tsaparas

The problem of assessing the significance of data mining results on high-dimensional 0--1 datasets has been studied extensively in the literature. For problems such as mining frequent sets and finding correlations, significance testing can be done by standard statistical tests such as chi-square, or other methods. However, the results of such tests depend only on the specific attributes and not on the dataset as a whole. Moreover, the tests are difficult to apply to sets of patterns or other complex results of data mining algorithms. In this article, we consider a simple randomization technique that deals with this shortcoming. The approach consists of producing random datasets that have the same row and column margins as the given dataset, computing the results of interest on the randomized instances and comparing them to the results on the actual data. This randomization technique can be used to assess the results of many different types of data mining algorithms, such as frequent sets, clustering, and spectral analysis. To generate random datasets with given margins, we use variations of a Markov chain approach which is based on a simple swap operation. We give theoretical results on the efficiency of different randomization methods, and apply the swap randomization method to several well-known datasets. Our results indicate that for some datasets the structure discovered by the data mining algorithms is expected, given the row and column margins of the datasets, while for other datasets the discovered structure conveys information that is not captured by the margin counts.

pervasive technologies related to assistive environments | 2010

Misco: a MapReduce framework for mobile systems

Adam Ji Dou; Vana Kalogeraki; Dimitrios Gunopulos; Taneli Mielikäinen; Ville H. Tuulos

The proliferation of increasingly powerful, ubiquitous mobile devices has created a new and powerful sensing and computational environment. Software development and application deployment in such distributed mobile settings is especially challenging due to issues of failures, concurrency, and lack of easy programming models. We present a framework which provides a powerful software abstraction that hides many of such complexities from the application developer. We design and implement a mobile MapReduce framework targeted at any device which supports Python and network connectivity. We have implemented our system on a testbed of Nokia N95 8GB smartphones and demonstrated the feasibility and performance of our approach.

Rapid Communications in Mass Spectrometry | 2008

FiD: a software for ab initio structural identification of product ions from tandem mass spectrometric data.

Markus Heinonen; Ari Rantanen; Taneli Mielikäinen; Juha Kokkonen; Jari Kiuru; Raimo A. Ketola; Juho Rousu

We present FiD (Fragment iDentificator), a software tool for the structural identification of product ions produced with tandem mass spectrometric measurement of low molecular weight organic compounds. Tandem mass spectrometry (MS/MS) has proven to be an indispensable tool in modern, cell-wide metabolomics and fluxomics studies. In such studies, the structural information of the MS(n) product ions is usually needed in the downstream analysis of the measurement data. The manual identification of the structures of MS(n) product ions is, however, a nontrivial task requiring expertise, and calls for computer assistance. Commercial software tools, such as Mass Frontier and ACD/MS Fragmenter, rely on fragmentation rule databases for the identification of MS(n) product ions. FiD, on the other hand, conducts a combinatorial search over all possible fragmentation paths and outputs a ranked list of alternative structures. This gives the user an advantage in situations where the MS/MS data of compounds with less well-known fragmentation mechanisms are processed. FiD software implements two fragmentation models, the single-step model that ignores intermediate fragmentation states and the multi-step model, which allows for complex fragmentation pathways. The software works for MS/MS data produced both in positive- and negative-ion modes. The software has an easy-to-use graphical interface with built-in visualization capabilities for structures of product ions and fragmentation pathways. In our experiments involving amino acids and sugar-phosphates, often found, e.g., in the central carbon metabolism of yeasts, FiD software correctly predicted the structures of product ions on average in 85% of the cases. The FiD software is free for academic use and is available for download from www.cs.helsinki.fi/group/sysfys/software/fragid.

international conference on data mining | 2006

What is the Dimension of Your Binary Data

Nikolaj Tatti; Taneli Mielikäinen; Aristides Gionis; Heikki Mannila

Many 0/1 datasets have a very large number of variables; however, they are sparse and the dependency structure of the variables is simpler than the number of variables would suggest. Defining the effective dimensionality of such a dataset is a nontrivial problem. We consider the problem of defining a robust measure of dimension for 0/1 datasets, and show that the basic idea of fractal dimension can be adapted for binary data. However, as such the fractal dimension is difficult to interpret. Hence we introduce the concept of normalized fractal dimension. For a dataset D, its normalized fractal dimension counts the number of independent columns needed to achieve the unnormalized fractal dimension of D. The normalized fractal dimension measures the degree of dependency structure of the data. We study the properties of the normalized fractal dimension and discuss its computation. We give empirical results on the normalized fractal dimension, comparing it against PCA.

Journal of Computational Biology | 2011

Computing Atom Mappings for Biochemical Reactions without Subgraph Isomorphism

Markus Heinonen; Sampsa Lappalainen; Taneli Mielikäinen; Juho Rousu

The ability to trace the fate of individual atoms through the metabolic pathways is needed in many applications of systems biology and drug discovery. However, this information is not immediately available from the most common metabolome studies and needs to be separately acquired. Automatic discovery of correspondence of atoms in biochemical reactions is called the atom mapping problem. We suggest an efficient approach for solving the atom mapping problem exactly--finding mappings of minimum edge edit distance. The algorithm is based on A* search equipped with sophisticated heuristics for pruning the search space. This approach has clear advantages over the commonly used heuristic approach of iterative maximum common subgraph (MCS) algorithm: we explicitly minimize an objective function, and we produce solutions that typically require less manual curation. The two methods are similar in computational resource demands. We compare the performance of the proposed algorithm against several alternatives on data obtained from the KEGG LIGAND and RPAIR databases: greedy search, bi-partite graph matching, and the MCS approach. Our experiments show that alternative approaches often fail in finding mappings with minimum edit distance.

knowledge discovery and data mining | 2006

Aggregating time partitions

Taneli Mielikäinen; Evimaria Terzi; Panayiotis Tsaparas

Partitions of sequential data exist either per se or as a result of sequence segmentation algorithms. It is often the case that the same timeline is partitioned in many different ways. For example, different segmentation algorithms produce different partitions of the same underlying data points. In such cases, we are interested in producing an aggregate partition, i.e., a segmentation that agrees as much as possible with the input segmentations. Each partition is defined as a set of continuous non-overlapping segments of the timeline. We show that this problem can be solved optimally in polynomial time using dynamic programming. We also propose faster greedy heuristics that work well in practice. We experiment with our algorithms and we demonstrate their utility in clustering the behavior of mobile-phone users and combining the results of different segmentation algorithms on genomic sequences.

distributed event-based systems | 2011

Scheduling for real-time mobile MapReduce systems

Adam Ji Dou; Vana Kalogeraki; Dimitrios Gunopulos; Taneli Mielikäinen; Ville H. Tuulos

The popularity of portable electronics such as smartphones, PDAs and mobile devices and their increasing processing capabilities has enabled the development of several real-time mobile applications that require low-latency, high-throughput response and scalability. Supporting real-time applications in mobile settings is especially challenging due to limited resources, mobile device failures and the significant quality fluctuations of the wireless medium. In this paper we address the problem of supporting distributed real-time applications in a mobile MapReduce framework under the presence of failures. We present Real-Time Mobile MapReduce (MiscoRT), our system aimed at supporting the execution of distributed applications with real-time response requirements. We propose a two level scheduling scheme, designed for the MapReduce programming model, that effectively predicts application execution times and dynamically schedules application tasks. We have performed extensive experiments on a testbed of Nokia N95 8GB smartphones. We demonstrate that our scheduling system is efficient, has low overhead and performs up to 32% faster than its competitors.

european conference on machine learning | 2014

Conditional log-linear models for mobile application usage prediction

Jingu Kim; Taneli Mielikäinen

Over the last decade, mobile device usage has evolved rapidly from basic calling and texting to primarily using applications. On average, smartphone users have tens of applications installed in their devices. As the number of installed applications grows, finding a right application at a particular moment is becoming more challenging. To alleviate the problem, we study the task of predicting applications that a user is most likely going to use at a given situation. We formulate the prediction task with a conditional log-linear model and present an online learning scheme suitable for resource-constrained mobile devices. Using real-world mobile application usage data, we evaluate the performance and the behavior of the proposed solution against other prediction methods. Based on our experimental evaluation, the proposed approach offers competitive prediction performance with moderate resource needs.

inductive logic programming | 2008

Probabilistic logic learning from haplotype data

Niels Landwehr; Taneli Mielikäinen

The analysis of haplotype data of human populations has received much attention recently. For instance, problems such as Haplotype Reconstruction are important intermediate steps in gene association studies, which seek to uncover the genetic basis of complex diseases. In this chapter, we explore the application of probabilistic logic learning techniques to haplotype data. More specifically, a new haplotype reconstrcution technique based on Logical Hidden Markov Models is presented and experimentally compared against other state-of-the-art haplotyping systems. Furthermore, we explore approaches for combining haplotype reconstructions from different sources, which can increase accuracy and robustness of reconstruction estimates. Finally, techniques for discovering the structure in haplotype data at the level of haplotypes and population are discussed.

Explore More