Jaakko Peltonen
University of Tampere
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jaakko Peltonen.
IEEE Transactions on Neural Networks | 2001
Samuel Kaski; Janne Sinkkonen; Jaakko Peltonen
We introduce a method for deriving a metric, locally based on the Fisher information matrix, into the data space. A self-organizing map (SOM) is computed in the new metric to explore financial statements of enterprises. The metric measures local distances in terms of changes in the distribution of an auxiliary random variable that reflects what is important in the data. In this paper the variable indicates bankruptcy within the next few years. The conditional density of the auxiliary variable is first estimated, and the change in the estimate resulting from local displacements in the primary data space is measured using the Fisher information matrix. When a self-organizing map is computed in the new metric it still visualizes the data space in a topology-preserving fashion, but represents the (local) directions in which the probability of bankruptcy changes the most.
IEEE Transactions on Neural Networks | 2005
Jaakko Peltonen; Samuel Kaski
A simple probabilistic model is introduced to generalize classical linear discriminant analysis (LDA) in finding components that are informative of or relevant for data classes. The components maximize the predictability of the class distribution which is asymptotically equivalent to 1) maximizing mutual information with the classes, and 2) finding principal components in the so-called learning or Fisher metrics. The Fisher metric measures only distances that are relevant to the classes, that is, distances that cause changes in the class distribution. The components have applications in data exploration, visualization, and dimensionality reduction. In empirical experiments, the method outperformed, in addition to more classical methods, a Renyi entropy-based alternative while having essentially equivalent computational cost.
IEEE Transactions on Visualization and Computer Graphics | 2017
Dominik Sacha; Leishi Zhang; Michael Sedlmair; John Aldo Lee; Jaakko Peltonen; Daniel Weiskopf; Stephen C. North; Daniel A. Keim
Dimensionality Reduction (DR) is a core building block in visualizing multidimensional data. For DR techniques to be useful in exploratory data analysis, they need to be adapted to human needs and domain-specific problems, ideally, interactively, and on-the-fly. Many visual analytics systems have already demonstrated the benefits of tightly integrating DR with interactive visualizations. Nevertheless, a general, structured understanding of this integration is missing. To address this, we systematically studied the visual analytics and visualization literature to investigate how analysts interact with automatic DR techniques. The results reveal seven common interaction scenarios that are amenable to interactive control such as specifying algorithmic constraints, selecting relevant features, or choosing among several DR algorithms. We investigate specific implementations of visual analysis systems integrating DR, and analyze ways that other machine learning methods have been combined with DR. Summarizing the results in a “human in the loop” process model provides a general lens for the evaluation of visual interactive DR systems. We apply the proposed model to study and classify several systems previously described in the literature, and to derive future research opportunities.
BMC Bioinformatics | 2007
Merja Oja; Jaakko Peltonen; Jonas Blomberg; Samuel Kaski
BackgroundHuman endogenous retroviruses (HERVs) are surviving traces of ancient retrovirus infections and now reside within the human DNA. Recently HERV expression has been detected in both normal tissues and diseased patients. However, the activities (expression levels) of individual HERV sequences are mostly unknown.ResultsWe introduce a generative mixture model, based on Hidden Markov Models, for estimating the activities of the individual HERV sequences from EST (expressed sequence tag) databases. We use the model to estimate the relative activities of 181 HERVs. We also empirically justify a faster heuristic method for HERV activity estimation and use it to estimate the activities of 2450 HERVs. The majority of the HERV activities were previously unknown.Conclusion(i) Our methods estimate activity accurately based on experiments on simulated data. (ii) Our estimate on real data shows that 7% of the HERVs are active. The active ones are spread unevenly into HERV groups and relatively uniformly in terms of estimated age. HERVs with the retroviral env gene are more often active than HERVs without env. Few of the active HERVs have open reading frames for retroviral proteins.
Proceedings of the National Academy of Sciences of the United States of America | 2015
Antti Honkela; Jaakko Peltonen; Hande Topa; Iryna Charapitsa; Filomena Matarese; Korbinian Grote; Hendrik G. Stunnenberg; George Reid; Neil D. Lawrence; Magnus Rattray
Significance Gene transcription is a highly regulated dynamic process. Delays in transcription have important consequences on dynamics of gene expression and consequently on downstream biological function. We model temporal dynamics of transcription using genome-wide time course data measuring transcriptional activity and mRNA concentration. We find a significant number of genes exhibit a long RNA processing delay between transcription termination and mRNA production. These long processing delays are more common for short genes, which would otherwise be expected to transcribe most rapidly. The distribution of intronic reads suggests that these delays are required for splicing to be completed. Understanding such delays is essential for understanding how a rapid cellular response is regulated. Genes with similar transcriptional activation kinetics can display very different temporal mRNA profiles because of differences in transcription time, degradation rate, and RNA-processing kinetics. Recent studies have shown that a splicing-associated RNA production delay can be significant. To investigate this issue more generally, it is useful to develop methods applicable to genome-wide datasets. We introduce a joint model of transcriptional activation and mRNA accumulation that can be used for inference of transcription rate, RNA production delay, and degradation rate given data from high-throughput sequencing time course experiments. We combine a mechanistic differential equation model with a nonparametric statistical modeling approach allowing us to capture a broad range of activation kinetics, and we use Bayesian parameter estimation to quantify the uncertainty in estimates of the kinetic parameters. We apply the model to data from estrogen receptor α activation in the MCF-7 breast cancer cell line. We use RNA polymerase II ChIP-Seq time course data to characterize transcriptional activation and mRNA-Seq time course data to quantify mature transcripts. We find that 11% of genes with a good signal in the data display a delay of more than 20 min between completing transcription and mature mRNA production. The genes displaying these long delays are significantly more likely to be short. We also find a statistical association between high delay and late intron retention in pre-mRNA data, indicating significant splicing-associated production delays in many genes.
international joint conference on artificial intelligence | 2011
Joni Pajarinen; Jaakko Peltonen
Decentralized partially observable Markov decision processes (DEC-POMDPs) are used to plan policies for multiple agents that must maximize a joint reward function but do not communicate with each other. The agents act under uncertainty about each other and the environment. This planning task arises in optimization of wireless networks, and other scenarios where communication between agents is restricted by costs or physical limits. DEC-POMDPs are a promising solution, but optimizing policies quickly becomes computationally intractable when problem size grows. Factored DEC-POMDPs allow large problems to be described in compact form, but have the same worst case complexity as non-factored DEC-POMDPs. We propose an efficient optimization algorithm for large factored infinite-horizon DEC-POMDPs. We formulate expectation-maximization based optimization into a new form, where complexity can be kept tractable by factored approximations. Our method performs well, and it can solve problems with more agents and larger state spaces than state of the art DEC-POMDP methods. We give results for factored infinite-horizon DEC-POMDP problems with up to 10 agents.
international conference on artificial neural networks | 2002
Jaakko Peltonen; Arto Klami; Samuel Kaski
Improved methods are presented for learning metrics that measure only important distances. It is assumed that changes in primary data are relevant only to the extent that they cause changes in auxiliary data, available paired with the primary data. The metrics are here derived from estimators of the conditional density of the auxiliary data. More accurate estimators are compared, and a more accurate approximation to the distances is introduced. The new methods improved the quality of Self-Organizing Maps (SOMs) significantly for four of the five studied data sets.
PLOS ONE | 2014
Ali Faisal; Jaakko Peltonen; Elisabeth Georgii; Johan Rung; Samuel Kaski
A main challenge of data-driven sciences is how to make maximal use of the progressively expanding databases of experimental datasets in order to keep research cumulative. We introduce the idea of a modeling-based dataset retrieval engine designed for relating a researchers experimental dataset to earlier work in the field. The search is (i) data-driven to enable new findings, going beyond the state of the art of keyword searches in annotations, (ii) modeling-driven, to include both biological knowledge and insights learned from data, and (iii) scalable, as it is accomplished without building one unified grand model of all data. Assuming each dataset has been modeled beforehand, by the researchers or automatically by database managers, we apply a rapidly computable and optimizable combination model to decompose a new dataset into contributions from earlier relevant models. By using the data-driven decomposition, we identify a network of interrelated datasets from a large annotated human gene expression atlas. While tissue type and disease were major driving forces for determining relevant datasets, the found relationships were richer, and the model-based search was more accurate than the keyword search; moreover, it recovered biologically meaningful relationships that are not straightforwardly visible from annotations—for instance, between cells in different developmental stages such as thymocytes and T-cells. Data-driven links and citations matched to a large extent; the data-driven links even uncovered corrections to the publication data, as two of the most linked datasets were not highly cited and turned out to have wrong publication entries in the database.
international workshop on machine learning for signal processing | 2007
Jaakko Peltonen; Jacob Goldberger; Samuel Kaski
We introduce a method that learns a class-discriminative subspace or discriminative components of data. Such a sub- space is useful for visualization, dimensionality reduction, feature extraction, and for learning a regularized distance metric. We learn the subspace by optimizing a probabilistic semiparametric model, a mixture of Gaussians, of classes in the subspace. The semiparametric modeling leads to fast computation (O(N) for N samples) in each iteration of optimization, in contrast to recent nonparametric methods that take O(N2) time, but with equal accuracy. Moreover, we learn the subspace in a semi-supervised manner from three kinds of data: labeled and unlabeled samples, and unlabeled samples with pairwise constraints, with a unified objective.
intelligent user interfaces | 2015
Salvatore Andolina; Khalil Klouche; Jaakko Peltonen; Mohammad E. Hoque; Tuukka Ruotsalo; Diogo Cabral; Arto Klami; Dorota Glowacka; Patrik Floréen; Giulio Jacucci
The users understanding of information needs and the information available in the data collection can evolve during an exploratory search session. Search systems tailored for well-defined narrow search tasks may be suboptimal for exploratory search where the user can sequentially refine the expressions of her information needs and explore alternative search directions. A major challenge for exploratory search systems design is how to support such behavior and expose the user to relevant yet novel information that can be difficult to discover by using conventional query formulation techniques. We introduce IntentStreams, a system for exploratory search that provides interactive query refinement mechanisms and parallel visualization of search streams. The system models each search stream via an intent model allowing rapid user feedback. The user interface allows swift initiation of alternative and parallel search streams by direct manipulation that does not require typing. A study with 13 participants shows that IntentStreams provides better support for branching behavior compared to a conventional search system.