Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Paul Blomstedt is active.

Publication


Featured researches published by Paul Blomstedt.


IEEE Transactions on Pattern Analysis and Machine Intelligence | 2015

A Bayesian Predictive Model for Clustering Data of Mixed Discrete and Continuous Type

Paul Blomstedt; Jing Tang; Jie Xiong; Christian Granlund; Jukka Corander

Advantages of model-based clustering methods over heuristic alternatives have been widely demonstrated in the literature. Most model-based clustering algorithms assume that the data are either discrete or continuous, possibly allowing both types to be present in separate features. In this paper, we introduce a model-based approach for clustering feature vectors of mixed type, allowing each feature to simultaneously take on both categorical and real values. Such data may be encountered, for instance, in chemical and biological analyses, in the analysis of survey data, as well as in image analysis. Our model is formulated within a Bayesian predictive framework, where clustering solutions correspond to random partitions of the data. Using conjugate analysis, the posterior probability for each possible partition can be determined analytically, enabling the utilization of efficient computational search strategies for finding the posterior optimal partition. The derived model is illustrated using several synthetic and real datasets.


Journal of Chemometrics | 2014

Bayesian predictive modeling and comparison of oil samples

Paul Blomstedt; Romain Gauriot; Niina Viitala; Tapani Reinikainen; Jukka Corander

Statistical comparison of oil samples is an integral part of oil spill identification, which deals with the process of linking an oil spill with its source of origin. In current practice, a frequentist hypothesis test is often used to evaluate evidence in support of a match between a spill and a source sample. As frequentist tests are only able to evaluate evidence against a hypothesis but not in support of it, we argue that this leads to unsound statistical reasoning. Moreover, currently only verbal conclusions on a very coarse scale can be made about the match between two samples, whereas a finer quantitative assessment would often be preferred. To address these issues, we propose a Bayesian predictive approach for evaluating the similarity between the chemical compositions of two oil samples. We derive the underlying statistical model from some basic assumptions on modeling assays in analytical chemistry, and to further facilitate and improve numerical evaluations, we develop analytical expressions for the key elements of Bayesian inference for this model. The approach is illustrated with both simulated and real data and is shown to have appealing properties in comparison with both standard frequentist and Bayesian approaches. Copyright


Bioinformatics | 2016

Modelling-based experiment retrieval: A case study with gene expression clustering

Paul Blomstedt; Ritabrata Dutta; Sohan Seth; Alvis Brazma; Samuel Kaski

MOTIVATION Public and private repositories of experimental data are growing to sizes that require dedicated methods for finding relevant data. To improve on the state of the art of keyword searches from annotations, methods for content-based retrieval have been proposed. In the context of gene expression experiments, most methods retrieve gene expression profiles, requiring each experiment to be expressed as a single profile, typically of case versus control. A more general, recently suggested alternative is to retrieve experiments whose models are good for modelling the query dataset. However, for very noisy and high-dimensional query data, this retrieval criterion turns out to be very noisy as well. RESULTS We propose doing retrieval using a denoised model of the query dataset, instead of the original noisy dataset itself. To this end, we introduce a general probabilistic framework, where each experiment is modelled separately and the retrieval is done by finding related models. For retrieval of gene expression experiments, we use a probabilistic model called product partition model, which induces a clustering of genes that show similar expression patterns across a number of samples. The suggested metric for retrieval using clusterings is the normalized information distance. Empirical results finally suggest that inference for the full probabilistic model can be approximated with good performance using computationally faster heuristic clustering approaches (e.g. k-means). The method is highly scalable and straightforward to apply to construct a general-purpose gene expression experiment retrieval method. AVAILABILITY AND IMPLEMENTATION The method can be implemented using standard clustering algorithms and normalized information distance, available in many statistical software packages. CONTACT [email protected] or [email protected] SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.


Communications in Statistics-theory and Methods | 2015

Posterior Predictive Comparisons for the Two-sample Problem

Paul Blomstedt; Jukka Corander

The two-sample problem of inferring whether two random samples have equal underlying distributions is formulated within the Bayesian framework as a comparison of two posterior predictive inferences rather than as a problem of model selection. The suggested approach is argued to be particularly advantageous in problems where the objective is to evaluate evidence in support of equality, along with being robust to the priors used and being capable of handling improper priors. Our approach is contrasted with the Bayes factor in a normal setting and finally, an additional example is considered where the observed samples are realizations of Markov chains.


bioRxiv | 2017

Resolving outbreak dynamics using Approximate Bayesian Computation for stochastic birth-death models

Jarno Lintusaari; Paul Blomstedt; Tuomas Sivula; Michael U. Gutmann; Samuel Kaski; Jukka Corander

Earlier research has suggested that Approximate Bayesian Computation (ABC) makes it possible to fit simulator-based intractable birth-death models to investigate communicable disease outbreak dynamics with accuracy comparable to that of exact Bayesian methods. However, recent findings have indicated that key parameters such as the reproductive number R may remain poorly identifiable. Here we show that the identifiability issue can be resolved by taking into account disease-specific characteristics of the transmission process in closer detail. Using tuberculosis (TB) in the San Francisco Bay area as a case-study, we consider the situation where the genotype data are generated as a mixture of three stochastic processes, each with their distinct dynamics and clear epidemiological interpretation. The ABC inference yields stable and accurate posterior inferences about outbreak dynamics from aggregated annual case data with genotype information. We also show that under the proposed model, the infectious population size can be reliably inferred from the data. The estimate is approximately two orders of magnitude smaller compared to assumptions made in the earlier ABC studies, and is much better aligned with epidemiological knowledge about active TB prevalence. Similarly, the reproductive number R related to the primary underlying transmission process is estimated to be nearly three-fold compared with the previous estimates, which has a substantial impact on the interpretation of the fitted outbreak model.


Ices Journal of Marine Science | 2015

General state-space population dynamics model for Bayesian stock assessment

Samu Mäntyniemi; Rebecca Whitlock; Tommi Perälä; Paul Blomstedt; Jarno Vanhatalo; Margarita M. Rincón; Anna Kuparinen; Henni Pulkkinen; O. Sakari Kuikka


arXiv: Computation | 2014

Expectation propagation as a way of life: A framework for Bayesian inference on partitioned data

Andrew Gelman; Aki Vehtari; Pasi Jylänki; Tuomas Sivula; Dustin Tran; Swupnil Sahai; Paul Blomstedt; John P. Cunningham; David Schiminovich; Christian P. Robert


arXiv: Machine Learning | 2017

Distributed Bayesian Matrix Factorization with Minimal Communication.

Xiangju Qin; Paul Blomstedt; Eemeli Leppäaho; Pekka Parviainen; Samuel Kaski


arXiv: Computation | 2016

Bayesian inference in hierarchical models by combining independent posteriors

Ritabrata Dutta; Paul Blomstedt; Samuel Kaski


arXiv: Applications | 2015

A Bayesian length-based population dynamics model for northern shrimp (Pandalus Borealis)

Paul Blomstedt; Jarno Vanhatalo; Mats Ulmestrand; Anna Gårdmark; Samu Mäntyniemi

Collaboration


Dive into the Paul Blomstedt's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Aki Vehtari

Helsinki Institute for Information Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge