Is this you? Create Your Porfile

Antti Honkela

Helsinki Institute for Information Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Antti Honkela is active.

Explore More

Publication

Featured researches published by Antti Honkela.

Nature Biotechnology | 2014

A community effort to assess and improve drug sensitivity prediction algorithms

James C. Costello; Laura M. Heiser; Elisabeth Georgii; Michael P. Menden; Nicholas Wang; Mukesh Bansal; Muhammad Ammad-ud-din; Petteri Hintsanen; Suleiman A. Khan; John-Patrick Mpindi; Olli Kallioniemi; Antti Honkela; Tero Aittokallio; Krister Wennerberg; Nci Dream Community; James J. Collins; Dan Gallahan; Dinah S. Singer; Julio Saez-Rodriguez; Samuel Kaski; Joe W. Gray; Gustavo Stolovitzky

Predicting the best treatment strategy from genomic information is a core goal of precision medicine. Here we focus on predicting drug response based on a cohort of genomic, epigenomic and proteomic profiling data sets measured in human breast cancer cell lines. Through a collaborative effort between the National Cancer Institute (NCI) and the Dialogue on Reverse Engineering Assessment and Methods (DREAM) project, we analyzed a total of 44 drug sensitivity prediction algorithms. The top-performing approaches modeled nonlinear relationships and incorporated biological pathway information. We found that gene expression microarrays consistently provided the best predictive power of the individual profiling data sets; however, performance was increased by including multiple, independent data sets. We discuss the innovations underlying the top-performing methodology, Bayesian multitask MKL, and we provide detailed descriptions of all methods. This study establishes benchmarks for drug sensitivity prediction and identifies approaches that can be leveraged for the development of new methods.

Bioinformatics | 2012

Identifying differentially expressed transcripts from RNA-seq data with biological variation

Peter Glaus; Antti Honkela; Magnus Rattray

Motivation: High-throughput sequencing enables expression analysis at the level of individual transcripts. The analysis of transcriptome expression levels and differential expression (DE) estimation requires a probabilistic approach to properly account for ambiguity caused by shared exons and finite read sampling as well as the intrinsic biological variance of transcript expression. Results: We present Bayesian inference of transcripts from sequencing data (BitSeq), a Bayesian approach for estimation of transcript expression level from RNA-seq experiments. Inferred relative expression is represented by Markov chain Monte Carlo samples from the posterior probability distribution of a generative model of the read data. We propose a novel method for DE analysis across replicates which propagates uncertainty from the sample-level model while modelling biological variance using an expression-level-dependent prior. We demonstrate the advantages of our method using simulated data as well as an RNA-seq dataset with technical and biological replication for both studied conditions. Availability: The implementation of the transcriptome expression estimation and differential expression analysis, BitSeq, has been written in C++ and Python. The software is available online from http://code.google.com/p/bitseq/, version 0.4 was used for generating results presented in this article. Contact: [email protected], [email protected] or [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.

ICA | 2000

Bayesian Non-Linear Independent Component Analysis by Multi-Layer Perceptrons

Harri Lappalainen; Antti Honkela

Summary. In this chapter, a nonlinear extension to independent component analysis is developed. The nonlinear mapping from source signals to observations is modelled by a multi-layer perceptron network and the distributions of source signals are modelled by mixture-of-Gaussians. The observations are assumed to be corrupted by Gaussian noise and therefore the method is more adequately described as nonlinear independent factor analysis. The nonlinear mapping, the source distributions and the noise level are estimated from the data. Bayesian approach to learning avoids problems with overlearning which would otherwise be severe in unsupervised learning with flexible nonlinear models.

Proceedings of the National Academy of Sciences of the United States of America | 2010

Model-based method for transcription factor target identification with limited data

Antti Honkela; Charles Girardot; E. Hilary Gustafson; Ya Hsin Liu; Eileen E. M. Furlong; Neil D. Lawrence; Magnus Rattray

We present a computational method for identifying potential targets of a transcription factor (TF) using wild-type gene expression time series data. For each putative target gene we fit a simple differential equation model of transcriptional regulation, and the model likelihood serves as a score to rank targets. The expression profile of the TF is modeled as a sample from a Gaussian process prior distribution that is integrated out using a nonparametric Bayesian procedure. This results in a parsimonious model with relatively few parameters that can be applied to short time series datasets without noticeable overfitting. We assess our method using genome-wide chromatin immunoprecipitation (ChIP-chip) and loss-of-function mutant expression data for two TFs, Twist, and Mef2, controlling mesoderm development in Drosophila. Lists of top-ranked genes identified by our method are significantly enriched for genes close to bound regions identified in the ChIP-chip data and for genes that are differentially expressed in loss-of-function mutants. Targets of Twist display diverse expression profiles, and in this case a model-based approach performs significantly better than scoring based on correlation with TF expression. Our approach is found to be comparable or superior to ranking based on mutant differential expression scores. Also, we show how integrating complementary wild-type spatial expression data can further improve target ranking performance.

european conference on computational biology | 2008

Gaussian process modelling of latent chemical species

Pei Gao; Antti Honkela; Magnus Rattray; Neil D. Lawrence

MOTIVATION Inference of latent chemical species in biochemical interaction networks is a key problem in estimation of the structure and parameters of the genetic, metabolic and protein interaction networks that underpin all biological processes. We present a framework for Bayesian marginalization of these latent chemical species through Gaussian process priors. RESULTS We demonstrate our general approach on three different biological examples of single input motifs, including both activation and repression of transcription. We focus in particular on the problem of inferring transcription factor activity when the concentration of active protein cannot easily be measured. We show how the uncertainty in the inferred transcription factor activity can be integrated out in order to derive a likelihood function that can be used for the estimation of regulatory model parameters. An advantage of our approach is that we avoid the use of a coarsegrained discretization of continuous time functions, which would lead to a large number of additional parameters to be estimated. We develop exact (for linear regulation) and approximate (for non-linear regulation) inference schemes, which are much more efficient than competing sampling-based schemes and therefore provide us with a practical toolkit for model-based inference. AVAILABILITY The software and data for recreating all the experiments in this paper is available in MATLAB from http://www.cs.man. ac.uk/~neill/gpsim.

Nature Communications | 2016

Sequence element enrichment analysis to determine the genetic basis of bacterial phenotypes

John A. Lees; Minna Vehkala; Niko Välimäki; Simon R. Harris; Claire Chewapreecha; Nicholas J. Croucher; Pekka Marttinen; Mark R. Davies; Andrew C. Steer; Stephen Y.C. Tong; Antti Honkela; Julian Parkhill; Stephen D. Bentley; Jukka Corander

Bacterial genomes vary extensively in terms of both gene content and gene sequence. This plasticity hampers the use of traditional SNP-based methods for identifying all genetic associations with phenotypic variation. Here we introduce a computationally scalable and widely applicable statistical method (SEER) for the identification of sequence elements that are significantly enriched in a phenotype of interest. SEER is applicable to tens of thousands of genomes by counting variable-length k-mers using a distributed string-mining algorithm. Robust options are provided for association analysis that also correct for the clonal population structure of bacteria. Using large collections of genomes of the major human pathogens Streptococcus pneumoniae and Streptococcus pyogenes, SEER identifies relevant previously characterized resistance determinants for several antibiotics and discovers potential novel factors related to the invasiveness of S. pyogenes. We thus demonstrate that our method can answer important biologically and medically relevant questions.

international conference on neural information processing | 2008

Natural Conjugate Gradient in Variational Inference

Antti Honkela; Matti Tornio; Tapani Raiko; Juha Karhunen

Variational methods for approximate inference in machine learning often adapt a parametric probability distribution to optimize a given objective function. This view is especially useful when applying variational Bayes (VB) to models outside the conjugate-exponential family. For them, variational Bayesian expectation maximization (VB EM) algorithms are not easily available, and gradient-based methods are often used as alternatives. Traditional natural gradient methods use the Riemannian structure (or geometry) of the predictive distribution to speed up maximum likelihood estimation. We propose using the geometry of the variational approximating distribution instead to speed up a conjugate gradient method for variational learning and inference. The computational overhead is small due to the simplicity of the approximating distribution. Experiments with real-world speech data show significant speedups over alternative learning algorithms.

IEEE Transactions on Neural Networks | 2004

Variational learning and bits-back coding: an information-theoretic view to Bayesian learning

Antti Honkela; Harri Valpola

The bits-back coding first introduced by Wallace in 1990 and later by Hinton and van Camp in 1993 provides an interesting link between Bayesian learning and information-theoretic minimum-description-length (MDL) learning approaches. The bits-back coding allows interpreting the cost function used in the variational Bayesian method called ensemble learning as a code length in addition to the Bayesian view of misfit of the posterior approximation and a lower bound of model evidence. Combining these two viewpoints provides interesting insights to the learning process and the functions of different parts of the model. In this paper, the problem of variational Bayesian learning of hierarchical latent variable models is used to demonstrate the benefits of the two views. The code-length interpretation provides new views to many parts of the problem such as model comparison and pruning and helps explain many phenomena occurring in learning.

Proceedings of the National Academy of Sciences of the United States of America | 2015

Genome-wide modeling of transcription kinetics reveals patterns of RNA production delays

Antti Honkela; Jaakko Peltonen; Hande Topa; Iryna Charapitsa; Filomena Matarese; Korbinian Grote; Hendrik G. Stunnenberg; George Reid; Neil D. Lawrence; Magnus Rattray

Significance Gene transcription is a highly regulated dynamic process. Delays in transcription have important consequences on dynamics of gene expression and consequently on downstream biological function. We model temporal dynamics of transcription using genome-wide time course data measuring transcriptional activity and mRNA concentration. We find a significant number of genes exhibit a long RNA processing delay between transcription termination and mRNA production. These long processing delays are more common for short genes, which would otherwise be expected to transcribe most rapidly. The distribution of intronic reads suggests that these delays are required for splicing to be completed. Understanding such delays is essential for understanding how a rapid cellular response is regulated. Genes with similar transcriptional activation kinetics can display very different temporal mRNA profiles because of differences in transcription time, degradation rate, and RNA-processing kinetics. Recent studies have shown that a splicing-associated RNA production delay can be significant. To investigate this issue more generally, it is useful to develop methods applicable to genome-wide datasets. We introduce a joint model of transcriptional activation and mRNA accumulation that can be used for inference of transcription rate, RNA production delay, and degradation rate given data from high-throughput sequencing time course experiments. We combine a mechanistic differential equation model with a nonparametric statistical modeling approach allowing us to capture a broad range of activation kinetics, and we use Bayesian parameter estimation to quantify the uncertainty in estimates of the kinetic parameters. We apply the model to data from estrogen receptor α activation in the MCF-7 breast cancer cell line. We use RNA polymerase II ChIP-Seq time course data to characterize transcriptional activation and mRNA-Seq time course data to quantify mature transcripts. We find that 11% of genes with a good signal in the data display a delay of more than 20 min between completing transcription and mature mRNA production. The genes displaying these long delays are significantly more likely to be short. We also find a statistical association between high delay and late intron retention in pre-mRNA data, indicating significant splicing-associated production delays in many genes.

Digital Signal Processing | 2007

Blind separation of nonlinear mixtures by variational Bayesian learning

Antti Honkela; Harri Valpola; Alexander Ilin; Juha Karhunen

Blind separation of sources from nonlinear mixtures is a challenging and often ill-posed problem. We present three methods for solving this problem: an improved nonlinear factor analysis (NFA) method using a multilayer perceptron (MLP) network to model the nonlinearity, a hierarchical NFA (HNFA) method suitable for larger problems and a post-nonlinear NFA (PNFA) method for more restricted post-nonlinear mixtures. The methods are based on variational Bayesian learning, which provides the needed regularisation and allows for easy handling of missing data. While the basic methods are incapable of recovering the correct rotation of the source space, they can discover the underlying nonlinear manifold and allow reconstruction of the original sources using standard linear independent component analysis (ICA) techniques.

Explore More