Simon Rogers | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Simon Rogers is active.

Explore More

Publication

Featured researches published by Simon Rogers.

Journal of Computational Biology | 2003

Estimating Dataset Size Requirements for Classifying DNA Microarray Data

Sayan Mukherjee; Pablo Tamayo; Simon Rogers; Ryan Rifkin; Anna Engle; Colin Campbell; Todd R. Golub; Jill P. Mesirov

A statistical methodology for estimating dataset size requirements for classifying microarray data using learning curves is introduced. The goal is to use existing classification results to estimate dataset size requirements for future classification experiments and to evaluate the gain in accuracy and significance of classifiers built with additional data. The method is based on fitting inverse power-law models to construct empirical learning curves. It also includes a permutation test procedure to assess the statistical significance of classification performance for a given dataset size. This procedure is applied to several molecular classification problems representing a broad spectrum of levels of complexity.

Neural Computation | 2006

Variational Bayesian multinomial probit regression with Gaussian process priors

Mark A. Girolami; Simon Rogers

It is well known in the statistics literature that augmenting binary and polychotomous response models with gaussian latent variables enables exact Bayesian analysis via Gibbs sampling from the parameter posterior. By adopting such a data augmentation strategy, dispensing with priors over regression coefficients in favor of gaussian process (GP) priors over functions, and employing variational approximations to the full posterior, we obtain efficient computational methods for GP classification in the multiclass setting.1 The model augmentation with additional latent variables ensures full a posteriori class coupling while retaining the simple a priori independent GP covariance structure from which sparse approximations, such as multiclass informative vector machines (IVM), emerge in a natural and straightforward manner. This is the first time that a fully variational Bayesian treatment for multiclass GP classification has been developed without having to resort to additional explicit approximations to the nongaussian likelihood term. Empirical comparisons with exact analysis use Markov Chain Monte Carlo (MCMC) and Laplace approximations illustrate the utility of the variational approximation as a computationally economic alternative to full MCMC and it is shown to be more accurate than the Laplace approximation.

Bioinformatics | 2008

Investigating the correspondence between transcriptomic and proteomic expression profiles using coupled cluster models

Simon Rogers; Mark Girolami; Walter Kolch; Katrina M. Waters; Tao Liu; Brian D. Thrall; H. Steven Wiley

MOTIVATION Modern transcriptomics and proteomics enable us to survey the expression of RNAs and proteins at large scales. While these data are usually generated and analyzed separately, there is an increasing interest in comparing and co-analyzing transcriptome and proteome expression data. A major open question is whether transcriptome and proteome expression is linked and how it is coordinated. RESULTS Here we have developed a probabilistic clustering model that permits analysis of the links between transcriptomic and proteomic profiles in a sensible and flexible manner. Our coupled mixture model defines a prior probability distribution over the component to which a protein profile should be assigned conditioned on which component the associated mRNA profile belongs to. We apply this approach to a large dataset of quantitative transcriptomic and proteomic expression data obtained from a human breast epithelial cell line (HMEC). The results reveal a complex relationship between transcriptome and proteome with most mRNA clusters linked to at least two protein clusters, and vice versa. A more detailed analysis incorporating information on gene function from the Gene Ontology database shows that a high correlation of mRNA and protein expression is limited to the components of some molecular machines, such as the ribosome, cell adhesion complexes and the TCP-1 chaperonin involved in protein folding. AVAILABILITY Matlab code is available from the authors on request.

IEEE/ACM Transactions on Computational Biology and Bioinformatics | 2005

The Latent Process Decomposition of cDNA Microarray Data Sets

Simon Rogers; Mark A. Girolami; Colin Campbell; Rainer Breitling

We present a new computational technique (a software implementation, data sets, and supplementary information are available at http://www.enm.bris.ac.uk/lpd/) which enables the probabilistic analysis of cDNA microarray data and we demonstrate its effectiveness in identifying features of biomedical importance. A hierarchical Bayesian model, called Latent Process Decomposition (LPD), is introduced in which each sample in the data set is represented as a combinatorial mixture over a finite set of latent processes, which are expected to correspond to biological processes. Parameters in the model are estimated using efficient variational methods. This type of probabilistic model is most appropriate for the interpretation of measurement data generated by cDNA microarray technology. For determining informative substructure in such data sets, the proposed model has several important advantages over the standard use of dendrograms. First, the ability to objectively assess the optimal number of sample clusters. Second, the ability to represent samples and gene expression levels using a common set of latent variables (dendrograms cluster samples and gene expression values separately which amounts to two distinct reduced space representations). Third, in constrast to standard cluster models, observations are not assigned to a single cluster and, thus, for example, gene expression levels are modeled via combinations of the latent processes identified by the algorithm. We show this new method compares favorably with alternative cluster analysis methods. To illustrate its potential, we apply the proposed technique to several microarray data sets for cancer. For these data sets it successfully decomposes the data into known subtypes and indicates possible further taxonomic subdivision in addition to highlighting, in a wholly unsupervised manner, the importance of certain genes which are known to be medically significant. To illustrate its wider applicability, we also illustrate its performance on a microarray data set for yeast.

international conference on machine learning | 2005

Hierarchic Bayesian models for kernel learning

Mark A. Girolami; Simon Rogers

The integration of diverse forms of informative data by learning an optimal combination of base kernels in classification or regression problems can provide enhanced performance when compared to that obtained from any single data source. We present a Bayesian hierarchical model which enables kernel learning and present effective variational Bayes estimators for regression and classification. Illustrative experiments demonstrate the utility of the proposed method. Matlab code replicating results reported is available at http://www.dcs.gla.ac.uk/~srogers/kernel_comb.html.

Bioinformatics | 2009

Probabilistic assignment of formulas to mass peaks in metabolomics experiments

Simon Rogers; Richard A. Scheltema; Mark A. Girolami; Rainer Breitling

MOTIVATION High-accuracy mass spectrometry is a popular technology for high-throughput measurements of cellular metabolites (metabolomics). One of the major challenges is the correct identification of the observed mass peaks, including the assignment of their empirical formula, based on the measured mass. RESULTS We propose a novel probabilistic method for the assignment of empirical formulas to mass peaks in high-throughput metabolomics mass spectrometry measurements. The method incorporates information about possible biochemical transformations between the empirical formulas to assign higher probability to formulas that could be created from other metabolites in the sample. In a series of experiments, we show that the method performs well and provides greater insight than assignments based on mass alone. In addition, we extend the model to incorporate isotope information to achieve even more reliable formula identification. AVAILABILITY A supplementary document, Matlab code, data and further information are available from http://www.dcs.gla.ac.uk/inference/metsamp.

Genes, Chromosomes and Cancer | 2004

Prognostic classification of relapsing favorable histology Wilms tumor using cDNA microarray expression profiling and support vector machines

Richard D. Williams; Sandra Hing; Braden T. Greer; Craig C. Whiteford; Jun S. Wei; Rachael Natrajan; Anna Kelsey; Simon Rogers; Colin Campbell; Kathy Pritchard-Jones; Javed Khan

Treatment of Wilms tumor has a high success rate, with some 85% of patients achieving long‐term survival. However, late effects of treatment and management of relapse remain significant clinical problems. If accurate prognostic methods were available, effective risk‐adapted therapies could be tailored to individual patients at diagnosis. Few molecular prognostic markers for Wilms tumor are currently defined, though previous studies have linked allele loss on 1p or 16q, genomic gain of 1q, and overexpression from 1q with an increased risk of relapse. To identify specific patterns of gene expression that are predictive of relapse, we used high‐density (30 k) cDNA microarrays to analyze RNA samples from 27 favorable histology Wilms tumors taken from primary nephrectomies at the time of initial diagnosis. Thirteen of these tumors relapsed within 2 years. Genes differentially expressed between the relapsing and nonrelapsing tumor classes were identified by statistical scoring (t test). These genes encode proteins with diverse molecular functions, including transcription factors, developmental regulators, apoptotic factors, and signaling molecules. Use of a support vector machine classifier, feature selection, and test evaluation using cross‐validation led to identification of a generalizable expression signature, a small subset of genes whose expression potentially can be used to predict tumor outcome in new samples. Similar methods were used to identify genes that are differentially expressed between tumors with and without genomic 1q gain. This set of discriminators was highly enriched in genes on 1q, indicating close agreement between data obtained from expression profiling with data from genomic copy number analyses.

BMC Bioinformatics | 2007

Bayesian model-based inference of transcription factor activity

Simon Rogers; Raya Khanin; Mark A. Girolami

BackgroundIn many approaches to the inference and modeling of regulatory interactions using microarray data, the expression of the gene coding for the transcription factor is considered to be an accurate surrogate for the true activity of the protein it produces. There are many instances where this is inaccurate due to post-translational modifications of the transcription factor protein. Inference of the activity of the transcription factor from the expression of its targets has predominantly involved linear models that do not reflect the nonlinear nature of transcription. We extend a recent approach to inferring the transcription factor activity based on nonlinear Michaelis-Menten kinetics of transcription from maximum likelihood to fully Bayesian inference and give an example of how the model can be further developed.ResultsWe present results on synthetic and real microarray data. Additionally, we illustrate how gene and replicate specific delays can be incorporated into the model.ConclusionWe demonstrate that full Bayesian inference is appropriate in this application and has several benefits over the maximum likelihood approach, especially when the volume of data is limited. We also show the benefits of using a non-linear model over a linear model, particularly in the case of repression.

user interface software and technology | 2012

A user-specific machine learning approach for improving touch accuracy on mobile devices

Daryl Weir; Simon Rogers; Roderick Murray-Smith; Markus Löchtefeld

We present a flexible Machine Learning approach for learning user-specific touch input models to increase touch accuracy on mobile devices. The model is based on flexible, non-parametric Gaussian Process regression and is learned using recorded touch inputs. We demonstrate that significant touch accuracy improvements can be obtained when either raw sensor data is used as an input or when the devices reported touch location is used as an input, with the latter marginally outperforming the former. We show that learned offset functions are highly nonlinear and user-specific and that user-specific models outperform models trained on data pooled from several users. Crucially, significant performance improvements can be obtained with a small (≈200) number of training examples, easily obtained for a particular user through a calibration game or from keyboard entry data.

human factors in computing systems | 2011

AnglePose: robust, precise capacitive touch tracking via 3d orientation estimation

Simon Rogers; John Williamson; Craig D. Stewart; Roderick Murray-Smith

We present a finger-tracking system for touch-based interaction which can track 3D finger angle in addition to position, using low-resolution conventional capacitive sensors, therefore compensating for the inaccuracy due to pose variation in conventional touch systems. Probabilistic inference about the pose of the finger is carried out in real-time using a particle filter; this results in an efficient and robust pose estimator which also gives appropriate uncertainty estimates. We show empirically that tracking the full pose of the finger results in greater accuracy in pointing tasks with small targets than competitive techniques. Our model can detect and cope with different finger sizes and the use of either fingers or thumbs, bringing a significant potential for improvement in one-handed interaction with touch devices. In addition to the gain in accuracy we also give examples of how this technique could open up the space of novel interactions.

Explore More