Christopher Holmes
University of Oxford
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Christopher Holmes.
Nucleic Acids Research | 2007
Stefano Colella; Christopher Yau; Jennifer M. Taylor; Ghazala Mirza; Helen Butler; Penny Clouston; Anne S. Bassett; Anneke Seller; Christopher Holmes; Jiannis Ragoussis
Array-based technologies have been used to detect chromosomal copy number changes (aneuploidies) in the human genome. Recent studies identified numerous copy number variants (CNV) and some are common polymorphisms that may contribute to disease susceptibility. We developed, and experimentally validated, a novel computational framework (QuantiSNP) for detecting regions of copy number variation from BeadArray™ SNP genotyping data using an Objective Bayes Hidden-Markov Model (OB-HMM). Objective Bayes measures are used to set certain hyperparameters in the priors using a novel re-sampling framework to calibrate the model to a fixed Type I (false positive) error rate. Other parameters are set via maximum marginal likelihood to prior training data of known structure. QuantiSNP provides probabilistic quantification of state classifications and significantly improves the accuracy of segmental aneuploidy identification and mapping, relative to existing analytical tools (Beadstudio, Illumina), as demonstrated by validation of breakpoint boundaries. QuantiSNP identified both novel and validated CNVs. QuantiSNP was developed using BeadArray™ SNP data but it can be adapted to other platforms and we believe that the OB-HMM framework has widespread applicability in genomic research. In conclusion, QuantiSNP is a novel algorithm for high-resolution CNV/aneuploidy detection with application to clinical genetics, cancer and disease association studies.
Nature Genetics | 2013
Claire Palles; Jean-Baptiste Cazier; Kimberley Howarth; Enric Domingo; Angela Jones; Peter Broderick; Zoe Kemp; Sarah L. Spain; Estrella Guarino; Israel Salguero; Amy Sherborne; Daniel Chubb; Luis Carvajal-Carmona; Yusanne Ma; Kulvinder Kaur; Sara E. Dobbins; Ella Barclay; Maggie Gorman; Lynn Martin; Michal Kovac; Sean Humphray; Anneke Lucassen; Christopher Holmes; David R. Bentley; Peter Donnelly; Jenny C. Taylor; Christos Petridis; Rebecca Roylance; Elinor Sawyer; David Kerr
Many individuals with multiple or large colorectal adenomas or early-onset colorectal cancer (CRC) have no detectable germline mutations in the known cancer predisposition genes. Using whole-genome sequencing, supplemented by linkage and association analysis, we identified specific heterozygous POLE or POLD1 germline variants in several multiple-adenoma and/or CRC cases but in no controls. The variants associated with susceptibility, POLE p.Leu424Val and POLD1 p.Ser478Asn, have high penetrance, and POLD1 mutation was also associated with endometrial cancer predisposition. The mutations map to equivalent sites in the proofreading (exonuclease) domain of DNA polymerases ɛ and δ and are predicted to cause a defect in the correction of mispaired bases inserted during DNA replication. In agreement with this prediction, the tumors from mutation carriers were microsatellite stable but tended to acquire base substitution mutations, as confirmed by yeast functional assays. Further analysis of published data showed that the recently described group of hypermutant, microsatellite-stable CRCs is likely to be caused by somatic POLE mutations affecting the exonuclease domain.
Statistical Science | 2005
Ajay Jasra; Christopher Holmes; David A. Stephens
In the past ten years there has been a dramatic increase of in terest in the Bayesian analysis of finite mixture models. This is primarily because of the emergence of Markov chain Monte Carlo (MCMC) methods. While MCMC provides a convenient way to draw inference from compli cated statistical models, there are many, perhaps underappreciated, problems associated with the MCMC analysis of mixtures. The problems are mainly caused by the nonidentifiability of the components under symmetric priors, which leads to so-called label switching in the MCMC output. This means that ergodic averages of component specific quantities will be identical and thus useless for inference. We review the solutions to the label switching problem, such as artificial identifiability constraints, relabelling algorithms and label invariant loss functions. We also review various MCMC sampling schemes that have been suggested for mixture models and discuss posterior sensitivity to prior specification.
PLOS Medicine | 2008
Adaikalavan Ramasamy; Adrian Mondry; Christopher Holmes; Douglas G. Altman
Adaikalavan Ramasamy and colleagues outline seven key issues and suggest a stepwise approach in conducting a meta-analysis of microarray datasets.
Bayesian Analysis | 2006
Christopher Holmes; Leonhard Held
In this paper we discuss auxiliary variable approaches to Bayesian binary and multinomial regression. These approaches are ideally suited to automated Markov chain Monte Carlo simulation. In the first part we describe a simple technique using joint updating that improves the performance of the conventional probit regression algorithm. In the second part we discuss auxiliary variable methods for inference in Bayesian logistic regression, including covariate set uncertainty. Finally, we show how the logistic method is easily extended to multinomial regression models. All of the algorithms are fully automatic with no user set parameters and no necessary Metropolis-Hastings accept/reject steps.
Archive | 2003
David D. Denison; Mark Hansen; Christopher Holmes; Bani K. Mallick; Bin Yu
Nonlinear Classification * Approximation Theory and Signal Processing * Modeling of Complex Objects * Splines Gaussian Processes and Support Vector Machines * Case Studies * Theory * Machine Learning and Optimization * Future Directions
Journal of Computational and Graphical Statistics | 2010
Anthony Lee; Christopher Yau; Michael B. Giles; Arnaud Doucet; Christopher Holmes
We present a case study on the utility of graphics cards to perform massively parallel simulation of advanced Monte Carlo methods. Graphics cards, containing multiple Graphics Processing Units (GPUs), are self-contained parallel computational devices that can be housed in conventional desktop and laptop computers and can be thought of as prototypes of the next generation of many-core processors. For certain classes of population-based Monte Carlo algorithms they offer massively parallel simulation, with the added advantage over conventional distributed multicore processors that they are cheap, easily accessible, easy to maintain, easy to code, dedicated local devices with low power consumption. On a canonical set of stochastic simulation examples including population-based Markov chain Monte Carlo methods and Sequential Monte Carlo methods, we find speedups from 35- to 500-fold over conventional single-threaded computer code. Our findings suggest that GPUs have the potential to facilitate the growth of statistical modeling into complex data-rich domains through the availability of cheap and accessible many-core computation. We believe the speedup we observe should motivate wider use of parallelizable simulation methods and greater methodological attention to their design. This article has supplementary material online.
Journal of the American Statistical Association | 2006
Nicholas A. Heard; Christopher Holmes; David A. Stephens
Malaria represents one of the major worldwide challenges to public health. A recent breakthrough in the study of the disease follows the annotation of the genome of the malaria parasite Plasmodium falciparum and the mosquito vector (an organism that spreads an infectious disease)Anopheles. Of particular interest is the molecular biology underlying the immune response system of Anopheles, which actively fights against Plasmodium infection. This article reports a statistical analysis of gene expression time profiles from mosquitoes that have been infected with a bacterial agent. Specifically, we introduce a Bayesian model-based hierarchical clustering algorithm for curve data to investigate mechanisms of regulation in the genes concerned; that is, we aim to cluster genes having similar expression profiles. Genes displaying similar, interesting profiles can then be highlighted for further investigation by the experimenter. We show how our approach reveals structure within the data not captured by other approaches. One of the most pertinent features of the data is the sample size, which records the expression levels of 2,771 genes at 6 time points. Additionally, the time points are unequally spaced, and there is expected nonstationary behavior in the gene profiles. We demonstrate our approach to be readily implementable under these conditions, and highlight some crucial computational savings that can be made in the context of a fully Bayesian analysis.
Nature Biotechnology | 2013
Quin F. Wills; Kenneth J. Livak; Alex J. Tipping; Tariq Enver; Andrew Goldson; Darren W. Sexton; Christopher Holmes
Gene expression in multiple individual cells from a tissue or culture sample varies according to cell-cycle, genetic, epigenetic and stochastic differences between the cells. However, single-cell differences have been largely neglected in the analysis of the functional consequences of genetic variation. Here we measure the expression of 92 genes affected by Wnt signaling in 1,440 single cells from 15 individuals to associate single-nucleotide polymorphisms (SNPs) with gene-expression phenotypes, while accounting for stochastic and cell-cycle differences between cells. We provide evidence that many heritable variations in gene function--such as burst size, burst frequency, cell cycle-specific expression and expression correlation/noise between cells--are masked when expression is averaged over many cells. Our results demonstrate how single-cell analyses provide insights into the mechanistic and network effects of genetic variability, with improved statistical power to model these effects on gene expression.
Genome Biology | 2010
Christopher Yau; Dmitri Mouradov; Robert N. Jorissen; Stefano Colella; Ghazala Mirza; Graham Steers; Adrian L. Harris; Jiannis Ragoussis; Oliver M. Sieber; Christopher Holmes
We describe a statistical method for the characterization of genomic aberrations in single nucleotide polymorphism microarray data acquired from cancer genomes. Our approach allows us to model the joint effect of polyploidy, normal DNA contamination and intra-tumour heterogeneity within a single unified Bayesian framework. We demonstrate the efficacy of our method on numerous datasets including laboratory generated mixtures of normal-cancer cell lines and real primary tumours.