Kieran R. Campbell
University of Oxford
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Kieran R. Campbell.
Bioinformatics | 2017
Davis J. McCarthy; Kieran R. Campbell; Aaron T. L. Lun; Quin F. Wills
Motivation: Single‐cell RNA sequencing (scRNA‐seq) is increasingly used to study gene expression at the level of individual cells. However, preparing raw sequence data for further analysis is not a straightforward process. Biases, artifacts and other sources of unwanted variation are present in the data, requiring substantial time and effort to be spent on pre‐processing, quality control (QC) and normalization. Results: We have developed the R/Bioconductor package scater to facilitate rigorous pre‐processing, quality control, normalization and visualization of scRNA‐seq data. The package provides a convenient, flexible workflow to process raw sequencing reads into a high‐quality expression dataset ready for downstream analysis. scater provides a rich suite of plotting tools for single‐cell data and a flexible data structure that is compatible with existing tools and can be used as infrastructure for future software development. Availability and Implementation: The open‐source code, along with installation instructions, vignettes and case studies, is available through Bioconductor at http://bioconductor.org/packages/scater. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.
PLOS Computational Biology | 2016
Kieran R. Campbell; Christopher Yau
Single cell gene expression profiling can be used to quantify transcriptional dynamics in temporal processes, such as cell differentiation, using computational methods to label each cell with a ‘pseudotime’ where true time series experimentation is too difficult to perform. However, owing to the high variability in gene expression between individual cells, there is an inherent uncertainty in the precise temporal ordering of the cells. Pre-existing methods for pseudotime estimation have predominantly given point estimates precluding a rigorous analysis of the implications of uncertainty. We use probabilistic modelling techniques to quantify pseudotime uncertainty and propagate this into downstream differential expression analysis. We demonstrate that reliance on a point estimate of pseudotime can lead to inflated false discovery rates and that probabilistic approaches provide greater robustness and measures of the temporal resolution that can be obtained from pseudotime inference.
bioRxiv | 2015
Kieran R. Campbell; Chris P. Ponting; Caleb Webber
Advances in RNA-seq technologies provide unprecedented insight into the variability and heterogeneity of gene expression at the single-cell level. However, such data offers only a snapshot of the transcriptome, whereas it is often the progression of cells through dynamic biological processes that is of interest. As a result, one outstanding challenge is to infer such progressions by ordering gene expression from single cell data alone, known as the cell ordering problem. Here, we introduce a new method that constructs a low-dimensional non-linear embedding of the data using laplacian eigenmaps before assigning each cell a pseudotime using principal curves. We characterise why on a theoretical level our method is more robust to the high levels of noise typical of single-cell RNA-seq data before demonstrating its utility on two existing datasets of differentiating cells.
Wellcome Open Research | 2017
Kieran R. Campbell; Christopher Yau
Modeling bifurcations in single-cell transcriptomics data has become an increasingly popular field of research. Several methods have been proposed to infer bifurcation structure from such data, but all rely on heuristic non-probabilistic inference. Here we propose the first generative, fully probabilistic model for such inference based on a Bayesian hierarchical mixture of factor analyzers. Our model exhibits competitive performance on large datasets despite implementing full Markov-Chain Monte Carlo sampling, and its unique hierarchical prior structure enables automatic determination of genes driving the bifurcation process. We additionally propose an Empirical-Bayes like extension that deals with the high levels of zero-inflation in single-cell RNA-seq data and quantify when such models are useful. We apply or model to both real and simulated single-cell gene expression data and compare the results to existing pseudotime methods. Finally, we discuss both the merits and weaknesses of such a unified, probabilistic approach in the context practical bioinformatics analyses.
bioRxiv | 2016
Kieran R. Campbell; Christopher Yau
Pseudotime estimation algorithms from single cell molecular profiling allows the recovery of temporal information from otherwise static profiles of individual cells. This pseudotemporal information can be used to characterise transient events in temporally evolving biological systems. Conventional algorithms typically employ an unsupervised approach that do not model any explicit gene behaviours making them hard to apply in the context of strong prior knowledge. Our approach Ouija takes an alternate approach to pseudotime by explicitly focusing on switch-like marker genes that are ordinarily used to confirm the accuracy of unsupervised pseudotime algorithms. Instead of using the marker genes for confirmation, we derive pseudotimes directly from the marker genes. We show that in many instances a small panel of marker genes can recover pseudotimes that are consistent with those obtained using the whole transcriptome. Ouija therefore provides a powerful complimentary approach to existing approaches to pseudotime estimation.Pseudotime estimation from single-cell gene expression allows the recovery of temporal information from otherwise static profiles of individual cells. This pseudotemporal information can be used to characterise transient events in temporally evolving biological systems. Conventional algorithms typically emphasise an unsupervised transcriptome-wide approach and use retrospective analysis to evaluate the behaviour of individual genes. Here we introduce an orthogonal approach termed “Ouija” that learns pseudotimes from a small set of marker genes that might ordinarily be used to retrospectively confirm the accuracy of unsupervised pseudotime algorithms. Crucially, we model these genes in terms of switch-like or transient behaviour along the trajectory, allowing us to understand why the pseudotimes have been inferred and learn informative parameters about the behaviour of each gene. Since each gene is associated with a switch or peak time the genes are effectively ordered along with the cells, allowing each part of the trajectory to be understood in terms of the behaviour of certain genes. In the following we introduce our model and demonstrate that in many instances a small panel of marker genes can recover pseudotimes that are consistent with those obtained using the entire transcriptome. Furthermore, we show that our method can detect differences in the regulation timings between two genes and identify “metastable” states - discrete cell types along the continuous trajectories - that recapitulate known cell types. Ouija therefore provides a powerful complimentary approach to existing whole transcriptome based pseudotime estimation methods. An open source implementation is available at http://www.github.com/kieranrcampbell/ouija as an R package and at http://www.github.com/kieranrcampbell/ouijaflow as a Python/TensorFlow package.
bioRxiv | 2015
Kieran R. Campbell; Christopher Yau
Single-cell genomics has revolutionised modern biology while requiring the development of advanced computational and statistical methods. Advances have been made in uncovering gene expression heterogeneity, discovering new cell types and novel identification of genes and transcription factors involved in cellular processes. One such approach to the analysis is to construct pseudotime orderings of cells as they progress through a particular biological process, such as cell-cycle or differentiation. These methods assign a score - known as the pseudotime - to each cell as a surrogate measure of progression. However, all published methods to date are purely algorithmic and lack any way to give uncertainty to the pseudotime assigned to a cell. Here we present a method that combines Gaussian Process Latent Variable Models (GP-LVM) with a recently published electroGP prior to perform Bayesian inference on the pseudotimes. We go on to show that the posterior variability in these pseudotimes leads to nontrivial uncertainty in the pseudo-temporal ordering of the cells and that pseudotimes should not be thought of as point estimates.
Bioinformatics | 2016
Kieran R. Campbell; Christopher Yau
Motivation: Pseudotime analyses of single‐cell RNA‐seq data have become increasingly common. Typically, a latent trajectory corresponding to a biological process of interest—such as differentiation or cell cycle—is discovered. However, relatively little attention has been paid to modelling the differential expression of genes along such trajectories. Results: We present switchde, a statistical framework and accompanying R package for identifying switch‐like differential expression of genes along pseudotemporal trajectories. Our method includes fast model fitting that provides interpretable parameter estimates corresponding to how quickly a gene is up or down regulated as well as where in the trajectory such regulation occurs. It also reports a P‐value in favour of rejecting a constant‐expression model for switch‐like differential expression and optionally models the zero‐inflation prevalent in single‐cell data. Availability and Implementation: The R package switchde is available through the Bioconductor project at https://bioconductor.org/packages/switchde. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.
bioRxiv | 2017
Kieran R. Campbell; Christopher Yau
Pseudotime algorithms can be employed to extract latent temporal information from crosssectional data sets allowing dynamic biological processes to be studied in situations where the collection of genuine time series data is challenging or prohibitive. Computational techniques have arisen from areas such as single-cell ‘omics and in cancer modelling where pseudotime can be used to learn about cellular differentiation or tumour progression. However, methods to date typically assume homogenous genetic and environmental backgrounds, which becomes particularly limiting as datasets grow in size and complexity. As a solution to this we describe a novel statistical framework that learns pseudotime trajectories in the presence of non-homogeneous genetic, phenotypic, or environmental backgrounds. We demonstrate that this enables us to identify interactions between such factors and the underlying genomic trajectory. By applying this model to both single-cell gene expression data and population level cancer studies we show that it uncovers known and novel interaction effects between genetic and enironmental factors and the expression of genes in pathways. We provide an R implementation of our method PhenoPath at https://github.com/kieranrcampbell/phenopath
Nature Communications | 2018
Kieran R. Campbell; Christopher Yau
Pseudotime algorithms can be employed to extract latent temporal information from cross-sectional data sets allowing dynamic biological processes to be studied in situations where the collection of time series data is challenging or prohibitive. Computational techniques have arisen from single-cell ‘omics and cancer modelling where pseudotime can be used to learn about cellular differentiation or tumour progression. However, methods to date typically implicitly assume homogeneous genetic, phenotypic or environmental backgrounds, which becomes limiting as data sets grow in size and complexity. We describe a novel statistical framework that learns how pseudotime trajectories can be modulated through covariates that encode such factors. We apply this model to both single-cell and bulk gene expression data sets and show that the approach can recover known and novel covariate-pseudotime interaction effects. This hybrid regression-latent variable model framework extends pseudotemporal modelling from its most prevalent area of single cell genomics to wider applications.Cross-sectional omic data often have non-homogeneous genetic, phenotypic, or environmental backgrounds. Here, the authors develop a statistical framework to infer pseudotime trajectories in the presence of such factors as well as their interactions in both single-cell and bulk gene expression analysis
Bioinformatics | 2018
Kieran R. Campbell; Christopher Yau
Motivation: Pseudotime estimation from single‐cell gene expression data allows the recovery of temporal information from otherwise static profiles of individual cells. Conventional pseudotime inference methods emphasize an unsupervised transcriptome‐wide approach and use retrospective analysis to evaluate the behaviour of individual genes. However, the resulting trajectories can only be understood in terms of abstract geometric structures and not in terms of interpretable models of gene behaviour. Results: Here we introduce an orthogonal Bayesian approach termed ‘Ouija’ that learns pseudotimes from a small set of marker genes that might ordinarily be used to retrospectively confirm the accuracy of unsupervised pseudotime algorithms. Crucially, we model these genes in terms of switch‐like or transient behaviour along the trajectory, allowing us to understand why the pseudotimes have been inferred and learn informative parameters about the behaviour of each gene. Since each gene is associated with a switch or peak time the genes are effectively ordered along with the cells, allowing each part of the trajectory to be understood in terms of the behaviour of certain genes. We demonstrate that this small panel of marker genes can recover pseudotimes that are consistent with those obtained using the entire transcriptome. Furthermore, we show that our method can detect differences in the regulation timings between two genes and identify ‘metastable’ states—discrete cell types along the continuous trajectories—that recapitulate known cell types. Availability and implementation: An open source implementation is available as an R package at http://www.github.com/kieranrcampbell/ouija and as a Python/TensorFlow package at http://www.github.com/kieranrcampbell/ouijaflow. Supplementary information: Supplementary data are available at Bioinformatics online.