Davis J. McCarthy
European Bioinformatics Institute
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Davis J. McCarthy.
Genome Medicine | 2014
Davis J. McCarthy; Peter Humburg; Alexander Kanapin; Manuel A. Rivas; Kyle J. Gaulton; Jean-Baptiste Cazier; Peter Donnelly
BackgroundVariant annotation is a crucial step in the analysis of genome sequencing data. Functional annotation results can have a strong influence on the ultimate conclusions of disease studies. Incorrect or incomplete annotations can cause researchers both to overlook potentially disease-relevant DNA variants and to dilute interesting variants in a pool of false positives. Researchers are aware of these issues in general, but the extent of the dependency of final results on the choice of transcripts and software used for annotation has not been quantified in detail.MethodsThis paper quantifies the extent of differences in annotation of 80 million variants from a whole-genome sequencing study. We compare results using the RefSeq and Ensembl transcript sets as the basis for variant annotation with the software Annovar, and also compare the results from two annotation software packages, Annovar and VEP (Ensembl’s Variant Effect Predictor), when using Ensembl transcripts.ResultsWe found only 44% agreement in annotations for putative loss-of-function variants when using the RefSeq and Ensembl transcript sets as the basis for annotation with Annovar. The rate of matching annotations for loss-of-function and nonsynonymous variants combined was 79% and for all exonic variants it was 83%. When comparing results from Annovar and VEP using Ensembl transcripts, matching annotations were seen for only 65% of loss-of-function variants and 87% of all exonic variants, with splicing variants revealed as the category with the greatest discrepancy. Using these comparisons, we characterised the types of apparent errors made by Annovar and VEP and discuss their impact on the analysis of DNA variants in genome sequencing studies.ConclusionsVariant annotation is not yet a solved problem. Choice of transcript set can have a large effect on the ultimate variant annotations obtained in a whole-genome sequencing study. Choice of annotation software can also have a substantial effect. The annotation step in the analysis of a genome sequencing study must therefore be considered carefully, and a conscious choice made as to which transcript set and software are used for annotation.
Bioinformatics | 2017
Davis J. McCarthy; Kieran R. Campbell; Aaron T. L. Lun; Quin F. Wills
Motivation: Single‐cell RNA sequencing (scRNA‐seq) is increasingly used to study gene expression at the level of individual cells. However, preparing raw sequence data for further analysis is not a straightforward process. Biases, artifacts and other sources of unwanted variation are present in the data, requiring substantial time and effort to be spent on pre‐processing, quality control (QC) and normalization. Results: We have developed the R/Bioconductor package scater to facilitate rigorous pre‐processing, quality control, normalization and visualization of scRNA‐seq data. The package provides a convenient, flexible workflow to process raw sequencing reads into a high‐quality expression dataset ready for downstream analysis. scater provides a rich suite of plotting tools for single‐cell data and a flexible data structure that is compatible with existing tools and can be used as infrastructure for future software development. Availability and Implementation: The open‐source code, along with installation instructions, vignettes and case studies, is available through Bioconductor at http://bioconductor.org/packages/scater. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.
Nature | 2017
Helena Kilpinen; Angela Goncalves; Andreas Leha; Vackar Afzal; Kaur Alasoo; Sofie Ashford; Sendu Bala; Dalila Bensaddek; Francesco Paolo Casale; Oliver J. Culley; Petr Danecek; Adam Faulconbridge; Peter W. Harrison; Annie Kathuria; Davis J. McCarthy; Shane McCarthy; Ruta Meleckyte; Yasin Memari; Nathalie Moens; Filipa Soares; Alice L. Mann; Ian Streeter; Chukwuma A. Agu; Alex Alderton; Rachel Nelson; Sarah Harper; Minal Patel; Alistair White; Sharad R Patel; Laura Clarke
Technology utilizing human induced pluripotent stem cells (iPS cells) has enormous potential to provide improved cellular models of human disease. However, variable genetic and phenotypic characterization of many existing iPS cell lines limits their potential use for research and therapy. Here we describe the systematic generation, genotyping and phenotyping of 711 iPS cell lines derived from 301 healthy individuals by the Human Induced Pluripotent Stem Cells Initiative. Our study outlines the major sources of genetic and phenotypic variation in iPS cells and establishes their suitability as models of complex human traits and cancer. Through genome-wide profiling we find that 5–46% of the variation in different iPS cell phenotypes, including differentiation capacity and cellular morphology, arises from differences between individuals. Additionally, we assess the phenotypic consequences of genomic copy-number alterations that are repeatedly observed in iPS cells. In addition, we present a comprehensive map of common regulatory variants affecting the transcriptome of human pluripotent cells.
F1000Research | 2016
Aaron T. L. Lun; Davis J. McCarthy; John C. Marioni
Single-cell RNA sequencing (scRNA-seq) is widely used to profile the transcriptome of individual cells. This provides biological resolution that cannot be matched by bulk RNA sequencing, at the cost of increased technical noise and data complexity. The differences between scRNA-seq and bulk RNA-seq data mean that the analysis of the former cannot be performed by recycling bioinformatics pipelines for the latter. Rather, dedicated single-cell methods are required at various steps to exploit the cellular resolution while accounting for technical noise. This article describes a computational workflow for low-level analyses of scRNA-seq data, based primarily on software packages from the open-source Bioconductor project. It covers basic steps including quality control, data exploration and normalization, as well as more complex procedures such as cell cycle phase assignment, identification of highly variable and correlated genes, clustering into subpopulations and marker gene detection. Analyses were demonstrated on gene-level count data from several publicly available datasets involving haematopoietic stem cells, brain-derived cells, T-helper cells and mouse embryonic stem cells. This will provide a range of usage scenarios from which readers can construct their own analysis pipelines.
Genome Biology | 2017
Florian Buettner; Naruemon Pratanwanich; Davis J. McCarthy; John C. Marioni; Oliver Stegle
Single-cell RNA-sequencing (scRNA-seq) allows studying heterogeneity in gene expression in large cell populations. Such heterogeneity can arise due to technical or biological factors, making decomposing sources of variation difficult. We here describe f-scLVM (factorial single-cell latent variable model), a method based on factor analysis that uses pathway annotations to guide the inference of interpretable factors underpinning the heterogeneity. Our model jointly estimates the relevance of individual factors, refines gene set annotations, and infers factors without annotation. In applications to multiple scRNA-seq datasets, we find that f-scLVM robustly decomposes scRNA-seq datasets into interpretable components, thereby facilitating the identification of novel subpopulations.
Nature | 2017
Helena Kilpinen; Angela Goncalves; Andreas Leha; Vackar Afzal; Kaur Alasoo; Sofie Ashford; Sendu Bala; Dalila Bensaddek; Francesco Paolo Casale; Oliver J. Culley; Petr Danecek; Adam Faulconbridge; Peter W. Harrison; Annie Kathuria; Davis J. McCarthy; Shane McCarthy; Ruta Meleckyte; Yasin Memari; Nathalie Moens; Filipa Soares; Alice L. Mann; Ian Streeter; Chukwuma A. Agu; Alex Alderton; Rachel Nelson; Sarah Harper; Minal Patel; Alistair White; Sharad R Patel; Laura Clarke
This corrects the article DOI: 10.1038/nature22403.
bioRxiv | 2018
Davis J. McCarthy; Raghd Rostom; Yuanhua Huang; Daniel J Kunz; Petr Danecek; Marc Jan Bonder; Tzachi Hagai; Wenyi Wang; Daniel J. Gaffney; B. D. Simons; Oliver Stegle; Sarah A. Teichmann
Decoding the clonal substructures of somatic tissues sheds light on cell growth, development and differentiation in health, ageing and disease. DNA-sequencing, either using bulk or using single-cell assays, has enabled the reconstruction of clonal trees from frequency and co-occurrence patterns of somatic variants. However, approaches to systematically characterize phenotypic and functional variations between individual clones are not established. Here we present cardelino (https://github.com/PMBio/cardelino), a computational method for inferring the clone of origin of individual cells that have been assayed using single-cell RNA-seq (scRNA-seq). After validating our model using simulations, we apply cardelino to matched scRNA-seq and exome sequencing data from 32 human dermal fibroblast lines, identifying hundreds of differentially expressed genes between cells from different somatic clones. These genes are frequently enriched for cell cycle and proliferation pathways, indicating a key role for cell division genes in non-neutral somatic evolution. Key findings A novel approach for integrating DNA-seq and single-cell RNA-seq data to reconstruct clonal substructure for single-cell transcriptomes. Evidence for non-neutral evolution of clonal populations in human fibroblasts. Proliferation and cell cycle pathways are commonly distorted in mutated clonal populations.
bioRxiv | 2018
Stephanie Maria Linker; Lara Urban; Stephen J. Clark; Mariya Chhatriwala; Shradha Amatya; Davis J. McCarthy; Ingo Ebersberger; Ludovic Vallier; Wolf Reik; Oliver Stegle; Marc Jan Bonder
Background Alternative splicing is a key mechanism in eukaryotic cells to increase the effective number of functionally distinct gene products. Using bulk RNA sequencing, splicing variation has been studied both across human tissues and in genetically diverse individuals. This has identified disease-relevant splicing events, as well as associations between splicing and genomic variations, including sequence composition and conservation. However, variability in splicing between single cells from the same tissue and its determinants remain poorly understood. Results We applied parallel DNA methylation and transcriptome sequencing to differentiating human induced pluripotent stem cells to characterize splicing variation (exon skipping) and its determinants. Our results shows that splicing rates in single cells can be accurately predicted based on sequence composition and other genomic features. We also identified a moderate but significant contribution from DNA methylation to splicing variation across cells. By combining sequence information and DNA methylation, we derived an accurate model (AUC=0.85) for predicting different splicing modes of individual cassette exons. These explain conventional inclusion and exclusion patterns, but also more subtle modes of cell-to-cell variation in splicing. Finally, we identified and characterized associations between DNA methylation and splicing changes during cell differentiation. Conclusions Our study yields new insights into alternative splicing at the single-cell level and reveals a previously underappreciated link between DNA methylation variation and splicing.
Genome Biology | 2016
Tomislav Ilicic; Jong Kyoung Kim; Aleksandra A. Kolodziejczyk; Frederik Otzen Bagger; Davis J. McCarthy; John C. Marioni; Sarah A. Teichmann
F1000Research | 2016
Aaron T. L. Lun; Davis J. McCarthy; John C. Marioni