Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Andrew McDavid is active.

Publication


Featured researches published by Andrew McDavid.


Nature Genetics | 2012

Detectable clonal mosaicism from birth to old age and its relationship to cancer

Cathy C. Laurie; Cecelia A. Laurie; Kenneth Rice; Kimberly F. Doheny; Leila R. Zelnick; Caitlin P. McHugh; Hua Ling; Kurt N. Hetrick; Elizabeth W. Pugh; Christopher I. Amos; Qingyi Wei; Li-E Wang; Jeffrey E. Lee; Kathleen C. Barnes; Nadia N. Hansel; Rasika A. Mathias; Denise Daley; Terri H. Beaty; Alan F. Scott; Ingo Ruczinski; Rob Scharpf; Laura J. Bierut; Sarah M. Hartz; Maria Teresa Landi; Neal D. Freedman; Lynn R. Goldin; David Ginsburg; Jun-Jun Li; Karl C. Desch; Sara S. Strom

We detected clonal mosaicism for large chromosomal anomalies (duplications, deletions and uniparental disomy) using SNP microarray data from over 50,000 subjects recruited for genome-wide association studies. This detection method requires a relatively high frequency of cells with the same abnormal karyotype (>5–10%; presumably of clonal origin) in the presence of normal cells. The frequency of detectable clonal mosaicism in peripheral blood is low (<0.5%) from birth until 50 years of age, after which it rapidly rises to 2–3% in the elderly. Many of the mosaic anomalies are characteristic of those found in hematological cancers and identify common deleted regions with genes previously associated with these cancers. Although only 3% of subjects with detectable clonal mosaicism had any record of hematological cancer before DNA sampling, those without a previous diagnosis have an estimated tenfold higher risk of a subsequent hematological cancer (95% confidence interval = 6–18).


Genome Biology | 2015

MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data

Greg Finak; Andrew McDavid; Masanao Yajima; Jingyuan Deng; Vivian H. Gersuk; Alex K. Shalek; Chloe K. Slichter; Hannah W. Miller; M. Juliana McElrath; Martin Prlic; Peter S. Linsley; Raphael Gottardo

Single-cell transcriptomics reveals gene expression heterogeneity but suffers from stochastic dropout and characteristic bimodal expression distributions in which expression is either strongly non-zero or non-detectable. We propose a two-part, generalized linear model for such bimodal data that parameterizes both of these features. We argue that the cellular detection rate, the fraction of genes expressed in a cell, should be adjusted for as a source of nuisance variation. Our model provides gene set enrichment analysis tailored to single-cell data. It provides insights into how networks of co-expressed genes evolve across an experimental treatment. MAST is available at https://github.com/RGLab/MAST.


Current protocols in human genetics | 2011

Quality Control Procedures for Genome‐Wide Association Studies

Stephen D. Turner; Loren L. Armstrong; Yuki Bradford; Christopher S. Carlson; Dana C. Crawford; Andrew Crenshaw; Mariza de Andrade; Kimberly F. Doheny; Jonathan L. Haines; Geoffrey Hayes; Gail P. Jarvik; Lan Jiang; Iftikhar J. Kullo; Rongling Li; Hua Ling; Teri A. Manolio; Martha E. Matsumoto; Catherine A. McCarty; Andrew McDavid; Daniel B. Mirel; Justin Paschall; Elizabeth W. Pugh; Luke V. Rasmussen; Russell A. Wilke; Rebecca L. Zuvich; Marylyn D. Ritchie

Genome‐wide association studies (GWAS) are being conducted at an unprecedented rate in population‐based cohorts and have increased our understanding of the pathophysiology of complex disease. Regardless of context, the practical utility of this information will ultimately depend upon the quality of the original data. Quality control (QC) procedures for GWAS are computationally intensive, operationally challenging, and constantly evolving. Here we enumerate some of the challenges in QC of GWAS data and describe the approaches that the electronic MEdical Records and Genomics (eMERGE) network is using for quality assurance in GWAS data, thereby minimizing potential bias and error in GWAS results. We discuss common issues associated with QC of GWAS data, including data file formats, software packages for data manipulation and analysis, sex chromosome anomalies, sample identity, sample relatedness, population substructure, batch effects, and marker quality. We propose best practices and discuss areas of ongoing and future research. Curr. Protoc. Hum. Genet. 68:1.19.1‐1.19.18


Bioinformatics | 2013

Data exploration, quality control and testing in single-cell qPCR-based gene expression experiments

Andrew McDavid; Greg Finak; Pratip K. Chattopadyay; Maria Dominguez; Laurie Lamoreaux; Steven S. Ma; Mario Roederer; Raphael Gottardo

Motivation: Cell populations are never truly homogeneous; individual cells exist in biochemical states that define functional differences between them. New technology based on microfluidic arrays combined with multiplexed quantitative polymerase chain reactions now enables high-throughput single-cell gene expression measurement, allowing assessment of cellular heterogeneity. However, few analytic tools have been developed specifically for the statistical and analytical challenges of single-cell quantitative polymerase chain reactions data. Results: We present a statistical framework for the exploration, quality control and analysis of single-cell gene expression data from microfluidic arrays. We assess accuracy and within-sample heterogeneity of single-cell expression and develop quality control criteria to filter unreliable cell measurements. We propose a statistical model accounting for the fact that genes at the single-cell level can be on (and a continuous expression measure is recorded) or dichotomously off (and the recorded expression is zero). Based on this model, we derive a combined likelihood ratio test for differential expression that incorporates both the discrete and continuous components. Using an experiment that examines treatment-specific changes in expression, we show that this combined test is more powerful than either the continuous or dichotomous component in isolation, or a t-test on the zero-inflated data. Although developed for measurements from a specific platform (Fluidigm), these tools are generalizable to other multi-parametric measures over large numbers of events. Availability: All results presented here were obtained using the SingleCellAssay R package available on GitHub (http://github.com/RGLab/SingleCellAssay). Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.


Genetic Epidemiology | 2011

Pitfalls of Merging GWAS Data: Lessons Learned in the eMERGE Network and Quality Control Procedures to Maintain High Data Quality

Rebecca L. Zuvich; Loren L. Armstrong; Suzette J. Bielinski; Yuki Bradford; Christopher S. Carlson; Dana C. Crawford; Andrew Crenshaw; Mariza de Andrade; Kimberly F. Doheny; Jonathan L. Haines; M. Geoffrey Hayes; Gail P. Jarvik; Lan Jiang; Iftikhar J. Kullo; Rongling Li; Hua Ling; Teri A. Manolio; Martha E. Matsumoto; Catherine A. McCarty; Andrew McDavid; Daniel B. Mirel; Lana M. Olson; Justin Paschall; Elizabeth W. Pugh; Luke V. Rasmussen; Laura J. Rasmussen-Torvik; Stephen D. Turner; Russell A. Wilke; Marylyn D. Ritchie

Genome‐wide association studies (GWAS) are a useful approach in the study of the genetic components of complex phenotypes. Aside from large cohorts, GWAS have generally been limited to the study of one or a few diseases or traits. The emergence of biobanks linked to electronic medical records (EMRs) allows the efficient reuse of genetic data to yield meaningful genotype–phenotype associations for multiple phenotypes or traits. Phase I of the electronic MEdical Records and GEnomics (eMERGE‐I) Network is a National Human Genome Research Institute‐supported consortium composed of five sites to perform various genetic association studies using DNA repositories and EMR systems. Each eMERGE site has developed EMR‐based algorithms to comprise a core set of 14 phenotypes for extraction of study samples from each sites DNA repository. Each eMERGE site selected samples for a specific phenotype, and these samples were genotyped at either the Broad Institute or at the Center for Inherited Disease Research using the Illumina Infinium BeadChip technology. In all, approximately 17,000 samples from across the five sites were genotyped. A unified quality control (QC) pipeline was developed by the eMERGE Genomics Working Group and used to ensure thorough cleaning of the data. This process includes examination of sample and marker quality and various batch effects. Upon completion of the genotyping and QC analyses for each sites primary study, eMERGE Coordinating Center merged the datasets from all five sites. This larger merged dataset reentered the established eMERGE QC pipeline. Based on lessons learned during the process, additional analyses and QC checkpoints were added to the pipeline to ensure proper merging. Here, we explore the challenges associated with combining datasets from different genotyping centers and describe the expansion to eMERGE QC pipeline for merged datasets. These additional steps will be useful as the eMERGE project expands to include additional sites in eMERGE‐II, and also serve as a starting point for investigators merging multiple genotype datasets accessible through the National Center for Biotechnology Information in the database of Genotypes and Phenotypes. Our experience demonstrates that merging multiple datasets after additional QC can be an efficient use of genotype data despite new challenges that appear in the process. Genet. Epidemiol. 35:887–898, 2011.


Human Molecular Genetics | 2013

Genetic variation associated with circulating monocyte count in the eMERGE Network

David R. Crosslin; Andrew McDavid; Noah Weston; Xiuwen Zheng; Eugene Hart; Mariza de Andrade; Iftikhar J. Kullo; Catherine A. McCarty; Kimberly F. Doheny; Elizabeth W. Pugh; Abel N. Kho; M. Geoffrey Hayes; Marylyn D. Ritchie; Alexander Saip; Dana C. Crawford; Paul K. Crane; Katherine M. Newton; David Carrell; Carlos J. Gallego; Michael A. Nalls; Rongling Li; Daniel B. Mirel; Andrew Crenshaw; David Couper; Toshiko Tanaka; Frank J. A. van Rooij; Ming-Huei Chen; Albert V. Smith; Neil A. Zakai; Qiong Yango

With white blood cell count emerging as an important risk factor for chronic inflammatory diseases, genetic associations of differential leukocyte types, specifically monocyte count, are providing novel candidate genes and pathways to further investigate. Circulating monocytes play a critical role in vascular diseases such as in the formation of atherosclerotic plaque. We performed a joint and ancestry-stratified genome-wide association analyses to identify variants specifically associated with monocyte count in 11 014 subjects in the electronic Medical Records and Genomics Network. In the joint and European ancestry samples, we identified novel associations in the chromosome 16 interferon regulatory factor 8 (IRF8) gene (P-value = 2.78×10(-16), β = -0.22). Other monocyte associations include novel missense variants in the chemokine-binding protein 2 (CCBP2) gene (P-value = 1.88×10(-7), β = 0.30) and a region of replication found in ribophorin I (RPN1) (P-value = 2.63×10(-16), β = -0.23) on chromosome 3. The CCBP2 and RPN1 region is located near GATA binding protein2 gene that has been previously shown to be associated with coronary heart disease. On chromosome 9, we found a novel association in the prostaglandin reductase 1 gene (P-value = 2.29×10(-7), β = 0.16), which is downstream from lysophosphatidic acid receptor 1. This region has previously been shown to be associated with monocyte count. We also replicated monocyte associations of genome-wide significance (P-value = 5.68×10(-17), β = -0.23) at the integrin, alpha 4 gene on chromosome 2. The novel IRF8 results and further replications provide supporting evidence of genetic regions associated with monocyte count.


PLOS Computational Biology | 2014

Modeling Bi-modality Improves Characterization of Cell Cycle on Gene Expression in Single Cells

Andrew McDavid; Lucas Dennis; Patrick Danaher; Greg Finak; Michael Krouse; Alice Wang; Philippa Webster; Joseph Beechem; Raphael Gottardo

Advances in high-throughput, single cell gene expression are allowing interrogation of cell heterogeneity. However, there is concern that the cell cycle phase of a cell might bias characterizations of gene expression at the single-cell level. We assess the effect of cell cycle phase on gene expression in single cells by measuring 333 genes in 930 cells across three phases and three cell lines. We determine each cells phase non-invasively without chemical arrest and use it as a covariate in tests of differential expression. We observe bi-modal gene expression, a previously-described phenomenon, wherein the expression of otherwise abundant genes is either strongly positive, or undetectable within individual cells. This bi-modality is likely both biologically and technically driven. Irrespective of its source, we show that it should be modeled to draw accurate inferences from single cell expression experiments. To this end, we propose a semi-continuous modeling framework based on the generalized linear model, and use it to characterize genes with consistent cell cycle effects across three cell lines. Our new computational framework improves the detection of previously characterized cell-cycle genes compared to approaches that do not account for the bi-modality of single-cell data. We use our semi-continuous modelling framework to estimate single cell gene co-expression networks. These networks suggest that in addition to having phase-dependent shifts in expression (when averaged over many cells), some, but not all, canonical cell cycle genes tend to be co-expressed in groups in single cells. We estimate the amount of single cell expression variability attributable to the cell cycle. We find that the cell cycle explains only 5%–17% of expression variability, suggesting that the cell cycle will not tend to be a large nuisance factor in analysis of the single cell transcriptome.


Biostatistics | 2014

Mixture models for single-cell assays with applications to vaccine studies

Greg Finak; Andrew McDavid; Pratip K. Chattopadhyay; Maria Dominguez; Steve De Rosa; Mario Roederer; Raphael Gottardo

Blood and tissue are composed of many functionally distinct cell subsets. In immunological studies, these can be measured accurately only using single-cell assays. The characterization of these small cell subsets is crucial to decipher system-level biological changes. For this reason, an increasing number of studies rely on assays that provide single-cell measurements of multiple genes and proteins from bulk cell samples. A common problem in the analysis of such data is to identify biomarkers (or combinations of biomarkers) that are differentially expressed between two biological conditions (e.g. before/after stimulation), where expression is defined as the proportion of cells expressing that biomarker (or biomarker combination) in the cell subset(s) of interest. Here, we present a Bayesian hierarchical framework based on a beta-binomial mixture model for testing for differential biomarker expression using single-cell assays. Our model allows the inference to be subject specific, as is typically required when assessing vaccine responses, while borrowing strength across subjects through common prior distributions. We propose two approaches for parameter estimation: an empirical-Bayes approach using an Expectation-Maximization algorithm and a fully Bayesian one based on a Markov chain Monte Carlo algorithm. We compare our method against classical approaches for single-cell assays including Fishers exact test, a likelihood ratio test, and basic log-fold changes. Using several experimental assays measuring proteins or genes at single-cell level and simulations, we show that our method has higher sensitivity and specificity than alternative methods. Additional simulations show that our framework is also robust to model misspecification. Finally, we demonstrate how our approach can be extended to testing multivariate differential expression across multiple biomarker combinations using a Dirichlet-multinomial model and illustrate this approach using single-cell gene expression data and simulations.


Nature Biotechnology | 2016

The contribution of cell cycle to heterogeneity in single-cell RNA-seq data

Andrew McDavid; Greg Finak; Raphael Gottardo

1. Orcut, M. Hackers are homing in on hospitals. MIT Technol. Rev. http://www.technologyreview.com/ news/530411/hackers-are-homing-in-on-hospitals/ (2 September 2014). 2. Anonymous. Data Breach Industry Forecast. 2015 Second Annual Data Breach Industry Forecast https://www.experian.com/assets/data-breach/whitepapers/2015-industry-forecast-experian.pdf (Experian, Dublin, 2015). 3. Wilde Mathews, A. Hacked database included 78.8 million people. Wall Street Journal http://www.wsj. com/articles/anthem-hacked-database-included78-8-million-people-1424807364 (24 February 2015). 4. Guidance Regarding Methods for De-identification of Protected Health Information in Accordance with the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule http://www.hhs.gov/hipaa/forprofessionals/privacy/special-topics/de-identification/ index.html (US Department of Health and Human Services, 2012). 5. Department of Health and Human Services, Office of the Assistant Secretary for Health, Office for Human Research Protections. Fed. Reg. 80, 53933–54061 (2015). 6. Naveed, M. et al. Privacy in the genomics era. ACM Comput. Surveys 48 (1), 6 (2015). 7. Anonymous. Participants in Personal Genome Project identified by privacy experts. MIT Technol. Rev. http:// www.technologyreview.com/view/514486/participantsin-personal-genome-project-identified-by-privacyexperts/ (1 May 2013). 8. Check Hayden, E. Nature 519, 400–401 (2015). affected and the number of bytes transferred; and second, logging compute access for the user, event time, operation performed, API used to make the access, the resource modified (such as virtual machines, disks, firewalls, machine images and networks) and network traffic in bytes, including notes about whether traffic was between or within compute zones (North America, Europe, Asia-Pacific, China), ingress (to the cloud) or egress (from the cloud) Logging and monitoring are typically not a requirement for non-PHI data, but we believe that to feel confident about the security implementation, these are a necessary step. Administrators should perform routine logging and mine the logs, using simple scripts or queries, for unexpected access patterns. We recommend the three following monitoring practices: first, as Google and other cloud providers send out security bulletins18 with details of vulnerabilities and patches, we recommend that administrators monitor that researchers use OS images with the latest security patches; second, because even for a generally HIPAA-compliant cloud provider (e.g., GCP), beta service offerings are not covered by HIPAA, for IRB-guided studies requiring HIPAA compliance, administrators should either disallow service via quota or monitor for use; and third, we recommend that administrators stay informed regarding DUAs for especially sensitive projects to make sure there are no inadvertent violations. Note that logging eventually results in data storage and hence cost. Administrators must keep track of these costs and plan for suitable strategies to manage log data volume.


PLOS ONE | 2013

Confirmation of the reported association of clonal chromosomal mosaicism with an increased risk of incident hematologic cancer.

Ursula M. Schick; Andrew McDavid; Paul K. Crane; Noah Weston; Kelly Ehrlich; Katherine M. Newton; Robert B. Wallace; Ebony Bookman; Tabitha A. Harrison; Aaron K. Aragaki; David R. Crosslin; Sophia S. Wang; Alex P. Reiner; Rebecca D. Jackson; Ulrike Peters; Eric B. Larson; Gail P. Jarvik; Christopher S. Carlson

Chromosomal abnormalities provide clinical utility in the diagnosis and treatment of hematologic malignancies, and may be predictive of malignant transformation in individuals without apparent clinical presentation of a hematologic cancer. In an effort to confirm previous reports of an association between clonal mosaicism and incident hematologic cancer, we applied the anomDetectBAF algorithm to call chromosomal anomalies in genotype data from previously conducted Genome Wide Association Studies (GWAS). The genotypes were initially collected from DNA derived from peripheral blood of 12,176 participants in the Group Health electronic Medical Records and Genomics study (eMERGE) and the Women’s Health Initiative (WHI). We detected clonal mosaicism in 169 individuals (1.4%) and large clonal mosaic events (>2 mb) in 117 (1.0%) individuals. Though only 9.5% of clonal mosaic carriers had an incident diagnosis of hematologic cancer (multiple myeloma, myelodysplastic syndrome, lymphoma, or leukemia), the carriers had a 5.5-fold increased risk (95% CI: 3.3–9.3; p-value = 7.5×10−11) of developing these cancers subsequently. Carriers of large mosaic anomalies showed particularly pronounced risk of subsequent leukemia (HR = 19.2, 95% CI: 8.9–41.6; p-value = 7.3×10−14). Thus we independently confirm the association between detectable clonal mosaicism and hematologic cancer found previously in two recent publications.

Collaboration


Dive into the Andrew McDavid's collaboration.

Top Co-Authors

Avatar

Raphael Gottardo

Fred Hutchinson Cancer Research Center

View shared research outputs
Top Co-Authors

Avatar

Christopher S. Carlson

Fred Hutchinson Cancer Research Center

View shared research outputs
Top Co-Authors

Avatar

Greg Finak

Fred Hutchinson Cancer Research Center

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Marylyn D. Ritchie

Pennsylvania State University

View shared research outputs
Top Co-Authors

Avatar

Rongling Li

National Institutes of Health

View shared research outputs
Researchain Logo
Decentralizing Knowledge