Paul J. McMurdie
Stanford University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Paul J. McMurdie.
PLOS ONE | 2013
Paul J. McMurdie; Susan Holmes
Background The analysis of microbial communities through DNA sequencing brings many challenges: the integration of different types of data with methods from ecology, genetics, phylogenetics, multivariate statistics, visualization and testing. With the increased breadth of experimental designs now being pursued, project-specific statistical analyses are often needed, and these analyses are often difficult (or impossible) for peer researchers to independently reproduce. The vast majority of the requisite tools for performing these analyses reproducibly are already implemented in R and its extensions (packages), but with limited support for high throughput microbiome census data. Results Here we describe a software project, phyloseq, dedicated to the object-oriented representation and analysis of microbiome census data in R. It supports importing data from a variety of common formats, as well as many analysis techniques. These include calibration, filtering, subsetting, agglomeration, multi-table comparisons, diversity analysis, parallelized Fast UniFrac, ordination methods, and production of publication-quality graphics; all in a manner that is easy to document, share, and modify. We show how to apply functions from other R packages to phyloseq-represented data, illustrating the availability of a large number of open source analysis techniques. We discuss the use of phyloseq with tools for reproducible research, a practice common in other fields but still rare in the analysis of highly parallel microbiome census data. We have made available all of the materials necessary to completely reproduce the analysis and figures included in this article, an example of best practices for reproducible research. Conclusions The phyloseq project for R is a new open-source software package, freely available on the web from both GitHub and Bioconductor.
PLOS Computational Biology | 2014
Paul J. McMurdie; Susan Holmes
Current practice in the normalization of microbiome count data is inefficient in the statistical sense. For apparently historical reasons, the common approach is either to use simple proportions (which does not address heteroscedasticity) or to use rarefying of counts, even though both of these approaches are inappropriate for detection of differentially abundant species. Well-established statistical theory is available that simultaneously accounts for library size differences and biological variability using an appropriate mixture model. Moreover, specific implementations for DNA sequencing read count data (based on a Negative Binomial model for instance) are already available in RNA-Seq focused R packages such as edgeR and DESeq. Here we summarize the supporting statistical theory and use simulations and empirical data to demonstrate substantial improvements provided by a relevant mixture model framework over simple proportions or rarefying. We show how both proportions and rarefied counts result in a high rate of false positives in tests for species that are differentially abundant across sample classes. Regarding microbiome sample-wise clustering, we also show that the rarefying procedure often discards samples that can be accurately clustered by alternative methods. We further compare different Negative Binomial methods with a recently-described zero-inflated Gaussian mixture, implemented in a package called metagenomeSeq. We find that metagenomeSeq performs well when there is an adequate number of biological replicates, but it nevertheless tends toward a higher false positive rate. Based on these results and well-established statistical theory, we advocate that investigators avoid rarefying altogether. We have provided microbiome-specific extensions to these tools in the R package, phyloseq.
Nature Methods | 2016
Benjamin J. Callahan; Paul J. McMurdie; Michael J Rosen; Andrew W Han; Amy Jo A Johnson; Susan Holmes
We present the open-source software package DADA2 for modeling and correcting Illumina-sequenced amplicon errors (https://github.com/benjjneb/dada2). DADA2 infers sample sequences exactly and resolves differences of as little as 1 nucleotide. In several mock communities, DADA2 identified more real variants and output fewer spurious sequences than other methods. We applied DADA2 to vaginal samples from a cohort of pregnant women, revealing a diversity of previously undetected Lactobacillus crispatus variants.
Methods in Enzymology | 2013
Jose A. Navas-Molina; Juan Manuel Peralta-Sánchez; Antonio Gonzalez; Paul J. McMurdie; Yoshiki Vázquez-Baeza; Zhenjiang Xu; Luke K. Ursell; Christian L. Lauber; Hong-Wei Zhou; Se Jin Song; James Huntley; Gail Ackermann; Donna Berg-Lyons; Susan Holmes; J. Gregory Caporaso; Rob Knight
High-throughput DNA sequencing technologies, coupled with advanced bioinformatics tools, have enabled rapid advances in microbial ecology and our understanding of the human microbiome. QIIME (Quantitative Insights Into Microbial Ecology) is an open-source bioinformatics software package designed for microbial community analysis based on DNA sequence data, which provides a single analysis framework for analysis of raw sequence data through publication-quality statistical analyses and interactive visualizations. In this chapter, we demonstrate the use of the QIIME pipeline to analyze microbial communities obtained from several sites on the bodies of transgenic and wild-type mice, as assessed using 16S rRNA gene sequences generated on the Illumina MiSeq platform. We present our recommended pipeline for performing microbial community analysis and provide guidelines for making critical choices in the process. We present examples of some of the types of analyses that are enabled by QIIME and discuss how other tools, such as phyloseq and R, can be applied to expand upon these analyses.
Proceedings of the National Academy of Sciences of the United States of America | 2015
Daniel B. DiGiulio; Benjamin J. Callahan; Paul J. McMurdie; Elizabeth K. Costello; Deirdre J. Lyell; Anna Robaczewska; Christine L. Sun; Daniela S. Aliaga Goltsman; Ronald J. Wong; Gary M. Shaw; David K. Stevenson; Susan Holmes; David A. Relman
Significance The human indigenous microbial communities (microbiota) play critical roles in health and may be especially important for mother and fetus during pregnancy. Using a case-control cohort of 40 women, we characterized weekly variation in the vaginal, gut, and oral microbiota during and after pregnancy. Microbiota membership remained relatively stable at each body site during pregnancy. An altered vaginal microbial community was associated with preterm birth; this finding was corroborated by an analysis of samples from an additional cohort of nine women. We also discovered an abrupt change in the vaginal microbiota at delivery that persisted in some cases for at least 1 y. Our findings suggest that pregnancy outcomes might be predicted by features of the microbiota early in gestation. Despite the critical role of the human microbiota in health, our understanding of microbiota compositional dynamics during and after pregnancy is incomplete. We conducted a case-control study of 49 pregnant women, 15 of whom delivered preterm. From 40 of these women, we analyzed bacterial taxonomic composition of 3,767 specimens collected prospectively and weekly during gestation and monthly after delivery from the vagina, distal gut, saliva, and tooth/gum. Linear mixed-effects modeling, medoid-based clustering, and Markov chain modeling were used to analyze community temporal trends, community structure, and vaginal community state transitions. Microbiota community taxonomic composition and diversity remained remarkably stable at all four body sites during pregnancy (P > 0.05 for trends over time). Prevalence of a Lactobacillus-poor vaginal community state type (CST 4) was inversely correlated with gestational age at delivery (P = 0.0039). Risk for preterm birth was more pronounced for subjects with CST 4 accompanied by elevated Gardnerella or Ureaplasma abundances. This finding was validated with a set of 246 vaginal specimens from nine women (four of whom delivered preterm). Most women experienced a postdelivery disturbance in the vaginal community characterized by a decrease in Lactobacillus species and an increase in diverse anaerobes such as Peptoniphilus, Prevotella, and Anaerococcus species. This disturbance was unrelated to gestational age at delivery and persisted for up to 1 y. These findings have important implications for predicting premature labor, a major global health problem, and for understanding the potential impact of a persistent, altered postpartum microbiota on maternal health, including outcomes of pregnancies following short interpregnancy intervals.
PLOS Genetics | 2009
Paul J. McMurdie; Sebastian Behrens; Jochen A Müller; Jonathan Göke; Kirsti M. Ritalahti; Ryan Wagner; Eugene Goltsman; Alla Lapidus; Susan Holmes; Frank E. Löffler; Alfred M. Spormann
Vinyl chloride (VC) is a human carcinogen and widespread priority pollutant. Here we report the first, to our knowledge, complete genome sequences of microorganisms able to respire VC, Dehalococcoides sp. strains VS and BAV1. Notably, the respective VC reductase encoding genes, vcrAB and bvcAB, were found embedded in distinct genomic islands (GEIs) with different predicted integration sites, suggesting that these genes were acquired horizontally and independently by distinct mechanisms. A comparative analysis that included two previously sequenced Dehalococcoides genomes revealed a contextually conserved core that is interrupted by two high plasticity regions (HPRs) near the Ori. These HPRs contain the majority of GEIs and strain-specific genes identified in the four Dehalococcoides genomes, an elevated number of repeated elements including insertion sequences (IS), as well as 91 of 96 rdhAB, genes that putatively encode terminal reductases in organohalide respiration. Only three core rdhA orthologous groups were identified, and only one of these groups is supported by synteny. The low number of core rdhAB, contrasted with the high rdhAB numbers per genome (up to 36 in strain VS), as well as their colocalization with GEIs and other signatures for horizontal transfer, suggests that niche adaptation via organohalide respiration is a fundamental ecological strategy in Dehalococccoides. This adaptation has been exacted through multiple mechanisms of recombination that are mainly confined within HPRs of an otherwise remarkably stable, syntenic, streamlined genome among the smallest of any free-living microorganism.
Applied and Environmental Microbiology | 2008
Sebastian Behrens; Mohammad F. Azizian; Paul J. McMurdie; Andrew Sabalowsky; Mark E. Dolan; Lew Semprini; Alfred M. Spormann
ABSTRACT We investigated the distribution and activity of chloroethene-degrading microorganisms and associated functional genes during reductive dehalogenation of tetrachloroethene to ethene in a laboratory continuous-flow column. Using real-time PCR, we quantified “Dehalococcoides” species 16S rRNA and chloroethene-reductive dehalogenase (RDase) genes (pceA, tceA, vcrA, and bvcA) in nucleic acid extracts from different sections of the column. Dehalococcoides 16S rRNA gene copies were highest at the inflow port [(3.6 ± 0.6) × 106 (mean ± standard deviation) per gram soil] where the electron donor and acceptor were introduced into the column. The highest transcript numbers for tceA, vcrA, and bvcA were detected 5 to 10 cm from the column inflow. bvcA was the most highly expressed of all RDase genes and the only vinyl chloride reductase-encoding transcript detectable close to the column outflow. Interestingly, no expression of pceA was detected in the column, despite the presence of the genes in the microbial community throughout the column. By comparing the 16S rRNA gene copy numbers to the sum of all four RDase genes, we found that 50% of the Dehalococcoides population in the first part of the column did not contain either one of the known chloroethene RDase genes. Analysis of 16S rRNA gene clone libraries from both ends of the flow column revealed a microbial community dominated by members of Firmicutes and Actinobacteria. Higher clone sequence diversity was observed near the column outflow. The results presented have implications for our understanding of the ecophysiology of reductively dehalogenating Dehalococcoides spp. and their role in bioremediation of chloroethenes.
The ISME Journal | 2017
Benjamin J. Callahan; Paul J. McMurdie; Susan Holmes
Recent advances have made it possible to analyze high-throughput marker-gene sequencing data without resorting to the customary construction of molecular operational taxonomic units (OTUs): clusters of sequencing reads that differ by less than a fixed dissimilarity threshold. New methods control errors sufficiently such that amplicon sequence variants (ASVs) can be resolved exactly, down to the level of single-nucleotide differences over the sequenced gene region. The benefits of finer resolution are immediately apparent, and arguments for ASV methods have focused on their improved resolution. Less obvious, but we believe more important, are the broad benefits that derive from the status of ASVs as consistent labels with intrinsic biological meaning identified independently from a reference database. Here we discuss how these features grant ASVs the combined advantages of closed-reference OTUs—including computational costs that scale linearly with study size, simple merging between independently processed data sets, and forward prediction—and of de novo OTUs—including accurate measurement of diversity and applicability to communities lacking deep coverage in reference databases. We argue that the improvements in reusability, reproducibility and comprehensiveness are sufficiently great that ASVs should replace OTUs as the standard unit of marker-gene analysis and reporting.
The ISME Journal | 2012
Luke C Burow; Dagmar Woebken; Brad M. Bebout; Paul J. McMurdie; Steven W. Singer; Jennifer Pett-Ridge; Leslie Prufert-Bebout; Alfred M. Spormann; Peter K. Weber; Tori M. Hoehler
Hydrogen (H2) release from photosynthetic microbial mats has contributed to the chemical evolution of Earth and could potentially be a source of renewable H2 in the future. However, the taxonomy of H2-producing microorganisms (hydrogenogens) in these mats has not been previously determined. With combined biogeochemical and molecular studies of microbial mats collected from Elkhorn Slough, Monterey Bay, California, we characterized the mechanisms of H2 production and identified a dominant hydrogenogen. Net production of H2 was observed within the upper photosynthetic layer (0–2 mm) of the mats under dark and anoxic conditions. Pyrosequencing of rRNA gene libraries generated from this layer demonstrated the presence of 64 phyla, with Bacteriodetes, Cyanobacteria and Proteobacteria dominating the sequences. Sequencing of rRNA transcripts obtained from this layer demonstrated that Cyanobacteria dominated rRNA transcript pyrotag libraries. An OTU affiliated to Microcoleus spp. was the most abundant OTU in both rRNA gene and transcript libraries. Depriving mats of sunlight resulted in an order of magnitude decrease in subsequent nighttime H2 production, suggesting that newly fixed carbon is critical to H2 production. Suppression of nitrogen (N2)-fixation in the mats did not suppress H2 production, which indicates that co-metabolic production of H2 during N2-fixation is not an important contributor to H2 production. Concomitant production of organic acids is consistent with fermentation of recently produced photosynthate as the dominant mode of H2 production. Analysis of rRNA % transcript:% gene ratios and H2-evolving bidirectional [NiFe] hydrogenase % transcript:% gene ratios indicated that Microcoelus spp. are dominant hydrogenogens in the Elkhorn Slough mats.
pacific symposium on biocomputing | 2011
Paul J. McMurdie; Susan Holmes
We present a detailed description of a new Bioconductor package, phyloseq, for integrated data and analysis of taxonomically-clustered phylogenetic sequencing data in conjunction with related data types. The phyloseq package integrates abundance data, phylogenetic information and covariates so that exploratory transformations, plots, and confirmatory testing and diagnostic plots can be carried out seamlessly. The package is built following the S4 object-oriented framework of the R language so that once the data have been input the user can easily transform, plot and analyze the data. We present some examples that highlight the methods and the ease with which we can leverage existing packages.