Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Stephen Nayfach is active.

Publication


Featured researches published by Stephen Nayfach.


Genome Biology | 2015

Average genome size estimation improves comparative metagenomics and sheds light on the functional ecology of the human microbiome.

Stephen Nayfach; Katherine S. Pollard

Average genome size is an important, yet often overlooked, property of microbial communities. We developed MicrobeCensus to rapidly and accurately estimate average genome size from shotgun metagenomic data and applied our tool to 1,352 human microbiome samples. We found that average genome size differs significantly within and between body sites and tracks with major functional and taxonomic differences. In the gut, average genome size is positively correlated with the abundance of Bacteroides and genes related to carbohydrate metabolism. Importantly, we found that average genome size variation can bias comparative analyses, and that normalization improves detection of differentially abundant genes.


Cell | 2017

Discovery of Reactive Microbiota-Derived Metabolites that Inhibit Host Proteases

Chun-Jun Guo; Fang-Yuan Chang; Thomas P. Wyche; Keriann M. Backus; Timothy M. Acker; Masanori Funabashi; Mao Taketani; Mohamed S. Donia; Stephen Nayfach; Katherine S. Pollard; Charles S. Craik; Benjamin F. Cravatt; Jon Clardy; Christopher A. Voigt; Michael A. Fischbach

The gut microbiota modulate host biology in numerous ways, but little is known about the molecular mediators of these interactions. Previously, we found a widely distributed family of nonribosomal peptide synthetase gene clusters in gut bacteria. Here, by expressing a subset of these clusters in Escherichia coli or Bacillus subtilis, we show that they encode pyrazinones and dihydropyrazinones. At least one of the 47 clusters is present in 88% of the National Institutes of Health Human Microbiome Project (NIH HMP) stool samples, and they are transcribed under conditions of host colonization. We present evidence that the active form of these molecules is the initially released peptide aldehyde, which bears potent protease inhibitory activity and selectively targets a subset of cathepsins in human cell proteomes. Our findings show that an approach combining bioinformatics, synthetic biology, and heterologous gene cluster expression can rapidly expand our knowledge of the metabolic potential of the microbiota while avoiding the challenges of cultivating fastidious commensals.


Cell | 2016

Toward Accurate and Quantitative Comparative Metagenomics

Stephen Nayfach; Katherine S. Pollard

Shotgun metagenomics and computational analysis are used to compare the taxonomic and functional profiles of microbial communities. Leveraging this approach to understand roles of microbes in human biology and other environments requires quantitative data summaries whose values are comparable across samples and studies. Comparability is currently hampered by the use of abundance statistics that do not estimate a meaningful parameter of the microbial community and biases introduced by experimental protocols and data-cleaning approaches. Addressing these challenges, along with improving study design, data access, metadata standardization, and analysis tools, will enable accurate comparative metagenomics. We envision a future in which microbiome studies are replicable and new metagenomes are easily and rapidly integrated with existing data. Only then can the potential of metagenomics for predictive ecological modeling, well-powered association studies, and effective microbiome medicine be fully realized.


PLOS Computational Biology | 2015

Automated and Accurate Estimation of Gene Family Abundance from Shotgun Metagenomes

Stephen Nayfach; Patrick H. Bradley; Stacia K. Wyman; Timothy J. Laurent; Alexander G. Williams; Jonathan A. Eisen; Katherine S. Pollard; Thomas J. Sharpton

Shotgun metagenomic DNA sequencing is a widely applicable tool for characterizing the functions that are encoded by microbial communities. Several bioinformatic tools can be used to functionally annotate metagenomes, allowing researchers to draw inferences about the functional potential of the community and to identify putative functional biomarkers. However, little is known about how decisions made during annotation affect the reliability of the results. Here, we use statistical simulations to rigorously assess how to optimize annotation accuracy and speed, given parameters of the input data like read length and library size. We identify best practices in metagenome annotation and use them to guide the development of the Shotgun Metagenome Annotation Pipeline (ShotMAP). ShotMAP is an analytically flexible, end-to-end annotation pipeline that can be implemented either on a local computer or a cloud compute cluster. We use ShotMAP to assess how different annotation databases impact the interpretation of how marine metagenome and metatranscriptome functional capacity changes across seasons. We also apply ShotMAP to data obtained from a clinical microbiome investigation of inflammatory bowel disease. This analysis finds that gut microbiota collected from Crohns disease patients are functionally distinct from gut microbiota collected from either ulcerative colitis patients or healthy controls, with differential abundance of metabolic pathways related to host-microbiome interactions that may serve as putative biomarkers of disease.


Bioinformatics | 2015

MetaQuery: A web server for rapid annotation and quantitative analysis of specific genes in the human gut microbiome

Stephen Nayfach; Michael A. Fischbach; Katherine S. Pollard

Summary: Microbiome researchers frequently want to know how abundant a particular microbial gene or pathway is across different human hosts, including its association with disease and its co-occurrence with other genes or microbial taxa. With thousands of publicly available metagenomes, these questions should be easy to answer. However, computational barriers prevent most researchers from conducting such analyses. We address this problem with MetaQuery, a web application for rapid and quantitative analysis of specific genes in the human gut microbiome. The user inputs one or more query genes, and our software returns the estimated abundance of these genes across 1267 publicly available fecal metagenomes from American, European and Chinese individuals. In addition, our application performs downstream statistical analyses to identify features that are associated with gene variation, including other query genes (i.e. gene co-variation), taxa, clinical variables (e.g. inflammatory bowel disease and diabetes) and average genome size. The speed and accessibility of MetaQuery are a step toward democratizing metagenomics research, which should allow many researchers to query the abundance and variation of specific genes in the human gut microbiome. Availability and implementation: http://metaquery.docpollard.org. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.


bioRxiv | 2015

Population genetic analyses of metagenomes reveal extensive strain-level variation in prevalent human-associated bacteria

Stephen Nayfach; Katherine S. Pollard

Deep sequencing has the potential to shed light on the functional and phylogenetic heterogeneity of microbial populations in the environment. Here we present PhyloCNV, an integrated computational pipeline for quantifying species abundance and strain-level genomic variation from shotgun metagenomes. Our method leverages a comprehensive database of >30,000 reference genomes which we accurately clustered into species groups using a panel of universal-single-copy genes. Given a shotgun metagenome, PhyloCNV will rapidly and automatically identify gene copy number variants and single-nucleotide variants present in abundant bacterial species. We applied PhyloCNV to >500 faecal metagenomes from the United States, Europe, China, Peru, and Tanzania and present the first global analysis of strain-level variation and biogeography in the human gut microbiome. On average there is 8.5x more nucleotide diversity of strains between different individuals than within individuals, with elevated strain-level diversity in hosts from Peru and Tanzania that live rural lifestyles. For many, but not all common gut species, a significant proportion of inter-sample strain-level genetic diversity is explained by host geography. Eubacterium rectale, for example, has a highly structured population that tracks with host country, while strains of Bacteroides uniformis and other species are structured independently of their hosts. Finally, we discovered that the gene content of some bacterial strains diverges at short evolutionary timescales during which few nucleotide variants accumulate. These findings shed light onto the recent evolutionary history of microbes in the human gut and highlight the extensive differences in the gene content of closely related bacterial strains. PhyloCNV is freely available at: https://github.com/snayfach/PhyloCNV.We present the Metagenomic Intra-species Diversity Analysis System (MIDAS), which is an integrated computational pipeline for quantifying bacterial species abundance and strain-level genomic variation, including gene content and single nucleotide polymorphisms, from shotgun metagenomes. Our method leverages a database of >30,000 bacterial reference genomes which we clustered into species groups. These cover the majority of abundant species in the human microbiome but only a small proportion of microbes in other environments, including soil and seawater. We applied MIDAS to stool metagenomes from 98 Swedish mothers and their infants over one year and used rare single nucleotide variants to reveal extensive vertical transmission of strains at birth but colonization with strains unlikely to derive from the mother at later time points. This pattern was missed with species-level analysis, because the infant gut microbiome composition converges towards that of an adult over time. We also applied MIDAS to 198 globally distributed marine metagenomes and used gene content to show that many prevalent bacterial species have population structure that correlates with geographic location. Strain-level genetic variants present in metagenomes clearly reveal extensive structure and dynamics that are obscured when data is analyzed at a higher taxonomic resolution.


bioRxiv | 2014

Average genome size estimation enables accurate quantification of gene family abundance and sheds light on the functional ecology of the human microbiome

Stephen Nayfach; Katherine S. Pollard

Average genome size (AGS) is an important, yet often overlooked property of microbial communities. We developed MicrobeCensus to rapidly and accurately estimate AGS from short-read metagenomics data and applied our tool to over 1,300 human microbiome samples. We found that AGS differs significantly within and between body sites and tracks with major functional and taxonomic differences. For example, in the gut, AGS ranges from 2.5 to 5.8 megabases and is positively correlated with the abundance of Bacteroides and polysaccharide metabolism. Furthermore, we found that AGS variation can bias comparative analyses, and that normalization improves detection of differentially abundant genes.


PLOS ONE | 2018

A most wanted list of conserved microbial protein families with no known domains

Stacia K. Wyman; Aram Avila-Herrera; Stephen Nayfach; Katherine S. Pollard

The number and proportion of genes with no known function are growing rapidly. To quantify this phenomenon and provide criteria for prioritizing genes for functional characterization, we developed a bioinformatics pipeline that identifies robustly defined protein families with no annotated domains, ranks these with respect to phylogenetic breadth, and identifies them in metagenomics data. We applied this approach to 271 965 protein families from the SFams database and discovered many with no functional annotation, including >118 000 families lacking any known protein domain. From these, we prioritized 6 668 conserved protein families with at least three sequences from organisms in at least two distinct classes. These Function Unknown Families (FUnkFams) are present in Tara Oceans Expedition and Human Microbiome Project metagenomes, with distributions associated with sampling environment. Our findings highlight the extent of functional novelty in sequence databases and establish an approach for creating a “most wanted” list of genes to prioritize for further characterization.


bioRxiv | 2017

A most wanted list of conserved protein families with no known domains

Stacia K. Wyman; Aram Avila-Herrera; Stephen Nayfach; Katherine S. Pollard

The number and proportion of genes with no known function are growing rapidly. To quantify this phenomenon and provide criteria for prioritizing genes for functional characterization, we developed a bioinformatics pipeline that identifies robustly defined protein families with no annotated domains, ranks these with respect to phylogenetic breadth, and identifies them in metagenomics data. We applied this approach to 271 965 protein families from the SFams database and discovered many with no functional annotation, including >118 000 families lacking any known protein domain. From these, we prioritized 6 668 conserved protein families with at least three sequences from organisms in at least two distinct classes. These Function Unknown Families (FUnkFams) are present in Tara Oceans Expedition and Human Microbiome Project metagenomes, with distributions associated with sampling environment. Our findings highlight the extent of functional novelty in sequence databases and establish an approach for creating a “most wanted” list of genes to characterize.


Genome Research | 2016

An integrated metagenomics pipeline for strain profiling reveals novel patterns of bacterial transmission and biogeography

Stephen Nayfach; Beltran Rodriguez-Mueller; Nandita R. Garud; Katherine S. Pollard

Collaboration


Dive into the Stephen Nayfach's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Michael A. Fischbach

California Institute for Quantitative Biosciences

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Aram Avila-Herrera

Lawrence Livermore National Laboratory

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge