Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Katherine S. Pollard is active.

Publication


Featured researches published by Katherine S. Pollard.


bioRxiv | 2015

Population genetic analyses of metagenomes reveal extensive strain-level variation in prevalent human-associated bacteria

Stephen Nayfach; Katherine S. Pollard

Deep sequencing has the potential to shed light on the functional and phylogenetic heterogeneity of microbial populations in the environment. Here we present PhyloCNV, an integrated computational pipeline for quantifying species abundance and strain-level genomic variation from shotgun metagenomes. Our method leverages a comprehensive database of >30,000 reference genomes which we accurately clustered into species groups using a panel of universal-single-copy genes. Given a shotgun metagenome, PhyloCNV will rapidly and automatically identify gene copy number variants and single-nucleotide variants present in abundant bacterial species. We applied PhyloCNV to >500 faecal metagenomes from the United States, Europe, China, Peru, and Tanzania and present the first global analysis of strain-level variation and biogeography in the human gut microbiome. On average there is 8.5x more nucleotide diversity of strains between different individuals than within individuals, with elevated strain-level diversity in hosts from Peru and Tanzania that live rural lifestyles. For many, but not all common gut species, a significant proportion of inter-sample strain-level genetic diversity is explained by host geography. Eubacterium rectale, for example, has a highly structured population that tracks with host country, while strains of Bacteroides uniformis and other species are structured independently of their hosts. Finally, we discovered that the gene content of some bacterial strains diverges at short evolutionary timescales during which few nucleotide variants accumulate. These findings shed light onto the recent evolutionary history of microbes in the human gut and highlight the extensive differences in the gene content of closely related bacterial strains. PhyloCNV is freely available at: https://github.com/snayfach/PhyloCNV.We present the Metagenomic Intra-species Diversity Analysis System (MIDAS), which is an integrated computational pipeline for quantifying bacterial species abundance and strain-level genomic variation, including gene content and single nucleotide polymorphisms, from shotgun metagenomes. Our method leverages a database of >30,000 bacterial reference genomes which we clustered into species groups. These cover the majority of abundant species in the human microbiome but only a small proportion of microbes in other environments, including soil and seawater. We applied MIDAS to stool metagenomes from 98 Swedish mothers and their infants over one year and used rare single nucleotide variants to reveal extensive vertical transmission of strains at birth but colonization with strains unlikely to derive from the mother at later time points. This pattern was missed with species-level analysis, because the infant gut microbiome composition converges towards that of an adult over time. We also applied MIDAS to 198 globally distributed marine metagenomes and used gene content to show that many prevalent bacterial species have population structure that correlates with geographic location. Strain-level genetic variants present in metagenomes clearly reveal extensive structure and dynamics that are obscured when data is analyzed at a higher taxonomic resolution.


bioRxiv | 2018

Chromatin features constrain structural variation across evolutionary timescales

Geoffrey Fudenberg; Katherine S. Pollard

The potential impact of structural variants includes not only the duplication or deletion of coding sequences, but also the perturbation of non-coding DNA regulatory elements and structural chromatin features, including topological domains (TADs). Structural variants disrupting TAD boundaries have been implicated both in cancer and developmental disease; this likely occurs via ‘enhancer hijacking’, whereby removal of the TAD boundary exposes enhancers to new target transcription start sites (TSSs). With this functional role, we hypothesized that boundaries would display evidence for negative selection. Here we demonstrate that the chromatin landscape constrains structural variation both within healthy humans and across primate evolution. In contrast, in patients with developmental delay, variants occur remarkably uniformly across genomic features, suggesting a potentially broad role for enhancer hijacking in human disease.


bioRxiv | 2018

A Metagenomic Meta-Analysis Reveals Functional Signatures of Health and Disease in the Human Gut Microbiome

Courtney R Armour; Stephen Nayfach; Katherine S. Pollard; Thomas J. Sharpton

While recent research indicates that human health depends, in part, upon the symbiotic relationship between gut microbes and their host, the specific interactions between host and microbe that define health are poorly resolved. Metagenomic clinical studies clarify this definition by revealing gut microbial taxa and functions that stratify healthy and diseased individuals. However, the typical single-disease focus of microbiome studies limits insight into which microbiome features robustly associate with health, indicate general deviations from health, or predict specific diseases. Additionally, the focus on taxonomy may limit our understanding of how the microbiome relates to health given observations that different taxonomic members can fulfill similar functional roles. To improve our understanding of the association between the gut microbiome and health, we integrated about 2,000 gut metagenomes obtained from eight clinical studies in a statistical meta-analysis. We identify characteristics of the gut microbiome that associate generally with disease, including functional alpha-diversity, beta-diversity, and beta-dispersion. Moreover, we resolve microbiome modules that stratify diseased individuals from controls in a manner independent of study-specific effects. Many of the differentially abundant functions overlap multiple diseases suggesting a role in host health, while others are specific to a single disease and may associate with disease-specific etiologies. Our results clarify potential microbiome-mediated mechanisms of disease and reveal features of the microbiome that may be useful for the development of microbiome-based diagnostics. Ultimately, our study clarifies the definition of a healthy microbiome and how perturbations to it associate with disease.


bioRxiv | 2018

phylogenize: a web tool to identify microbial genes underlying environment associations

Patrick H. Bradley; Katherine S. Pollard

Summary: Microbes differ in prevalence across environments, but in most cases the causes remain opaque. Phylogenetic comparative methods have emerged as powerful, specific methods to identify microbial genes underlying differences in community composition. However, to apply these methods currently requires computational expertise and sequenced isolates or shotgun metagenomes, limiting their wider adoption. We present phylogenize, a web server that allows researchers to apply phylogenetic regression to 16S amplicon as well as shotgun sequencing data and to visualize results. Using data from the Human Microbiome Project, we show that phylogenize draws similar conclusions from 16S and from shotgun sequencing. Additionally, we apply phylogenize to 16S data from the Earth Microbiome Project, revealing both known and candidate pathways involved in plant colonization. phylogenize has broad applicability to the analysis of both human-associated and environmental microbiomes. Availability phylogenize is available at https://phylogenize.org with source code available at https://bitbucket.org/pbradz/phylogenize. Contact [email protected] Summary Phylogenetic comparative methods are powerful but presently under-utilized ways to identify microbial genes underlying differences in community composition. These methods help to identify functionally important genes because they test for associations beyond those expected when related microbes occupy similar environments. We present phylogenize, a pipeline with web, QIIME2, and R interfaces that allows researchers to perform phylogenetic regression on 16S amplicon and shotgun sequencing data and to visualize results. phylogenize applies broadly to both host-associated and environmental microbiomes. Using Human Microbiome Project and Earth Microbiome Project data, we show that phylogenize draws similar conclusions from 16S versus shotgun sequencing and reveals both known and candidate pathways associated with host colonization. Availability phylogenize is available at https://phylogenize.org and https://bitbucket.org/pbradz/phylogenize. Contact [email protected]


bioRxiv | 2018

Most regulatory interactions are not in linkage disequilibrium

Sean Whalen; Katherine S. Pollard

Linkage disequilibrium (LD) and genomic proximity are commonly used to map non-coding variants to genes, despite increasing examples of causal variants outside the LD block of the gene they regulate. We compared chromatin contacts in 22 cell types to LD across billions of pairs of loci in the human genome and found no concordance, even at genomic distances below 25 kilobases where both tend to be high. Gene expression and ontology data suggest that chromatin contacts identify regulatory variants more reliably than do LD and genomic proximity. We conclude that the genomic architectures of genetic and physical interactions are independent, with important implications for gene regulatory evolution and precision medicine.


bioRxiv | 2017

The geometry of the distance-decay of similarity in ecological communities

Joshua Ladau; Jessica L. Green; Katherine S. Pollard

Understanding beta-diversity has strong implications for evaluating the extent of biodiversity and formulating effective conservation policy. Here, we show that the distance-decay relationship, an important measure of beta-diversity, follows a universal form which we call the piecewise quadratic model. To derive the piecewise quadratic model, we develop a new conceptual framework which is based on geometric probability and several key insights about the roles of study design (e.g., plot dimensions and spatial distributions). We fit the piecewise quadratic model to six empirical distance-decay relationships, spanning a range of taxa and spatial scales, including surveys of tropical vegetation, mammals, and amphibians. We find that the model predicts the functional form of the relationships extremely well, with coefficients of determination in excess of 0.95. Moreover, the model predicts a phase transition at distance scales where sample plots are overlapping, which we confirm empirically. Our framework and model provide a fundamental, quantitative link between distance-decay relationships and the shapes of ranges of taxa.


bioRxiv | 2016

Cryptic functional variation in the human gut microbiome

Patrick H. Bradley; Katherine S. Pollard

While human gut microbiomes vary significantly in taxonomic composition, biological pathway abundance is surprisingly invariable across hosts. We hypothesized that healthy microbiomes appear functionally redundant due to factors that obscure differences in gene abundance across hosts. To account for these biases, we developed a powerful test of gene variability, applicable to shotgun metagenomes from any environment. Our analysis of healthy stool metagenomes reveals thousands of genes whose abundance differs signifi-cantly between people consistently across studies, including glycolytic enzymes, lipopolysac-charide biosynthetic genes, and secretion systems. Even housekeeping pathways contain a mix of variable and invariable genes, though most deeply conserved genes are significantly invariable. Variable genes tend to be associated with Proteobacteria, as opposed to taxa used to define enterotypes or the dominant phyla Bacteroidetes and Firmicutes. These re-sults establish limits on functional redundancy and predict specific genes and taxa that may drive physiological differences between gut microbiomes. Impact Statement A statistical test for gene variability reveals extensive functional differences between healthy humanmicrobiomes.Background: The human gut microbiome harbors microbes that perform diverse biochemical functions. Previous work suggested that functional variation between gut microbiota is small relative to taxonomic variation. However, these conclusions were largely based on broad pathways and qualitative patterns. Identifying microbial genes with highly variable or invariable abundance across hosts requires a new statistical test. Results: We develop a model for microbiome gene abundance that allows for differences in means between studies and accounts for the mean-variance relationship in shotgun data. Applying a test based on this model to stool metagenomes from three populations of healthy adults, we discover many significantly variable genes, including components of central carbon metabolism and other pathways comprised primarily of more stable genes. By integrating taxonomic profiles into our test for gene variability, we reveal that Proteobacteria are a major source of variable genes. Stable genes tend to have broad phylogenetic distributions, but several two-component signaling pathways and carbohydrate utilization gene families have relatively constant levels across hosts despite being taxonomically restricted. Conclusions: Gene-level tests shed light on adaptation to the gut environment, and highlight microbially-encoded functions that may respond to or cause variability in host traits.


bioRxiv | 2016

Features of ChIP-seq data peak calling algorithms with good operating characteristics

Reuben Thomas; Sean Thomas; Alisha K. Holloway; Katherine S. Pollard

Author description Reuben Thomas is a Staff Research Scientist in the Bioinformatics Core at Gladstone Institutes Sean Thomas is a Staff Research Scientist in the Bioinformatics Core at Gladstone Institutes Alisha K Holloway is the Director of Bioinformatics at Phylos Biosciences, visiting scientist at Gladstone Institutes and Adjunct Assistant Professor in Biostatistics at the University of California, San Francisco. Katherine S Pollard is a Senior Investigator at Gladstone Institutes and Professor of Biostatistics at University of California, San Francisco. Key Points Peak-calling using Chip-seq data consists of two sub-problems: identifying candidate peaks and testing candidate peaks for statistical significance. Twelve features of the two sub-problems of peak-calling methods are identified. Methods that explicitly combine the signals from ChIP and input samples are less powerful than methods that do not. Methods that use windows of different sizes to scan the genome for potential peaks are more powerful than ones that do not. Methods that use a Poisson test to rank their candidate peaks are more powerful than those that use a Binomial test. Abstract Chromatin immunoprecipitation followed by sequencing (ChIP-seq) is an important tool for studying gene regulatory proteins, such as transcription factors and histones. Peak calling is one of the first steps in analysis of these data. Peak-calling consists of two sub-problems: identifying candidate peaks and testing candidate peaks for statistical significance. We surveyed 30 methods and identified 12 features of the two sub-problems that distinguish methods from each other. We picked six methods (GEM, MACS2, MUSIC, BCP, TM and ZINBA) that span this feature space and used a combination of 300 simulated ChIP-seq data sets, 3 real data sets and mathematical analyses to identify features of methods that allow some to perform better than others. We prove that methods that explicitly combine the signals from ChIP and input samples are less powerful than methods that do not. Methods that use windows of different sizes are more powerful than ones that do not. For statistical testing of candidate peaks, methods that use a Poisson test to rank their candidate peaks are more powerful than those that use a Binomial test. BCP and MACS2 have the best operating characteristics on simulated transcription factor binding data. GEM has the highest fraction of the top 500 peaks containing the binding motif of the immunoprecipitated factor, with 50% of its peaks within 10 base pairs (bp) of a motif. BCP and MUSIC perform best on histone data. These findings provide guidance and rationale for selecting the best peak caller for a given application.


bioRxiv | 2015

Protein binding and methylation on looping chromatin accurately predict distal regulatory interactions

Sean Whalen; Rebecca M. Truty; Katherine S. Pollard

Identifying the gene targets of distal regulatory sequences is a challenging problem with the potential to illuminate the causal underpinnings of complex diseases. However, current experimental methods to map enhancer-promoter interactions genome-wide are limited by their cost and complexity. We present TargetFinder, a computational method that reconstructs a cell’s three-dimensional regulatory landscape from two-dimensional genomic features. TargetFinder achieves outstanding predictive accuracy across diverse cell lines with a false discovery rate up to fifteen times smaller than common heuristics, and reveals that distal regulatory interactions are characterized by distinct signatures of protein interactions and epigenetic marks on the DNA loop between an active enhancer and targeted promoter. Much of this signature is shared across cell types, shedding light on the role of chromatin organization in gene regulation and establishing TargetFinder as a method to accurately map long-range regulatory interactions using a small number of easily acquired datasets.


mSystems | 2018

Existing Climate Change Will Lead to Pronounced Shifts in the Diversity of Soil Prokaryotes

Joshua Ladau; Yu Shi; Xin Jing; Jin-Sheng He; Litong Chen; Xiangui Lin; Noah Fierer; Jack A. Gilbert; Katherine S. Pollard; Haiyan Chu

Collaboration


Dive into the Katherine S. Pollard's collaboration.

Top Co-Authors

Avatar

Joshua Ladau

University of California

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Geoffrey Fudenberg

Massachusetts Institute of Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge