Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where David Paez-Espino is active.

Publication


Featured researches published by David Paez-Espino.


Standards in Genomic Sciences | 2015

The standard operating procedure of the DOE-JGI Microbial Genome Annotation Pipeline (MGAP v.4)

Marcel Huntemann; Natalia Ivanova; Konstantinos Mavromatis; H. James Tripp; David Paez-Espino; Krishnaveni Palaniappan; Ernest Szeto; Manoj Pillay; I-Min A. Chen; Amrita Pati; Torben Nielsen; Victor Markowitz; Nikos C. Kyrpides

The DOE-JGI Microbial Genome Annotation Pipeline performs structural and functional annotation of microbial genomes that are further included into the Integrated Microbial Genome comparative analysis system. MGAP is applied to assembled nucleotide sequence datasets that are provided via the IMG submission site. Dataset submission for annotation first requires project and associated metadata description in GOLD. The MGAP sequence data processing consists of feature prediction including identification of protein-coding genes, non-coding RNAs and regulatory RNA features, as well as CRISPR elements. Structural annotation is followed by assignment of protein product names and functions.


Nature Biotechnology | 2017

1,003 reference genomes of bacterial and archaeal isolates expand coverage of the tree of life

Supratim Mukherjee; Rekha Seshadri; Neha Varghese; Emiley A. Eloe-Fadrosh; Jan P. Meier-Kolthoff; Markus Göker; R. Cameron Coates; Michalis Hadjithomas; Georgios A. Pavlopoulos; David Paez-Espino; Yasuo Yoshikuni; Axel Visel; William B. Whitman; George M Garrity; Jonathan A. Eisen; Philip Hugenholtz; Amrita Pati; Natalia Ivanova; Tanja Woyke; Hans-Peter Klenk; Nikos C. Kyrpides

We present 1,003 reference genomes that were sequenced as part of the Genomic Encyclopedia of Bacteria and Archaea (GEBA) initiative, selected to maximize sequence coverage of phylogenetic space. These genomes double the number of existing type strains and expand their overall phylogenetic diversity by 25%. Comparative analyses with previously available finished and draft genomes reveal a 10.5% increase in novel protein families as a function of phylogenetic diversity. The GEBA genomes recruit 25 million previously unassigned metagenomic proteins from 4,650 samples, improving their phylogenetic and functional interpretation. We identify numerous biosynthetic clusters and experimentally validate a divergent phenazine cluster with potential new chemical structure and antimicrobial activity. This Resource is the largest single release of reference genomes to date. Bacterial and archaeal isolate sequence space is still far from saturated, and future endeavors in this direction will continue to be a valuable resource for scientific discovery.


Nucleic Acids Research | 2017

IMG/VR: A database of cultured and uncultured DNA viruses and retroviruses

David Paez-Espino; I.-Min A. Chen; Krishna Palaniappan; Anna Ratner; Ken Chu; Ernest Szeto; Manoj Pillay; Jinghua Huang; Victor Markowitz; Torben Nielsen; Marcel Huntemann; T. B.K. Reddy; Georgios A. Pavlopoulos; Matthew B. Sullivan; Barbara J. Campbell; Feng Chen; Katherine D. McMahon; Steve J. Hallam; Vincent J. Denef; Ricardo Cavicchioli; Sean M. Caffrey; Wolfgang R. Streit; John Webster; Kim M. Handley; Ghasem H. Salekdeh; Nicolas Tsesmetzis; João C. Setubal; Phillip B. Pope; Wen Tso Liu; Adam R. Rivers

Viruses represent the most abundant life forms on the planet. Recent experimental and computational improvements have led to a dramatic increase in the number of viral genome sequences identified primarily from metagenomic samples. As a result of the expanding catalog of metagenomic viral sequences, there exists a need for a comprehensive computational platform integrating all these sequences with associated metadata and analytical tools. Here we present IMG/VR (https://img.jgi.doe.gov/vr/), the largest publicly available database of 3908 isolate reference DNA viruses with 264 413 computationally identified viral contigs from >6000 ecologically diverse metagenomic samples. Approximately half of the viral contigs are grouped into genetically distinct quasi-species clusters. Microbial hosts are predicted for 20 000 viral sequences, revealing nine microbial phyla previously unreported to be infected by viruses. Viral sequences can be queried using a variety of associated metadata, including habitat type and geographic location of the samples, or taxonomic classification according to hallmark viral genes. IMG/VR has a user-friendly interface that allows users to interrogate all integrated data and interact by comparing with external sequences, thus serving as an essential resource in the viral genomics community.


The ISME Journal | 2015

Antarctic archaea–virus interactions: metaproteome-led analysis of invasion, evasion and adaptation

Bernhard Tschitschko; Timothy J. Williams; Michelle A. Allen; David Paez-Espino; Nikos C. Kyrpides; Ling Zhong; Mark J. Raftery; Ricardo Cavicchioli

Despite knowledge that viruses are abundant in natural ecosystems, there is limited understanding of which viruses infect which hosts, and how both hosts and viruses respond to those interactions—interactions that ultimately shape community structure and dynamics. In Deep Lake, Antarctica, intergenera gene exchange occurs rampantly within the low complexity, haloarchaea-dominated community, strongly balanced by distinctions in niche adaptation which maintain sympatric speciation. By performing metaproteomics for the first time on haloarchaea, genomic variation of S-layer, archaella and other cell surface proteins was linked to mechanisms of infection evasion. CRISPR defense systems were found to be active, with haloarchaea responding to at least eight distinct types of viruses, including those infecting between genera. The role of BREX systems in defending against viruses was also examined. Although evasion and defense were evident, both hosts and viruses also may benefit from viruses carrying and expressing host genes, thereby potentially enhancing genetic variation and phenotypic differences within populations. The data point to a complex inter-play leading to a dynamic optimization of host–virus interactions. This comprehensive overview was achieved only through the integration of results from metaproteomics, genomics and metagenomics.


Nature Biotechnology | 2018

Cultivation and sequencing of rumen microbiome members from the Hungate1000 Collection

Rekha Seshadri; Sinead C. Leahy; Graeme T. Attwood; Koon Hoong Teh; Suzanne C. Lambie; Adrian L. Cookson; Emiley A. Eloe-Fadrosh; Georgios A. Pavlopoulos; Michalis Hadjithomas; Neha Varghese; David Paez-Espino; Nikola Palevich; Peter H. Janssen; Ron S. Ronimus; Samantha Noel; Priya Soni; Kerri Reilly; Todd Atherly; Cherie J. Ziemer; André-Denis G. Wright; Suzanne Ishaq; Michael A. Cotta; Stephanie Thompson; Katie Crosley; Nest McKain; R. John Wallace; Harry J. Flint; Jennifer C. Martin; Robert J Forster; Robert J Gruninger

Productivity of ruminant livestock depends on the rumen microbiota, which ferment indigestible plant polysaccharides into nutrients used for growth. Understanding the functions carried out by the rumen microbiota is important for reducing greenhouse gas production by ruminants and for developing biofuels from lignocellulose. We present 410 cultured bacteria and archaea, together with their reference genomes, representing every cultivated rumen-associated archaeal and bacterial family. We evaluate polysaccharide degradation, short-chain fatty acid production and methanogenesis pathways, and assign specific taxa to functions. A total of 336 organisms were present in available rumen metagenomic data sets, and 134 were present in human gut microbiome data sets. Comparison with the human microbiome revealed rumen-specific enrichment for genes encoding de novo synthesis of vitamin B12, ongoing evolution by gene loss and potential vertical inheritance of the rumen microbiome based on underrepresentation of markers of environmental stress. We estimate that our Hungate genome resource represents ∼75% of the genus-level bacterial and archaeal taxa present in the rumen.


Mbio | 2017

On the origin of reverse transcriptase- using CRISPR-Cas systems and their hyperdiverse, enigmatic spacer repertoires

Sukrit Silas; Kira S. Makarova; Sergey Shmakov; David Paez-Espino; Georg Mohr; Yi Liu; Michelle Davison; Simon Roux; Siddharth R. Krishnamurthy; Becky Xu Hua Fu; Loren Hansen; David Wang; Matthew B. Sullivan; Andrew D. Millard; Martha R. J. Clokie; Devaki Bhaya; Alan M. Lambowitz; Nikos C. Kyrpides; Eugene V. Koonin; Andrew Fire

ABSTRACT Cas1 integrase is the key enzyme of the clustered regularly interspaced short palindromic repeat (CRISPR)-Cas adaptation module that mediates acquisition of spacers derived from foreign DNA by CRISPR arrays. In diverse bacteria, the cas1 gene is fused (or adjacent) to a gene encoding a reverse transcriptase (RT) related to group II intron RTs. An RT-Cas1 fusion protein has been recently shown to enable acquisition of CRISPR spacers from RNA. Phylogenetic analysis of the CRISPR-associated RTs demonstrates monophyly of the RT-Cas1 fusion, and coevolution of the RT and Cas1 domains. Nearly all such RTs are present within type III CRISPR-Cas loci, but their phylogeny does not parallel the CRISPR-Cas type classification, indicating that RT-Cas1 is an autonomous functional module that is disseminated by horizontal gene transfer and can function with diverse type III systems. To compare the sequence pools sampled by RT-Cas1-associated and RT-lacking CRISPR-Cas systems, we obtained samples of a commercially grown cyanobacterium—Arthrospira platensis. Sequencing of the CRISPR arrays uncovered a highly diverse population of spacers. Spacer diversity was particularly striking for the RT-Cas1-containing type III-B system, where no saturation was evident even with millions of sequences analyzed. In contrast, analysis of the RT-lacking type III-D system yielded a highly diverse pool but reached a point where fewer novel spacers were recovered as sequencing depth was increased. Matches could be identified for a small fraction of the non-RT-Cas1-associated spacers, and for only a single RT-Cas1-associated spacer. Thus, the principal source(s) of the spacers, particularly the hypervariable spacer repertoire of the RT-associated arrays, remains unknown. IMPORTANCE While the majority of CRISPR-Cas immune systems adapt to foreign genetic elements by capturing segments of invasive DNA, some systems carry reverse transcriptases (RTs) that enable adaptation to RNA molecules. From analysis of available bacterial sequence data, we find evidence that RT-based RNA adaptation machinery has been able to join with CRISPR-Cas immune systems in many, diverse bacterial species. To investigate whether the abilities to adapt to DNA and RNA molecules are utilized for defense against distinct classes of invaders in nature, we sequenced CRISPR arrays from samples of commercial-scale open-air cultures of Arthrospira platensis, a cyanobacterium that contains both RT-lacking and RT-containing CRISPR-Cas systems. We uncovered a diverse pool of naturally occurring immune memories, with the RT-lacking locus acquiring a number of segments matching known viral or bacterial genes, while the RT-containing locus has acquired spacers from a distinct sequence pool for which the source remains enigmatic. While the majority of CRISPR-Cas immune systems adapt to foreign genetic elements by capturing segments of invasive DNA, some systems carry reverse transcriptases (RTs) that enable adaptation to RNA molecules. From analysis of available bacterial sequence data, we find evidence that RT-based RNA adaptation machinery has been able to join with CRISPR-Cas immune systems in many, diverse bacterial species. To investigate whether the abilities to adapt to DNA and RNA molecules are utilized for defense against distinct classes of invaders in nature, we sequenced CRISPR arrays from samples of commercial-scale open-air cultures of Arthrospira platensis, a cyanobacterium that contains both RT-lacking and RT-containing CRISPR-Cas systems. We uncovered a diverse pool of naturally occurring immune memories, with the RT-lacking locus acquiring a number of segments matching known viral or bacterial genes, while the RT-containing locus has acquired spacers from a distinct sequence pool for which the source remains enigmatic.


Nature microbiology | 2018

Murine colitis reveals a disease-associated bacteriophage community

Breck A. Duerkop; Manuel Kleiner; David Paez-Espino; Wenhan Zhu; Brian Bushnell; Brian Hassell; Sebastian E. Winter; Nikos C. Kyrpides; Lora V. Hooper

The dysregulation of intestinal microbial communities is associated with inflammatory bowel diseases (IBD). Studies aimed at understanding the contribution of the microbiota to inflammatory diseases have primarily focused on bacteria, yet the intestine harbours a viral component dominated by prokaryotic viruses known as bacteriophages (phages). Phage numbers are elevated at the intestinal mucosal surface and phages increase in abundance during IBD, suggesting that phages play an unidentified role in IBD. We used a sequence-independent approach for the selection of viral contigs and then applied quantitative metagenomics to study intestinal phages in a mouse model of colitis. We discovered that during colitis the intestinal phage population is altered and transitions from an ordered state to a stochastic dysbiosis. We identified phages specific to pathobiotic hosts associated with intestinal disease, whose abundances are altered during colitis. Additionally, phage populations in healthy and diseased mice overlapped with phages from healthy humans and humans with IBD. Our findings indicate that intestinal phage communities are altered during inflammatory disease, establishing a platform for investigating phage involvement in IBD.Quantitative metagenomics reveals an altered bacteriophage community in a mouse model of colitis, which overlaps with that observed in humans with inflammatory bowel disease (IBD), providing a tool for interrogating phage dynamics in IBD.


Standards in Genomic Sciences | 2016

Erratum to: The standard operating procedure of the DOE-JGI Microbial Genome Annotation Pipeline (MGAP v.4)

Marcel Huntemann; Natalia Ivanova; Konstantinos Mavromatis; H. James Tripp; David Paez-Espino; Krishnaveni Palaniappan; Ernest Szeto; Manoj Pillay; I-Min A. Chen; Amrita Pati; Torben Nielsen; Victor Markowitz; Nikos C. Kyrpides

[This corrects the article DOI: 10.1186/s40793-015-0077-y.].


Scientific Reports | 2018

Investigation of recombination-intense viral groups and their genes in the Earth’s virome

Jan P. Meier-Kolthoff; Jumpei Uchiyama; Hiroko Yahara; David Paez-Espino; Koji Yahara

Bacteriophages (phages), or bacterial viruses, are the most abundant and diverse biological entities that impact the global ecosystem. Recent advances in metagenomics have revealed their rampant abundance in the biosphere. A fundamental aspect of bacteriophages that remains unexplored in metagenomic data is the process of recombination as a driving force in evolution that occurs among different viruses within the same bacterial host. Here, we systematically examined signatures of recombination in every gene from 211 species-level viral groups in a recently obtained dataset of the Earth’s virome that contain corresponding information on the host bacterial species. Our study revealed that signatures of recombination are widespread (84%) among the diverse viral groups. We identified 25 recombination-intense viral groups, widely distributed across the viral taxonomy, and present in bacterial species living in the human oral cavity. We also revealed a significant inverse association between the recombination-intense viral groups and Type II restriction endonucleases, that could be effective in reducing recombination among phages in a cell. Furthermore, we identified recombination-intense genes that are significantly enriched for encoding phage morphogenesis proteins. Changes in the viral genomic sequence by recombination may be important to escape cleavage by the host bacterial immune systems.


Science | 2018

Programmed DNA destruction by miniature CRISPR-Cas14 enzymes

Lucas B. Harrington; David Burstein; Janice S. Chen; David Paez-Espino; Enbo Ma; Isaac P. Witte; Joshua C. Cofsky; Nikos C. Kyrpides; Jillian F. Banfield; Jennifer A. Doudna

A programmable type of CRISPR system CRISPR-Cas9 systems have been causing a revolution in biology. Harrington et al. describe the discovery and technological implementation of an additional type of CRISPR system based on an extracompact effector protein, Cas14. Metagenomics data, particularly from uncultivated samples, uncovered the CRISPR-Cas14 systems containing all the components necessary for adaptive immunity in prokaryotes. At half the size of class 2 CRISPR effectors, Cas14 appears to target single-stranded DNA without class 2 sequence restrictions. By leveraging this activity, a fast and high-fidelity nucleic acid detection system enabled detection of single-nucleotide polymorphisms. Science, this issue p. 839 Identification, characterization, and technological implementation of additional archaea-derived CRISPR-Cas14 systems are described. CRISPR-Cas systems provide microbes with adaptive immunity to infectious nucleic acids and are widely employed as genome editing tools. These tools use RNA-guided Cas proteins whose large size (950 to 1400 amino acids) has been considered essential to their specific DNA- or RNA-targeting activities. Here we present a set of CRISPR-Cas systems from uncultivated archaea that contain Cas14, a family of exceptionally compact RNA-guided nucleases (400 to 700 amino acids). Despite their small size, Cas14 proteins are capable of targeted single-stranded DNA (ssDNA) cleavage without restrictive sequence requirements. Moreover, target recognition by Cas14 triggers nonspecific cutting of ssDNA molecules, an activity that enables high-fidelity single-nucleotide polymorphism genotyping (Cas14-DETECTR). Metagenomic data show that multiple CRISPR-Cas14 systems evolved independently and suggest a potential evolutionary origin of single-effector CRISPR-based adaptive immunity.

Collaboration


Dive into the David Paez-Espino's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Amrita Pati

Joint Genome Institute

View shared research outputs
Top Co-Authors

Avatar

Ernest Szeto

Lawrence Berkeley National Laboratory

View shared research outputs
Top Co-Authors

Avatar

Manoj Pillay

Lawrence Berkeley National Laboratory

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Natalia Ivanova

United States Department of Energy

View shared research outputs
Top Co-Authors

Avatar

Torben Nielsen

United States Department of Energy

View shared research outputs
Top Co-Authors

Avatar

Victor Markowitz

Lawrence Berkeley National Laboratory

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Enbo Ma

University of California

View shared research outputs
Researchain Logo
Decentralizing Knowledge