Guillaume Jospin
University of California, Davis
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Guillaume Jospin.
Bioinformatics | 2015
David A. Coil; Guillaume Jospin; Aaron E. Darling
MOTIVATION Open-source bacterial genome assembly remains inaccessible to many biologists because of its complexity. Few software solutions exist that are capable of automating all steps in the process of de novo genome assembly from Illumina data. RESULTS A5-miseq can produce high-quality microbial genome assemblies on a laptop computer without any parameter tuning. A5-miseq does this by automating the process of adapter trimming, quality filtering, error correction, contig and scaffold generation and detection of misassemblies. Unlike the original A5 pipeline, A5-miseq can use long reads from the Illumina MiSeq, use read pairing information during contig generation and includes several improvements to read trimming. Together, these changes result in substantially improved assemblies that recover a more complete set of reference genes than previous methods. AVAILABILITY A5-miseq is licensed under the GPL open-source license. Source code and precompiled binaries for Mac OS X 10.6+ and Linux 2.6.15+ are available from http://sourceforge.net/projects/ngopt CONTACT [email protected] SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
PeerJ | 2014
Aaron E. Darling; Guillaume Jospin; Eric Lowe; Frederick A. Matsen; Holly M. Bik; Jonathan A. Eisen
Like all organisms on the planet, environmental microbes are subject to the forces of molecular evolution. Metagenomic sequencing provides a means to access the DNA sequence of uncultured microbes. By combining DNA sequencing of microbial communities with evolutionary modeling and phylogenetic analysis we might obtain new insights into microbiology and also provide a basis for practical tools such as forensic pathogen detection. In this work we present an approach to leverage phylogenetic analysis of metagenomic sequence data to conduct several types of analysis. First, we present a method to conduct phylogeny-driven Bayesian hypothesis tests for the presence of an organism in a sample. Second, we present a means to compare community structure across a collection of many samples and develop direct associations between the abundance of certain organisms and sample metadata. Third, we apply new tools to analyze the phylogenetic diversity of microbial communities and again demonstrate how this can be associated to sample metadata. These analyses are implemented in an open source software pipeline called PhyloSift. As a pipeline, PhyloSift incorporates several other programs including LAST, HMMER, and pplacer to automate phylogenetic analysis of protein coding and RNA sequences in metagenomic datasets generated by modern sequencing platforms (e.g., Illumina, 454).
PLOS ONE | 2013
Dongying Wu; Guillaume Jospin; Jonathan A. Eisen
With the astonishing rate that genomic and metagenomic sequence data sets are accumulating, there are many reasons to constrain the data analyses. One approach to such constrained analyses is to focus on select subsets of gene families that are particularly well suited for the tasks at hand. Such gene families have generally been referred to as “marker” genes. We are particularly interested in identifying and using such marker genes for phylogenetic and phylogeny-driven ecological studies of microbes and their communities (e.g., construction of species trees, phylogenetic based assignment of metagenomic sequence reads to taxonomic groups, phylogeny-based assessment of alpha- and beta-diversity of microbial communities from metagenomic data). We therefore refer to these as PhyEco (for phylogenetic and phylogenetic ecology) markers. The dual use of these PhyEco markers means that we needed to develop and apply a set of somewhat novel criteria for identification of the best candidates for such markers. The criteria we focused on included universality across the taxa of interest, ability to be used to produce robust phylogenetic trees that reflect as much as possible the evolution of the species from which the genes come, and low variation in copy number across taxa. We describe here an automated protocol for identifying potential PhyEco markers from a set of complete genome sequences. The protocol combines rapid searching, clustering and phylogenetic tree building algorithms to generate protein families that meet the criteria listed above. We report here the identification of PhyEco markers for different taxonomic levels including 40 for “all bacteria and archaea”, 114 for “all bacteria (greatly expanding on the ∼30 commonly used), and 100 s to 1000 s for some of the individual phyla of bacteria. This new list of PhyEco markers should allow much more detailed automated phylogenetic and phylogenetic ecology analyses of these groups than possible previously.
The ISME Journal | 2013
Joshua Ladau; Thomas J. Sharpton; Mariel M. Finucane; Guillaume Jospin; Steven W. Kembel; James P. O'Dwyer; Alexander F. Koeppel; Jessica L. Green; Katherine S. Pollard
Genomic approaches to characterizing bacterial communities are revealing significant differences in diversity and composition between environments. But bacterial distributions have not been mapped at a global scale. Although current community surveys are way too sparse to map global diversity patterns directly, there is now sufficient data to fit accurate models of how bacterial distributions vary across different environments and to make global scale maps from these models. We apply this approach to map the global distributions of bacteria in marine surface waters. Our spatially and temporally explicit predictions suggest that bacterial diversity peaks in temperate latitudes across the world’s oceans. These global peaks are seasonal, occurring 6 months apart in the two hemispheres, in the boreal and austral winters. This pattern is quite different from the tropical, seasonally consistent diversity patterns observed for most macroorganisms. However, like other marine organisms, surface water bacteria are particularly diverse in regions of high human environmental impacts on the oceans. Our maps provide the first picture of bacterial distributions at a global scale and suggest important differences between the diversity patterns of bacteria compared with other organisms.
PeerJ | 2014
James Angus Chandler; Pamela M. James; Guillaume Jospin; Jenna M. Lang
Drosophila suzukii is an introduced pest insect that feeds on undamaged, attached fruit. This diet is distinct from the fallen, discomposing fruits utilized by most other species of Drosophila. Since the bacterial microbiota of Drosophila, and of many other animals, is affected by diet, we hypothesized that the bacteria associated with D. suzukii are distinct from that of other Drosophila. Using 16S rDNA PCR and Illumina sequencing, we characterized the bacterial communities of larval and adult D. suzukii collected from undamaged, attached cherries in California, USA. We find that the bacterial communities associated with these samples of D. suzukii contain a high frequency of Tatumella. Gluconobacter and Acetobacter, two taxa with known associations with Drosophila, were also found, although at lower frequency than Tatumella in four of the five samples examined. Sampling D. suzukii from different locations and/or while feeding on different fruits is needed to determine the generality of the results determined by these samples. Nevertheless this is, to our knowledge, the first study characterizing the bacterial communities of this ecologically unique and economically important species of Drosophila.
PeerJ | 2015
Madison I. Dunitz; Jenna M. Lang; Guillaume Jospin; Aaron E. Darling; Jonathan A. Eisen; David A. Coil
The sequencing, assembly, and basic analysis of microbial genomes, once a painstaking and expensive undertaking, has become much easier for research labs with access to standard molecular biology and computational tools. However, there are a confusing variety of options available for DNA library preparation and sequencing, and inexperience with bioinformatics can pose a significant barrier to entry for many who may be interested in microbial genomics. The objective of the present study was to design, test, troubleshoot, and publish a simple, comprehensive workflow from the collection of an environmental sample (a swab) to a published microbial genome; empowering even a lab or classroom with limited resources and bioinformatics experience to perform it.
BMC Bioinformatics | 2012
Thomas J. Sharpton; Guillaume Jospin; Dongying Wu; Morgan G. I. Langille; Katherine S. Pollard; Jonathan A. Eisen
BackgroundNew computational resources are needed to manage the increasing volume of biological data from genome sequencing projects. One fundamental challenge is the ability to maintain a complete and current catalog of protein diversity. We developed a new approach for the identification of protein families that focuses on the rapid discovery of homologous protein sequences.ResultsWe implemented fully automated and high-throughput procedures to de novo cluster proteins into families based upon global alignment similarity. Our approach employs an iterative clustering strategy in which homologs of known families are sifted out of the search for new families. The resulting reduction in computational complexity enables us to rapidly identify novel protein families found in new genomes and to perform efficient, automated updates that keep pace with genome sequencing. We refer to protein families identified through this approach as “Sifting Families,” or SFams. Our analysis of ~10.5 million protein sequences from 2,928 genomes identified 436,360 SFams, many of which are not represented in other protein family databases. We validated the quality of SFam clustering through statistical as well as network topology–based analyses.ConclusionsWe describe the rapid identification of SFams and demonstrate how they can be used to annotate genomes and metagenomes. The SFam database catalogs protein-family quality metrics, multiple sequence alignments, hidden Markov models, and phylogenetic trees. Our source code and database are publicly available and will be subject to frequent updates (http://edhar.genomecenter.ucdavis.edu/sifting_families/).
bioRxiv | 2016
Varvara K. Kozyreva; Guillaume Jospin; Alexander L. Greninger; James Watt; Jonathan A. Eisen; Vishnu Chaturvedi
Shigellosis is an acute diarrheal disease causing nearly half a million infections, 6,000 hospitalizations, and 70 deaths annually in the United States. S. sonnei caused two unusually large outbreaks in 2014 and 2015 in California. We used whole-genome sequencing to understand the pathogenic potential of bacteria involved in these outbreaks. Our results suggest the persistence of a local S. sonnei SDi/SJo clone in California since at least 2008. Recently, a derivative of the original clone acquired the ability to produce Shiga toxin (STX) via exchanges of bacteriophages with other bacteria. STX production is connected with more severe disease, including bloody diarrhea. A second population of S. sonnei that caused an outbreak in the San Francisco area was resistant to fluoroquinolones and showed evidence of connection to a fluoroquinolone-resistant lineage from South Asia. These emerging trends in S. sonnei populations in California must be monitored for future risks of the spread of increasingly virulent and resistant clones. ABSTRACT Shigella sonnei has caused unusually large outbreaks of shigellosis in California in 2014 and 2015. Preliminary data indicated the involvement of two distinct bacterial populations, one from San Diego and San Joaquin (SDi/SJo) and one from the San Francisco (SFr) Bay area. Whole-genome analysis and antibiotic susceptibility testing of 68 outbreak and archival isolates of S. sonnei were performed to investigate the microbiological factors related to these outbreaks. Both SDi/SJo and SFr populations, as well as almost all of the archival S. sonnei isolates belonged to sequence type 152 (ST152). Genome-wide single nucleotide polymorphism (SNP) analysis clustered the majority of California (CA) isolates to an earlier described lineage III. Isolates in the SDi/SJo population had a novel lambdoid bacteriophage carrying genes encoding Shiga toxin (STX) that were most closely related to that found in Escherichia coli O104:H4. However, the STX genes (stx1A and stx1B) from this novel phage had sequences most similar to the phages from Shigella flexneri and S. dysenteriae. The isolates in the SFr population were resistant to ciprofloxacin due to point mutations in gyrA and parC genes and were related to the fluoroquinolone-resistant S. sonnei clade within lineage III that originated in South Asia. The emergence of a highly virulent S. sonnei strain and introduction of a fluoroquinolone-resistant strain reflect the changing traits of this pathogen in California. An enhanced monitoring is advocated for early detection of future outbreaks caused by such strains. IMPORTANCE Shigellosis is an acute diarrheal disease causing nearly half a million infections, 6,000 hospitalizations, and 70 deaths annually in the United States. S. sonnei caused two unusually large outbreaks in 2014 and 2015 in California. We used whole-genome sequencing to understand the pathogenic potential of bacteria involved in these outbreaks. Our results suggest the persistence of a local S. sonnei SDi/SJo clone in California since at least 2008. Recently, a derivative of the original clone acquired the ability to produce Shiga toxin (STX) via exchanges of bacteriophages with other bacteria. STX production is connected with more severe disease, including bloody diarrhea. A second population of S. sonnei that caused an outbreak in the San Francisco area was resistant to fluoroquinolones and showed evidence of connection to a fluoroquinolone-resistant lineage from South Asia. These emerging trends in S. sonnei populations in California must be monitored for future risks of the spread of increasingly virulent and resistant clones.
Frontiers in Cellular and Infection Microbiology | 2014
Sima Tokajian; Jonathan A. Eisen; Guillaume Jospin; Anna Farra; David A. Coil
Objective: The emergence of extended-spectrum β-lactamase (ESBL)-producing bacteria is now a critical concern. The ESBL-producing Klebsiella pneumoniae constitutes one of the most common multidrug-resistant (MDR) groups of gram-negative bacteria involved in nosocomial infections worldwide. In this study we report on the molecular characterization through whole genome sequencing of an ESBL-producing K. pneumoniae strain, LAU-KP1, isolated from a stool sample from a patient admitted for a gastrointestinal procedure/surgery at the Lebanese Amrican University Medical Center-Rizk Hospital (LAUMCRH) in Lebanon. Methods: Illumina paired-end libraries were prepared and sequenced, which resulted in 4,220,969 high-quality reads. All sequence processing and assembly were performed using the A5 assembly pipeline. Results: The initial assembly produced 86 contigs, for which no scaffolding was obtained. The final collection of contigs was submitted to GenBank. The final draft genome sequence consists of a combined 5,632,663 bases with 57% G+C content. Automated annotation was performed using the RAST annotation server. Sequencing analysis revealed that the isolate harbored different β-lactamase genes, including blaoxa−1, blaCTX−M−15, blaSHV−11, and blaTEM−1b. The isolate was also characterized by the concomitant presence of other resistance determinants most notably acc(6′)-lb-cr and qnrb1. The entire plasmid content was also investigated and revealed homology with four major plasmids pKPN-IT, pBS512_2, pRSF1010_SL1344, and pKPN3. Conclusions: The potential role of K. pneumonia as a reservoir for ESBL genes and other resistance determinants is along with the presence of key factors that favor the spread of antimicrobial resistance a clear cause of concern and the problem that Carbapenem-non-susceptible ESBL isolates are posing in hospitals should be reconsidered through systematic exploration and molecular characterization.
Genome Announcements | 2015
David A. Coil; Alexandra Alexiev; Corrin Wallis; Ciaran O'Flynn; Oliver Deusch; Ian J. Davis; Alexander Horsfall; Nicola Kirkwood; Guillaume Jospin; Jonathan A. Eisen; Stephen Harris; Aaron E. Darling
ABSTRACT We present the draft genome sequences for 26 strains of Porphyromonas (P. canoris, P. gulae, P. cangingavalis, P. macacae, and 7 unidentified) and an unidentified member of the Porphyromonadaceae family. All of these strains were isolated from the canine oral cavity, from dogs with and without early periodontal disease.