Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Sourav Chatterji is active.

Publication


Featured researches published by Sourav Chatterji.


PLOS ONE | 2009

Assembling the Marine Metagenome, One Cell at a Time

Tanja Woyke; Gary Xie; Alex Copeland; José M. González; Cliff Han; Hajnalka Kiss; Jimmy Hw Saw; Pavel Senin; Chi Yang; Sourav Chatterji; Jan Fang Cheng; Jonathan A. Eisen; Michael E. Sieracki; Ramunas Stepanauskas

The difficulty associated with the cultivation of most microorganisms and the complexity of natural microbial assemblages, such as marine plankton or human microbiome, hinder genome reconstruction of representative taxa using cultivation or metagenomic approaches. Here we used an alternative, single cell sequencing approach to obtain high-quality genome assemblies of two uncultured, numerically significant marine microorganisms. We employed fluorescence-activated cell sorting and multiple displacement amplification to obtain hundreds of micrograms of genomic DNA from individual, uncultured cells of two marine flavobacteria from the Gulf of Maine that were phylogenetically distant from existing cultured strains. Shotgun sequencing and genome finishing yielded 1.9 Mbp in 17 contigs and 1.5 Mbp in 21 contigs for the two flavobacteria, with estimated genome recoveries of about 91% and 78%, respectively. Only 0.24% of the assembling sequences were contaminants and were removed from further analysis using rigorous quality control. In contrast to all cultured strains of marine flavobacteria, the two single cell genomes were excellent Global Ocean Sampling (GOS) metagenome fragment recruiters, demonstrating their numerical significance in the ocean. The geographic distribution of GOS recruits along the Northwest Atlantic coast coincided with ocean surface currents. Metabolic reconstruction indicated diverse potential energy sources, including biopolymer degradation, proteorhodopsin photometabolism, and hydrogen oxidation. Compared to cultured relatives, the two uncultured flavobacteria have small genome sizes, few non-coding nucleotides, and few paralogous genes, suggesting adaptations to narrow ecological niches. These features may have contributed to the abundance of the two taxa in specific regions of the ocean, and may have hindered their cultivation. We demonstrate the power of single cell DNA sequencing to generate reference genomes of uncultured taxa from a complex microbial community of marine bacterioplankton. A combination of single cell genomics and metagenomics enabled us to analyze the genome content, metabolic adaptations, and biogeography of these taxa.


PLOS ONE | 2012

Accounting for alignment uncertainty in phylogenomics

Martin Wu; Sourav Chatterji; Jonathan A. Eisen

Uncertainty in multiple sequence alignments has a large impact on phylogenetic analyses. Little has been done to evaluate the quality of individual positions in protein sequence alignments, which directly impact the accuracy of phylogenetic trees. Here we describe ZORRO, a probabilistic masking program that accounts for alignment uncertainty by assigning confidence scores to each alignment position. Using the BALIBASE database and in simulation studies, we demonstrate that masking by ZORRO significantly reduces the alignment uncertainty and improves the tree accuracy.


research in computational molecular biology | 2008

CompostBin: a DNA composition-based algorithm for binning environmental shotgun reads

Sourav Chatterji; Ichitaro Yamazaki; Zhaojun Bai; Jonathan A. Eisen

A major hindrance to studies of microbial diversity has been that the vast majority of microbes cannot be cultured in the laboratory and thus are not amenable to traditional methods of characterization. Environmental shotgun sequencing (ESS) overcomes this hurdle by sequencing the DNA from the organisms present in a microbial community. The interpretation of this metagenomic data can be greatly facilitated by associating every sequence read with its source organism. We report the development of CompostBin, a DNA composition-based algorithm for analyzing metagenomic sequence reads and distributing them into taxon-specific bins. Unlike previous methods that seek to bin assembled contigs and often require training on known reference genomes, CompostBin has the ability to accurately bin raw sequence reads without need for assembly or training. CompostBin uses a novel weighted PCA algorithm to project the high dimensional DNA composition data into an informative lower-dimensional space, and then uses the normalized cut clustering algorithm on this filtered data set to classify sequences into taxon-specific bins. We demonstrate the algorithms accuracy on a variety of low to medium complexity data sets.


PLOS ONE | 2009

Complete Genome Sequence of the Aerobic CO-Oxidizing Thermophile Thermomicrobium roseum

Dongying Wu; Jason Raymond; Martin Wu; Sourav Chatterji; Qinghu Ren; Joel E. Graham; Donald A. Bryant; Frank T. Robb; Albert S. Colman; Luke J. Tallon; Jonathan H. Badger; Ramana Madupu; Naomi L. Ward; Jonathan A. Eisen

In order to enrich the phylogenetic diversity represented in the available sequenced bacterial genomes and as part of an “Assembling the Tree of Life” project, we determined the genome sequence of Thermomicrobium roseum DSM 5159. T. roseum DSM 5159 is a red-pigmented, rod-shaped, Gram-negative extreme thermophile isolated from a hot spring that possesses both an atypical cell wall composition and an unusual cell membrane that is composed entirely of long-chain 1,2-diols. Its genome is composed of two circular DNA elements, one of 2,006,217 bp (referred to as the chromosome) and one of 919,596 bp (referred to as the megaplasmid). Strikingly, though few standard housekeeping genes are found on the megaplasmid, it does encode a complete system for chemotaxis including both chemosensory components and an entire flagellar apparatus. This is the first known example of a complete flagellar system being encoded on a plasmid and suggests a straightforward means for lateral transfer of flagellum-based motility. Phylogenomic analyses support the recent rRNA-based analyses that led to T. roseum being removed from the phylum Thermomicrobia and assigned to the phylum Chloroflexi. Because T. roseum is a deep-branching member of this phylum, analysis of its genome provides insights into the evolution of the Chloroflexi. In addition, even though this species is not photosynthetic, analysis of the genome provides some insight into the origins of photosynthesis in the Chloroflexi. Metabolic pathway reconstructions and experimental studies revealed new aspects of the biology of this species. For example, we present evidence that T. roseum oxidizes CO aerobically, making it the first thermophile known to do so. In addition, we propose that glycosylation of its carotenoids plays a crucial role in the adaptation of the cell membrane to this bacteriums thermophilic lifestyle. Analyses of published metagenomic sequences from two hot springs similar to the one from which this strain was isolated, show that close relatives of T. roseum DSM 5159 are present but have some key differences from the strain sequenced.


Genome Biology | 2006

Reference based annotation with GeneMapper

Sourav Chatterji; Lior Pachter

We introduce GeneMapper, a program for transferring annotations from a well annotated genome to other genomes. Drawing on high quality curated annotations, GeneMapper enables rapid and accurate annotation of newly sequenced genomes and is suitable for both finished and draft genomes. GeneMapper uses a profile based approach for mapping genes into multiple species, improving upon the standard pairwise approach. GeneMapper is freely available for academic use.


research in computational molecular biology | 2004

Multiple organism gene finding by collapsed gibbs sampling

Sourav Chatterji; Lior Pachter

The Gibbs sampling method has been widely used for sequence analysis after it was successfully applied to the problem of identifying regulatory motif sequences upstream of genes. Since then numerous variants of the original idea have emerged, however in all cases the application has been to finding short motifs in collections of short sequences (typically less than 100 nucleotides long). In this paper we introduce a Gibbs sampling approach for identifying genes in multiple large genomic sequences up to hundreds of kilobases long. This approach leverages the evolutionary relationships between the sequences to improve the gene predictions, without explicitly aligning the sequences. We have applied our method to the analysis of genomic sequence from 14 genomic regions, totaling roughly 1.8Mb of sequence in each organism. We show that our approach compares favorably with existing ab-initio approaches to gene finding, including pairwise comparison based gene prediction methods which make explicit use of alignments. Furthermore, excellent performance can be obtained with as little as 4 organisms, and the method overcomes a number of difficulties of previous comparison based gene finding approaches: it is robust with respect to genomic rearrangements, can work with draft sequence, and is fast (linear in the number and length of the sequences). It can also be seamlessly integrated with Gibbs sampling motif detection methods.


international parallel and distributed processing symposium | 2003

Performance evaluation of two emerging media processors: VIRAM and Imagine

Sourav Chatterji; Manikandan Narayanan; Jason Duell; Leonid Oliker

This work presets two emerging media microprocessors, VIRAM and Imagine, and compares the implementation strategies and performance results of these unique architectures. VIRAM is a complete system on a chip which uses PIM technology to combine vector processing with embedded DRAM. Imagine is a programmable streaming architecture with a specialized memory hierarchy designed for computationally intensive data-parallel codes. First, we preset a simple and effective approach for understanding and optimizing vector/stream applications. Performance results are then presented from a number of multimedia benchmarks and a computationally intensive scientific kernel. We explore the complex interactions between programming paradigms, the architectural support at the ISA level and the underlying microarchitecture of these two systems. Our long term goal is to evaluate leading media microprocessors as possible building blocks for future high performance systems.


symposium on principles of database systems | 2002

On the complexity of approximate query optimization

Sourav Chatterji; Sai Surya Kiran Evani; Sumit Ganguly; Mahesh Datt Yemmanuru

In this work, we study the complexity of the problem of approximate query optimization. We show that, for any Δ > 0, the problem of finding a join order sequence whose cost is within a factor 2<sup>Θ(<i>log</i><sup>1-Δ</sup>(<i>K</i>))</sup> of <i>K,</i> where <i>K</i> is the cost of the optimal join order sequence is NP-Hard. The complexity gap remains if the number of edges in the query graph is constrained to be a given function <i>e</i>(<i>n</i>) of the number of vertices <i>n</i> of the query graph, where <i>n</i>(<i>n</i> - 1)/2 - Θ(<i>n</i><sup>τ</sup>) ≥ <i>e</i>(<i>n</i>) ≥ <i>n</i> + Θ(<i>n</i><sup>τ</sup>) and τ is any constant between 0 and 1. These results show that, unless P=NP, the query optimization problem cannot be approximately solved by an algorithm that runs in polynomial time and has a competitive ratio that is within some polylogarithmic factor of the optimal cost.


Journal of Computational Biology | 2005

Large multiple organism gene finding by collapsed Gibbs sampling.

Sourav Chatterji; Lior Pachter

The Gibbs sampling method has been widely used for sequence analysis after it was successfully applied to the problem of identifying regulatory motif sequences upstream of genes. Since then, numerous variants of the original idea have emerged: however, in all cases the application has been to finding short motifs in collections of short sequences (typically less than 100 nucleotides long). In this paper, we introduce a Gibbs sampling approach for identifying genes in multiple large genomic sequences up to hundreds of kilobases long. This approach leverages the evolutionary relationships between the sequences to improve the gene predictions, without explicitly aligning the sequences. We have applied our method to the analysis of genomic sequence from 14 genomic regions, totaling roughly 1.8 Mb of sequence in each organism. We show that our approach compares favorably with existing ab initio approaches to gene finding, including pairwise comparison based gene prediction methods which make explicit use of alignments. Furthermore, excellent performance can be obtained with as little as four organisms, and the method overcomes a number of difficulties of previous comparison based gene finding approaches: it is robust with respect to genomic rearrangements, can work with draft sequence, and is fast (linear in the number and length of the sequences). It can also be seamlessly integrated with Gibbs sampling motif detection methods.


Genome Biology | 2004

Prediction of Saccharomyces cerevisiae replication origins

Adam M. Breier; Sourav Chatterji; Nicholas R. Cozzarelli

Collaboration


Dive into the Sourav Chatterji's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Lior Pachter

University of California

View shared research outputs
Top Co-Authors

Avatar

Adam M. Breier

University of California

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jason Duell

Lawrence Berkeley National Laboratory

View shared research outputs
Top Co-Authors

Avatar

Leonid Oliker

Lawrence Berkeley National Laboratory

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Martin Wu

University of Virginia

View shared research outputs
Researchain Logo
Decentralizing Knowledge