Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Hamidreza Chitsaz is active.

Publication


Featured researches published by Hamidreza Chitsaz.


GigaScience | 2013

Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species

Keith Bradnam; Joseph Fass; Anton Alexandrov; Paul Baranay; Michael Bechner; Inanc Birol; Sébastien Boisvert; Jarrod Chapman; Guillaume Chapuis; Rayan Chikhi; Hamidreza Chitsaz; Wen Chi Chou; Jacques Corbeil; Cristian Del Fabbro; Roderick R. Docking; Richard Durbin; Dent Earl; Scott J. Emrich; Pavel Fedotov; Nuno A. Fonseca; Ganeshkumar Ganapathy; Richard A. Gibbs; Sante Gnerre; Élénie Godzaridis; Steve Goldstein; Matthias Haimel; Giles Hall; David Haussler; Joseph Hiatt; Isaac Ho

BackgroundThe process of generating raw genome sequence data continues to become cheaper, faster, and more accurate. However, assembly of such data into high-quality, finished genome sequences remains challenging. Many genome assembly tools are available, but they differ greatly in terms of their performance (speed, scalability, hardware requirements, acceptance of newer read technologies) and in their final output (composition of assembled sequence). More importantly, it remains largely unclear how to best assess the quality of assembled genome sequences. The Assemblathon competitions are intended to assess current state-of-the-art methods in genome assembly.ResultsIn Assemblathon 2, we provided a variety of sequence data to be assembled for three vertebrate species (a bird, a fish, and snake). This resulted in a total of 43 submitted assemblies from 21 participating teams. We evaluated these assemblies using a combination of optical map data, Fosmid sequences, and several statistical methods. From over 100 different metrics, we chose ten key measures by which to assess the overall quality of the assemblies.ConclusionsMany current genome assemblers produced useful assemblies, containing a significant representation of their genes and overall genome structure. However, the high degree of variability between the entries suggests that there is still much room for improvement in the field of genome assembly and that approaches which work well in assembling the genome of one species may not necessarily work well for another.


Nature Biotechnology | 2011

Efficient de novo assembly of single-cell bacterial genomes from short-read data sets

Hamidreza Chitsaz; Joyclyn Yee-Greenbaum; Glenn Tesler; Mary-Jane Lombardo; Christopher L. Dupont; Jonathan H. Badger; Mark Novotny; Douglas B. Rusch; Louise Fraser; Niall Anthony Gormley; Ole Schulz-Trieglaff; Geoffrey Paul Smith; Dirk Evers; Pavel A. Pevzner; Roger S. Lasken

Whole genome amplification by the multiple displacement amplification (MDA) method allows sequencing of DNA from single cells of bacteria that cannot be cultured. Assembling a genome is challenging, however, because MDA generates highly nonuniform coverage of the genome. Here we describe an algorithm tailored for short-read data from single cells that improves assembly through the use of a progressively increasing coverage cutoff. Assembly of reads from single Escherichia coli and Staphylococcus aureus cells captures >91% of genes within contigs, approaching the 95% captured from an assembly based on many E. coli cells. We apply this method to assemble a genome from a single cell of an uncultivated SAR324 clade of Deltaproteobacteria, a cosmopolitan bacterial lineage in the global ocean. Metabolic reconstruction suggests that SAR324 is aerobic, motile and chemotaxic. Our approach enables acquisition of genome assemblies for individual uncultivated bacteria using only short reads, providing cell-specific genetic information absent from metagenomic studies.Whole genome amplification by the multiple displacement amplification (MDA) method allows sequencing of genomes from single cells of bacteria that cannot be cultured. However, genome assembly is challenging because of highly non-uniform read coverage generated by MDA. We describe an improved assembly approach tailored for single cell Illumina sequences that incorporates a progressively increasing coverage cutoff. This allows variable coverage datasets to be utilized effectively with assembly of E. coli and S. aureus single cell reads capturing >91% of genes within contigs, approaching the 95% captured from a multi-cell E. coli assembly. We apply this method to assemble a single cell genome of the uncultivated SAR324 clade of Deltaproteobacteria, a cosmopolitan bacterial lineage in the global ocean. Metabolic reconstruction suggests that SAR324 is aerobic, motile and chemotaxic. These new methods enable acquisition of genome assemblies for individual uncultivated bacteria, providing cell-specific genetic information absent from metagenomic studies.


Proceedings of the National Academy of Sciences of the United States of America | 2013

Candidate phylum TM6 genome recovered from a hospital sink biofilm provides genomic insights into this uncultivated phylum

Jeffrey S. McLean; Mary-Jane Lombardo; Jonathan H. Badger; Anna Edlund; Mark Novotny; Joyclyn Yee-Greenbaum; Nikolay Vyahhi; Adam P Hall; Youngik Yang; Christopher L. Dupont; Michael G. Ziegler; Hamidreza Chitsaz; Andrew E. Allen; Shibu Yooseph; Glenn Tesler; Pavel A. Pevzner; Robert Friedman; Kenneth H. Nealson; J. C. Venter; Roger S. Lasken

Significance This research highlights the discovery and genome reconstruction of a member of the globally distributed yet uncultivated candidate phylum TM6 (designated TM6SC1). In addition to the 16S rRNA gene, no other genomic information is available for this cosmopolitan phylum. This report also introduces a mini-metagenomic approach based on the use of high-throughput single-cell genomics techniques and assembly tools that address a widely recognized issue: how to effectively capture and sequence the currently uncultivated bacterial species that make up the “dark matter of life.” Amplification and sequencing random pools of 100 events enabled an estimated 90% recovery of the TM6SC1 genome. The “dark matter of life” describes microbes and even entire divisions of bacterial phyla that have evaded cultivation and have yet to be sequenced. We present a genome from the globally distributed but elusive candidate phylum TM6 and uncover its metabolic potential. TM6 was detected in a biofilm from a sink drain within a hospital restroom by analyzing cells using a highly automated single-cell genomics platform. We developed an approach for increasing throughput and effectively improving the likelihood of sampling rare events based on forming small random pools of single-flow–sorted cells, amplifying their DNA by multiple displacement amplification and sequencing all cells in the pool, creating a “mini-metagenome.” A recently developed single-cell assembler, SPAdes, in combination with contig binning methods, allowed the reconstruction of genomes from these mini-metagenomes. A total of 1.07 Mb was recovered in seven contigs for this member of TM6 (JCVI TM6SC1), estimated to represent 90% of its genome. High nucleotide identity between a total of three TM6 genome drafts generated from pools that were independently captured, amplified, and assembled provided strong confirmation of a correct genomic sequence. TM6 is likely a Gram-negative organism and possibly a symbiont of an unknown host (nonfree living) in part based on its small genome, low-GC content, and lack of biosynthesis pathways for most amino acids and vitamins. Phylogenomic analysis of conserved single-copy genes confirms that TM6SC1 is a deeply branching phylum.


The ISME Journal | 2014

Single-cell genome and metatranscriptome sequencing reveal metabolic interactions of an alkane-degrading methanogenic community

Mallory Embree; Harish Nagarajan; Narjes S. Movahedi; Hamidreza Chitsaz; Karsten Zengler

Microbial interactions have a key role in global geochemical cycles. Although we possess significant knowledge about the general biochemical processes occurring in microbial communities, we are often unable to decipher key functions of individual microorganisms within the environment in part owing to the inability to cultivate or study them in isolation. Here, we circumvent this shortcoming through the use of single-cell genome sequencing and a novel low-input metatranscriptomics protocol to reveal the intricate metabolic capabilities and microbial interactions of an alkane-degrading methanogenic community. This methanogenic consortium oxidizes saturated hydrocarbons under anoxic conditions through a thus-far-uncharacterized biochemical process. The genome sequence of a dominant bacterial member of this community, belonging to the genus Smithella, was sequenced and served as the basis for subsequent analysis through metabolic reconstruction. Metatranscriptomic data generated from less than 500 pg of mRNA highlighted metabolically active genes during anaerobic alkane oxidation in comparison with growth on fatty acids. These data sets suggest that Smithella is not activating hexadecane by fumarate addition. Differential expression assisted in the identification of hypothetical proteins with no known homology that may be involved in hexadecane activation. Additionally, the combination of 16S rDNA sequence and metatranscriptomic data enabled the study of other prevalent organisms within the consortium and their interactions with Smithella, thus yielding a comprehensive characterization of individual constituents at the genome scale during methanogenic alkane oxidation.


GigaScience | 2013

Assemblathon 2: evaluating de novo

Keith Bradnam; Joseph Fass; Anton Alexandrov; Paul Baranay; Michael Bechner; Inanc Birol; Sébastien Boisvert; Jarrod Chapman; Guillaume Chapuis; Rayan Chikhi; Hamidreza Chitsaz; Wen-Chi Chou; Jacques Corbeil; Cristian Del Fabbro; T. Roderick Docking; Richard Durbin; Dent Earl; Scott J. Emrich; Pavel Fedotov; Nuno A. Fonseca; Ganeshkumar Ganapathy; Richard A. Gibbs; Sante Gnerre; Élénie Godzaridis; Steve Goldstein; Matthias Haimel; Giles Hall; David Haussler; Joseph Hiatt; Isaac Ho

BackgroundThe process of generating raw genome sequence data continues to become cheaper, faster, and more accurate. However, assembly of such data into high-quality, finished genome sequences remains challenging. Many genome assembly tools are available, but they differ greatly in terms of their performance (speed, scalability, hardware requirements, acceptance of newer read technologies) and in their final output (composition of assembled sequence). More importantly, it remains largely unclear how to best assess the quality of assembled genome sequences. The Assemblathon competitions are intended to assess current state-of-the-art methods in genome assembly.ResultsIn Assemblathon 2, we provided a variety of sequence data to be assembled for three vertebrate species (a bird, a fish, and snake). This resulted in a total of 43 submitted assemblies from 21 participating teams. We evaluated these assemblies using a combination of optical map data, Fosmid sequences, and several statistical methods. From over 100 different metrics, we chose ten key measures by which to assess the overall quality of the assemblies.ConclusionsMany current genome assemblers produced useful assemblies, containing a significant representation of their genes and overall genome structure. However, the high degree of variability between the entries suggests that there is still much room for improvement in the field of genome assembly and that approaches which work well in assembling the genome of one species may not necessarily work well for another.


Bioinformatics | 2012

SEQuel: improving the accuracy of genome assemblies

Roy Ronen; Christina Boucher; Hamidreza Chitsaz; Pavel A. Pevzner

Motivation: Assemblies of next-generation sequencing (NGS) data, although accurate, still contain a substantial number of errors that need to be corrected after the assembly process. We develop SEQuel, a tool that corrects errors (i.e. insertions, deletions and substitution errors) in the assembled contigs. Fundamental to the algorithm behind SEQuel is the positional de Bruijn graph, a graph structure that models k-mers within reads while incorporating the approximate positions of reads into the model. Results: SEQuel reduced the number of small insertions and deletions in the assemblies of standard multi-cell Escherichia coli data by almost half, and corrected between 30% and 94% of the substitution errors. Further, we show SEQuel is imperative to improving single-cell assembly, which is inherently more challenging due to higher error rates and non-uniform coverage; over half of the small indels, and substitution errors in the single-cell assemblies were corrected. We apply SEQuel to the recently assembled Deltaproteobacterium SAR324 genome, which is the first bacterial genome with a comprehensive single-cell genome assembly, and make over 800 changes (insertions, deletions and substitutions) to refine this assembly. Availability: SEQuel can be used as a post-processing step in combination with any NGS assembler and is freely available at http://bix.ucsd.edu/SEQuel/. Contact: [email protected]


bioinformatics and biomedicine | 2012

De novo co-assembly of bacterial genomes from multiple single cells

Narjes S. Movahedi; Elmirasadat Forouzmand; Hamidreza Chitsaz

Recent progress in DNA amplification techniques, particularly multiple displacement amplification (MDA), has made it possible to sequence and assemble bacterial genomes from a single cell. However, the quality of single cell genome assembly has not yet reached the quality of normal multiceli genome assembly due to the coverage bias and errors caused by MDA. Using a template of more than one cell for MDA or combining separate MDA products has been shown to improve the result of genome assembly from few single cells, but providing identical single cells, as a necessary step for these approaches, is a challenge. As a solution to this problem, we give an algorithm for de novo co-assembly of bacterial genomes from multiple single cells. Our novel method not only detects the outlier cells in a pool, it also identifies and eliminates their genomic sequences from the final assembly. Our proposed co-assembly algorithm is based on colored de Bruijn graph which has been recently proposed for de novo structural variation detection. Our results show that de novo co-assembly of bacterial genomes from multiple single cells outperforms single cell assembly of each individual one in all standard metrics. Moreover, co-assembly outperforms mixed assembly in which the input datasets are simply concatenated. We implemented our algorithm in a software tool called HyDA which is available from http://compbio.cs.wayne.edu/software/hyda.


PLOS ONE | 2016

Enhancing Extraction of Drug-Drug Interaction from Literature Using Neutral Candidates, Negation, and Clause Dependency

Behrouz Bokharaeian; Alberto Díaz; Hamidreza Chitsaz

Motivation Supervised biomedical relation extraction plays an important role in biomedical natural language processing, endeavoring to obtain the relations between biomedical entities. Drug-drug interactions, which are investigated in the present paper, are notably among the critical biomedical relations. Thus far many methods have been developed with the aim of extracting DDI relations. However, unfortunately there has been a scarcity of comprehensive studies on the effects of negation, complex sentences, clause dependency, and neutral candidates in the course of DDI extraction from biomedical articles. Results Our study proposes clause dependency features and a number of features for identifying neutral candidates as well as negation cues and scopes. Furthermore, our experiments indicate that the proposed features significantly improve the performance of the relation extraction task combined with other kernel methods. We characterize the contribution of each category of features and finally conclude that neutral candidate features have the most prominent role among all of the three categories.


Bioinformatics | 2013

Distilled single-cell genome sequencing and de novo assembly for sparse microbial communities

Zeinab Taghavi; Narjes S. Movahedi; Sorin Drǎghici; Hamidreza Chitsaz

MOTIVATION Identification of every single genome present in a microbial sample is an important and challenging task with crucial applications. It is challenging because there are typically millions of cells in a microbial sample, the vast majority of which elude cultivation. The most accurate method to date is exhaustive single-cell sequencing using multiple displacement amplification, which is simply intractable for a large number of cells. However, there is hope for breaking this barrier, as the number of different cell types with distinct genome sequences is usually much smaller than the number of cells. RESULTS Here, we present a novel divide and conquer method to sequence and de novo assemble all distinct genomes present in a microbial sample with a sequencing cost and computational complexity proportional to the number of genome types, rather than the number of cells. The method is implemented in a tool called Squeezambler. We evaluated Squeezambler on simulated data. The proposed divide and conquer method successfully reduces the cost of sequencing in comparison with the naïve exhaustive approach. AVAILABILITY Squeezambler and datasets are available at http://compbio.cs.wayne.edu/software/squeezambler/.


Journal of Biomedical Semantics | 2017

SNPPhenA: a corpus for extracting ranked associations of single-nucleotide polymorphisms and phenotypes from literature

Behrouz Bokharaeian; Alberto Díaz; Nasrin Taghizadeh; Hamidreza Chitsaz; Ramyar Chavoshinejad

BackgroundSingle Nucleotide Polymorphisms (SNPs) are among the most important types of genetic variations influencing common diseases and phenotypes. Recently, some corpora and methods have been developed with the purpose of extracting mutations and diseases from texts. However, there is no available corpus, for extracting associations from texts, that is annotated with linguistic-based negation, modality markers, neutral candidates, and confidence level of associations.MethodIn this research, different steps were presented so as to produce the SNPPhenA corpus. They include automatic Named Entity Recognition (NER) followed by the manual annotation of SNP and phenotype names, annotation of the SNP-phenotype associations and their level of confidence, as well as modality markers. Moreover, the produced corpus was annotated with negation scopes and cues as well as neutral candidates that play crucial role as far as negation and the modality phenomenon in relation to extraction tasks.ResultThe agreement between annotators was measured by Cohen’s Kappa coefficient where the resulting scores indicated the reliability of the corpus. The Kappa score was 0.79 for annotating the associations and 0.80 for the confidence degree of associations. Further presented were the basic statistics of the annotated features of the corpus in addition to the results of our first experiments related to the extraction of ranked SNP-Phenotype associations. The prepared guideline documents render the corpus more convenient and facile to use. The corpus, guidelines and inter-annotator agreement analysis are available on the website of the corpus: http://nil.fdi.ucm.es/?q=node/639.ConclusionSpecifying the confidence degree of SNP-phenotype associations from articles helps identify the strength of associations that could in turn assist genomics scientists in determining phenotypic plasticity and the importance of environmental factors. What is more, our first experiments with the corpus show that linguistic-based confidence alongside other non-linguistic features can be utilized in order to estimate the strength of the observed SNP-phenotype associations. Trial Registration: Not Applicable

Collaboration


Dive into the Hamidreza Chitsaz's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

David Haussler

University of California

View shared research outputs
Top Co-Authors

Avatar

Dent Earl

University of California

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Indrakshi Ray

Colorado State University

View shared research outputs
Top Co-Authors

Avatar

Isaac Ho

United States Department of Energy

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Joseph Fass

University of California

View shared research outputs
Researchain Logo
Decentralizing Knowledge