Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Chirag Jain is active.

Publication


Featured researches published by Chirag Jain.


Nature Methods | 2017

Critical assessment of metagenome interpretation − a benchmark of computational metagenomics software

Alexander Sczyrba; Peter Hofmann; Peter Belmann; David Koslicki; Stefan Janssen; Johannes Droege; Ivan Gregor; Stephan Majda; Jessika Fiedler; Eik Dahms; Andreas Bremges; Adrian Fritz; Ruben Garrido-Oter; Tue Sparholt Jørgensen; Nicole Shapiro; Philip D. Blood; Alexey Gurevich; Yang Bai; Dmitrij Turaev; Matthew Z. DeMaere; Rayan Chikhi; Niranjan Nagarajan; Christopher Quince; Fernando Meyer; Monika Balvociute; Lars Hestbjerg Hansen; Søren J. Sørensen; Burton K H Chia; Bertrand Denis; Jeff Froula

Methods for assembly, taxonomic profiling and binning are key to interpreting metagenome data, but a lack of consensus about benchmarking complicates performance assessment. The Critical Assessment of Metagenome Interpretation (CAMI) challenge has engaged the global developer community to benchmark their programs on highly complex and realistic data sets, generated from ∼700 newly sequenced microorganisms and ∼600 novel viruses and plasmids and representing common experimental setups. Assembly and genome binning programs performed well for species represented by individual genomes but were substantially affected by the presence of related strains. Taxonomic profiling and binning programs were proficient at high taxonomic ranks, with a notable performance decrease below family level. Parameter settings markedly affected performance, underscoring their importance for program reproducibility. The CAMI results highlight current challenges but also provide a roadmap for software selection to answer specific research questions.


Nature Methods | 2017

Critical Assessment of Metagenome Interpretation — a benchmark of metagenomics software

Alexander Sczyrba; Peter Hofmann; Peter Belmann; David Koslicki; Stefan Janssen; Johannes Dröge; Ivan Gregor; Stephan Majda; Jessika Fiedler; Eik Dahms; Andreas Bremges; Adrian Fritz; Ruben Garrido-Oter; Tue Sparholt Jørgensen; Nicole Shapiro; Philip D. Blood; Alexey Gurevich; Yang Bai; Dmitrij Turaev; Matthew Z. DeMaere; Rayan Chikhi; Niranjan Nagarajan; Christopher Quince; Fernando Meyer; Monika Balvočiūtė; Lars Hestbjerg Hansen; Søren J. Sørensen; Burton K H Chia; Bertrand Denis; Jeff Froula

Methods for assembly, taxonomic profiling and binning are key to interpreting metagenome data, but a lack of consensus about benchmarking complicates performance assessment. The Critical Assessment of Metagenome Interpretation (CAMI) challenge has engaged the global developer community to benchmark their programs on highly complex and realistic data sets, generated from ∼700 newly sequenced microorganisms and ∼600 novel viruses and plasmids and representing common experimental setups. Assembly and genome binning programs performed well for species represented by individual genomes but were substantially affected by the presence of related strains. Taxonomic profiling and binning programs were proficient at high taxonomic ranks, with a notable performance decrease below family level. Parameter settings markedly affected performance, underscoring their importance for program reproducibility. The CAMI results highlight current challenges but also provide a roadmap for software selection to answer specific research questions.


ieee international conference on high performance computing data and analytics | 2015

A parallel connectivity algorithm for de Bruijn graphs in metagenomic applications

Patrick Flick; Chirag Jain; Tony Pan; Srinivas Aluru

Dramatic advances in DNA sequencing technology have made it possible to study microbial environments by direct sequencing of environmental DNA samples. Yet, due to the huge volume and high data complexity, current de novo assemblers cannot handle large metagenomic datasets or fail to perform assembly with acceptable quality. This paper presents the first parallel solution for decomposing the metagenomic assembly problem without compromising the post-assembly quality. We transform this problem into that of finding weakly connected components in the de Bruijn graph. We propose a novel distributed memory algorithm to identify the connected subgraphs, and present strategies to minimize the communication volume. We demonstrate the scalability of our algorithm on a soil metagenome dataset with 1.8 billion reads. Our approach achieves a runtime of 22 minutes using 1280 Intel Xeon cores for a 421 GB uncompressed FASTQ dataset. Moreover, our solution is generalizable to finding connected components in arbitrary undirected graphs.


bioRxiv | 2017

High-throughput ANI Analysis of 90K Prokaryotic Genomes Reveals Clear Species Boundaries

Chirag Jain; Luis M. Rodriguez-R; Adam M. Phillippy; Konstantinos T. Konstantinidis; Srinivas Aluru

A fundamental question in microbiology is whether there is a continuum of genetic diversity among genomes or clear species boundaries prevail instead. Answering this question requires robust measurement of whole-genome relatedness among thousands of genomes and from diverge phylogenetic lineages. Whole-genome similarity metrics such as Average Nucleotide Identity (ANI) can provide the resolution needed for this task, overcoming several limitations of traditional techniques used for the same purposes. Although the number of genomes currently available may be adequate, the associated bioinformatics tools for analysis are lagging behind these developments and cannot scale to large datasets. Here, we present a new method, FastANI, to compute ANI using alignment-free approximate sequence mapping. Our analyses demonstrate that FastANI produces an accurate ANI estimate and is up to three orders of magnitude faster when compared to an alignment (e.g., BLAST)-based approach. We leverage FastANI to compute pairwise ANI values among all prokaryotic genomes available in the NCBI database. Our results reveal a clear genetic discontinuity among the database genomes, with 99.8% of the total 8 billion genome pairs analyzed showing either >95% intra-species ANI or <83% inter-species ANI values. We further show that this discontinuity is recovered with or without the most frequently represented species in the database and is robust to historic additions in the public genome databases. Therefore, 95% ANI represents an accurate threshold for demarcating almost all currently named prokaryotic species, and wide species boundaries may exist for prokaryotes.


research in computational molecular biology | 2017

A Fast Approximate Algorithm for Mapping Long Reads to Large Reference Databases

Chirag Jain; Alexander Dilthey; Sergey Koren; Srinivas Aluru; Adam M. Phillippy

Emerging single-molecule sequencing technologies from Pacific Biosciences and Oxford Nanopore have revived interest in long read mapping algorithms. Alignment-based seed-and-extend methods demonstrate good accuracy, but face limited scalability, while faster alignment-free methods typically trade decreased precision for efficiency. In this paper, we combine a fast approximate read mapping algorithm based on minimizers with a novel MinHash identity estimation technique to achieve both scalability and precision. In contrast to prior methods, we develop a mathematical framework that defines the types of mapping targets we uncover, establish probabilistic estimates of p-value and sensitivity, and demonstrate tolerance for alignment error rates up to 20%. With this framework, our algorithm automatically adapts to different minimum length and identity requirements and provides both positional and identity estimates for each mapping reported. For mapping human PacBio reads to the hg38 reference, our method is 290x faster than BWA-MEM with a lower memory footprint and recall rate of 96%. We further demonstrate the scalability of our method by mapping noisy PacBio reads (each \(\ge 5\) kbp in length) to the complete NCBI RefSeq database containing 838 Gbp of sequence and \(> 60,000\) genomes.


IEEE/ACM Transactions on Computational Biology and Bioinformatics | 2017

Kmerind: A Flexible Parallel Library for K-mer Indexing of Biological Sequences on Distributed Memory Systems

Tony Pan; Patrick Flick; Chirag Jain; Yongchao Liu; Srinivas Aluru

Counting and indexing fixed length substrings, or


Proceedings of the ACM Workshop on Fault-Tolerance for HPC at Extreme Scale | 2016

A Self-Correcting Connected Components Algorithm

Piyush Sao; Oded Green; Chirag Jain; Richard W. Vuduc

k


research in computational molecular biology | 2018

A Fast Adaptive Algorithm for Computing Whole-Genome Homology Maps

Chirag Jain; Sergey Koren; Alexander Dilthey; Adam M. Phillippy; Srinivas Aluru

k-mers, in biological sequences is a key step in many bioinformatics tasks including genome alignment and mapping, genome assembly, and error correction. While advances in next generation sequencing technologies have dramatically reduced the cost and improved latency and throughput, few bioinformatics tools can efficiently process the datasets at the current generation rate of 1.8 terabases per 3-day experiment from a single sequencer. We present Kmerind, a high performance parallel


bioRxiv | 2018

MetaMaps - Strain-level metagenomic assignment and compositional estimation for long reads

Alexander Dilthey; Chirag Jain; Sergey Koren; Adam M. Phillippy

k


Bioinformatics | 2018

A fast adaptive algorithm for computing whole-genome homology maps

Chirag Jain; Sergey Koren; Alexander Dilthey; Adam M. Phillippy; Srinivas Aluru

k-mer indexing library for distributed memory environments. The Kmerind library provides a set of simple and consistent APIs with sequential semantics and parallel implementations that are designed to be flexible and extensible. Kmerinds

Collaboration


Dive into the Chirag Jain's collaboration.

Top Co-Authors

Avatar

Srinivas Aluru

Georgia Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Adam M. Phillippy

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Patrick Flick

Georgia Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Tony Pan

Georgia Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Alexander Dilthey

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Sergey Koren

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jeff Froula

Joint Genome Institute

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Oded Green

Georgia Institute of Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge