Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Kiran Garimella is active.

Publication


Featured researches published by Kiran Garimella.


Genome Research | 2010

The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data

Aaron McKenna; Matthew Hanna; Eric Banks; Andrey Sivachenko; Kristian Cibulskis; Andrew Kernytsky; Kiran Garimella; David Altshuler; Stacey B. Gabriel; Mark J. Daly; Mark A. DePristo

Next-generation DNA sequencing (NGS) projects, such as the 1000 Genomes Project, are already revolutionizing our understanding of genetic variation among individuals. However, the massive data sets generated by NGS--the 1000 Genome pilot alone includes nearly five terabases--make writing feature-rich, efficient, and robust analysis tools difficult for even computationally sophisticated individuals. Indeed, many professionals are limited in the scope and the ease with which they can answer scientific questions by the complexity of accessing and manipulating the data produced by these machines. Here, we discuss our Genome Analysis Toolkit (GATK), a structured programming framework designed to ease the development of efficient and robust analysis tools for next-generation DNA sequencers using the functional programming philosophy of MapReduce. The GATK provides a small but rich set of data access patterns that encompass the majority of analysis tool needs. Separating specific analysis calculations from common data management infrastructure enables us to optimize the GATK framework for correctness, stability, and CPU and memory efficiency and to enable distributed and shared memory parallelization. We highlight the capabilities of the GATK by describing the implementation and application of robust, scale-tolerant tools like coverage calculators and single nucleotide polymorphism (SNP) calling. We conclude that the GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.


Nature Genetics | 2011

A framework for variation discovery and genotyping using next-generation DNA sequencing data

Mark A. DePristo; Eric Banks; Ryan Poplin; Kiran Garimella; Jared Maguire; Christopher Hartl; Anthony A. Philippakis; Guillermo Del Angel; Manuel A. Rivas; Matt Hanna; Aaron McKenna; Timothy Fennell; Andrew Kernytsky; Andrey Sivachenko; Kristian Cibulskis; Stacey B. Gabriel; David Altshuler; Mark J. Daly

Recent advances in sequencing technology make it possible to comprehensively catalog genetic variation in population samples, creating a foundation for understanding human disease, ancestry and evolution. The amounts of raw data produced are prodigious, and many computational steps are required to translate this output into high-quality variant calls. We present a unified analytic framework to discover and genotype variation among multiple samples simultaneously that achieves sensitive and specific results across five sequencing technologies and three distinct, canonical experimental designs. Our process includes (i) initial read mapping; (ii) local realignment around indels; (iii) base quality score recalibration; (iv) SNP discovery and genotyping to find all potential variants; and (v) machine learning to separate true segregating variation from machine artifacts common to next-generation sequencing technologies. We here discuss the application of these tools, instantiated in the Genome Analysis Toolkit, to deep whole-genome, whole-exome capture and multi-sample low-pass (∼4×) 1000 Genomes Project datasets.


Nature | 2012

Patterns and rates of exonic de novo mutations in autism spectrum disorders

Benjamin M. Neale; Yan Kou; Li Liu; Avi Ma'ayan; Kaitlin E. Samocha; Aniko Sabo; Chiao-Feng Lin; Christine Stevens; Li-San Wang; Vladimir Makarov; Pazi Penchas Polak; Seungtai Yoon; Jared Maguire; Emily L. Crawford; Nicholas G. Campbell; Evan T. Geller; Otto Valladares; Chad Shafer; Han Liu; Tuo Zhao; Guiqing Cai; Jayon Lihm; Ruth Dannenfelser; Omar Jabado; Zuleyma Peralta; Uma Nagaswamy; Donna M. Muzny; Jeffrey G. Reid; Irene Newsham; Yuanqing Wu

Autism spectrum disorders (ASD) are believed to have genetic and environmental origins, yet in only a modest fraction of individuals can specific causes be identified. To identify further genetic risk factors, here we assess the role of de novo mutations in ASD by sequencing the exomes of ASD cases and their parents (n = 175 trios). Fewer than half of the cases (46.3%) carry a missense or nonsense de novo variant, and the overall rate of mutation is only modestly higher than the expected rate. In contrast, the proteins encoded by genes that harboured de novo missense or nonsense mutations showed a higher degree of connectivity among themselves and to previous ASD genes as indexed by protein-protein interaction screens. The small increase in the rate of de novo events, when taken together with the protein interaction results, are consistent with an important but limited role for de novo point mutations in ASD, similar to that documented for de novo copy number variants. Genetic models incorporating these data indicate that most of the observed de novo events are unconnected to ASD; those that do confer risk are distributed across many genes and are incompletely penetrant (that is, not necessarily sufficient for disease). Our results support polygenic models in which spontaneous coding mutations in any of a large number of genes increases risk by 5- to 20-fold. Despite the challenge posed by such models, results from de novo events and a large parallel case–control study provide strong evidence in favour of CHD8 and KATNAL2 as genuine autism risk factors.


Current protocols in human genetics | 2013

From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline.

Geraldine A. Van der Auwera; Mauricio O. Carneiro; Christopher Hartl; Ryan Poplin; Guillermo Del Angel; Ami Levy-Moonshine; Tadeusz Jordan; Khalid Shakir; David Roazen; Joel Thibault; Eric Banks; Kiran Garimella; David Altshuler; Stacey Gabriel; Mark A. DePristo

This unit describes how to use BWA and the Genome Analysis Toolkit (GATK) to map genome sequencing data to a reference and produce high‐quality variant calls that can be used in downstream analyses. The complete workflow includes the core NGS data‐processing steps that are necessary to make the raw data suitable for analysis by the GATK, as well as the key methods involved in variant discovery using the GATK. Curr. Protoc. Bioinform. 43:11.10.1‐11.10.33.


Nature | 2014

A polygenic burden of rare disruptive mutations in schizophrenia

Shaun Purcell; Jennifer L. Moran; Menachem Fromer; Douglas M. Ruderfer; Nadia Solovieff; Panos Roussos; Colm O'Dushlaine; K D Chambert; Sarah E. Bergen; Anna K. Kähler; Laramie Duncan; Eli A. Stahl; Giulio Genovese; Esperanza Fernández; Mark O. Collins; Noboru H. Komiyama; Jyoti S. Choudhary; Patrik K. E. Magnusson; Eric Banks; Khalid Shakir; Kiran Garimella; Timothy Fennell; Mark DePristo; Seth G. N. Grant; Stephen J. Haggarty; Stacey Gabriel; Edward M. Scolnick; Eric S. Lander; Christina M. Hultman; Patrick F. Sullivan

Schizophrenia is a common disease with a complex aetiology, probably involving multiple and heterogeneous genetic factors. Here, by analysing the exome sequences of 2,536 schizophrenia cases and 2,543 controls, we demonstrate a polygenic burden primarily arising from rare (less than 1 in 10,000), disruptive mutations distributed across many genes. Particularly enriched gene sets include the voltage-gated calcium ion channel and the signalling complex formed by the activity-regulated cytoskeleton-associated scaffold protein (ARC) of the postsynaptic density, sets previously implicated by genome-wide association and copy-number variation studies. Similar to reports in autism, targets of the fragile X mental retardation protein (FMRP, product of FMR1) are enriched for case mutations. No individual gene-based test achieves significance after correction for multiple testing and we do not detect any alleles of moderately low frequency (approximately 0.5 to 1 per cent) and moderately large effect. Taken together, these data suggest that population-based exome sequencing can discover risk alleles and complements established gene-mapping paradigms in neuropsychiatric disease.


Nature Genetics | 2011

Variation in genome-wide mutation rates within and between human families.

Donald F. Conrad; Jonathan E. M. Keebler; Mark A. DePristo; Sarah J. Lindsay; Yujun Zhang; Ferran Casals; Youssef Idaghdour; Chris Hartl; Carlos Torroja; Kiran Garimella; Martine Zilversmit; Reed A. Cartwright; Guy A. Rouleau; Mark J. Daly; Eric A. Stone

J.B.S. Haldane proposed in 1947 that the male germline may be more mutagenic than the female germline. Diverse studies have supported Haldanes contention of a higher average mutation rate in the male germline in a variety of mammals, including humans. Here we present, to our knowledge, the first direct comparative analysis of male and female germline mutation rates from the complete genome sequences of two parent-offspring trios. Through extensive validation, we identified 49 and 35 germline de novo mutations (DNMs) in two trio offspring, as well as 1,586 non-germline DNMs arising either somatically or in the cell lines from which the DNA was derived. Most strikingly, in one family, we observed that 92% of germline DNMs were from the paternal germline, whereas, in contrast, in the other family, 64% of DNMs were from the maternal germline. These observations suggest considerable variation in mutation rates within and between families.


Nature Genetics | 2012

Exome sequencing and the genetic basis of complex traits

Adam Kiezun; Kiran Garimella; Ron Do; Nathan O. Stitziel; Benjamin M. Neale; Paul J. McLaren; Namrata Gupta; Pamela Sklar; Patrick F. Sullivan; Jennifer L. Moran; Christina M. Hultman; Paul Lichtenstein; Patrik K. E. Magnusson; Thomas Lehner; Yin Yao Shugart; Alkes L. Price; Paul I. W. de Bakker; Shaun Purcell; Shamil R. Sunyaev

Shamil Sunyaev and colleagues present exome sequencing methods and their applications in studies to identify the genetic basis of human complex traits. They include analyses of the whole-exome sequences of 438 individuals from across several studies.


Genome Biology | 2011

The functional spectrum of low-frequency coding variation.

Gabor T. Marth; Fuli Yu; Amit Indap; Kiran Garimella; Simon Gravel; Wen Fung Leong; Chris Tyler-Smith; Matthew N. Bainbridge; Thomas W. Blackwell; Xiangqun Zheng-Bradley; Yuan Chen; Danny Challis; Laura Clarke; Edward V. Ball; Kristian Cibulskis; David Neil Cooper; Bob Fulton; Chris Hartl; Dan Koboldt; Donna M. Muzny; Richard Smith; Carrie Sougnez; Chip Stewart; Alistair Ward; Jin Yu; Yali Xue; David Altshuler; Carlos Bustamante; Andrew G. Clark; Mark J. Daly

BackgroundRare coding variants constitute an important class of human genetic variation, but are underrepresented in current databases that are based on small population samples. Recent studies show that variants altering amino acid sequence and protein function are enriched at low variant allele frequency, 2 to 5%, but because of insufficient sample size it is not clear if the same trend holds for rare variants below 1% allele frequency.ResultsThe 1000 Genomes Exon Pilot Project has collected deep-coverage exon-capture data in roughly 1,000 human genes, for nearly 700 samples. Although medical whole-exome projects are currently afoot, this is still the deepest reported sampling of a large number of human genes with next-generation technologies. According to the goals of the 1000 Genomes Project, we created effective informatics pipelines to process and analyze the data, and discovered 12,758 exonic SNPs, 70% of them novel, and 74% below 1% allele frequency in the seven population samples we examined. Our analysis confirms that coding variants below 1% allele frequency show increased population-specificity and are enriched for functional variants.ConclusionsThis study represents a large step toward detecting and interpreting low frequency coding variation, clearly lays out technical steps for effective analysis of DNA capture data, and articulates functional and population properties of this important class of genetic variation.


robotics and applications | 2012

Exome Sequencing Can Improve Diagnosis and Alter Patient Management

Tracy Dixon-Salazar; Jennifer L. Silhavy; Nitin Udpa; Jana Schroth; Ashleigh E. Schaffer; Jesus Olvera; Vineet Bafna; Maha S. Zaki; Ghada M.H. Abdel-Salam; Lobna Mansour; Laila Selim; Sawsan Abdel-Hadi; Naima Marzouki; Tawfeg Ben-Omran; Nouriya A. Al-Saana; F. Müjgan Sönmez; Figen Celep; Matloob Azam; Kiley J. Hill; Adrienne Collazo; Ali G. Fenstermaker; Gaia Novarino; Naiara Akizu; Kiran Garimella; Carrie Sougnez; Carsten Russ; Stacey Gabriel; Joseph G. Gleeson

Exome sequencing of 118 patients with neurodevelopmental disorders shows that this technique is useful for identifying new pathogenic mutations and for correcting diagnosis in ~10% of cases. A Needle in a Haystack Exome sequencing enables evaluation of all protein-coding variants in an individual genome and promises to revolutionize the practice of clinical genetics as it moves from the lab into the clinic. Bringing this technology to the clinic affords the opportunity not just to identify new disease-causing mutations but also to clarify disease presentation and diagnosis. There are many challenges to implementing this technology, however, including which patients to select for analysis, how to rank and prioritize the genetic variants, and how to align the data with the clinical record. In new work, Dixon-Salazar et al. studied a cohort of 118 probands with genetic forms of neurodevelopmental disease, all derived from consanguineous unions, using exome sequencing. All patients were previously excluded for genes most likely to cause their disease. The authors analyzed the exome sequences with a standardized bioinformatic pipeline. They found mutations in known disease-causing genes that in about 10% of cases led to a change in the underlying diagnosis. In 19% of cases, they identified mutations in genes not previously linked to disease. In the remaining cases, the genetic causes remained elusive. Thus, exome sequencing may both improve diagnosis and lead to alterations in patient management in some patients with neurodevelopmental disorders. However, analysis of more than one individual will be required to increase the success rate of identifying the causative mutation in most cases. The translation of “next-generation” sequencing directly to the clinic is still being assessed but has the potential for genetic diseases to reduce costs, advance accuracy, and point to unsuspected yet treatable conditions. To study its capability in the clinic, we performed whole-exome sequencing in 118 probands with a diagnosis of a pediatric-onset neurodevelopmental disease in which most known causes had been excluded. Twenty-two genes not previously identified as disease-causing were identified in this study (19% of cohort), further establishing exome sequencing as a useful tool for gene discovery. New genes identified included EXOC8 in Joubert syndrome and GFM2 in a patient with microcephaly, simplified gyral pattern, and insulin-dependent diabetes. Exome sequencing uncovered 10 probands (8% of cohort) with mutations in genes known to cause a disease different from the initial diagnosis. Upon further medical evaluation, these mutations were found to account for each proband’s disease, leading to a change in diagnosis, some of which led to changes in patient management. Our data provide proof of principle that genomic strategies are useful in clarifying diagnosis in a proportion of patients with neurodevelopmental disorders.


Nature Genetics | 2012

Extremely low-coverage sequencing and imputation increases power for genome-wide association studies

Bogdan Pasaniuc; Nadin Rohland; Paul J. McLaren; Kiran Garimella; Noah Zaitlen; Heng Li; Namrata Gupta; Benjamin M. Neale; Mark J. Daly; Pamela Sklar; Patrick F. Sullivan; Sarah E. Bergen; Jennifer L. Moran; Christina M. Hultman; Paul Lichtenstein; Patrik K. E. Magnusson; Shaun Purcell; David W. Haas; Liming Liang; Shamil R. Sunyaev; Nick Patterson; Paul I. W. de Bakker; David Reich; Alkes L. Price

Genome-wide association studies (GWAS) have proven to be a powerful method to identify common genetic variants contributing to susceptibility to common diseases. Here, we show that extremely low-coverage sequencing (0.1–0.5×) captures almost as much of the common (>5%) and low-frequency (1–5%) variation across the genome as SNP arrays. As an empirical demonstration, we show that genome-wide SNP genotypes can be inferred at a mean r2 of 0.71 using off-target data (0.24× average coverage) in a whole-exome study of 909 samples. Using both simulated and real exome-sequencing data sets, we show that association statistics obtained using extremely low-coverage sequencing data attain similar P values at known associated variants as data from genotyping arrays, without an excess of false positives. Within the context of reductions in sample preparation and sequencing costs, funds invested in extremely low-coverage sequencing can yield several times the effective sample size of GWAS based on SNP array data and a commensurate increase in statistical power.

Collaboration


Dive into the Kiran Garimella's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge