David Altshuler | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where David Altshuler is active.

Explore More

Publication

Featured researches published by David Altshuler.

Genome Research | 2010

The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data

Aaron McKenna; Matthew Hanna; Eric Banks; Andrey Sivachenko; Kristian Cibulskis; Andrew Kernytsky; Kiran Garimella; David Altshuler; Stacey B. Gabriel; Mark J. Daly; Mark A. DePristo

Next-generation DNA sequencing (NGS) projects, such as the 1000 Genomes Project, are already revolutionizing our understanding of genetic variation among individuals. However, the massive data sets generated by NGS--the 1000 Genome pilot alone includes nearly five terabases--make writing feature-rich, efficient, and robust analysis tools difficult for even computationally sophisticated individuals. Indeed, many professionals are limited in the scope and the ease with which they can answer scientific questions by the complexity of accessing and manipulating the data produced by these machines. Here, we discuss our Genome Analysis Toolkit (GATK), a structured programming framework designed to ease the development of efficient and robust analysis tools for next-generation DNA sequencers using the functional programming philosophy of MapReduce. The GATK provides a small but rich set of data access patterns that encompass the majority of analysis tool needs. Separating specific analysis calculations from common data management infrastructure enables us to optimize the GATK framework for correctness, stability, and CPU and memory efficiency and to enable distributed and shared memory parallelization. We highlight the capabilities of the GATK by describing the implementation and application of robust, scale-tolerant tools like coverage calculators and single nucleotide polymorphism (SNP) calling. We conclude that the GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.

Nature Genetics | 2011

A framework for variation discovery and genotyping using next-generation DNA sequencing data

Mark A. DePristo; Eric Banks; Ryan Poplin; Kiran Garimella; Jared Maguire; Christopher Hartl; Anthony A. Philippakis; Guillermo Del Angel; Manuel A. Rivas; Matt Hanna; Aaron McKenna; Timothy Fennell; Andrew Kernytsky; Andrey Sivachenko; Kristian Cibulskis; Stacey B. Gabriel; David Altshuler; Mark J. Daly

Recent advances in sequencing technology make it possible to comprehensively catalog genetic variation in population samples, creating a foundation for understanding human disease, ancestry and evolution. The amounts of raw data produced are prodigious, and many computational steps are required to translate this output into high-quality variant calls. We present a unified analytic framework to discover and genotype variation among multiple samples simultaneously that achieves sensitive and specific results across five sequencing technologies and three distinct, canonical experimental designs. Our process includes (i) initial read mapping; (ii) local realignment around indels; (iii) base quality score recalibration; (iv) SNP discovery and genotyping to find all potential variants; and (v) machine learning to separate true segregating variation from machine artifacts common to next-generation sequencing technologies. We here discuss the application of these tools, instantiated in the Genome Analysis Toolkit, to deep whole-genome, whole-exome capture and multi-sample low-pass (∼4×) 1000 Genomes Project datasets.

Nature Genetics | 2003

PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes.

Vamsi K. Mootha; Cecilia M. Lindgren; Karl-Fredrik Eriksson; Aravind Subramanian; Smita Sihag; Joseph Lehar; Pere Puigserver; Emma Carlsson; Martin Ridderstråle; Esa Laurila; Nicholas E. Houstis; Mark J. Daly; Nick Patterson; Jill P. Mesirov; Todd R. Golub; Pablo Tamayo; Bruce M. Spiegelman; Eric S. Lander; Joel N. Hirschhorn; David Altshuler; Leif Groop

DNA microarrays can be used to identify gene expression changes characteristic of human disease. This is challenging, however, when relevant differences are subtle at the level of individual genes. We introduce an analytical strategy, Gene Set Enrichment Analysis, designed to detect modest but coordinate changes in the expression of groups of functionally related genes. Using this approach, we identify a set of genes involved in oxidative phosphorylation whose expression is coordinately decreased in human diabetic muscle. Expression of these genes is high at sites of insulin-mediated glucose disposal, activated by PGC-1α and correlated with total-body aerobic capacity. Our results associate this gene set with clinically important variation in human metabolism and illustrate the value of pathway relationships in the analysis of genomic profiling experiments.

Nature | 2016

Analysis of protein-coding genetic variation in 60,706 humans

Monkol Lek; Konrad J. Karczewski; Eric Vallabh Minikel; Kaitlin E. Samocha; Eric Banks; Timothy Fennell; Anne H. O’Donnell-Luria; James S. Ware; Andrew Hill; Beryl B. Cummings; Taru Tukiainen; Daniel P. Birnbaum; Jack A. Kosmicki; Laramie Duncan; Karol Estrada; Fengmei Zhao; James Zou; Emma Pierce-Hoffman; Joanne Berghout; David Neil Cooper; Nicole Deflaux; Mark A. DePristo; Ron Do; Jason Flannick; Menachem Fromer; Laura Gauthier; Jackie Goldstein; Namrata Gupta; Daniel P. Howrigan; Adam Kiezun

Large-scale reference data sets of human genetic variation are critical for the medical and functional interpretation of DNA sequence changes. Here we describe the aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC). This catalogue of human genetic diversity contains an average of one variant every eight bases of the exome, and provides direct evidence for the presence of widespread mutational recurrence. We have used this catalogue to calculate objective metrics of pathogenicity for sequence variants, and to identify genes subject to strong selection against various classes of mutation; identifying 3,230 genes with near-complete depletion of predicted protein-truncating variants, with 72% of these genes having no currently established human disease phenotype. Finally, we demonstrate that these data can be used for the efficient filtering of candidate disease-causing variants, and for the discovery of human ‘knockout’ variants in protein-coding genes.

Science | 2007

Genome-Wide Association Analysis Identifies Loci for Type 2 Diabetes and Triglyceride Levels

Richa Saxena; Benjamin F. Voight; Valeriya Lyssenko; Noël P. Burtt; Paul I. W. de Bakker; Hong Chen; Jeffrey J. Roix; Sekar Kathiresan; Joel N. Hirschhorn; Mark J. Daly; Thomas Edward Hughes; Leif Groop; David Altshuler; Peter Almgren; Jose C. Florez; Joanne M. Meyer; Kristin Ardlie; Kristina Bengtsson Boström; Bo Isomaa; Guillaume Lettre; Ulf Lindblad; Helen N. Lyon; Olle Melander; Christopher Newton-Cheh; Peter Nilsson; Marju Orho-Melander; Lennart Råstam; Elizabeth K. Speliotes; Marja-Riitta Taskinen; Tiinamaija Tuomi

New strategies for prevention and treatment of type 2 diabetes (T2D) require improved insight into disease etiology. We analyzed 386,731 common single-nucleotide polymorphisms (SNPs) in 1464 patients with T2D and 1467 matched controls, each characterized for measures of glucose metabolism, lipids, obesity, and blood pressure. With collaborators (FUSION and WTCCC/UKT2D), we identified and confirmed three loci associated with T2D—in a noncoding region near CDKN2A and CDKN2B, in an intron of IGF2BP2, and an intron of CDKAL1—and replicated associations near HHEX and in SLC30A8 found by a recent whole-genome association study. We identified and confirmed association of a SNP in an intron of glucokinase regulatory protein (GCKR) with serum triglycerides. The discovery of associated variants in unsuspected genes and outside coding regions illustrates the ability of genome-wide association studies to provide potentially important clues to the pathogenesis of common diseases.

Nature | 2010

Integrating common and rare genetic variation in diverse human populations.

David Altshuler; Richard A. Gibbs; Leena Peltonen; Emmanouil T. Dermitzakis; Stephen F. Schaffner; Fuli Yu; Penelope E. Bonnen; de Bakker Pi; Panos Deloukas; Stacey Gabriel; R. Gwilliam; Sarah Hunt; Michael Inouye; Xiaoming Jia; Aarno Palotie; Melissa Parkin; Pamela Whittaker; Kyle Chang; Alicia Hawes; Lora Lewis; Yanru Ren; David A. Wheeler; Donna M. Muzny; C. Barnes; Katayoon Darvishi; Joshua M. Korn; Kristiansson K; Cin-Ty A. Lee; McCarrol Sa; James Nemesh

Despite great progress in identifying genetic variants that influence human disease, most inherited risk remains unexplained. A more complete understanding requires genome-wide studies that fully examine less common alleles in populations with a wide range of ancestry. To inform the design and interpretation of such studies, we genotyped 1.6 million common single nucleotide polymorphisms (SNPs) in 1,184 reference individuals from 11 global populations, and sequenced ten 100-kilobase regions in 692 of these individuals. This integrated data set of common and rare alleles, called ‘HapMap 3’, includes both SNPs and copy number polymorphisms (CNPs). We characterized population-specific differences among low-frequency variants, measured the improvement in imputation accuracy afforded by the larger reference panel, especially in imputing SNPs with a minor allele frequency of ≤5%, and demonstrated the feasibility of imputing newly discovered CNPs and SNPs. This expanded public resource of genome variants in global populations supports deeper interrogation of genomic variation and its role in human disease, and serves as a step towards a high-resolution map of the landscape of human genetic variation.

Nature Genetics | 1999

Characterization of single-nucleotide polymorphisms in coding regions of human genes

Michele Cargill; David Altshuler; James S. Ireland; Pamela Sklar; Kristin Ardlie; Nila Patil; Charles R. Lane; Esther P. Lim; Nilesh Kalyanaraman; James Nemesh; Liuda Ziaugra; Lisa Friedland; Alex Rolfe; Janet A. Warrington; Robert J. Lipshutz; George Q. Daley; Eric S. Lander

Nature Genet. 14, 415– 420 (1996). Due to a cloning error, the sequence reported for ING1 was incorrect. The error appears to have been a result of a compression introducing a frameshift and of the ING1 gene encoding several differentially spliced isoforms that contain a common 3′ exon, one of whichis of a size very similar to that reported in the publication above.

Nature Genetics | 2005

Efficiency and power in genetic association studies

Paul I. W. de Bakker; Roman Yelensky; Itsik Pe'er; Stacey Gabriel; Mark J. Daly; David Altshuler

We investigated selection and analysis of tag SNPs for genome-wide association studies by specifically examining the relationship between investment in genotyping and statistical power. Do pairwise or multimarker methods maximize efficiency and power? To what extent is power compromised when tags are selected from an incomplete resource such as HapMap? We addressed these questions using genotype data from the HapMap ENCODE project, association studies simulated under a realistic disease model, and empirical correction for multiple hypothesis testing. We demonstrate a haplotype-based tagging method that uniformly outperforms single-marker tests and methods for prioritization that markedly increase tagging efficiency. Examining all observed haplotypes for association, rather than just those that are proxies for known SNPs, increases power to detect rare causal alleles, at the cost of reduced power to detect common causal alleles. Power is robust to the completeness of the reference panel from which tags are selected. These findings have implications for prioritizing tag SNPs and interpreting association studies.

Nature Genetics | 2000

The common PPARgamma Pro12Ala polymorphism is associated with decreased risk of type 2 diabetes.

David Altshuler; Joel N. Hirschhorn; Mia Klannemark; Cecilia M. Lindgren; Marie-Claude Vohl; James Nemesh; Charles R. Lane; Stephen F. Schaffner; Stacey Bolk; Carl Brewer; Tiinamaija Tuomi; Daniel Gaudet; Thomas J. Hudson; Mark J. Daly; Leif Groop; Eric S. Lander

Genetic association studies are viewed as problematic and plagued by irreproducibility. Many associations have been reported for type 2 diabetes, but none have been confirmed in multiple samples and with comprehensive controls. We evaluated 16 published genetic associations to type 2 diabetes and related sub-phenotypes using a family-based design to control for population stratification, and replication samples to increase power. We were able to confirm only one association, that of the common Pro12Ala polymorphism in peroxisome proliferator-activated receptor-γ (PPARγ) with type 2 diabetes. By analysing over 3,000 individuals, we found a modest (1.25-fold) but significant (P=0.002) increase in diabetes risk associated with the more common proline allele (∼85% frequency). Moreover, our results resolve a controversy about common variation in PPARγ. An initial study found a threefold effect, but four of five subsequent publications failed to confirm the association. All six studies are consistent with the odds ratio we describe. The data implicate inherited variation in PPARγ in the pathogenesis of type 2 diabetes. Because the risk allele occurs at such high frequency, its modest effect translates into a large population attributable risk—influencing as much as 25% of type 2 diabetes in the general population.

Nature Genetics | 2008

Meta-analysis of genome-wide association data and large-scale replication identifies additional susceptibility loci for type 2 diabetes

Eleftheria Zeggini; Laura J. Scott; Richa Saxena; Benjamin F. Voight; Jonathan Marchini; Tianle Hu; Paul I. W. de Bakker; Gonçalo R. Abecasis; Peter Almgren; Gitte Andersen; Kristin Ardlie; Kristina Bengtsson Boström; Richard N. Bergman; Lori L. Bonnycastle; Knut Borch-Johnsen; Noël P. Burtt; Hong Chen; Peter S. Chines; Mark J. Daly; Parimal Deodhar; Chia-Jen Ding; Alex S. F. Doney; William L. Duren; Katherine S. Elliott; Michael R. Erdos; Timothy M. Frayling; Rachel M. Freathy; Lauren Gianniny; Harald Grallert; Niels Grarup

Genome-wide association (GWA) studies have identified multiple loci at which common variants modestly but reproducibly influence risk of type 2 diabetes (T2D). Established associations to common and rare variants explain only a small proportion of the heritability of T2D. As previously published analyses had limited power to identify variants with modest effects, we carried out meta-analysis of three T2D GWA scans comprising 10,128 individuals of European descent and ∼2.2 million SNPs (directly genotyped and imputed), followed by replication testing in an independent sample with an effective sample size of up to 53,975. We detected at least six previously unknown loci with robust evidence for association, including the JAZF1 (P = 5.0 × 10−14), CDC123-CAMK1D (P = 1.2 × 10−10), TSPAN8-LGR5 (P = 1.1 × 10−9), THADA (P = 1.1 × 10−9), ADAMTS9 (P = 1.2 × 10−8) and NOTCH2 (P = 4.1 × 10−8) gene regions. Our results illustrate the value of large discovery and follow-up samples for gaining further insights into the inherited basis of T2D.

Explore More