Mark J. Daly | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mark J. Daly is active.

Explore More

Publication

Featured researches published by Mark J. Daly.

American Journal of Human Genetics | 2007

PLINK: A Tool Set for Whole-Genome Association and Population-Based Linkage Analyses

Shaun Purcell; Benjamin M. Neale; Kathe Todd-Brown; Lori Thomas; Manuel A. Ferreira; David Bender; Julian Maller; Pamela Sklar; Paul I. W. de Bakker; Mark J. Daly; Pak Sham

Whole-genome association studies (WGAS) bring new computational, as well as analytic, challenges to researchers. Many existing genetic-analysis tools are not designed to handle such large data sets in a convenient manner and do not necessarily exploit the new opportunities that whole-genome data bring. To address these issues, we developed PLINK, an open-source C/C++ WGAS tool set. With PLINK, large data sets comprising hundreds of thousands of markers genotyped for thousands of individuals can be rapidly manipulated and analyzed in their entirety. As well as providing tools to make the basic analytic steps computationally efficient, PLINK also supports some novel approaches to whole-genome data that take advantage of whole-genome coverage. We introduce PLINK and describe the five main domains of function: data management, summary statistics, population stratification, association analysis, and identity-by-descent estimation. In particular, we focus on the estimation and use of identity-by-state and identity-by-descent information in the context of population-based whole-genome studies. This information can be used to detect and correct for population stratification and to identify extended chromosomal segments that are shared identical by descent between very distantly related individuals. Analysis of the patterns of segmental sharing has the potential to map disease loci that contain multiple rare variants in a population-based linkage analysis.

Genome Research | 2010

The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data

Aaron McKenna; Matthew Hanna; Eric Banks; Andrey Sivachenko; Kristian Cibulskis; Andrew Kernytsky; Kiran Garimella; David Altshuler; Stacey B. Gabriel; Mark J. Daly; Mark A. DePristo

Next-generation DNA sequencing (NGS) projects, such as the 1000 Genomes Project, are already revolutionizing our understanding of genetic variation among individuals. However, the massive data sets generated by NGS--the 1000 Genome pilot alone includes nearly five terabases--make writing feature-rich, efficient, and robust analysis tools difficult for even computationally sophisticated individuals. Indeed, many professionals are limited in the scope and the ease with which they can answer scientific questions by the complexity of accessing and manipulating the data produced by these machines. Here, we discuss our Genome Analysis Toolkit (GATK), a structured programming framework designed to ease the development of efficient and robust analysis tools for next-generation DNA sequencers using the functional programming philosophy of MapReduce. The GATK provides a small but rich set of data access patterns that encompass the majority of analysis tool needs. Separating specific analysis calculations from common data management infrastructure enables us to optimize the GATK framework for correctness, stability, and CPU and memory efficiency and to enable distributed and shared memory parallelization. We highlight the capabilities of the GATK by describing the implementation and application of robust, scale-tolerant tools like coverage calculators and single nucleotide polymorphism (SNP) calling. We conclude that the GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.

Genomics | 1987

MAPMAKER: an interactive computer package for constructing primary genetic linkage maps of experimental and natural populations.

Eric S. Lander; Philip Green; Jeff Abrahamson; Aaron Barlow; Mark J. Daly; Stephen E. Lincoln; Lee Newburg

With the advent of RFLPs, genetic linkage maps are now being assembled for a number of organisms including both inbred experimental populations such as maize and outbred natural populations such as humans. Accurate construction of such genetic maps requires multipoint linkage analysis of particular types of pedigrees. We describe here a computer package, called MAPMAKER, designed specifically for this purpose. The program uses an efficient algorithm that allows simultaneous multipoint analysis of any number of loci. MAPMAKER also includes an interactive command language that makes it easy for a geneticist to explore linkage data. MAPMAKER has been applied to the construction of linkage maps in a number of organisms, including the human and several plants, and we outline the mapping strategies that have been used.

Nature Genetics | 2011

A framework for variation discovery and genotyping using next-generation DNA sequencing data

Mark A. DePristo; Eric Banks; Ryan Poplin; Kiran Garimella; Jared Maguire; Christopher Hartl; Anthony A. Philippakis; Guillermo Del Angel; Manuel A. Rivas; Matt Hanna; Aaron McKenna; Timothy Fennell; Andrew Kernytsky; Andrey Sivachenko; Kristian Cibulskis; Stacey B. Gabriel; David Altshuler; Mark J. Daly

Recent advances in sequencing technology make it possible to comprehensively catalog genetic variation in population samples, creating a foundation for understanding human disease, ancestry and evolution. The amounts of raw data produced are prodigious, and many computational steps are required to translate this output into high-quality variant calls. We present a unified analytic framework to discover and genotype variation among multiple samples simultaneously that achieves sensitive and specific results across five sequencing technologies and three distinct, canonical experimental designs. Our process includes (i) initial read mapping; (ii) local realignment around indels; (iii) base quality score recalibration; (iv) SNP discovery and genotyping to find all potential variants; and (v) machine learning to separate true segregating variation from machine artifacts common to next-generation sequencing technologies. We here discuss the application of these tools, instantiated in the Genome Analysis Toolkit, to deep whole-genome, whole-exome capture and multi-sample low-pass (∼4×) 1000 Genomes Project datasets.

Nature Genetics | 2003

PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes.

Vamsi K. Mootha; Cecilia M. Lindgren; Karl-Fredrik Eriksson; Aravind Subramanian; Smita Sihag; Joseph Lehar; Pere Puigserver; Emma Carlsson; Martin Ridderstråle; Esa Laurila; Nicholas E. Houstis; Mark J. Daly; Nick Patterson; Jill P. Mesirov; Todd R. Golub; Pablo Tamayo; Bruce M. Spiegelman; Eric S. Lander; Joel N. Hirschhorn; David Altshuler; Leif Groop

DNA microarrays can be used to identify gene expression changes characteristic of human disease. This is challenging, however, when relevant differences are subtle at the level of individual genes. We introduce an analytical strategy, Gene Set Enrichment Analysis, designed to detect modest but coordinate changes in the expression of groups of functionally related genes. Using this approach, we identify a set of genes involved in oxidative phosphorylation whose expression is coordinately decreased in human diabetic muscle. Expression of these genes is high at sites of insulin-mediated glucose disposal, activated by PGC-1α and correlated with total-body aerobic capacity. Our results associate this gene set with clinically important variation in human metabolism and illustrate the value of pathway relationships in the analysis of genomic profiling experiments.

Nature | 2009

Common polygenic variation contributes to risk of schizophrenia and bipolar disorder

Shaun Purcell; Naomi R. Wray; Jennifer Stone; Peter M. Visscher; Michael Conlon O'Donovan; Patrick F. Sullivan; Pamela Sklar; Douglas M. Ruderfer; Andrew McQuillin; Derek W. Morris; Colm O’Dushlaine; Aiden Corvin; Peter Holmans; Michael C. O’Donovan; Stuart MacGregor; Hugh Gurling; Douglas Blackwood; Nicholas John Craddock; Michael Gill; Christina M. Hultman; George Kirov; Paul Lichtenstein; Walter J. Muir; Michael John Owen; Carlos N. Pato; Edward M. Scolnick; David St Clair; Nigel Melville Williams; Lyudmila Georgieva; Ivan Nikolov

Schizophrenia is a severe mental disorder with a lifetime risk of about 1%, characterized by hallucinations, delusions and cognitive deficits, with heritability estimated at up to 80%. We performed a genome-wide association study of 3,322 European individuals with schizophrenia and 3,587 controls. Here we show, using two analytic approaches, the extent to which common genetic variation underlies the risk of schizophrenia. First, we implicate the major histocompatibility complex. Second, we provide molecular genetic evidence for a substantial polygenic component to the risk of schizophrenia involving thousands of common alleles of very small effect. We show that this component also contributes to the risk of bipolar disorder, but not to several non-psychiatric diseases.

Nature | 2016

Analysis of protein-coding genetic variation in 60,706 humans

Monkol Lek; Konrad J. Karczewski; Eric Vallabh Minikel; Kaitlin E. Samocha; Eric Banks; Timothy Fennell; Anne H. O’Donnell-Luria; James S. Ware; Andrew Hill; Beryl B. Cummings; Taru Tukiainen; Daniel P. Birnbaum; Jack A. Kosmicki; Laramie Duncan; Karol Estrada; Fengmei Zhao; James Zou; Emma Pierce-Hoffman; Joanne Berghout; David Neil Cooper; Nicole Deflaux; Mark A. DePristo; Ron Do; Jason Flannick; Menachem Fromer; Laura Gauthier; Jackie Goldstein; Namrata Gupta; Daniel P. Howrigan; Adam Kiezun

Large-scale reference data sets of human genetic variation are critical for the medical and functional interpretation of DNA sequence changes. Here we describe the aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC). This catalogue of human genetic diversity contains an average of one variant every eight bases of the exome, and provides direct evidence for the presence of widespread mutational recurrence. We have used this catalogue to calculate objective metrics of pathogenicity for sequence variants, and to identify genes subject to strong selection against various classes of mutation; identifying 3,230 genes with near-complete depletion of predicted protein-truncating variants, with 72% of these genes having no currently established human disease phenotype. Finally, we demonstrate that these data can be used for the efficient filtering of candidate disease-causing variants, and for the discovery of human ‘knockout’ variants in protein-coding genes.

Science | 2007

Genome-Wide Association Analysis Identifies Loci for Type 2 Diabetes and Triglyceride Levels

Richa Saxena; Benjamin F. Voight; Valeriya Lyssenko; Noël P. Burtt; Paul I. W. de Bakker; Hong Chen; Jeffrey J. Roix; Sekar Kathiresan; Joel N. Hirschhorn; Mark J. Daly; Thomas Edward Hughes; Leif Groop; David Altshuler; Peter Almgren; Jose C. Florez; Joanne M. Meyer; Kristin Ardlie; Kristina Bengtsson Boström; Bo Isomaa; Guillaume Lettre; Ulf Lindblad; Helen N. Lyon; Olle Melander; Christopher Newton-Cheh; Peter Nilsson; Marju Orho-Melander; Lennart Råstam; Elizabeth K. Speliotes; Marja-Riitta Taskinen; Tiinamaija Tuomi

New strategies for prevention and treatment of type 2 diabetes (T2D) require improved insight into disease etiology. We analyzed 386,731 common single-nucleotide polymorphisms (SNPs) in 1464 patients with T2D and 1467 matched controls, each characterized for measures of glucose metabolism, lipids, obesity, and blood pressure. With collaborators (FUSION and WTCCC/UKT2D), we identified and confirmed three loci associated with T2D—in a noncoding region near CDKN2A and CDKN2B, in an intron of IGF2BP2, and an intron of CDKAL1—and replicated associations near HHEX and in SLC30A8 found by a recent whole-genome association study. We identified and confirmed association of a SNP in an intron of glucokinase regulatory protein (GCKR) with serum triglycerides. The discovery of associated variants in unsuspected genes and outside coding regions illustrates the ability of genome-wide association studies to provide potentially important clues to the pathogenesis of common diseases.

Science | 2006

A genome-wide association study identifies IL23R as an inflammatory bowel disease gene.

Richard H. Duerr; Kent D. Taylor; Steven R. Brant; John D. Rioux; Mark S. Silverberg; Mark J. Daly; A. Hillary Steinhart; Clara Abraham; Miguel Regueiro; Anne M. Griffiths; Themistocles Dassopoulos; Alain Bitton; Huiying Yang; Stephan R. Targan; Lisa W. Datta; Emily O. Kistner; L. Philip Schumm; Annette Lee; Peter K. Gregersen; M. Michael Barmada; Jerome I. Rotter; Dan L. Nicolae; Judy H. Cho

The inflammatory bowel diseases Crohns disease and ulcerative colitis are common, chronic disorders that cause abdominal pain, diarrhea, and gastrointestinal bleeding. To identify genetic factors that might contribute to these disorders, we performed a genome-wide association study. We found a highly significant association between Crohns disease and the IL23R gene on chromosome 1p31, which encodes a subunit of the receptor for the proinflammatory cytokine interleukin-23. An uncommon coding variant (rs11209026, c.1142G>A, p.Arg381Gln) confers strong protection against Crohns disease, and additional noncoding IL23R variants are independently associated. Replication studies confirmed IL23R associations in independent cohorts of patients with Crohns disease or ulcerative colitis. These results and previous studies on the proinflammatory role of IL-23 prioritize this signaling pathway as a therapeutic target in inflammatory bowel disease.

Nature Reviews Genetics | 2005

Genome-wide association studies for common diseases and complex traits

Joel N. Hirschhorn; Mark J. Daly

Genetic factors strongly affect susceptibility to common diseases and also influence disease-related quantitative traits. Identifying the relevant genes has been difficult, in part because each causal gene only makes a small contribution to overall heritability. Genetic association studies offer a potentially powerful approach for mapping causal genes with modest effects, but are limited because only a small number of genes can be studied at a time. Genome-wide association studies will soon become possible, and could open new frontiers in our understanding and treatment of disease. However, the execution and analysis of such studies will require great care.

Explore More