Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Paul I. W. de Bakker is active.

Publication


Featured researches published by Paul I. W. de Bakker.


American Journal of Human Genetics | 2007

PLINK: A Tool Set for Whole-Genome Association and Population-Based Linkage Analyses

Shaun Purcell; Benjamin M. Neale; Kathe Todd-Brown; Lori Thomas; Manuel A. Ferreira; David Bender; Julian Maller; Pamela Sklar; Paul I. W. de Bakker; Mark J. Daly; Pak Sham

Whole-genome association studies (WGAS) bring new computational, as well as analytic, challenges to researchers. Many existing genetic-analysis tools are not designed to handle such large data sets in a convenient manner and do not necessarily exploit the new opportunities that whole-genome data bring. To address these issues, we developed PLINK, an open-source C/C++ WGAS tool set. With PLINK, large data sets comprising hundreds of thousands of markers genotyped for thousands of individuals can be rapidly manipulated and analyzed in their entirety. As well as providing tools to make the basic analytic steps computationally efficient, PLINK also supports some novel approaches to whole-genome data that take advantage of whole-genome coverage. We introduce PLINK and describe the five main domains of function: data management, summary statistics, population stratification, association analysis, and identity-by-descent estimation. In particular, we focus on the estimation and use of identity-by-state and identity-by-descent information in the context of population-based whole-genome studies. This information can be used to detect and correct for population stratification and to identify extended chromosomal segments that are shared identical by descent between very distantly related individuals. Analysis of the patterns of segmental sharing has the potential to map disease loci that contain multiple rare variants in a population-based linkage analysis.


Science | 2007

Genome-Wide Association Analysis Identifies Loci for Type 2 Diabetes and Triglyceride Levels

Richa Saxena; Benjamin F. Voight; Valeriya Lyssenko; Noël P. Burtt; Paul I. W. de Bakker; Hong Chen; Jeffrey J. Roix; Sekar Kathiresan; Joel N. Hirschhorn; Mark J. Daly; Thomas Edward Hughes; Leif Groop; David Altshuler; Peter Almgren; Jose C. Florez; Joanne M. Meyer; Kristin Ardlie; Kristina Bengtsson Boström; Bo Isomaa; Guillaume Lettre; Ulf Lindblad; Helen N. Lyon; Olle Melander; Christopher Newton-Cheh; Peter Nilsson; Marju Orho-Melander; Lennart Råstam; Elizabeth K. Speliotes; Marja-Riitta Taskinen; Tiinamaija Tuomi

New strategies for prevention and treatment of type 2 diabetes (T2D) require improved insight into disease etiology. We analyzed 386,731 common single-nucleotide polymorphisms (SNPs) in 1464 patients with T2D and 1467 matched controls, each characterized for measures of glucose metabolism, lipids, obesity, and blood pressure. With collaborators (FUSION and WTCCC/UKT2D), we identified and confirmed three loci associated with T2D—in a noncoding region near CDKN2A and CDKN2B, in an intron of IGF2BP2, and an intron of CDKAL1—and replicated associations near HHEX and in SLC30A8 found by a recent whole-genome association study. We identified and confirmed association of a SNP in an intron of glucokinase regulatory protein (GCKR) with serum triglycerides. The discovery of associated variants in unsuspected genes and outside coding regions illustrates the ability of genome-wide association studies to provide potentially important clues to the pathogenesis of common diseases.


Proteins | 2003

Structure validation by Cα geometry: ϕ,ψ and Cβ deviation

Simon C. Lovell; Ian W. Davis; W. Bryan Arendall; Paul I. W. de Bakker; J. Michael Word; Michael G. Prisant; Jane S. Richardson; David C. Richardson

Geometrical validation around the Cα is described, with a new Cβ measure and updated Ramachandran plot. Deviation of the observed Cβ atom from ideal position provides a single measure encapsulating the major structure‐validation information contained in bond angle distortions. Cβ deviation is sensitive to incompatibilities between sidechain and backbone caused by misfit conformations or inappropriate refinement restraints. A new ϕ,ψ plot using density‐dependent smoothing for 81,234 non‐Gly, non‐Pro, and non‐prePro residues with B < 30 from 500 high‐resolution proteins shows sharp boundaries at critical edges and clear delineation between large empty areas and regions that are allowed but disfavored. One such region is the γ‐turn conformation near +75°,−60°, counted as forbidden by common structure‐validation programs; however, it occurs in well‐ordered parts of good structures, it is overrepresented near functional sites, and strain is partly compensated by the γ‐turn H‐bond. Favored and allowed ϕ,ψ regions are also defined for Pro, pre‐Pro, and Gly (important because Gly ϕ,ψ angles are more permissive but less accurately determined). Details of these accurate empirical distributions are poorly predicted by previous theoretical calculations, including a region left of α‐helix, which rates as favorable in energy yet rarely occurs. A proposed factor explaining this discrepancy is that crowding of the two‐peptide NHs permits donating only a single H‐bond. New calculations by Hu et al. [Proteins 2002 (this issue)] for Ala and Gly dipeptides, using mixed quantum mechanics and molecular mechanics, fit our nonrepetitive data in excellent detail. To run our geometrical evaluations on a user‐uploaded file, see MOLPROBITY (http://kinemage.biochem.duke.edu) or RAMPAGE (http://www‐cryst.bioc.cam.ac.uk/rampage). Proteins 2003;50:437–450.


Nature Genetics | 2005

Efficiency and power in genetic association studies

Paul I. W. de Bakker; Roman Yelensky; Itsik Pe'er; Stacey Gabriel; Mark J. Daly; David Altshuler

We investigated selection and analysis of tag SNPs for genome-wide association studies by specifically examining the relationship between investment in genotyping and statistical power. Do pairwise or multimarker methods maximize efficiency and power? To what extent is power compromised when tags are selected from an incomplete resource such as HapMap? We addressed these questions using genotype data from the HapMap ENCODE project, association studies simulated under a realistic disease model, and empirical correction for multiple hypothesis testing. We demonstrate a haplotype-based tagging method that uniformly outperforms single-marker tests and methods for prioritization that markedly increase tagging efficiency. Examining all observed haplotypes for association, rather than just those that are proxies for known SNPs, increases power to detect rare causal alleles, at the cost of reduced power to detect common causal alleles. Power is robust to the completeness of the reference panel from which tags are selected. These findings have implications for prioritizing tag SNPs and interpreting association studies.


Nature Genetics | 2008

Meta-analysis of genome-wide association data and large-scale replication identifies additional susceptibility loci for type 2 diabetes

Eleftheria Zeggini; Laura J. Scott; Richa Saxena; Benjamin F. Voight; Jonathan Marchini; Tianle Hu; Paul I. W. de Bakker; Gonçalo R. Abecasis; Peter Almgren; Gitte Andersen; Kristin Ardlie; Kristina Bengtsson Boström; Richard N. Bergman; Lori L. Bonnycastle; Knut Borch-Johnsen; Noël P. Burtt; Hong Chen; Peter S. Chines; Mark J. Daly; Parimal Deodhar; Chia-Jen Ding; Alex S. F. Doney; William L. Duren; Katherine S. Elliott; Michael R. Erdos; Timothy M. Frayling; Rachel M. Freathy; Lauren Gianniny; Harald Grallert; Niels Grarup

Genome-wide association (GWA) studies have identified multiple loci at which common variants modestly but reproducibly influence risk of type 2 diabetes (T2D). Established associations to common and rare variants explain only a small proportion of the heritability of T2D. As previously published analyses had limited power to identify variants with modest effects, we carried out meta-analysis of three T2D GWA scans comprising 10,128 individuals of European descent and ∼2.2 million SNPs (directly genotyped and imputed), followed by replication testing in an independent sample with an effective sample size of up to 53,975. We detected at least six previously unknown loci with robust evidence for association, including the JAZF1 (P = 5.0 × 10−14), CDC123-CAMK1D (P = 1.2 × 10−10), TSPAN8-LGR5 (P = 1.1 × 10−9), THADA (P = 1.1 × 10−9), ADAMTS9 (P = 1.2 × 10−8) and NOTCH2 (P = 4.1 × 10−8) gene regions. Our results illustrate the value of large discovery and follow-up samples for gaining further insights into the inherited basis of T2D.


Nature Genetics | 2009

Common variants at 30 loci contribute to polygenic dyslipidemia

Sekar Kathiresan; Cristen J. Willer; Gina M. Peloso; Serkalem Demissie; Kiran Musunuru; Eric E. Schadt; Lee M. Kaplan; Derrick Bennett; Yun Li; Toshiko Tanaka; Benjamin F. Voight; Lori L. Bonnycastle; Anne U. Jackson; Gabriel Crawford; Aarti Surti; Candace Guiducci; Noël P. Burtt; Sarah Parish; Robert Clarke; Diana Zelenika; Kari Kubalanza; Mario A. Morken; Laura J. Scott; Heather M. Stringham; Pilar Galan; Amy J. Swift; Johanna Kuusisto; Richard N. Bergman; Jouko Sundvall; Markku Laakso

Blood low-density lipoprotein (LDL) cholesterol, high-density lipoprotein (HDL) cholesterol and triglyceride levels are risk factors for cardiovascular disease. To dissect the polygenic basis of these traits, we conducted genome-wide association screens in 19,840 individuals and replication in up to 20,623 individuals. We identified 30 distinct loci associated with lipoprotein concentrations (each with P < 5 × 10−8), including 11 loci that reached genome-wide significance for the first time. The 11 newly defined loci include common variants associated with LDL cholesterol near ABCG8, MAFB, HNF1A and TIMD4; with HDL cholesterol near ANGPTL4, FADS1-FADS2-FADS3, HNF4A, LCAT, PLTP and TTC39B; and with triglycerides near AMAC1L2, FADS1-FADS2-FADS3 and PLTP. The proportion of individuals exceeding clinical cut points for high LDL cholesterol, low HDL cholesterol and high triglycerides varied according to an allelic dosage score (P < 10−15 for each trend). These results suggest that the cumulative effect of multiple common variants contributes to polygenic dyslipidemia.


Bioinformatics | 2008

SNAP : a web-based tool for identification and annotation of proxy SNPs using HapMap

Andrew D. Johnson; Robert E. Handsaker; Sara L. Pulit; Marcia M. Nizzari; Christopher J. O'Donnell; Paul I. W. de Bakker

SUMMARY The interpretation of genome-wide association results is confounded by linkage disequilibrium between nearby alleles. We have developed a flexible bioinformatics query tool for single-nucleotide polymorphisms (SNPs) to identify and to annotate nearby SNPs in linkage disequilibrium (proxies) based on HapMap. By offering functionality to generate graphical plots for these data, the SNAP server will facilitate interpretation and comparison of genome-wide association study results, and the design of fine-mapping experiments (by delineating genomic regions harboring associated variants and their proxies). AVAILABILITY SNAP server is available at http://www.broad.mit.edu/mpg/snap/.


Nature Genetics | 2010

Genome-wide association study meta-analysis identifies seven new rheumatoid arthritis risk loci

Eli A. Stahl; Soumya Raychaudhuri; Elaine F. Remmers; Gang Xie; Stephen Eyre; Brian Thomson; Yonghong Li; Fina Kurreeman; Alexandra Zhernakova; Anne Hinks; Candace Guiducci; Robert Chen; Lars Alfredsson; Christopher I. Amos; Kristin Ardlie; Anne Barton; John Bowes; Elisabeth Brouwer; Noël P. Burtt; Joseph J. Catanese; Jonathan S. Coblyn; Marieke J. H. Coenen; Karen H. Costenbader; Lindsey A. Criswell; J. Bart A. Crusius; Jing Cui; Paul I. W. de Bakker; Philip L. De Jager; Bo Ding; Paul Emery

To identify new genetic risk factors for rheumatoid arthritis, we conducted a genome-wide association study meta-analysis of 5,539 autoantibody-positive individuals with rheumatoid arthritis (cases) and 20,169 controls of European descent, followed by replication in an independent set of 6,768 rheumatoid arthritis cases and 8,806 controls. Of 34 SNPs selected for replication, 7 new rheumatoid arthritis risk alleles were identified at genome-wide significance (P < 5 × 10−8) in an analysis of all 41,282 samples. The associated SNPs are near genes of known immune function, including IL6ST, SPRED2, RBPJ, CCR6, IRF5 and PXK. We also refined associations at two established rheumatoid arthritis risk loci (IL2RA and CCL21) and confirmed the association at AFF3. These new associations bring the total number of confirmed rheumatoid arthritis risk loci to 31 among individuals of European ancestry. An additional 11 SNPs replicated at P < 0.05, many of which are validated autoimmune risk alleles, suggesting that most represent genuine rheumatoid arthritis risk alleles.


Nature Genetics | 2008

Integrated detection and population-genetic analysis of SNPs and copy number variation

Steven A. McCarroll; Finny Kuruvilla; Joshua M. Korn; Simon Cawley; James Nemesh; Alec Wysoker; Michael H. Shapero; Paul I. W. de Bakker; Julian Maller; Andrew Kirby; Amanda L. Elliott; Melissa Parkin; Earl Hubbell; Teresa Webster; Rui Mei; James Veitch; Patrick J Collins; Robert E. Handsaker; Steve Lincoln; Marcia M. Nizzari; John E. Blume; Keith W. Jones; Rich Rava; Mark J. Daly; Stacey Gabriel; David Altshuler

Dissecting the genetic basis of disease risk requires measuring all forms of genetic variation, including SNPs and copy number variants (CNVs), and is enabled by accurate maps of their locations, frequencies and population-genetic properties. We designed a hybrid genotyping array (Affymetrix SNP 6.0) to simultaneously measure 906,600 SNPs and copy number at 1.8 million genomic locations. By characterizing 270 HapMap samples, we developed a map of human CNV (at 2-kb breakpoint resolution) informed by integer genotypes for 1,320 copy number polymorphisms (CNPs) that segregate at an allele frequency >1%. More than 80% of the sequence in previously reported CNV regions fell outside our estimated CNV boundaries, indicating that large (>100 kb) CNVs affect much less of the genome than initially reported. Approximately 80% of observed copy number differences between pairs of individuals were due to common CNPs with an allele frequency >5%, and more than 99% derived from inheritance rather than new mutation. Most common, diallelic CNPs were in strong linkage disequilibrium with SNPs, and most low-frequency CNVs segregated on specific SNP haplotypes.


Nature Genetics | 2006

A high-resolution HLA and SNP haplotype map for disease association studies in the extended human MHC

Paul I. W. de Bakker; Gil McVean; Pardis C. Sabeti; Marcos M Miretti; Todd Green; Jonathan Marchini; Xiayi Ke; Alienke J. Monsuur; Pamela Whittaker; Marcos Delgado; Jonathan Morrison; Angela Richardson; Emily Walsh; Xiaojiang Gao; Luana Galver; John Hart; David A. Hafler; Margaret A. Pericak-Vance; John A. Todd; Mark J. Daly; John Trowsdale; Cisca Wijmenga; Tim J Vyse; Stephan Beck; Sarah S. Murray; Mary Carrington; Simon G. Gregory; Panos Deloukas; John D. Rioux

The proteins encoded by the classical HLA class I and class II genes in the major histocompatibility complex (MHC) are highly polymorphic and are essential in self versus non-self immune recognition. HLA variation is a crucial determinant of transplant rejection and susceptibility to a large number of infectious and autoimmune diseases. Yet identification of causal variants is problematic owing to linkage disequilibrium that extends across multiple HLA and non-HLA genes in the MHC. We therefore set out to characterize the linkage disequilibrium patterns between the highly polymorphic HLA genes and background variation by typing the classical HLA genes and >7,500 common SNPs and deletion-insertion polymorphisms across four population samples. The analysis provides informative tag SNPs that capture much of the common variation in the MHC region and that could be used in disease association studies, and it provides new insight into the evolutionary dynamics and ancestral origins of the HLA loci and their haplotypes.

Collaboration


Dive into the Paul I. W. de Bakker's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Soumya Raychaudhuri

Brigham and Women's Hospital

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Cisca Wijmenga

University Medical Center Groningen

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Albert Hofman

Erasmus University Rotterdam

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge