Khalid Shakir
Broad Institute
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Khalid Shakir.
Nature | 2016
Monkol Lek; Konrad J. Karczewski; Eric Vallabh Minikel; Kaitlin E. Samocha; Eric Banks; Timothy Fennell; Anne H. O’Donnell-Luria; James S. Ware; Andrew Hill; Beryl B. Cummings; Taru Tukiainen; Daniel P. Birnbaum; Jack A. Kosmicki; Laramie Duncan; Karol Estrada; Fengmei Zhao; James Zou; Emma Pierce-Hoffman; Joanne Berghout; David Neil Cooper; Nicole Deflaux; Mark A. DePristo; Ron Do; Jason Flannick; Menachem Fromer; Laura Gauthier; Jackie Goldstein; Namrata Gupta; Daniel P. Howrigan; Adam Kiezun
Large-scale reference data sets of human genetic variation are critical for the medical and functional interpretation of DNA sequence changes. Here we describe the aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC). This catalogue of human genetic diversity contains an average of one variant every eight bases of the exome, and provides direct evidence for the presence of widespread mutational recurrence. We have used this catalogue to calculate objective metrics of pathogenicity for sequence variants, and to identify genes subject to strong selection against various classes of mutation; identifying 3,230 genes with near-complete depletion of predicted protein-truncating variants, with 72% of these genes having no currently established human disease phenotype. Finally, we demonstrate that these data can be used for the efficient filtering of candidate disease-causing variants, and for the discovery of human ‘knockout’ variants in protein-coding genes.
Nature | 2012
Benjamin M. Neale; Yan Kou; Li Liu; Avi Ma'ayan; Kaitlin E. Samocha; Aniko Sabo; Chiao-Feng Lin; Christine Stevens; Li-San Wang; Vladimir Makarov; Pazi Penchas Polak; Seungtai Yoon; Jared Maguire; Emily L. Crawford; Nicholas G. Campbell; Evan T. Geller; Otto Valladares; Chad Shafer; Han Liu; Tuo Zhao; Guiqing Cai; Jayon Lihm; Ruth Dannenfelser; Omar Jabado; Zuleyma Peralta; Uma Nagaswamy; Donna M. Muzny; Jeffrey G. Reid; Irene Newsham; Yuanqing Wu
Autism spectrum disorders (ASD) are believed to have genetic and environmental origins, yet in only a modest fraction of individuals can specific causes be identified. To identify further genetic risk factors, here we assess the role of de novo mutations in ASD by sequencing the exomes of ASD cases and their parents (n = 175 trios). Fewer than half of the cases (46.3%) carry a missense or nonsense de novo variant, and the overall rate of mutation is only modestly higher than the expected rate. In contrast, the proteins encoded by genes that harboured de novo missense or nonsense mutations showed a higher degree of connectivity among themselves and to previous ASD genes as indexed by protein-protein interaction screens. The small increase in the rate of de novo events, when taken together with the protein interaction results, are consistent with an important but limited role for de novo point mutations in ASD, similar to that documented for de novo copy number variants. Genetic models incorporating these data indicate that most of the observed de novo events are unconnected to ASD; those that do confer risk are distributed across many genes and are incompletely penetrant (that is, not necessarily sufficient for disease). Our results support polygenic models in which spontaneous coding mutations in any of a large number of genes increases risk by 5- to 20-fold. Despite the challenge posed by such models, results from de novo events and a large parallel case–control study provide strong evidence in favour of CHD8 and KATNAL2 as genuine autism risk factors.
Current protocols in human genetics | 2013
Geraldine A. Van der Auwera; Mauricio O. Carneiro; Christopher Hartl; Ryan Poplin; Guillermo Del Angel; Ami Levy-Moonshine; Tadeusz Jordan; Khalid Shakir; David Roazen; Joel Thibault; Eric Banks; Kiran Garimella; David Altshuler; Stacey Gabriel; Mark A. DePristo
This unit describes how to use BWA and the Genome Analysis Toolkit (GATK) to map genome sequencing data to a reference and produce high‐quality variant calls that can be used in downstream analyses. The complete workflow includes the core NGS data‐processing steps that are necessary to make the raw data suitable for analysis by the GATK, as well as the key methods involved in variant discovery using the GATK. Curr. Protoc. Bioinform. 43:11.10.1‐11.10.33.
Nature | 2014
Shaun Purcell; Jennifer L. Moran; Menachem Fromer; Douglas M. Ruderfer; Nadia Solovieff; Panos Roussos; Colm O'Dushlaine; K D Chambert; Sarah E. Bergen; Anna K. Kähler; Laramie Duncan; Eli A. Stahl; Giulio Genovese; Esperanza Fernández; Mark O. Collins; Noboru H. Komiyama; Jyoti S. Choudhary; Patrik K. E. Magnusson; Eric Banks; Khalid Shakir; Kiran Garimella; Timothy Fennell; Mark DePristo; Seth G. N. Grant; Stephen J. Haggarty; Stacey Gabriel; Edward M. Scolnick; Eric S. Lander; Christina M. Hultman; Patrick F. Sullivan
Schizophrenia is a common disease with a complex aetiology, probably involving multiple and heterogeneous genetic factors. Here, by analysing the exome sequences of 2,536 schizophrenia cases and 2,543 controls, we demonstrate a polygenic burden primarily arising from rare (less than 1 in 10,000), disruptive mutations distributed across many genes. Particularly enriched gene sets include the voltage-gated calcium ion channel and the signalling complex formed by the activity-regulated cytoskeleton-associated scaffold protein (ARC) of the postsynaptic density, sets previously implicated by genome-wide association and copy-number variation studies. Similar to reports in autism, targets of the fragile X mental retardation protein (FMRP, product of FMR1) are enriched for case mutations. No individual gene-based test achieves significance after correction for multiple testing and we do not detect any alleles of moderately low frequency (approximately 0.5 to 1 per cent) and moderately large effect. Taken together, these data suggest that population-based exome sequencing can discover risk alleles and complements established gene-mapping paradigms in neuropsychiatric disease.
Human Molecular Genetics | 2011
Minal Çalışkan; Jessica X. Chong; Lawrence H. Uricchio; Rebecca Anderson; Peixian Chen; Carrie Sougnez; Kiran Garimella; Stacey Gabriel; Mark A. DePristo; Khalid Shakir; Dietrich Matern; Soma Das; Darrel Waggoner; Dan L. Nicolae; Carole Ober
Exome sequencing is a powerful tool for discovery of the Mendelian disease genes. Previously, we reported a novel locus for autosomal recessive non-syndromic mental retardation (NSMR) in a consanguineous family [Nolan, D.K., Chen, P., Das, S., Ober, C. and Waggoner, D. (2008) Fine mapping of a locus for nonsyndromic mental retardation on chromosome 19p13. Am. J. Med. Genet. A, 146A, 1414-1422]. Using linkage and homozygosity mapping, we previously localized the gene to chromosome 19p13. The parents of this sibship were recently included in an exome sequencing project. Using a series of filters, we narrowed the putative causal mutation to a single variant site that segregated with NSMR: the mutation was homozygous in five affected siblings but in none of eight unaffected siblings. This mutation causes a substitution of a leucine for a highly conserved proline at amino acid 182 in TECR (trans-2,3-enoyl-CoA reductase), a synaptic glycoprotein. Our results reveal the value of massively parallel sequencing for identification of novel disease genes that could not be found using traditional approaches and identifies only the seventh causal mutation for autosomal recessive NSMR.
PLOS Genetics | 2013
Li Liu; Aniko Sabo; Benjamin M. Neale; Uma Nagaswamy; Christine Stevens; Elaine T. Lim; Corneliu A. Bodea; Donna M. Muzny; Jeffrey G. Reid; Eric Banks; Hillary Coon; Mark A. DePristo; Huyen Dinh; Tim Fennel; Jason Flannick; Stacey Gabriel; Kiran Garimella; Shannon Gross; Alicia Hawes; Lora Lewis; Vladimir Makarov; Jared Maguire; Irene Newsham; Ryan Poplin; Stephan Ripke; Khalid Shakir; Kaitlin E. Samocha; Yuanqing Wu; Eric Boerwinkle; Joseph D. Buxbaum
We report on results from whole-exome sequencing (WES) of 1,039 subjects diagnosed with autism spectrum disorders (ASD) and 870 controls selected from the NIMH repository to be of similar ancestry to cases. The WES data came from two centers using different methods to produce sequence and to call variants from it. Therefore, an initial goal was to ensure the distribution of rare variation was similar for data from different centers. This proved straightforward by filtering called variants by fraction of missing data, read depth, and balance of alternative to reference reads. Results were evaluated using seven samples sequenced at both centers and by results from the association study. Next we addressed how the data and/or results from the centers should be combined. Gene-based analyses of association was an obvious choice, but should statistics for association be combined across centers (meta-analysis) or should data be combined and then analyzed (mega-analysis)? Because of the nature of many gene-based tests, we showed by theory and simulations that mega-analysis has better power than meta-analysis. Finally, before analyzing the data for association, we explored the impact of population structure on rare variant analysis in these data. Like other recent studies, we found evidence that population structure can confound case-control studies by the clustering of rare variants in ancestry space; yet, unlike some recent studies, for these data we found that principal component-based analyses were sufficient to control for ancestry and produce test statistics with appropriate distributions. After using a variety of gene-based tests and both meta- and mega-analysis, we found no new risk genes for ASD in this sample. Our results suggest that standard gene-based tests will require much larger samples of cases and controls before being effective for gene discovery, even for a disorder like ASD.
bioRxiv | 2017
Ryan Poplin; Valentin Ruano-Rubio; Mark A. DePristo; Timothy Fennell; Mauricio O. Carneiro; Geraldine A. Van der Auwera; David E. Kling; Laura Gauthier; Ami Levy-Moonshine; David Roazen; Khalid Shakir; Joel Thibault; Sheila Chandran; Chris Whelan; Monkol Lek; Stacey Gabriel; Mark J. Daly; Benjamin M. Neale; Daniel G. MacArthur; Eric Banks
Comprehensive disease gene discovery in both common and rare diseases will require the efficient and accurate detection of all classes of genetic variation across tens to hundreds of thousands of human samples. We describe here a novel assembly-based approach to variant calling, the GATK HaplotypeCaller (HC) and Reference Confidence Model (RCM), that determines genotype likelihoods independently per-sample but performs joint calling across all samples within a project simultaneously. We show by calling over 90,000 samples from the Exome Aggregation Consortium (ExAC) that, in contrast to other algorithms, the HC-RCM scales efficiently to very large sample sizes without loss in accuracy; and that the accuracy of indel variant calling is superior in comparison to other algorithms. More importantly, the HC-RCM produces a fully squared-off matrix of genotypes across all samples at every genomic position being investigated. The HC-RCM is a novel, scalable, assembly-based algorithm with abundant applications for population genetics and clinical studies.
American Journal of Human Genetics | 2014
Sophie R. Wang; Vineeta Agarwala; Jason Flannick; Charleston W. K. Chiang; David Altshuler; Alisa Manning; Christopher Hartl; Pierre Fontanillas; Todd Green; Eric Banks; Mark A. DePristo; Ryan Poplin; Khalid Shakir; Timothy Fennell; Jacquelyn Murphy; Noël P. Burtt; Stacey Gabriel; Christian Fuchsberger; Hyun Min Kang; Xueling Sim; Clement Ma; Adam E. Locke; Thomas W. Blackwell; Anne U. Jackson; Tanya M. Teslovich; Heather M. Stringham; Peter S. Chines; Phoenix Kwan; Jeroen R. Huyghe; Adrian Tan