Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Shamil R. Sunyaev is active.

Publication


Featured researches published by Shamil R. Sunyaev.


Nature Methods | 2010

A method and server for predicting damaging missense mutations

Ivan Adzhubei; Steffen Schmidt; Leonid Peshkin; Vasily Ramensky; Anna Gerasimova; Peer Bork; Alexey S. Kondrashov; Shamil R. Sunyaev

To the Editor: Applications of rapidly advancing sequencing technologies exacerbate the need to interpret individual sequence variants. Sequencing of phenotyped clinical subjects will soon become a method of choice in studies of the genetic causes of Mendelian and complex diseases. New exon capture techniques will direct sequencing efforts towards the most informative and easily interpretable protein-coding fraction of the genome. Thus, the demand for computational predictions of the impact of protein sequence variants will continue to grow. Here we present a new method and the corresponding software tool, PolyPhen-2 (http://genetics.bwh.harvard.edu/pph2/), which is different from the early tool PolyPhen1 in the set of predictive features, alignment pipeline, and the method of classification (Fig. 1a). PolyPhen-2 uses eight sequence-based and three structure-based predictive features (Supplementary Table 1) which were selected automatically by an iterative greedy algorithm (Supplementary Methods). Majority of these features involve comparison of a property of the wild-type (ancestral, normal) allele and the corresponding property of the mutant (derived, disease-causing) allele, which together define an amino acid replacement. Most informative features characterize how well the two human alleles fit into the pattern of amino acid replacements within the multiple sequence alignment of homologous proteins, how distant the protein harboring the first deviation from the human wild-type allele is from the human protein, and whether the mutant allele originated at a hypermutable site2. The alignment pipeline selects the set of homologous sequences for the analysis using a clustering algorithm and then constructs and refines their multiple alignment (Supplementary Fig. 1). The functional significance of an allele replacement is predicted from its individual features (Supplementary Figs. 2–4) by Naive Bayes classifier (Supplementary Methods). Figure 1 PolyPhen-2 pipeline and prediction accuracy. (a) Overview of the algorithm. (b) Receiver operating characteristic (ROC) curves for predictions made by PolyPhen-2 using five-fold cross-validation on HumDiv (red) and HumVar3 (light green). UniRef100 (solid ... We used two pairs of datasets to train and test PolyPhen-2. We compiled the first pair, HumDiv, from all 3,155 damaging alleles with known effects on the molecular function causing human Mendelian diseases, present in the UniProt database, together with 6,321 differences between human proteins and their closely related mammalian homologs, assumed to be non-damaging (Supplementary Methods). The second pair, HumVar3, consists of all the 13,032 human disease-causing mutations from UniProt, together with 8,946 human nsSNPs without annotated involvement in disease, which were treated as non-damaging. We found that PolyPhen-2 performance, as presented by its receiver operating characteristic curves, was consistently superior compared to PolyPhen (Fig. 1b) and it also compared favorably with the three other popular prediction tools4–6 (Fig. 1c). For a false positive rate of 20%, PolyPhen-2 achieves the rate of true positive predictions of 92% and 73% on HumDiv and HumVar, respectively (Supplementary Table 2). One reason for a lower accuracy of predictions on HumVar is that nsSNPs assumed to be non-damaging in HumVar contain a sizable fraction of mildly deleterious alleles. In contrast, most of amino acid replacements assumed non-damaging in HumDiv must be close to selective neutrality. Because alleles that are even mildly but unconditionally deleterious cannot be fixed in the evolving lineage, no method based on comparative sequence analysis is ideal for discriminating between drastically and mildly deleterious mutations, which are assigned to the opposite categories in HumVar. Another reason is that HumDiv uses an extra criterion to avoid possible erroneous annotations of damaging mutations. For a mutation, PolyPhen-2 calculates Naive Bayes posterior probability that this mutation is damaging and reports estimates of false positive (the chance that the mutation is classified as damaging when it is in fact non-damaging) and true positive (the chance that the mutation is classified as damaging when it is indeed damaging) rates. A mutation is also appraised qualitatively, as benign, possibly damaging, or probably damaging (Supplementary Methods). The user can choose between HumDiv- and HumVar-trained PolyPhen-2. Diagnostics of Mendelian diseases requires distinguishing mutations with drastic effects from all the remaining human variation, including abundant mildly deleterious alleles. Thus, HumVar-trained PolyPhen-2 should be used for this task. In contrast, HumDiv-trained PolyPhen-2 should be used for evaluating rare alleles at loci potentially involved in complex phenotypes, dense mapping of regions identified by genome-wide association studies, and analysis of natural selection from sequence data, where even mildly deleterious alleles must be treated as damaging.


Nature | 2013

Mutational heterogeneity in cancer and the search for new cancer-associated genes.

Michael S. Lawrence; Petar Stojanov; Paz Polak; Gregory V. Kryukov; Kristian Cibulskis; Andrey Sivachenko; Scott L. Carter; Chip Stewart; Craig H. Mermel; Steven A. Roberts; Adam Kiezun; Peter S. Hammerman; Aaron McKenna; Yotam Drier; Lihua Zou; Alex H. Ramos; Trevor J. Pugh; Nicolas Stransky; Elena Helman; Jaegil Kim; Carrie Sougnez; Lauren Ambrogio; Elizabeth Nickerson; Erica Shefler; Maria L. Cortes; Daniel Auclair; Gordon Saksena; Douglas Voet; Michael S. Noble; Daniel DiCara

Major international projects are underway that are aimed at creating a comprehensive catalogue of all the genes responsible for the initiation and progression of cancer. These studies involve the sequencing of matched tumour–normal samples followed by mathematical analysis to identify those genes in which mutations occur more frequently than expected by random chance. Here we describe a fundamental problem with cancer genome studies: as the sample size increases, the list of putatively significant genes produced by current analytical methods burgeons into the hundreds. The list includes many implausible genes (such as those encoding olfactory receptors and the muscle protein titin), suggesting extensive false-positive findings that overshadow true driver events. We show that this problem stems largely from mutational heterogeneity and provide a novel analytical methodology, MutSigCV, for resolving the problem. We apply MutSigCV to exome sequences from 3,083 tumour–normal pairs and discover extraordinary variation in mutation frequency and spectrum within cancer types, which sheds light on mutational processes and disease aetiology, and in mutation frequency across the genome, which is strongly correlated with DNA replication timing and also with transcriptional activity. By incorporating mutational heterogeneity into the analyses, MutSigCV is able to eliminate most of the apparent artefactual findings and enable the identification of genes truly associated with cancer.


Science | 2012

Systematic Localization of Common Disease-Associated Variation in Regulatory DNA

Matthew T. Maurano; Richard Humbert; Eric Rynes; Robert E. Thurman; Eric Haugen; Hao Wang; Alex Reynolds; Richard Sandstrom; Hongzhu Qu; Jennifer A. Brody; Anthony Shafer; Fidencio Neri; Kristen Lee; Tanya Kutyavin; Sandra Stehling-Sun; Audra K. Johnson; Theresa K. Canfield; Erika Giste; Morgan Diegel; Daniel Bates; R. Scott Hansen; Shane Neph; Peter J. Sabo; Shelly Heimfeld; Antony Raubitschek; Steven F. Ziegler; Chris Cotsapas; Nona Sotoodehnia; Ian A. Glass; Shamil R. Sunyaev

Predictions of Genetic Disease Many genome-wide association studies (GWAS) have identified loci and variants associated with disease, but the ability to predict disease on the basis of these genetic variants remains small. Maurano et al. (p. 1190; see the Perspective by Schadt and Chang; see the cover) characterize the location of GWAS variants in the genome with respect to their proximity to regulatory DNA [marked by deoxyribonuclease I (DNase I) hypersensitive sites] by tissue type, disease, and enrichments in physiologically relevant transcription factor binding sites and networks. They found many noncoding disease associations in regulatory DNA, indicating tissue and developmental-specific regulatory roles for many common genetic variants and thus enabling links to be made between gene regulation and adult-onset disease. Genetic variants that have been associated with diseases are concentrated in regulatory regions of the genome. Genome-wide association studies have identified many noncoding variants associated with common diseases and traits. We show that these variants are concentrated in regulatory DNA marked by deoxyribonuclease I (DNase I) hypersensitive sites (DHSs). Eighty-eight percent of such DHSs are active during fetal development and are enriched in variants associated with gestational exposure–related phenotypes. We identified distant gene targets for hundreds of variant-containing DHSs that may explain phenotype associations. Disease-associated variants systematically perturb transcription factor recognition sequences, frequently alter allelic chromatin states, and form regulatory networks. We also demonstrated tissue-selective enrichment of more weakly disease-associated variants within DHSs and the de novo identification of pathogenic cell types for Crohn’s disease, multiple sclerosis, and an electrocardiogram trait, without prior knowledge of physiological mechanisms. Our results suggest pervasive involvement of regulatory DNA variation in common human disease and provide pathogenic insights into diverse disorders.


Nature | 2012

The accessible chromatin landscape of the human genome.

Robert E. Thurman; Eric Rynes; Richard Humbert; Jeff Vierstra; Matthew T. Maurano; Eric Haugen; Nathan C. Sheffield; Andrew B. Stergachis; Hao Wang; Benjamin Vernot; Kavita Garg; Sam John; Richard Sandstrom; Daniel Bates; Lisa Boatman; Theresa K. Canfield; Morgan Diegel; Douglas Dunn; Abigail K. Ebersol; Tristan Frum; Erika Giste; Audra K. Johnson; Ericka M. Johnson; Tanya Kutyavin; Bryan R. Lajoie; Bum Kyu Lee; Kristen Lee; Darin London; Dimitra Lotakis; Shane Neph

DNase I hypersensitive sites (DHSs) are markers of regulatory DNA and have underpinned the discovery of all classes of cis-regulatory elements including enhancers, promoters, insulators, silencers and locus control regions. Here we present the first extensive map of human DHSs identified through genome-wide profiling in 125 diverse cell and tissue types. We identify ∼2.9 million DHSs that encompass virtually all known experimentally validated cis-regulatory sequences and expose a vast trove of novel elements, most with highly cell-selective regulation. Annotating these elements using ENCODE data reveals novel relationships between chromatin accessibility, transcription, DNA methylation and regulatory factor occupancy patterns. We connect ∼580,000 distal DHSs with their target promoters, revealing systematic pairing of different classes of distal DHSs and specific promoter types. Patterning of chromatin accessibility at many regulatory regions is organized with dozens to hundreds of co-activated elements, and the transcellular DNase I sensitivity pattern at a given region can predict cell-type-specific functional behaviours. The DHS landscape shows signatures of recent functional evolutionary constraint. However, the DHS compartment in pluripotent and immortalized cells exhibits higher mutation rates than that in highly differentiated cells, exposing an unexpected link between chromatin accessibility, proliferative potential and patterns of human variation.


Nature | 2012

Patterns and rates of exonic de novo mutations in autism spectrum disorders

Benjamin M. Neale; Yan Kou; Li Liu; Avi Ma'ayan; Kaitlin E. Samocha; Aniko Sabo; Chiao-Feng Lin; Christine Stevens; Li-San Wang; Vladimir Makarov; Pazi Penchas Polak; Seungtai Yoon; Jared Maguire; Emily L. Crawford; Nicholas G. Campbell; Evan T. Geller; Otto Valladares; Chad Shafer; Han Liu; Tuo Zhao; Guiqing Cai; Jayon Lihm; Ruth Dannenfelser; Omar Jabado; Zuleyma Peralta; Uma Nagaswamy; Donna M. Muzny; Jeffrey G. Reid; Irene Newsham; Yuanqing Wu

Autism spectrum disorders (ASD) are believed to have genetic and environmental origins, yet in only a modest fraction of individuals can specific causes be identified. To identify further genetic risk factors, here we assess the role of de novo mutations in ASD by sequencing the exomes of ASD cases and their parents (n = 175 trios). Fewer than half of the cases (46.3%) carry a missense or nonsense de novo variant, and the overall rate of mutation is only modestly higher than the expected rate. In contrast, the proteins encoded by genes that harboured de novo missense or nonsense mutations showed a higher degree of connectivity among themselves and to previous ASD genes as indexed by protein-protein interaction screens. The small increase in the rate of de novo events, when taken together with the protein interaction results, are consistent with an important but limited role for de novo point mutations in ASD, similar to that documented for de novo copy number variants. Genetic models incorporating these data indicate that most of the observed de novo events are unconnected to ASD; those that do confer risk are distributed across many genes and are incompletely penetrant (that is, not necessarily sufficient for disease). Our results support polygenic models in which spontaneous coding mutations in any of a large number of genes increases risk by 5- to 20-fold. Despite the challenge posed by such models, results from de novo events and a large parallel case–control study provide strong evidence in favour of CHD8 and KATNAL2 as genuine autism risk factors.


Science | 2012

Evolution and Functional Impact of Rare Coding Variation from Deep Sequencing of Human Exomes

Jacob A. Tennessen; Abigail W. Bigham; Timothy D. O'Connor; Wenqing Fu; Eimear E. Kenny; Simon Gravel; Sean McGee; Ron Do; Xiaoming Liu; Goo Jun; Hyun Min Kang; Daniel M. Jordan; Suzanne M. Leal; Stacey Gabriel; Mark J. Rieder; Gonçalo R. Abecasis; David Altshuler; Deborah A. Nickerson; Eric Boerwinkle; Shamil R. Sunyaev; Carlos Bustamante; Michael J. Bamshad; Joshua M. Akey

A Deep Look Into Our Genes Recent debates have focused on the degree of genetic variation and its impact upon health at the genomic level in humans (see the Perspective by Casals and Bertranpetit). Tennessen et al. (p. 64, published online 17 May), looking at all of the protein-coding genes in the human genome, and Nelson et al. (p. 100, published online 17 May), looking at genes that encode drug targets, address this question through deep sequencing efforts on samples from multiple individuals. The findings suggest that most human variation is rare, not shared between populations, and that rare variants are likely to play a role in human health. Most functionally consequential variants in protein-coding genes are rare and, thus, difficult to find. As a first step toward understanding how rare variants contribute to risk for complex diseases, we sequenced 15,585 human protein-coding genes to an average median depth of 111× in 2440 individuals of European (n = 1351) and African (n = 1088) ancestry. We identified over 500,000 single-nucleotide variants (SNVs), the majority of which were rare (86% with a minor allele frequency less than 0.5%), previously unknown (82%), and population-specific (82%). On average, 2.3% of the 13,595 SNVs each person carried were predicted to affect protein function of ~313 genes per genome, and ~95.7% of SNVs predicted to be functionally important were rare. This excess of rare functional variants is due to the combined effects of explosive, recent accelerated population growth and weak purifying selection. Furthermore, we show that large sample sizes will be required to associate rare variants with complex traits.


Proceedings of the National Academy of Sciences of the United States of America | 2012

The mystery of missing heritability: Genetic interactions create phantom heritability

Or Zuk; Eliana Hechter; Shamil R. Sunyaev; Eric S. Lander

Human genetics has been haunted by the mystery of “missing heritability” of common traits. Although studies have discovered >1,200 variants associated with common diseases and traits, these variants typically appear to explain only a minority of the heritability. The proportion of heritability explained by a set of variants is the ratio of (i) the heritability due to these variants (numerator), estimated directly from their observed effects, to (ii) the total heritability (denominator), inferred indirectly from population data. The prevailing view has been that the explanation for missing heritability lies in the numerator—that is, in as-yet undiscovered variants. While many variants surely remain to be found, we show here that a substantial portion of missing heritability could arise from overestimation of the denominator, creating “phantom heritability.” Specifically, (i) estimates of total heritability implicitly assume the trait involves no genetic interactions (epistasis) among loci; (ii) this assumption is not justified, because models with interactions are also consistent with observable data; and (iii) under such models, the total heritability may be much smaller and thus the proportion of heritability explained much larger. For example, 80% of the currently missing heritability for Crohns disease could be due to genetic interactions, if the disease involves interaction among three pathways. In short, missing heritability need not directly correspond to missing variants, because current estimates of total heritability may be significantly inflated by genetic interactions. Finally, we describe a method for estimating heritability from isolated populations that is not inflated by genetic interactions.


Current protocols in human genetics | 2013

Predicting Functional Effect of Human Missense Mutations Using PolyPhen‐2

Ivan Adzhubei; Daniel M. Jordan; Shamil R. Sunyaev

PolyPhen‐2 (Polymorphism Phenotyping v2), available as software and via a Web server, predicts the possible impact of amino acid substitutions on the stability and function of human proteins using structural and comparative evolutionary considerations. It performs functional annotation of single‐nucleotide polymorphisms (SNPs), maps coding SNPs to gene transcripts, extracts protein sequence annotations and structural attributes, and builds conservation profiles. It then estimates the probability of the missense mutation being damaging based on a combination of all these properties. PolyPhen‐2 features include a high‐quality multiple protein sequence alignment pipeline and a prediction method employing machine‐learning classification. The software also integrates the UCSC Genome Browsers human genome annotations and MultiZ multiple alignments of vertebrate genomes with the human genome. PolyPhen‐2 is capable of analyzing large volumes of data produced by next‐generation sequencing projects, thanks to built‐in support for high‐performance computing environments like Grid Engine and Platform LSF. Curr. Protoc. Hum. Genet. 76:7.20.1‐7.20.41.


Nature | 2014

Guidelines for investigating causality of sequence variants in human disease

Daniel G. MacArthur; Teri A. Manolio; David Dimmock; Heidi L. Rehm; Jay Shendure; Gonalo R. Abecasis; David Adams; Russ B. Altman; Euan A. Ashley; Jeffrey C. Barrett; Leslie G. Biesecker; Donald F. Conrad; Greg M. Cooper; Nancy J. Cox; Mark J. Daly; Mark Gerstein; David B. Goldstein; Joel N. Hirschhorn; Suzanne M. Leal; Len A. Pennacchio; John A. Stamatoyannopoulos; Shamil R. Sunyaev; David Valle; Benjamin F. Voight; Wendy Winckler; Chris Gunter

The discovery of rare genetic variants is accelerating, and clear guidelines for distinguishing disease-causing sequence variants from the many potentially functional variants present in any human genome are urgently needed. Without rigorous standards we risk an acceleration of false-positive reports of causality, which would impede the translation of genomic research findings into the clinical diagnostic setting and hinder biological understanding of disease. Here we discuss the key challenges of assessing sequence variants in human disease, integrating both gene-level and variant-level support for causality. We propose guidelines for summarizing confidence in variant pathogenicity and highlight several areas that require further resource development.


American Journal of Human Genetics | 2007

Most Rare Missense Alleles Are Deleterious in Humans: Implications for Complex Disease and Association Studies

Gregory V. Kryukov; Len A. Pennacchio; Shamil R. Sunyaev

The accumulation of mildly deleterious missense mutations in individual human genomes has been proposed to be a genetic basis for complex diseases. The plausibility of this hypothesis depends on quantitative estimates of the prevalence of mildly deleterious de novo mutations and polymorphic variants in humans and on the intensity of selective pressure against them. We combined analysis of mutations causing human Mendelian diseases, of human-chimpanzee divergence, and of systematic data on human genetic variation and found that ~20% of new missense mutations in humans result in a loss of function, whereas ~27% are effectively neutral. Thus, the remaining 53% of new missense mutations have mildly deleterious effects. These mutations give rise to many low-frequency deleterious allelic variants in the human population, as is evident from a new data set of 37 genes sequenced in >1,500 individual human chromosomes. Surprisingly, up to 70% of low-frequency missense alleles are mildly deleterious and are associated with a heterozygous fitness loss in the range 0.001-0.003. Thus, the low allele frequency of an amino acid variant can, by itself, serve as a predictor of its functional significance. Several recent studies have reported a significant excess of rare missense variants in candidate genes or pathways in individuals with extreme values of quantitative phenotypes. These studies would be unlikely to yield results if most rare variants were neutral or if rare variants were not a significant contributor to the genetic component of phenotypic inheritance. Our results provide a justification for these types of candidate-gene (pathway) association studies and imply that mutation-selection balance may be a feasible evolutionary mechanism underlying some common diseases.

Collaboration


Dive into the Shamil R. Sunyaev's collaboration.

Top Co-Authors

Avatar

Peer Bork

University of Würzburg

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Christopher A. Cassa

Brigham and Women's Hospital

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Ron Do

Icahn School of Medicine at Mount Sinai

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge