Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Yakir A. Reshef is active.

Publication


Featured researches published by Yakir A. Reshef.


Nature Genetics | 2015

Partitioning heritability by functional annotation using genome-wide association summary statistics

Hilary Finucane; Brendan Bulik-Sullivan; Alexander Gusev; Gosia Trynka; Yakir A. Reshef; Po-Ru Loh; Verneri Anttila; Han Xu; Chongzhi Zang; Kyle Kai-How Farh; Stephan Ripke; Felix R. Day; Shaun Purcell; Eli A. Stahl; Sara Lindström; John Perry; Yukinori Okada; Soumya Raychaudhuri; Mark J. Daly; Nick Patterson; Benjamin M. Neale; Alkes L. Price

Recent work has demonstrated that some functional categories of the genome contribute disproportionately to the heritability of complex diseases. Here we analyze a broad set of functional elements, including cell type–specific elements, to estimate their polygenic contributions to heritability in genome-wide association studies (GWAS) of 17 complex diseases and traits with an average sample size of 73,599. To enable this analysis, we introduce a new method, stratified LD score regression, for partitioning heritability from GWAS summary statistics while accounting for linked markers. This new method is computationally tractable at very large sample sizes and leverages genome-wide information. Our findings include a large enrichment of heritability in conserved regions across many traits, a very large immunological disease–specific enrichment of heritability in FANTOM5 enhancers and many cell type–specific enrichments, including significant enrichment of central nervous system cell types in the heritability of body mass index, age at menarche, educational attainment and smoking behavior.


Nature Genetics | 2016

Reference-based phasing using the Haplotype Reference Consortium panel

Po-Ru Loh; Petr Danecek; Pier Francesco Palamara; Christian Fuchsberger; Yakir A. Reshef; Hilary Finucane; Sebastian Schoenherr; Lukas Forer; Shane McCarthy; Gonçalo R. Abecasis; Richard Durbin; Alkes L. Price

Haplotype phasing is a fundamental problem in medical and population genetics. Phasing is generally performed via statistical phasing in a genotyped cohort, an approach that can yield high accuracy in very large cohorts but attains lower accuracy in smaller cohorts. Here we instead explore the paradigm of reference-based phasing. We introduce a new phasing algorithm, Eagle2, that attains high accuracy across a broad range of cohort sizes by efficiently leveraging information from large external reference panels (such as the Haplotype Reference Consortium; HRC) using a new data structure based on the positional Burrows-Wheeler transform. We demonstrate that Eagle2 attains a ∼20× speedup and ∼10% increase in accuracy compared to reference-based phasing using SHAPEIT2. On European-ancestry samples, Eagle2 with the HRC panel achieves >2× the accuracy of 1000 Genomes–based phasing. Eagle2 is open source and freely available for HRC-based phasing via the Sanger Imputation Service and the Michigan Imputation Server.


Proceedings of the National Academy of Sciences of the United States of America | 2014

Cleaning up the record on the maximal information coefficient and equitability.

David N. Reshef; Yakir A. Reshef; Michael Mitzenmacher; Pardis C. Sabeti

Although we appreciate Kinney and Atwal’s interest in equitability and maximal information coefficient (MIC), we believe they misrepresent our work. We highlight a few of our main objections below. Fig. 1. Equitability of MIC and mutual information under a range of noise models. The equitability of MIC and mutual information across a subset of noise models analyzed in refs. 1 and 4. For each noise model, the relationships tested are as in ref. 4. In each ... Regarding our original paper (1), Kinney and Atwal (2) state “MIC is said to satisfy not just the heuristic notion of equitability, but also the mathematical criterion of R2 equitability,” the latter being their formalization of the heuristic notion that we introduced. This statement is simply false. We were explicit in our paper that our claims regarding MIC’s performance were based on large-scale simulations: “We tested MIC’s equitability through simulations….[These] show that, for a large collection of test functions with varied sample sizes, noise levels, and noise models, MIC roughly equals the coefficient of determination R2 relative to each respective noiseless function.” Although we mathematically proved several things about MIC, none of our claims imply that it satisfies Kinney and Atwal’s R2 equitability, which would require that MIC exactly equal R2 in the infinite data limit. Thus, their proof that no dependence measure can satisfy R2 equitability, although interesting, does not uncover any error in our work, and their suggestion that it does is a gross misrepresentation. Kinney and Atwal seem ready to toss out equitability as a useful criterion based on their theoretical result. We argue, however, that regardless of whether “perfect” equitability is possible, approximate notions of equitability remain the right goal for many data exploration settings. Just as the theory of NP completeness does not suggest we stop thinking about NP complete problems, but instead that we look for approximations and solutions in restricted cases, an impossibility result about perfect equitability provides focus for further research, but does not mean that useful solutions are unattainable. Similarly, as others have noted (3), Kinney and Atwal’s proof requires a highly permissive noise model, and so the attainability of R2 equitability under more limited noise models such as those in our work remains an open question. Finally, the authors argue that mutual information is more equitable than MIC. However, they provide as justification only a single noise model, only at limiting sample sizes (n≥5,000). As we’ve shown in follow-up work (4), which they themselves cite but fail to address, MIC is more equitable than mutual information estimation under many other realistic noise models even at a sample size of 5,000. Kinney and Atwal have stated, “…it matters how one defines noise” (5), and a useful statistic must indeed be robust to a wide range of noise models. Equally importantly, we’ve established in both our original and follow-up work that at sample size regimes less than 5,000, MIC is more equitable than mutual information estimates across all noise models tested. MIC’s superior equitability in these settings is not an “artifact” we neglected—as Kinney and Atwal suggest—but rather a weakness of mutual information estimation and an important consideration for practitioners. We expect that the understanding of equitability and MIC will improve over time and that better methods may arise. However, accurate representations of the work thus far will allow researchers in the area to most productively and collectively move forward.


Nature Genetics | 2018

Heritability enrichment of specifically expressed genes identifies disease-relevant tissues and cell types

Hilary Finucane; Yakir A. Reshef; Verneri Anttila; Kamil Slowikowski; Alexander Gusev; Andrea Byrnes; Steven Gazal; Po-Ru Loh; Caleb Lareau; Noam Shoresh; Giulio Genovese; Arpiar Saunders; Evan Z. Macosko; Samuela Pollack; John Richard Perry; Jason D. Buenrostro; Bradley E. Bernstein; Soumya Raychaudhuri; Steven A. McCarroll; Benjamin M. Neale; Alkes L. Price

We introduce an approach to identify disease-relevant tissues and cell types by analyzing gene expression data together with genome-wide association study (GWAS) summary statistics. Our approach uses stratified linkage disequilibrium (LD) score regression to test whether disease heritability is enriched in regions surrounding genes with the highest specific expression in a given tissue. We applied our approach to gene expression data from several sources together with GWAS summary statistics for 48 diseases and traits (average N = 169,331) and found significant tissue-specific enrichments (false discovery rate (FDR) < 5%) for 34 traits. In our analysis of multiple tissues, we detected a broad range of enrichments that recapitulated known biology. In our brain-specific analysis, significant enrichments included an enrichment of inhibitory over excitatory neurons for bipolar disorder, and excitatory over inhibitory neurons for schizophrenia and body mass index. Our results demonstrate that our polygenic approach is a powerful way to leverage gene expression data for interpreting GWAS signals.A new method tests whether disease heritability is enriched near genes with high tissue-specific expression. The authors use gene expression data together with GWAS summary statistics for 48 diseases and traits to identify disease-relevant tissues.


bioRxiv | 2015

Partitioning heritability by functional category using GWAS summary statistics

Hilary Finucane; Brendan Bulik-Sullivan; Alexander Gusev; Gosia Trynka; Yakir A. Reshef; Po-Ru Loh; Verneri Anttilla; Han Xu; Chongzhi Zang; Kyle Farh; Stephan Ripke; Felix R. Day; S Purcell; Eli A. Stahl; Sara Lindström; John Perry; Yukinori Okada; Soumya Raychaudhuri; Mark J. Daly; Nick Patterson; Benjamin M. Neale; Alkes L. Price

Recent work has demonstrated that some functional categories of the genome contribute disproportionately to the heritability of complex diseases. Here, we analyze a broad set of functional elements, including cell-type-specific elements, to estimate their polygenic contributions to heritability in genome-wide association studies (GWAS) of 17 complex diseases and traits spanning a total of 1.3 million phenotype measurements. To enable this analysis, we introduce a new method for partitioning heritability from GWAS summary statistics while controlling for linked markers. This new method is computationally tractable at very large sample sizes, and leverages genome-wide information. Our results include a large enrichment of heritability in conserved regions across many traits; a very large immunological disease-specific enrichment of heritability in FANTOM5 enhancers; and many cell-type-specific enrichments including significant enrichment of central nervous system cell types in body mass index, age at menarche, educational attainment, and smoking behavior. These results demonstrate that GWAS can aid in understanding the biological basis of disease and provide direction for functional follow-up.


Nature Genetics | 2018

Transcriptome-wide association study of schizophrenia and chromatin activity yields mechanistic disease insights

Alexander Gusev; Nicholas Mancuso; Hyejung Won; Maria Kousi; Hilary Finucane; Yakir A. Reshef; Lingyun Song; Alexias Safi; Steven A. McCarroll; Benjamin M. Neale; Roel A. Ophoff; Michael Conlon O'Donovan; Gregory E. Crawford; Daniel H. Geschwind; Nicholas Katsanis; Patrick F. Sullivan; Bogdan Pasaniuc; Alkes L. Price

Genome-wide association studies (GWAS) have identified over 100 risk loci for schizophrenia, but the causal mechanisms remain largely unknown. We performed a transcriptome-wide association study (TWAS) integrating a schizophrenia GWAS of 79,845 individuals from the Psychiatric Genomics Consortium with expression data from brain, blood, and adipose tissues across 3,693 primarily control individuals. We identified 157 TWAS-significant genes, of which 35 did not overlap a known GWAS locus. Of these 157 genes, 42 were associated with specific chromatin features measured in independent samples, thus highlighting potential regulatory targets for follow-up. Suppression of one identified susceptibility gene, mapk3, in zebrafish showed a significant effect on neurodevelopmental phenotypes. Expression and splicing from the brain captured most of the TWAS effect across all genes. This large-scale connection of associations to target genes, tissues, and regulatory features is an essential step in moving toward a mechanistic understanding of GWAS.A transcriptome-wide association study integrating genome-wide association data with expression data from brain, blood and adipose tissues identifies new candidate susceptibility genes for schizophrenia, providing a step toward understanding the underlying biology.


Journal of Computational Biology | 2012

Comparing Pedigree Graphs

Bonnie Kirkpatrick; Yakir A. Reshef; Hilary Finucane; Haitao Jiang; Binhai Zhu; Richard M. Karp

Pedigree graphs, or family trees, are typically constructed by an expensive process of examining genealogical records to determine which pairs of individuals are parent and child. New methods to automate this process take as input genetic data from a set of extant individuals and reconstruct ancestral individuals. There is a great need to evaluate the quality of these methods by comparing the estimated pedigree to the true pedigree. In this article, we consider two main pedigree comparison problems. The first is the pedigree isomorphism problem, for which we present a linear-time algorithm for leaf-labeled pedigrees. The second is the pedigree edit distance problem, for which we present (1) several algorithms that are fast and exact in various special cases, and (2) a general, randomized heuristic algorithm. In the negative direction, we first prove that the pedigree isomorphism problem is as hard as the general graph isomorphism problem, and that the sub-pedigree isomorphism problem is NP-hard. We then show that the pedigree edit distance problem is APX-hard in general and NP-hard on leaf-labeled pedigrees. We use simulated pedigrees to compare our edit-distance algorithms to each other as well as to a branch-and-bound algorithm that always finds an optimal solution.


Nature Genetics | 2018

Leveraging molecular quantitative trait loci to understand the genetic architecture of diseases and complex traits

Farhad Hormozdiari; Steven Gazal; Bryce van de Geijn; Hilary Finucane; Chelsea J.-T. Ju; Po-Ru Loh; Armin Schoech; Yakir A. Reshef; Xuanyao Liu; Luke O’Connor; Alexander Gusev; Eleazar Eskin; Alkes L. Price

There is increasing evidence that many risk loci found using genome-wide association studies are molecular quantitative trait loci (QTLs). Here we introduce a new set of functional annotations based on causal posterior probabilities of fine-mapped molecular cis-QTLs, using data from the Genotype-Tissue Expression (GTEx) and BLUEPRINT consortia. We show that these annotations are more strongly enriched for heritability (5.84× for eQTLs; P = 1.19 × 10−31) across 41 diseases and complex traits than annotations containing all significant molecular QTLs (1.80× for expression (e)QTLs). eQTL annotations obtained by meta-analyzing all GTEx tissues generally performed best, whereas tissue-specific eQTL annotations produced stronger enrichments for blood- and brain-related diseases and traits. eQTL annotations restricted to loss-of-function intolerant genes were even more enriched for heritability (17.06×; P = 1.20 × 10−35). All molecular QTLs except splicing QTLs remained significantly enriched in joint analysis, indicating that each of these annotations is uniquely informative for disease and complex trait architectures.A new set of functional annotations based on fine-mapped molecular quantitative trait loci from GTEx and BLUEPRINT consortium data are enriched for disease heritability across 41 diseases and complex traits.


Random Structures and Algorithms | 2013

On Extractors and Exposure-Resilient Functions for Sublogarithmic Entropy

Yakir A. Reshef; Salil P. Vadhan

We study resilient functions and exposure-resilient functions in the low-entropy regime. A resilient function (a.k.a. deterministic extractor for oblivious bit-xing sources) maps any distribution on n-bit strings in which k bits are uniformly random and the rest are xed into an output distribution that is close to uniform. With exposure-resilient functions, all the input bits are random, but we ask that the output be close to uniform conditioned on any subset of n k input bits. In this paper, we focus on the case that k is sublogarithmic in n. We simplify and improve an explicit construction of resilient functions for k sublogarithmic in n due to Kamp and Zuckerman (SICOMP 2006), achieving error exponentially small in k rather than polynomially small in k. Our main result is that when k is sublogarithmic in n, the short output length of this construction (O(logk) output bits) is optimal for extractors computable by a large class of space-bounded streaming algorithms. Next, we show that a random function is a resilient function with high probability if and only if k is superlogarithmic in n, suggesting that our main result may apply more generally. In contrast, we show that a random function is a static (resp. adaptive) exposure-resilient function with high probability even if k is as small as a constant (resp. log logn). No explicit exposure-resilient functions achieving these parameters are known.


The Annals of Applied Statistics | 2018

An empirical study of the maximal and total information coefficients and leading measures of dependence

David N. Reshef; Yakir A. Reshef; Pardis C. Sabeti; Michael Mitzenmacher

In exploratory data analysis, we are often interested in identifying promising pairwise associations for further analysis while filtering out weaker ones. This can be accomplished by computing a measure of dependence on all variable pairs and examining the highest-scoring pairs, provided the measure of dependence used assigns similar scores to equally noisy relationships of different types. This property, called equitability and previously formalized, can be used to assess measures of dependence along with the power of their corresponding independence tests and their runtime. Here we present an empirical evaluation of the equitability, power against independence, and runtime of several leading measures of dependence. These include the two recently introduced and simultaneously computable statistics MICe, whose goal is equitability, and TICe, whose goal is power against independence. Regarding equitability, our analysis finds that MICe is the most equitable method on functional relationships in most of the settings we considered. Regarding power against independence, we find that TICe and Heller and Gorfine’s SDDP share state-of-the-art performance, with several other methods achieving excellent power as well. Our analyses also show evidence for a trade-off between power against independence and equitability consistent with recent theoretical work. Our results suggest that a fast and useful strategy for achieving a combination of power against independence and equitability is to filter relationships by TICe and then to rank the remaining ones using MICe. We confirm our findings on a set of data collected by the World Health Organization.

Collaboration


Dive into the Yakir A. Reshef's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

David N. Reshef

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge