Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Elior Rahmani is active.

Publication


Featured researches published by Elior Rahmani.


Nature Methods | 2016

Sparse PCA corrects for cell type heterogeneity in epigenome-wide association studies

Elior Rahmani; Noah Zaitlen; Yael Baran; Celeste Eng; Donglei Hu; Joshua M. Galanter; Sam S. Oh; Esteban G. Burchard; Eleazar Eskin; James Zou; Eran Halperin

In epigenome-wide association studies (EWAS), different methylation profiles of distinct cell types may lead to false discoveries. We introduce ReFACTor, a method based on principal component analysis (PCA) and designed for the correction of cell type heterogeneity in EWAS. ReFACTor does not require knowledge of cell counts, and it provides improved estimates of cell type composition, resulting in improved power and control for false positives in EWAS. Corresponding software is available at http://www.cs.tau.ac.il/~heran/cozygene/software/refactor.html.


Epigenetics & Chromatin | 2017

Genome-wide methylation data mirror ancestry information

Elior Rahmani; Liat Shenhav; Regev Schweiger; Paul Yousefi; Karen Huen; Brenda Eskenazi; Celeste Eng; Scott Huntsman; Donglei Hu; Joshua M. Galanter; Sam S. Oh; Melanie Waldenberger; Konstantin Strauch; Harald Grallert; Thomas Meitinger; Christian Gieger; Nina Holland; Esteban G. Burchard; Noah Zaitlen; Eran Halperin

BackgroundGenetic data are known to harbor information about human demographics, and genotyping data are commonly used for capturing ancestry information by leveraging genome-wide differences between populations. In contrast, it is not clear to what extent population structure is captured by whole-genome DNA methylation data.ResultsWe demonstrate, using three large-cohort 450K methylation array data sets, that ancestry information signal is mirrored in genome-wide DNA methylation data and that it can be further isolated more effectively by leveraging the correlation structure of CpGs with cis-located SNPs. Based on these insights, we propose a method, EPISTRUCTURE, for the inference of ancestry from methylation data, without the need for genotype data.ConclusionsEPISTRUCTURE can be used to infer ancestry information of individuals based on their methylation data in the absence of corresponding genetic data. Although genetic data are often collected in epigenetic studies of large cohorts, these are typically not made publicly available, making the application of EPISTRUCTURE especially useful for anyone working on public data. Implementation of EPISTRUCTURE is available in GLINT, our recently released toolset for DNA methylation analysis at: http://glint-epigenetics.readthedocs.io.


Arthritis & Rheumatism | 2017

Rheumatoid Arthritis Naive T Cells Share Hypermethylation Sites With Synoviocytes

Brooke Rhead; Calliope Holingue; Michael W. Cole; Xiaorong Shao; Hong L. Quach; Diana Quach; Khooshbu Shah; Elizabeth Sinclair; John Graf; Thomas M. Link; Ruby Harrison; Elior Rahmani; Eran Halperin; Wei Wang; Gary S. Firestein; Lisa F. Barcellos; Lindsey A. Criswell

To determine whether differentially methylated CpGs in synovium‐derived fibroblast‐like synoviocytes (FLS) of patients with rheumatoid arthritis (RA) were also differentially methylated in RA peripheral blood (PB) samples.


Nature Methods | 2017

Correcting for cell-type heterogeneity in DNA methylation: a comprehensive evaluation

Elior Rahmani; Noah Zaitlen; Yael Baran; Celeste Eng; Donglei Hu; Joshua M. Galanter; Sam S. Oh; Esteban G. Burchard; Eleazar Eskin; James Zou; Eran Halperin

Rahmani et al. reply: Zheng et al.1 discuss potential pitfalls in our evaluation of ReFACTor2, a reference-free method to account for cell-type heterogeneity. Below, we reproduce their analysis and demonstrate that conclusions cannot be drawn on the basis of their results owing to conceptual and technical flaws in their analysis. We show with our reanalysis and further evidence from experiments on a total of 10 data sets that ReFACTor has improved performance over alternative methods, including the referencebased method of Houseman et al.3. Zheng et al.1 claim that more evidence needs to be provided to determine whether ReFACTor is applicable to tissues other than blood. They generated a “gold standard” set of “true positives” and “true negatives” for breast cancer differentially methylated CpGs (DMCs) and compared ReFACTor to Surrogate Variable Analysis (SVA)4 using EWAS data. There are multiple problems with this analysis. First, the list of ‘true positives’ is unreliable owing to the fact that only two control individuals were used for its construction (Supplementary Note 1). We show through a simple permutation analysis that using only two controls is likely to result in tens of thousands of spurious ‘true positives’ (Supplementary Fig. 1). Therefore, benchmarking on these ‘true positives’ is an invalid approach. Second, Zheng et al.1 report improved sensitivity for SVA; however, they do not report that a simple unadjusted analysis using a standard Bonferroni significance level achieves considerably better sensitivity and greater specificity than SVA (Supplementary Table 1). Thus, the metric used to evaluate performance is also invalid, as the naive method that does not adjust for cell-type heterogeneity outperforms a method that does. A detailed description of this experiment as well as additional flaws in their analysis is given in Supplementary Note 1. The focus of Zheng et al.1 on the potential loss of power in the case of many true positives is of interest. Because a reliable gold standard is currently not available, we examined this scenario by splitting a large set of breast cancer samples (n = 305)5 into two groups on the basis of the reference-based cell-composition estimates provided by Zheng et al.1. One group was labeled as controls, and differential methylation effects were added to all samples in the other group in more than 20,000 sites. The results (Supplementary Note 2 and Supplementary Tables 2–4) show that ReFACTor and SVA obtain similar sensitivity, but ReFACTor captures the cell composition substantially better than SVA and thus adjusts well for false positives, whereas SVA suffers from thousands of false positives. In contrast to the argument of Zheng et al.1, when ReFACTor is correctly applied (Zheng et al.1 did not follow our guidelines), the ReFACTor components are dominated by information about cell-type composition rather than disease status (Supplementary Note 1 and Supplementary Fig. 2). Zheng et al.1 next consider our original experiment in which FACS cell counts were available2. They argue that successively adding components may cause overfitting. However, our point in that section of Rahmani et al.2. was to evaluate the relative performance of different methods as a function of model dimension, and thus there is no issue of overfitting (Supplementary Note 3). They evaluated ReFACTor by measuring the correlation between each cell type and ReFACTor components selected via likelihood ratio test (LRT) and observed that ReFACTor only slightly improves upon the reference-based approach. However, LRT depends on sample size, hence we re-evaluated ReFACTor using LRT with all 560 samples in the data set (as opposed to a subset). Our analysis revealed more significant components, which leads to a substantial improvement, far outperforming the reference-based approach (Supplementary Note 3 and Supplementary Fig. 3). Finally, Zheng et al.1 try to demonstrate the advantage of the reference-based method3 using a very small data set with known cell composition (n = 18)6. However, in their analysis, Zheng et al.1 did not correct for known batch effects, and we found that adjusting for batch information produces similar performance for ReFACTor and the reference-based method (Supplementary Fig. 4). Furthermore, such a small sample size cannot provide statistically significant evidence for the improvement of any method. Specifically, using multiple subsampled FACS data sets of 18 samples, we observed that the performance of both methods was highly variable (Supplementary Fig. 5). Moreover, Zheng et al.1 relied on a method for determining the dimension of the data (RMT)7. We found that the number of dimensions estimated by RMT is linearly determined by the sample size (R2 > 0.95), making it inapplicable (Supplementary Fig. 6 and Supplementary Note 3). Given that firm conclusions cannot be drawn based on small data sets, we further evaluated the performance of ReFACTor and the reference-based method using five large whole-blood data sets (minimum n = 312). We divided the samples in each data set into two groups on the basis of cell-composition distribution (Supplementary Note 3). Then, we conducted an EWAS on the assignment into groups as the phenotype. In this scenario, the assignment into groups is expected to be correlated with the true underlying cell composition, and an insufficient correction will lead to spurious associations. We found that ReFACTor consistently outperformed the reference-based method; particularly, the


Bioinformatics | 2014

EPIQ-efficient detection of SNP-SNP epistatic interactions for quantitative traits.

Ya’ara Arkin; Elior Rahmani; Marcus E. Kleber; Reijo Laaksonen; Winfried März; Eran Halperin

Motivation: Gene–gene interactions are of potential biological and medical interest, as they can shed light on both the inheritance mechanism of a trait and on the underlying biological mechanisms. Evidence of epistatic interactions has been reported in both humans and other organisms. Unlike single-locus genome-wide association studies (GWAS), which proved efficient in detecting numerous genetic loci related with various traits, interaction-based GWAS have so far produced very few reproducible discoveries. Such studies introduce a great computational and statistical burden by necessitating a large number of hypotheses to be tested including all pairs of single nucleotide polymorphisms (SNPs). Thus, many software tools have been developed for interaction-based case–control studies, some leading to reliable discoveries. For quantitative data, on the other hand, only a handful of tools exist, and the computational burden is still substantial. Results: We present an efficient algorithm for detecting epistasis in quantitative GWAS, achieving a substantial runtime speedup by avoiding the need to exhaustively test all SNP pairs using metric embedding and random projections. Unlike previous metric embedding methods for case–control studies, we introduce a new embedding, where each SNP is mapped to two Euclidean spaces. We implemented our method in a tool named EPIQ (EPIstasis detection for Quantitative GWAS), and we show by simulations that EPIQ requires hours of processing time where other methods require days and sometimes weeks. Applying our method to a dataset from the Ludwigshafen risk and cardiovascular health study, we discovered a pair of SNPs with a near-significant interaction (P = 2.2 × 10−13), in only 1.5 h on 10 processors. Availability: https://github.com/yaarasegre/EPIQ Contact: [email protected]


Genetics | 2017

RL-SKAT: An Exact and Efficient Score Test for Heritability and Set Tests

Regev Schweiger; Omer Weissbrod; Elior Rahmani; Martina Müller-Nurasyid; Sonja Kunze; Christian Gieger; Melanie Waldenberger; Saharon Rosset; Eran Halperin

Testing for the existence of variance components in linear mixed models is a fundamental task in many applicative fields. In statistical genetics, the score test has recently become instrumental in the task of testing an association between a set of genetic markers and a phenotype. With few markers, this amounts to set-based variance component tests, which attempt to increase power in association studies by aggregating weak individual effects. When the entire genome is considered, it allows testing for the heritability of a phenotype, defined as the proportion of phenotypic variance explained by genetics. In the popular score-based Sequence Kernel Association Test (SKAT) method, the assumed distribution of the score test statistic is uncalibrated in small samples, with a correction being computationally expensive. This may cause severe inflation or deflation of P-values, even when the null hypothesis is true. Here, we characterize the conditions under which this discrepancy holds, and show it may occur also in large real datasets, such as a dataset from the Wellcome Trust Case Control Consortium 2 (n = 13,950) study, and, in particular, when the individuals in the sample are unrelated. In these cases, the SKAT approximation tends to be highly overconservative and therefore underpowered. To address this limitation, we suggest an efficient method to calculate exact P-values for the score test in the case of a single variance component and a continuous response vector, which can speed up the analysis by orders of magnitude. Our results enable fast and accurate application of the score test in heritability and in set-based association tests. Our method is available in http://github.com/cozygene/RL-SKAT.


research in computational molecular biology | 2017

A Bayesian Framework for Estimating Cell Type Composition from DNA Methylation Without the Need for Methylation Reference

Elior Rahmani; Regev Schweiger; Liat Shenhav; Eleazar Eskin; Eran Halperin

Genome-wide DNA methylation levels measured from a target tissue across a population have become ubiquitous over the last few years, as methylation status is suggested to hold great potential for better understanding the role of epigenetics. Different cell types are known to have different methylation profiles. Therefore, in the common scenario where methylation levels are collected from heterogeneous sources such as blood, convoluted signals are formed according to the cell type composition of the samples. Knowledge of the cell type proportions is important for statistical analysis, and it may provide novel biological insights and contribute to our understanding of disease biology. Since high resolution cell counting is costly and often logistically impractical to obtain in large studies, targeted methods that are inexpensive and practical for estimating cell proportions are needed. Although a supervised approach has been shown to provide reasonable estimates of cell proportions, this approach leverages scarce reference methylation data from sorted cells which are not available for most tissues and are not appropriate for any target population. Here, we introduce BayesCCE, a Bayesian semi-supervised method that leverages prior knowledge on the cell type composition distribution in the studied tissue. As we demonstrate, such prior information is substantially easier to obtain compared to appropriate reference methylation levels from sorted cells. Using real and simulated data, we show that our proposed method is able to construct a set of components, each corresponding to a single cell type, and together providing up to 50% improvement in correlation when compared with existing reference-free methods. We further make a design suggestion for future data collection efforts by showing that results can be further improved using cell count measurements for a small subset of individuals in the study sample or by incorporating external data of individuals with measured cell counts. Our approach provides a new opportunity to investigate cell compositions in genomic studies of tissues for which it was not possible before.


Bioinformatics | 2017

GLINT: a user-friendly toolset for the analysis of high-throughput DNA-methylation array data

Elior Rahmani; Reut Yedidim; Liat Shenhav; Regev Schweiger; Omer Weissbrod; Noah Zaitlen; Eran Halperin

Summary: GLINT is a user‐friendly command‐line toolset for fast analysis of genome‐wide DNA methylation data generated using the Illumina human methylation arrays. GLINT, which does not require any programming proficiency, allows an easy execution of Epigenome‐Wide Association Study analysis pipeline under different models while accounting for known confounders in methylation data. Availability and Implementation: GLINT is a command‐line software, freely available at https://github.com/cozygene/glint/releases. It requires Python 2.7 and several freely available Python packages. Further information and documentation as well as a quick start tutorial are available at http://glint‐epigenetics.readthedocs.io. Contact: [email protected] or [email protected]


research in computational molecular biology | 2017

Using Stochastic Approximation Techniques to Efficiently Construct Confidence Intervals for Heritability

Regev Schweiger; Eyal Fisher; Elior Rahmani; Liat Shenhav; Saharon Rosset; Eran Halperin

Estimation of heritability is an important task in genetics. The use of linear mixed models (LMMs) to determine narrow-sense SNP-heritability and related quantities has received much recent attention, due of its ability to account for variants with small effect sizes. Typically, heritability estimation under LMMs uses the restricted maximum likelihood (REML) approach. The common way to report the uncertainty in REML estimation uses standard errors (SE), which rely on asymptotic properties. However, these assumptions are often violated because of the bounded parameter space, statistical dependencies, and limited sample size, leading to biased estimates and inflated or deflated confidence intervals. In addition, for larger datasets (e.g., tens of thousands of individuals), the construction of SEs itself may require considerable time, as it requires expensive matrix inversions and multiplications.


bioRxiv | 2017

An exact and efficient score test for variance components models

Regev Schweiger; Omer Weissbrod; Elior Rahmani; Martina Müller-Nurasyid; Sonja Kunze; Christian Gieger; Melanie Waldenberger; Saharon Rosset; Eran Halperin

Testing for the existence of variance components in linear mixed models is a fundamental task in many applicative fields. In statistical genetics, the score test has recently become instrumental in the task of testing an association between a set of genetic markers and a phenotype. With few markers, this amounts to set-based variance component tests, which attempt to increase power in association studies by aggregating weak individual effects. When the entire genome is considered, it allows testing for the heritability of a phenotype, defined as the proportion of phenotypic variance explained by genetics. In the popular score-based Sequence Kernel Association Test (SKAT) method, the assumed distribution of the score test statistic is uncalibrated in small samples, with a correction being computationally expensive. This may cause severe inflation or deflation of p-values, even when the null hypothesis is true. Here, we characterize the conditions under which this discrepancy holds, and show it may occur also in large real datasets, such as a dataset from the Wellcome Trust Case Control Consortium 2 (n=13,950) study, and in particular when the individuals in the sample are unrelated. In these cases the SKAT approximation tends to be highly over-conservative and therefore underpowered. To address this limitation, we suggest an efficient method to calculate exact p-values for the score test in the case of a single variance component and a continuous response vector, which can speed up the analysis by orders of magnitude. Our results enable fast and accurate application of the score test in heritability and in set-based association tests. Our method is available in http://github.com/cozygene/RL-SKAT.Testing for the existence of variance components in linear mixed models is a fundamental task in many applicative fields. In statistical genetics, the score test has recently become instrumental in the task of testing an association between a set of genetic markers and a phenotype. With few markers, this amounts to set-based variance component tests, which attempt to increase power in association studies by aggregating weak individual effects. When the entire genome is considered, it allows testing for the heritability of a phenotype, defined as the proportion of phenotypic variance explained by genetics. In the popular score-based Sequence Kernel Association Test (SKAT) method, the assumed distribution of the score test statistic is uncalibrated in small samples, with a correction being computationally expensive. This may cause severe inflation or deflation of p-values, even when the null hypothesis is true. Here, we characterize the conditions under which this discrepancy holds, and show it may occur also in large real datasets, such as a dataset from the Wellcome Trust Case Control Consortium 2 (n=13,950) study, and in particular when the individuals in the sample are unrelated. In these cases the SKAT approximation tends to be highly over-conservative and therefore underpowered. To address this limitation, we suggest an efficient method to calculate exact p-values for the score test in the case of a single variance component and a continuous response vector, which can speed up the analysis by orders of magnitude. Our results enable fast and accurate application of the score test in heritability and in set-based association tests. Our method is available in http://github.com/cozygene/RL-SKAT.

Collaboration


Dive into the Elior Rahmani's collaboration.

Top Co-Authors

Avatar

Eran Halperin

University of California

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Omer Weissbrod

Technion – Israel Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Noah Zaitlen

University of California

View shared research outputs
Top Co-Authors

Avatar

Celeste Eng

University of California

View shared research outputs
Top Co-Authors

Avatar

Christian Gieger

Pennington Biomedical Research Center

View shared research outputs
Top Co-Authors

Avatar

Donglei Hu

University of California

View shared research outputs
Top Co-Authors

Avatar

Eleazar Eskin

University of California

View shared research outputs
Researchain Logo
Decentralizing Knowledge