Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Regev Schweiger is active.

Publication


Featured researches published by Regev Schweiger.


intelligent systems in molecular biology | 2011

Generative probabilistic models for protein—protein interaction networks—the biclique perspective

Regev Schweiger; Michal Linial; Nathan Linial

Motivation: Much of the large-scale molecular data from living cells can be represented in terms of networks. Such networks occupy a central position in cellular systems biology. In the protein–protein interaction (PPI) network, nodes represent proteins and edges represent connections between them, based on experimental evidence. As PPI networks are rich and complex, a mathematical model is sought to capture their properties and shed light on PPI evolution. The mathematical literature contains various generative models of random graphs. It is a major, still largely open question, which of these models (if any) can properly reproduce various biologically interesting networks. Here, we consider this problem where the graph at hand is the PPI network of Saccharomyces cerevisiae. We are trying to distinguishing between a model family which performs a process of copying neighbors, represented by the duplication–divergence (DD) model, and models which do not copy neighbors, with the Barabási–Albert (BA) preferential attachment model as a leading example. Results: The observed property of the network is the distribution of maximal bicliques in the graph. This is a novel criterion to distinguish between models in this area. It is particularly appropriate for this purpose, since it reflects the graphs growth pattern under either model. This test clearly favors the DD model. In particular, for the BA model, the vast majority (92.9%) of the bicliques with both sides ≥4 must be already embedded in the models seed graph, whereas the corresponding figure for the DD model is only 5.1%. Our results, based on the biclique perspective, conclusively show that a naïve unmodified DD model can capture a key aspect of PPI networks. Contact: [email protected]; [email protected]; [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.


Epigenetics & Chromatin | 2017

Genome-wide methylation data mirror ancestry information

Elior Rahmani; Liat Shenhav; Regev Schweiger; Paul Yousefi; Karen Huen; Brenda Eskenazi; Celeste Eng; Scott Huntsman; Donglei Hu; Joshua M. Galanter; Sam S. Oh; Melanie Waldenberger; Konstantin Strauch; Harald Grallert; Thomas Meitinger; Christian Gieger; Nina Holland; Esteban G. Burchard; Noah Zaitlen; Eran Halperin

BackgroundGenetic data are known to harbor information about human demographics, and genotyping data are commonly used for capturing ancestry information by leveraging genome-wide differences between populations. In contrast, it is not clear to what extent population structure is captured by whole-genome DNA methylation data.ResultsWe demonstrate, using three large-cohort 450K methylation array data sets, that ancestry information signal is mirrored in genome-wide DNA methylation data and that it can be further isolated more effectively by leveraging the correlation structure of CpGs with cis-located SNPs. Based on these insights, we propose a method, EPISTRUCTURE, for the inference of ancestry from methylation data, without the need for genotype data.ConclusionsEPISTRUCTURE can be used to infer ancestry information of individuals based on their methylation data in the absence of corresponding genetic data. Although genetic data are often collected in epigenetic studies of large cohorts, these are typically not made publicly available, making the application of EPISTRUCTURE especially useful for anyone working on public data. Implementation of EPISTRUCTURE is available in GLINT, our recently released toolset for DNA methylation analysis at: http://glint-epigenetics.readthedocs.io.


Nucleic Acids Research | 2010

PANDORA: analysis of protein and peptide sets through the hierarchical integration of annotations

Nadav Rappoport; Menachem Fromer; Regev Schweiger; Michal Linial

Derivation of biological meaning from large sets of proteins or genes is a frequent task in genomic and proteomic studies. Such sets often arise from experimental methods including large-scale gene expression experiments and mass spectrometry (MS) proteomics. Large sets of genes or proteins are also the outcome of computational methods such as BLAST search and homology-based classifications. We have developed the PANDORA web server, which functions as a platform for the advanced biological analysis of sets of genes, proteins, or proteolytic peptides. First, the input set is mapped to a set of corresponding proteins. Then, an analysis of the protein set produces a graph-based hierarchy which highlights intrinsic relations amongst biological subsets, in light of their different annotations from multiple annotation resources. PANDORA integrates a large collection of annotation sources (GO, UniProt Keywords, InterPro, Enzyme, SCOP, CATH, Gene-3D, NCBI taxonomy and more) that comprise ∼200 000 different annotation terms associated with ∼3.2 million sequences from UniProtKB. Statistical enrichment based on a binomial approximation of the hypergeometric distribution and corrected for multiple hypothesis tests is calculated using several background sets, including major gene-expression DNA-chip platforms. Users can also visualize either standard or user-defined binary and quantitative properties alongside the proteins. PANDORA 4.2 is available at http://www.pandora.cs.huji.ac.il.


American Journal of Human Genetics | 2016

Fast and Accurate Construction of Confidence Intervals for Heritability

Regev Schweiger; Shachar Kaufman; Reijo Laaksonen; Marcus E. Kleber; Winfried März; Eleazar Eskin; Saharon Rosset; Eran Halperin

Estimation of heritability is fundamental in genetic studies. Recently, heritability estimation using linear mixed models (LMMs) has gained popularity because these estimates can be obtained from unrelated individuals collected in genome-wide association studies. Typically, heritability estimation under LMMs uses the restricted maximum likelihood (REML) approach. Existing methods for the construction of confidence intervals and estimators of SEs for REML rely on asymptotic properties. However, these assumptions are often violated because of the bounded parameter space, statistical dependencies, and limited sample size, leading to biased estimates and inflated or deflated confidence intervals. Here, we show that the estimation of confidence intervals by state-of-the-art methods is inaccurate, especially when the true heritability is relatively low or relatively high. We further show that these inaccuracies occur in datasets including thousands of individuals. Such biases are present, for example, in estimates of heritability of gene expression in the Genotype-Tissue Expression project and of lipid profiles in the Ludwigshafen Risk and Cardiovascular Health study. We also show that often the probability that the genetic component is estimated as 0 is high even when the true heritability is bounded away from 0, emphasizing the need for accurate confidence intervals. We propose a computationally efficient method, ALBI (accurate LMM-based heritability bootstrap confidence intervals), for estimating the distribution of the heritability estimator and for constructing accurate confidence intervals. Our method can be used as an add-on to existing methods for estimating heritability and variance components, such as GCTA, FaST-LMM, GEMMA, or EMMAX.


Genetics | 2017

RL-SKAT: An Exact and Efficient Score Test for Heritability and Set Tests

Regev Schweiger; Omer Weissbrod; Elior Rahmani; Martina Müller-Nurasyid; Sonja Kunze; Christian Gieger; Melanie Waldenberger; Saharon Rosset; Eran Halperin

Testing for the existence of variance components in linear mixed models is a fundamental task in many applicative fields. In statistical genetics, the score test has recently become instrumental in the task of testing an association between a set of genetic markers and a phenotype. With few markers, this amounts to set-based variance component tests, which attempt to increase power in association studies by aggregating weak individual effects. When the entire genome is considered, it allows testing for the heritability of a phenotype, defined as the proportion of phenotypic variance explained by genetics. In the popular score-based Sequence Kernel Association Test (SKAT) method, the assumed distribution of the score test statistic is uncalibrated in small samples, with a correction being computationally expensive. This may cause severe inflation or deflation of P-values, even when the null hypothesis is true. Here, we characterize the conditions under which this discrepancy holds, and show it may occur also in large real datasets, such as a dataset from the Wellcome Trust Case Control Consortium 2 (n = 13,950) study, and, in particular, when the individuals in the sample are unrelated. In these cases, the SKAT approximation tends to be highly overconservative and therefore underpowered. To address this limitation, we suggest an efficient method to calculate exact P-values for the score test in the case of a single variance component and a continuous response vector, which can speed up the analysis by orders of magnitude. Our results enable fast and accurate application of the score test in heritability and in set-based association tests. Our method is available in http://github.com/cozygene/RL-SKAT.


research in computational molecular biology | 2017

A Bayesian Framework for Estimating Cell Type Composition from DNA Methylation Without the Need for Methylation Reference

Elior Rahmani; Regev Schweiger; Liat Shenhav; Eleazar Eskin; Eran Halperin

Genome-wide DNA methylation levels measured from a target tissue across a population have become ubiquitous over the last few years, as methylation status is suggested to hold great potential for better understanding the role of epigenetics. Different cell types are known to have different methylation profiles. Therefore, in the common scenario where methylation levels are collected from heterogeneous sources such as blood, convoluted signals are formed according to the cell type composition of the samples. Knowledge of the cell type proportions is important for statistical analysis, and it may provide novel biological insights and contribute to our understanding of disease biology. Since high resolution cell counting is costly and often logistically impractical to obtain in large studies, targeted methods that are inexpensive and practical for estimating cell proportions are needed. Although a supervised approach has been shown to provide reasonable estimates of cell proportions, this approach leverages scarce reference methylation data from sorted cells which are not available for most tissues and are not appropriate for any target population. Here, we introduce BayesCCE, a Bayesian semi-supervised method that leverages prior knowledge on the cell type composition distribution in the studied tissue. As we demonstrate, such prior information is substantially easier to obtain compared to appropriate reference methylation levels from sorted cells. Using real and simulated data, we show that our proposed method is able to construct a set of components, each corresponding to a single cell type, and together providing up to 50% improvement in correlation when compared with existing reference-free methods. We further make a design suggestion for future data collection efforts by showing that results can be further improved using cell count measurements for a small subset of individuals in the study sample or by incorporating external data of individuals with measured cell counts. Our approach provides a new opportunity to investigate cell compositions in genomic studies of tissues for which it was not possible before.


Bioinformatics | 2017

GLINT: a user-friendly toolset for the analysis of high-throughput DNA-methylation array data

Elior Rahmani; Reut Yedidim; Liat Shenhav; Regev Schweiger; Omer Weissbrod; Noah Zaitlen; Eran Halperin

Summary: GLINT is a user‐friendly command‐line toolset for fast analysis of genome‐wide DNA methylation data generated using the Illumina human methylation arrays. GLINT, which does not require any programming proficiency, allows an easy execution of Epigenome‐Wide Association Study analysis pipeline under different models while accounting for known confounders in methylation data. Availability and Implementation: GLINT is a command‐line software, freely available at https://github.com/cozygene/glint/releases. It requires Python 2.7 and several freely available Python packages. Further information and documentation as well as a quick start tutorial are available at http://glint‐epigenetics.readthedocs.io. Contact: [email protected] or [email protected]


research in computational molecular biology | 2017

Using Stochastic Approximation Techniques to Efficiently Construct Confidence Intervals for Heritability

Regev Schweiger; Eyal Fisher; Elior Rahmani; Liat Shenhav; Saharon Rosset; Eran Halperin

Estimation of heritability is an important task in genetics. The use of linear mixed models (LMMs) to determine narrow-sense SNP-heritability and related quantities has received much recent attention, due of its ability to account for variants with small effect sizes. Typically, heritability estimation under LMMs uses the restricted maximum likelihood (REML) approach. The common way to report the uncertainty in REML estimation uses standard errors (SE), which rely on asymptotic properties. However, these assumptions are often violated because of the bounded parameter space, statistical dependencies, and limited sample size, leading to biased estimates and inflated or deflated confidence intervals. In addition, for larger datasets (e.g., tens of thousands of individuals), the construction of SEs itself may require considerable time, as it requires expensive matrix inversions and multiplications.


bioRxiv | 2018

FactorialHMM: Fast and exact inference in factorial hidden Markov models

Regev Schweiger; Yaniv Erlich; Shai Carmi

Motivation Hidden Markov models (HMMs) are powerful tools for modeling processes along the genome. In a standard genomic HMM, observations are drawn, at each genomic position, from a distribution whose parameters depend on a hidden state; the hidden states evolve along the genome as a Markov chain. Often, the hidden state is the Cartesian product of multiple processes, each evolving independently along the genome. Inference in these so-called Factorial HMMs has a naïve running time that scales as the square of the number of possible states, which by itself increases exponentially with the number of subchains; such a running time scaling is impractical for many applications. While faster algorithms exist, there is no available implementation suitable for developing bioinformatics applications. Results We developed FactorialHMM, a Python package for fast exact inference in Factorial HMMs. Our package allows simulating either directly from the model or from the posterior distribution of states given the observations. Additionally, we allow the inference of all key quantities related to HMMs: (1) the (Viterbi) sequence of states with the highest posterior probability; (2) the likelihood of the data; and (3) the posterior probability (given all observations) of the marginal and pairwise state probabilities. The running time and space requirement of all procedures is linearithmic in the number of possible states. Our package is highly modular, providing the user with maximal flexibility for developing downstream applications. Availability https://github.com/regevs/factorialhmm


bioRxiv | 2018

Cell-type-specific resolution epigenetics without the need for cell sorting or single-cell biology

Elior Rahmani; Regev Schweiger; Brooke Rhead; Lindsey A. Criswell; Lisa F. Barcellos; Eleazar Eskin; Saharon Rosset; Sriram Sankararaman; Eran Halperin

High costs and technical limitations of cell sorting and single-cell techniques currently restrict the collection of large-scale, cell-type-specific DNA methylation data for a large number of individuals. This, in turn, impedes our ability to tackle key biological questions that pertain to variation within a population, such as identification of disease-associated genes at a cell-type-specific resolution. Here, we show mathematically and experimentally that cell-type-specific methylation levels of an individual can be learned from its tissue-level bulk data, as if the sample has been profiled with a single-cell resolution and then signals were aggregated in each cell population separately. Thus, our proposed approach provides an unprecedented way to perform powerful large-scale epigenetic studies with cell-type-specific resolution using relatively easily obtainable large tissue-level data. We revisit previous studies with methylation and reveal novel associations with leukocyte composition in blood and multiple novel cell-type-specific associations with rheumatoid arthritis (RA). For the latter, further evidence demonstrates correlation of the associated CpGs with cell-type-specific expression of known RA risk genes, thus rendering our results consistent with the possibility that contributors to RA pathogenesis are regulated by cell-type-specific changes in methylation.

Collaboration


Dive into the Regev Schweiger's collaboration.

Top Co-Authors

Avatar

Eran Halperin

University of California

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Liat Shenhav

University of California

View shared research outputs
Top Co-Authors

Avatar

Omer Weissbrod

Technion – Israel Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Eleazar Eskin

University of California

View shared research outputs
Top Co-Authors

Avatar

Michal Linial

Hebrew University of Jerusalem

View shared research outputs
Top Co-Authors

Avatar

Christian Gieger

Pennington Biomedical Research Center

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge