Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Andrea Rau is active.

Publication


Featured researches published by Andrea Rau.


Briefings in Bioinformatics | 2013

A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis

Marie-Agnès Dillies; Andrea Rau; Julie Aubert; Christelle Hennequet-Antier; Marine Jeanmougin; Nicolas Servant; Céline Keime; Guillemette Marot; David Castel; Jordi Estellé; Gregory Guernec; Bernd Jagla; Luc Jouneau; Denis Laloë; Caroline Le Gall; Brigitte Schaëffer; Stéphane Le Crom; Mickael Guedj; Florence Jaffrézic

During the last 3 years, a number of approaches for the normalization of RNA sequencing data have emerged in the literature, differing both in the type of bias adjustment and in the statistical strategy adopted. However, as data continue to accumulate, there has been no clear consensus on the appropriate normalization method to be used or the impact of a chosen method on the downstream analysis. In this work, we focus on a comprehensive comparison of seven recently proposed normalization methods for the differential analysis of RNA-seq data, with an emphasis on the use of varied real and simulated datasets involving different species and experimental designs to represent data characteristics commonly observed in practice. Based on this comparison study, we propose practical recommendations on the appropriate normalization method to be used and its impact on the differential analysis of RNA-seq data.


Bioinformatics | 2013

Data-based filtering for replicated high-throughput transcriptome sequencing experiments

Andrea Rau; Mélina Gallopin; Gilles Celeux; Florence Jaffrézic

Motivation: RNA sequencing is now widely performed to study differential expression among experimental conditions. As tests are performed on a large number of genes, stringent false-discovery rate control is required at the expense of detection power. Ad hoc filtering techniques are regularly used to moderate this correction by removing genes with low signal, with little attention paid to their impact on downstream analyses. Results: We propose a data-driven method based on the Jaccard similarity index to calculate a filtering threshold for replicated RNA sequencing data. In comparisons with alternative data filters regularly used in practice, we demonstrate the effectiveness of our proposed method to correctly filter lowly expressed genes, leading to increased detection power for moderately to highly expressed genes. Interestingly, this data-driven threshold varies among experiments, highlighting the interest of the method proposed here. Availability: The proposed filtering method is implemented in the R package HTSFilter available on Bioconductor. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.


Statistical Applications in Genetics and Molecular Biology | 2010

An Empirical Bayesian Method for Estimating Biological Networks from Temporal Microarray Data

Andrea Rau; Florence Jaffrézic; Jean Louis Foulley; R. W. Doerge

Gene regulatory networks refer to the interactions that occur among genes and other cellular products. The topology of these networks can be inferred from measurements of changes in gene expression over time. However, because the measurement device (i.e., microarrays) typically yields information on thousands of genes over few biological replicates, these systems are quite difficult to elucidate. An approach with proven effectiveness for inferring networks is the Dynamic Bayesian Network. We have developed an iterative empirical Bayesian procedure with a Kalman filter that estimates the posterior distributions of network parameters. We compare our method to similar existing methods on simulated data and real microarray time series data. We find that the proposed method performs comparably on both model-based and data-based simulations in considerably less computational time. The R and C code used to implement the proposed method are publicly available in the R package ebdbNet.


Bioinformatics | 2015

Co-expression analysis of high-throughput transcriptome sequencing data with Poisson mixture models.

Andrea Rau; Cathy Maugis-Rabusseau; Marie-Laure Martin-Magniette; Gilles Celeux

MOTIVATION In recent years, gene expression studies have increasingly made use of high-throughput sequencing technology. In turn, research concerning the appropriate statistical methods for the analysis of digital gene expression (DGE) has flourished, primarily in the context of normalization and differential analysis. RESULTS In this work, we focus on the question of clustering DGE profiles as a means to discover groups of co-expressed genes. We propose a Poisson mixture model using a rigorous framework for parameter estimation as well as the choice of the appropriate number of clusters. We illustrate co-expression analyses using our approach on two real RNA-seq datasets. A set of simulation studies also compares the performance of the proposed model with that of several related approaches developed to cluster RNA-seq or serial analysis of gene expression data. AVAILABILITY AND AND IMPLEMENTATION The proposed method is implemented in the open-source R package HTSCluster, available on CRAN. CONTACT [email protected] SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.


BMC Bioinformatics | 2014

Differential meta-analysis of RNA-seq data from multiple studies

Andrea Rau; Guillemette Marot; Florence Jaffrézic

BackgroundHigh-throughput sequencing is now regularly used for studies of the transcriptome (RNA-seq), particularly for comparisons among experimental conditions. For the time being, a limited number of biological replicates are typically considered in such experiments, leading to low detection power for differential expression. As their cost continues to decrease, it is likely that additional follow-up studies will be conducted to re-address the same biological question.ResultsWe demonstrate how p-value combination techniques previously used for microarray meta-analyses can be used for the differential analysis of RNA-seq data from multiple related studies. These techniques are compared to a negative binomial generalized linear model (GLM) including a fixed study effect on simulated data and real data on human melanoma cell lines. The GLM with fixed study effect performed well for low inter-study variation and small numbers of studies, but was outperformed by the meta-analysis methods for moderate to large inter-study variability and larger numbers of studies.ConclusionsThe p-value combination techniques illustrated here are a valuable tool to perform differential meta-analyses of RNA-seq data by appropriately accounting for biological and technical variability within studies as well as additional study-specific effects. An R package metaRNASeq is available on the CRAN (http://cran.r-project.org/web/packages/metaRNASeq).


BMC Systems Biology | 2013

Joint estimation of causal effects from observational and intervention gene expression data

Andrea Rau; Florence Jaffrézic; Gregory Nuel

BackgroundIn recent years, there has been great interest in using transcriptomic data to infer gene regulatory networks. For the time being, methodological development in this area has primarily made use of graphical Gaussian models for observational wild-type data, resulting in undirected graphs that are not able to accurately highlight causal relationships among genes. In the present work, we seek to improve the estimation of causal effects among genes by jointly modeling observational transcriptomic data with arbitrarily complex intervention data obtained by performing partial, single, or multiple gene knock-outs or knock-downs.ResultsUsing the framework of causal Gaussian Bayesian networks, we propose a Markov chain Monte Carlo algorithm with a Mallows proposal model and analytical likelihood maximization to sample from the posterior distribution of causal node orderings, and in turn, to estimate causal effects. The main advantage of the proposed algorithm over previously proposed methods is its flexibility to accommodate any kind of intervention design, including partial or multiple knock-out experiments. Using simulated data as well as data from the Dialogue for Reverse Engineering Assessments and Methods (DREAM) 2007 challenge, the proposed method was compared to two alternative approaches: one requiring a complete, single knock-out design, and one able to model only observational data.ConclusionsThe proposed algorithm was found to perform as well as, and in most cases better, than the alternative methods in terms of accuracy for the estimation of causal effects. In addition, multiple knock-outs proved to contribute valuable additional information compared to single knock-outs. Finally, the simulation study confirmed that it is not possible to estimate the causal ordering of genes from observational data alone. In all cases, we found that the inclusion of intervention experiments enabled more accurate estimation of causal regulatory relationships than the use of wild-type data alone.


PLOS ONE | 2013

A Hierarchical Poisson Log-Normal Model for Network Inference from RNA Sequencing Data

Mélina Gallopin; Andrea Rau; Florence Jaffrézic

Gene network inference from transcriptomic data is an important methodological challenge and a key aspect of systems biology. Although several methods have been proposed to infer networks from microarray data, there is a need for inference methods able to model RNA-seq data, which are count-based and highly variable. In this work we propose a hierarchical Poisson log-normal model with a Lasso penalty to infer gene networks from RNA-seq data; this model has the advantage of directly modelling discrete data and accounting for inter-sample variance larger than the sample mean. Using real microRNA-seq data from breast cancer tumors and simulations, we compare this method to a regularized Gaussian graphical model on log-transformed data, and a Poisson log-linear graphical model with a Lasso penalty on power-transformed data. For data simulated with large inter-sample dispersion, the proposed model performs better than the other methods in terms of sensitivity, specificity and area under the ROC curve. These results show the necessity of methods specifically designed for gene network inference from RNA-seq data.


PLOS ONE | 2014

Impact of the genetic background on the composition of the chicken plasma MiRNome in response to a stress.

Marie-Laure Endale Ahanda; Tatiana Zerjal; Sophie Dhorne-Pollet; Andrea Rau; Amanda M. Cooksey; Elisabetta Giuffra

Circulating extra-cellular microRNAs (miRNAs) have emerged as promising minimally invasive markers in human medicine. We evaluated miRNAs isolated from total plasma as biomarker candidates of a response to an abiotic stress (feed deprivation) in a livestock species. Two chicken lines selected for high (R+) and low (R−) residual feed intake were chosen as an experimental model because of their extreme divergence in feed intake and energy metabolism. Adult R+ and R− cocks were sampled after 16 hours of feed deprivation and again four hours after re-feeding. More than 292 million sequence reads were generated by small RNA-seq of total plasma RNA. A total of 649 mature miRNAs were identified; after quality filtering, 148 miRNAs were retained for further analyses. We identified 23 and 19 differentially abundant miRNAs between feeding conditions and between lines respectively, with only two miRNAs identified in both comparisons. We validated a panel of six differentially abundant miRNAs by RT-qPCR on a larger number of plasma samples and checked their response to feed deprivation in liver. Finally, we evaluated the conservation and tissue distribution of differentially abundant miRNAs in plasma across a variety of red jungle fowl tissues. We show that the chicken plasma miRNome reacts promptly to the alteration of the animal physiological condition driven by a feed deprivation stress. The plasma content of stress-responsive miRNAs is strongly influenced by the genetic background, with differences reflecting the phenotypic divergence acquired through long-term selection, as evidenced by the profiles of conserved miRNAs with a regulatory role in energy metabolism (gga-miR-204, gga-miR-let-7f-5p and gga-miR-122-5p). These results reinforce the emerging view in human medicine that even small genetic differences can have a considerable impact on the resolution of biomarker studies, and provide support for the emerging interest in miRNAs as potential novel and minimally invasive biomarkers for livestock species.


Statistics and Computing | 2012

Reverse engineering gene regulatory networks using approximate Bayesian computation

Andrea Rau; Florence Jaffrézic; Jean Louis Foulley; R. W. Doerge

Gene regulatory networks are collections of genes that interact with one other and with other substances in the cell. By measuring gene expression over time using high-throughput technologies, it may be possible to reverse engineer, or infer, the structure of the gene network involved in a particular cellular process. These gene expression data typically have a high dimensionality and a limited number of biological replicates and time points. Due to these issues and the complexity of biological systems, the problem of reverse engineering networks from gene expression data demands a specialized suite of statistical tools and methodologies. We propose a non-standard adaptation of a simulation-based approach known as Approximate Bayesian Computing based on Markov chain Monte Carlo sampling. This approach is particularly well suited for the inference of gene regulatory networks from longitudinal data. The performance of this approach is investigated via simulations and using longitudinal expression data from a genetic repair system in Escherichia coli.


Briefings in Bioinformatics | 2017

Transformation and model choice for RNA-seq co-expression analysis

Andrea Rau; Cathy Maugis-Rabusseau

Although a large number of clustering algorithms have been proposed to identify groups of co-expressed genes from microarray data, the question of if and how such methods may be applied to RNA sequencing (RNA-seq) data remains unaddressed. In this work, we investigate the use of data transformations in conjunction with Gaussian mixture models for RNA-seq co-expression analyses, as well as a penalized model selection criterion to select both an appropriate transformation and number of clusters present in the data. This approach has the advantage of accounting for per-cluster correlation structures among samples, which can be strong in RNA-seq data. In addition, it provides a rigorous statistical framework for parameter estimation, an objective assessment of data transformations and number of clusters and the possibility of performing diagnostic checks on the quality and homogeneity of the identified clusters. We analyze four varied RNA-seq data sets to illustrate the use of transformations and model selection in conjunction with Gaussian mixture models. Finally, we propose a Bioconductor package coseq (co-expression of RNA-seq data) to facilitate implementation and visualization of the recommended RNA-seq co-expression analyses.

Collaboration


Dive into the Andrea Rau's collaboration.

Top Co-Authors

Avatar

Florence Jaffrézic

Institut national de la recherche agronomique

View shared research outputs
Top Co-Authors

Avatar

Cathy Maugis-Rabusseau

Institut de Mathématiques de Toulouse

View shared research outputs
Top Co-Authors

Avatar

Denis Laloë

Institut national de la recherche agronomique

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Christophe Klopp

Institut national de la recherche agronomique

View shared research outputs
Top Co-Authors

Avatar

Tatiana Zerjal

Université Paris-Saclay

View shared research outputs
Top Co-Authors

Avatar

Marco Moroldo

Institut national de la recherche agronomique

View shared research outputs
Top Co-Authors

Avatar

Silvia Vincent-Naulleau

Institut national de la recherche agronomique

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Adeline Goubil

Institut national de la recherche agronomique

View shared research outputs
Researchain Logo
Decentralizing Knowledge