Cathy Maugis-Rabusseau
Institut de Mathématiques de Toulouse
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Cathy Maugis-Rabusseau.
Bioinformatics | 2015
Andrea Rau; Cathy Maugis-Rabusseau; Marie-Laure Martin-Magniette; Gilles Celeux
MOTIVATION In recent years, gene expression studies have increasingly made use of high-throughput sequencing technology. In turn, research concerning the appropriate statistical methods for the analysis of digital gene expression (DGE) has flourished, primarily in the context of normalization and differential analysis. RESULTS In this work, we focus on the question of clustering DGE profiles as a means to discover groups of co-expressed genes. We propose a Poisson mixture model using a rigorous framework for parameter estimation as well as the choice of the appropriate number of clusters. We illustrate co-expression analyses using our approach on two real RNA-seq datasets. A set of simulation studies also compares the performance of the proposed model with that of several related approaches developed to cluster RNA-seq or serial analysis of gene expression data. AVAILABILITY AND AND IMPLEMENTATION The proposed method is implemented in the open-source R package HTSCluster, available on CRAN. CONTACT [email protected] SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Computational Statistics & Data Analysis | 2016
Panagiotis Papastamoulis; Marie-Laure Martin-Magniette; Cathy Maugis-Rabusseau
Modelling heterogeneity in large datasets of counts under the presence of covariates demands advanced clustering methods. Towards this direction a mixture of Poisson regressions is proposed. Conditionally on the covariates and a cluster, the multivariate distribution is a product of independent Poisson distributions. A variety of different parameterizations is taken into account for the slope of the conditional log-means. Also considered is the case of partitioning the response variables into sets of replicates sharing the same conditional log-mean up to an additive constant. Model parameters are estimated via an Expectation-Maximization algorithm with Newton-Raphson steps. In particular, an efficient initialization is introduced in order to improve the inference: a splitting scheme is combined with a Small-EM strategy. Simulations and application on two real high-throughput sequencing datasets highlight improvements of parameter estimations. The proposed methodology is implemented in the R package poisson.glm.mix, available on CRAN.
Briefings in Bioinformatics | 2017
Andrea Rau; Cathy Maugis-Rabusseau
Although a large number of clustering algorithms have been proposed to identify groups of co-expressed genes from microarray data, the question of if and how such methods may be applied to RNA sequencing (RNA-seq) data remains unaddressed. In this work, we investigate the use of data transformations in conjunction with Gaussian mixture models for RNA-seq co-expression analyses, as well as a penalized model selection criterion to select both an appropriate transformation and number of clusters present in the data. This approach has the advantage of accounting for per-cluster correlation structures among samples, which can be strong in RNA-seq data. In addition, it provides a rigorous statistical framework for parameter estimation, an objective assessment of data transformations and number of clusters and the possibility of performing diagnostic checks on the quality and homogeneity of the identified clusters. We analyze four varied RNA-seq data sets to illustrate the use of transformations and model selection in conjunction with Gaussian mixture models. Finally, we propose a Bioconductor package coseq (co-expression of RNA-seq data) to facilitate implementation and visualization of the recommended RNA-seq co-expression analyses.
Briefings in Bioinformatics | 2016
Guillem Rigaill; Sandrine Balzergue; Véronique Brunaud; Eddy Blondet; Andrea Rau; Odile Rogier; José Caius; Cathy Maugis-Rabusseau; Ludivine Soubigou-Taconnat; Sébastien Aubourg; Claire Lurin; Marie-Laure Martin-Magniette; Etienne Delannoy
Numerous statistical pipelines are now available for the differential analysis of gene expression measured with RNA-sequencing technology. Most of them are based on similar statistical frameworks after normalization, differing primarily in the choice of data distribution, mean and variance estimation strategy and data filtering. We propose an evaluation of the impact of these choices when few biological replicates are available through the use of synthetic data sets. This framework is based on real data sets and allows the exploration of various scenarios differing in the proportion of non-differentially expressed genes. Hence, it provides an evaluation of the key ingredients of the differential analysis, free of the biases associated with the simulation of data using parametric models. Our results show the relevance of a proper modeling of the mean by using linear or generalized linear modeling. Once the mean is properly modeled, the impact of the other parameters on the performance of the test is much less important. Finally, we propose to use the simple visualization of the raw P-value histogram as a practical evaluation criterion of the performance of differential analysis methods on real data sets.
Advanced Data Analysis and Classification | 2018
Gilles Celeux; Cathy Maugis-Rabusseau; Mohammed Sedki
Several methods for variable selection have been proposed in model-based clustering and classification. These make use of backward or forward procedures to define the roles of the variables. Unfortunately, such stepwise procedures are slow and the resulting algorithms inefficient when analyzing large data sets with many variables. In this paper, we propose an alternative regularization approach for variable selection in model-based clustering and classification. In our approach the variables are first ranked using a lasso-like procedure in order to avoid slow stepwise algorithms. Thus, the variable selection methodology of Maugis et al. (Comput Stat Data Anal 53:3872–3882, 2000b) can be efficiently applied to high-dimensional data sets.
arXiv: Applications | 2014
Gilles Celeux; Marie-Laure Martin-Magniette; Cathy Maugis-Rabusseau; Adrian E. Raftery
Esaim: Probability and Statistics | 2013
Cathy Maugis-Rabusseau; Bertrand Michel
Archive | 2011
Andrea Rau; Gilles Celeux; Marie-Laure Martin-Magniette; Cathy Maugis-Rabusseau
Bernoulli | 2016
Béatrice Laurent; Clément Marteau; Cathy Maugis-Rabusseau
Journal of Applied Statistics | 2018
Antoine Godichon-Baggioni; Cathy Maugis-Rabusseau; Andrea Rau