Is this you? Create Your Porfile

Catherine Matias

Centre national de la recherche scientifique

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Catherine Matias is active.

Explore More

Publication

Featured researches published by Catherine Matias.

Annals of Statistics | 2009

IDENTIFIABILITY OF PARAMETERS IN LATENT STRUCTURE MODELS WITH MANY OBSERVED VARIABLES

Elizabeth S. Allman; Catherine Matias; John A. Rhodes

While hidden class models of various types arise in many statistical applications, it is often difficult to establish the identifiability of their parameters. Focusing on models in which there is some structure of independence of some of the observed variables conditioned on hidden ones, we demonstrate a general approach for establishing identifiability utilizing algebraic arguments. A theorem of J. Kruskal for a simple latent-class model with finite state space lies at the core of our results, though we apply it to a diverse set of models. These include mixtures of both finite and nonparametric product distributions, hidden Markov models and random graph mixture models, and lead to a number of new results and improvements to old ones. In the parametric setting, this approach indicates that for such models, the classical definition of identifiability is typically too strong. Instead generic identifiability holds, which implies that the set of nonidentifiable parameters has measure zero, so that parameter inference is still meaningful. In particular, this sheds light on the properties of finite mixtures of Bernoulli products, which have been used for decades despite being known to have nonidentifiable parameters. In the nonparametric setting, we again obtain identifiability only when certain restrictions are placed on the distributions that are mixed, but we explicitly describe the conditions.

Electronic Journal of Statistics | 2009

Inferring sparse Gaussian graphical models with latent structure

Christophe Ambroise; Julien Chiquet; Catherine Matias

Our concern is selecting the concentration matrixs nonzero coefficients for a sparse Gaussian graphical model in a high-dimensional setting. This corresponds to estimating the graph of conditional dependencies between the variables. We describe a novel framework taking into account a latent structure on the concentration matrix. This latent structure is used to drive a penalty matrix and thus to recover a graphical model with a constrained topology. Our method uses an

Bioinformatics | 2009

SIMoNe: Statistical Inference for MOdular NEtworks

Julien Chiquet; Alexander Smith; Gilles Grasseau; Catherine Matias; Christophe Ambroise

\ell_1

Bernoulli | 2015

Convergence of the groups posterior distribution in latent or stochastic block models

Mahendra Mariadassou; Catherine Matias

penalized likelihood criterion. Inference of the graph of conditional dependencies between the variates and of the hidden variables is performed simultaneously in an iterative EM-like algorithm named SIMoNe (Statistical Inference for Modular Networks). Performances are illustrated on synthetic as well as real data, the latter concerning breast cancer. For gene regulation networks, our method can provide a useful insight both on the mutual influence existing between genes, and on the modules existing in the network.

Scandinavian Journal of Statistics | 2006

Parameter Estimation in Pair-hidden Markov Models

Ana Arribas-Gil; Elisabeth Gassiat; Catherine Matias

SUMMARY The R package SIMoNe (Statistical Inference for MOdular NEtworks) enables inference of gene-regulatory networks based on partial correlation coefficients from microarray experiments. Modelling gene expression data with a Gaussian graphical model (hereafter GGM), the algorithm estimates non-zero entries of the concentration matrix, in a sparse and possibly high-dimensional setting. Its originality lies in the fact that it searches for a latent modular structure to drive the inference procedure through adaptive penalization of the concentration matrix. AVAILABILITY Under the GNU General Public Licence at http://cran.r-project.org/web/packages/simone/

Electronic Journal of Statistics | 2008

Adaptivity in convolution models with partially known noise distribution

Cristina Butucea; Catherine Matias; Christophe Pouet

We propose a unified framework for studying both latent and stochastic block models, which are used to cluster simultaneously rows and columns of a data matrix. In this new framework, we study the behaviour of the groups posterior distribution, given the data. We characterize whether it is possible to asymptotically recover the actual groups on the rows and columns of the matrix. In other words, we establish sufficient conditions for the groups posterior distribution to converge (as the size of the data increases) to a Dirac mass located at the actual (random) groups configuration. In particular, we highlight some cases where the model assumes symmetries in the matrix of connection probabilities that prevents from a correct recovering of the groups. We also discuss the validity of these results when the proportion of non-null entries in the data matrix converges to zero.

Systematic Biology | 2015

Cophylogeny Reconstruction via an Approximate Bayesian Computation

Christian Baudet; Beatrice Donati; Blerina Sinaimeri; Pierluigi Crescenzi; Christian Gautier; Catherine Matias; Marie-France Sagot

This paper deals with parameter estimation in pair-hidden Markov models. We first provide a rigorous formalism for these models and discuss possible definitions of likelihoods. The model is biologically motivated and therefore naturally leads to restrictions on the parameter space. Existence of two different information divergence rates is established and a divergence property is shown under additional assumptions. This yields consistency for the parameter in parametrization schemes for which the divergence property holds. Simulations illustrate different cases which are not covered by our results. Copyright 2006 Board of the Foundation of the Scandinavian Journal of Statistics..

Mathematical Methods of Statistics | 2014

Asymptotic normality and efficiency of the maximum likelihood estimator for the parameter of a ballistic random walk in a random environment

Mikael Falconnet; Dasha Loukianova; Catherine Matias

We consider a semiparametric convolution model. We observe random variables having a distribution given by the convolution of some unknown density f and some partially known noise density g. In this work, g is assumed exponentially smooth with stable law having unknown self- similarity index s. In order to ensure identifiability of the model, we re- strict our attention to polynomially smooth, Sobolev-type densities f, with smoothness parameter �. In this context, we first provide a consistent esti- mation procedure for s. This estimator is then plugged-into three different procedures: estimation of the unknown densityf, of the functional R f 2 and goodness-of-fit test of the hypothesis H0 : f = f0, where the alternative H1 is expressed with respect to L2-norm (i.e. has the form 2

Annales De L Institut Henri Poincare-probabilites Et Statistiques | 2009

Adaptive goodness-of-fit testing from indirect observations

Cristina Butucea; Catherine Matias; Christophe Pouet

Despite an increasingly vast literature on cophylogenetic reconstructions for studying host–parasite associations, understanding the common evolutionary history of such systems remains a problem that is far from being solved. Most algorithms for host–parasite reconciliation use an event-based model, where the events include in general (a subset of) cospeciation, duplication, loss, and host switch. All known parsimonious event-based methods then assign a cost to each type of event in order to find a reconstruction of minimum cost. The main problem with this approach is that the cost of the events strongly influences the reconciliation obtained. Some earlier approaches attempt to avoid this problem by finding a Pareto set of solutions and hence by considering event costs under some minimization constraints. To deal with this problem, we developed an algorithm, called Coala, for estimating the frequency of the events based on an approximate Bayesian computation approach. The benefits of this method are 2-fold: (i) it provides more confidence in the set of costs to be used in a reconciliation, and (ii) it allows estimation of the frequency of the events in cases where the data set consists of trees with a large number of taxa. We evaluate our method on simulated and on biological data sets. We show that in both cases, for the same pair of host and parasite trees, different sets of frequencies for the events lead to equally probable solutions. Moreover, often these solutions differ greatly in terms of the number of inferred events. It appears crucial to take this into account before attempting any further biological interpretation of such reconciliations. More generally, we also show that the set of frequencies can vary widely depending on the input host and parasite trees. Indiscriminately applying a standard vector of costs may thus not be a good strategy.

Royal Society Open Science | 2017

Revealing the hidden structure of dynamic ecological networks

Vincent Miele; Catherine Matias

We consider a one-dimensional ballistic random walk evolving in a parametric independent and identically distributed random environment. We study the asymptotic properties of the maximum likelihood estimator of the parameter based on a single observation of the path till the time it reaches a distant site. We prove asymptotic normality for this consistent estimator as the distant site tends to infinity and establish that it achieves the Cramér-Rao bound. We also explore in a simulation setting the numerical behavior of asymptotic confidence regions for the parameter value.

Explore More