Catherine Matias
Centre national de la recherche scientifique
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Catherine Matias.
Annals of Statistics | 2009
Elizabeth S. Allman; Catherine Matias; John A. Rhodes
While hidden class models of various types arise in many statistical applications, it is often difficult to establish the identifiability of their parameters. Focusing on models in which there is some structure of independence of some of the observed variables conditioned on hidden ones, we demonstrate a general approach for establishing identifiability utilizing algebraic arguments. A theorem of J. Kruskal for a simple latent-class model with finite state space lies at the core of our results, though we apply it to a diverse set of models. These include mixtures of both finite and nonparametric product distributions, hidden Markov models and random graph mixture models, and lead to a number of new results and improvements to old ones. In the parametric setting, this approach indicates that for such models, the classical definition of identifiability is typically too strong. Instead generic identifiability holds, which implies that the set of nonidentifiable parameters has measure zero, so that parameter inference is still meaningful. In particular, this sheds light on the properties of finite mixtures of Bernoulli products, which have been used for decades despite being known to have nonidentifiable parameters. In the nonparametric setting, we again obtain identifiability only when certain restrictions are placed on the distributions that are mixed, but we explicitly describe the conditions.
Electronic Journal of Statistics | 2009
Christophe Ambroise; Julien Chiquet; Catherine Matias
Our concern is selecting the concentration matrixs nonzero coefficients for a sparse Gaussian graphical model in a high-dimensional setting. This corresponds to estimating the graph of conditional dependencies between the variables. We describe a novel framework taking into account a latent structure on the concentration matrix. This latent structure is used to drive a penalty matrix and thus to recover a graphical model with a constrained topology. Our method uses an
Bioinformatics | 2009
Julien Chiquet; Alexander Smith; Gilles Grasseau; Catherine Matias; Christophe Ambroise
\ell_1
Bernoulli | 2015
Mahendra Mariadassou; Catherine Matias
penalized likelihood criterion. Inference of the graph of conditional dependencies between the variates and of the hidden variables is performed simultaneously in an iterative EM-like algorithm named SIMoNe (Statistical Inference for Modular Networks). Performances are illustrated on synthetic as well as real data, the latter concerning breast cancer. For gene regulation networks, our method can provide a useful insight both on the mutual influence existing between genes, and on the modules existing in the network.
Scandinavian Journal of Statistics | 2006
Ana Arribas-Gil; Elisabeth Gassiat; Catherine Matias
SUMMARY The R package SIMoNe (Statistical Inference for MOdular NEtworks) enables inference of gene-regulatory networks based on partial correlation coefficients from microarray experiments. Modelling gene expression data with a Gaussian graphical model (hereafter GGM), the algorithm estimates non-zero entries of the concentration matrix, in a sparse and possibly high-dimensional setting. Its originality lies in the fact that it searches for a latent modular structure to drive the inference procedure through adaptive penalization of the concentration matrix. AVAILABILITY Under the GNU General Public Licence at http://cran.r-project.org/web/packages/simone/
Electronic Journal of Statistics | 2008
Cristina Butucea; Catherine Matias; Christophe Pouet
We propose a unified framework for studying both latent and stochastic block models, which are used to cluster simultaneously rows and columns of a data matrix. In this new framework, we study the behaviour of the groups posterior distribution, given the data. We characterize whether it is possible to asymptotically recover the actual groups on the rows and columns of the matrix. In other words, we establish sufficient conditions for the groups posterior distribution to converge (as the size of the data increases) to a Dirac mass located at the actual (random) groups configuration. In particular, we highlight some cases where the model assumes symmetries in the matrix of connection probabilities that prevents from a correct recovering of the groups. We also discuss the validity of these results when the proportion of non-null entries in the data matrix converges to zero.
Systematic Biology | 2015
Christian Baudet; Beatrice Donati; Blerina Sinaimeri; Pierluigi Crescenzi; Christian Gautier; Catherine Matias; Marie-France Sagot
This paper deals with parameter estimation in pair-hidden Markov models. We first provide a rigorous formalism for these models and discuss possible definitions of likelihoods. The model is biologically motivated and therefore naturally leads to restrictions on the parameter space. Existence of two different information divergence rates is established and a divergence property is shown under additional assumptions. This yields consistency for the parameter in parametrization schemes for which the divergence property holds. Simulations illustrate different cases which are not covered by our results. Copyright 2006 Board of the Foundation of the Scandinavian Journal of Statistics..
Mathematical Methods of Statistics | 2014
Mikael Falconnet; Dasha Loukianova; Catherine Matias
We consider a semiparametric convolution model. We observe random variables having a distribution given by the convolution of some unknown density f and some partially known noise density g. In this work, g is assumed exponentially smooth with stable law having unknown self- similarity index s. In order to ensure identifiability of the model, we re- strict our attention to polynomially smooth, Sobolev-type densities f, with smoothness parameter �. In this context, we first provide a consistent esti- mation procedure for s. This estimator is then plugged-into three different procedures: estimation of the unknown densityf, of the functional R f 2 and goodness-of-fit test of the hypothesis H0 : f = f0, where the alternative H1 is expressed with respect to L2-norm (i.e. has the form 2
Annales De L Institut Henri Poincare-probabilites Et Statistiques | 2009
Cristina Butucea; Catherine Matias; Christophe Pouet
Despite an increasingly vast literature on cophylogenetic reconstructions for studying host–parasite associations, understanding the common evolutionary history of such systems remains a problem that is far from being solved. Most algorithms for host–parasite reconciliation use an event-based model, where the events include in general (a subset of) cospeciation, duplication, loss, and host switch. All known parsimonious event-based methods then assign a cost to each type of event in order to find a reconstruction of minimum cost. The main problem with this approach is that the cost of the events strongly influences the reconciliation obtained. Some earlier approaches attempt to avoid this problem by finding a Pareto set of solutions and hence by considering event costs under some minimization constraints. To deal with this problem, we developed an algorithm, called Coala, for estimating the frequency of the events based on an approximate Bayesian computation approach. The benefits of this method are 2-fold: (i) it provides more confidence in the set of costs to be used in a reconciliation, and (ii) it allows estimation of the frequency of the events in cases where the data set consists of trees with a large number of taxa. We evaluate our method on simulated and on biological data sets. We show that in both cases, for the same pair of host and parasite trees, different sets of frequencies for the events lead to equally probable solutions. Moreover, often these solutions differ greatly in terms of the number of inferred events. It appears crucial to take this into account before attempting any further biological interpretation of such reconciliations. More generally, we also show that the set of frequencies can vary widely depending on the input host and parasite trees. Indiscriminately applying a standard vector of costs may thus not be a good strategy.
Royal Society Open Science | 2017
Vincent Miele; Catherine Matias
We consider a one-dimensional ballistic random walk evolving in a parametric independent and identically distributed random environment. We study the asymptotic properties of the maximum likelihood estimator of the parameter based on a single observation of the path till the time it reaches a distant site. We prove asymptotic normality for this consistent estimator as the distant site tends to infinity and establish that it achieves the Cramér-Rao bound. We also explore in a simulation setting the numerical behavior of asymptotic confidence regions for the parameter value.