Christophe Ambroise
Centre national de la recherche scientifique
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Christophe Ambroise.
Proceedings of the National Academy of Sciences of the United States of America | 2002
Christophe Ambroise; Geoffrey J. McLachlan
In the context of cancer diagnosis and treatment, we consider the problem of constructing an accurate prediction rule on the basis of a relatively small number of tumor tissue samples of known type containing the expression data on very many (possibly thousands) genes. Recently, results have been presented in the literature suggesting that it is possible to construct a prediction rule from only a few genes such that it has a negligible prediction error rate. However, in these results the test error or the leave-one-out cross-validated error is calculated without allowance for the selection bias. There is no allowance because the rule is either tested on tissue samples that were used in the first instance to select the genes being used in the rule or because the cross-validation of the rule is not external to the selection process; that is, gene selection is not performed in training the rule at each stage of the cross-validation process. We describe how in practice the selection bias can be assessed and corrected for by either performing a cross-validation or applying the bootstrap external to the selection process. We recommend using 10-fold rather than leave-one-out cross-validation, and concerning the bootstrap, we suggest using the so-called .632+ bootstrap error estimate designed to handle overfitted prediction rules. Using two published data sets, we demonstrate that when correction is made for the selection bias, the cross-validated error is no longer zero for a subset of only a few genes.
The Annals of Applied Statistics | 2011
Pierre Latouche; Etienne Birmelé; Christophe Ambroise
Complex systems in nature and in society are often represented as networks, describing the rich set of interactions between objects of interest. Many deterministic and probabilistic clustering methods have been developed to analyze such structures. Given a network, almost all of them partition the vertices into disjoint clusters, according to their connection profile. However, recent studies have shown that these techniques were too restrictive and that most of the existing networks contained overlapping clusters. To tackle this issue, we present in this paper the Overlapping Stochastic Block Model. Our approach allows the vertices to belong to multiple clusters, and, to some extent, generalizes the well-known Stochastic Block Model [Nowicki and Snijders (2001)]. We show that the model is generically identifiable within classes of equivalence and we propose an approximate inference procedure, based on global and local variational techniques. Using toy data sets as well as the French Political Blogosphere network and the transcriptional network of Saccharomyces cerevisiae, we compare our work with other approaches.
Neurocomputing | 2000
Christophe Ambroise; Geniève Sèze; Fouad Badran; Sylvie Thiria
Abstract This paper presents a new method for segmenting multispectral satellite images. The proposed method is unsupervised and consists of two steps. During the first step the pixels of a learning set are summarized by a set of codebook vectors using a Probabilistic Self-Organizing Map (PSOM, Statistique et methodes neuronales, Dunod, Paris, 1997). In a second step the codebook vectors of the map are clustered using Agglomerative Hierarchical Clustering (AHC, Pattern Recognition and Neural Networks, Cambridge University Press, Cambridge, 1996). Each pixel takes the label of its nearest codebook vector. A practical application to Meteosat images illustrates the relevance of our approach.
Pattern Recognition | 2008
Hugo Zanghi; Christophe Ambroise; Vincent Miele
In the context of graph clustering, we consider the problem of simultaneously estimating both the partition of the graph nodes and the parameters of an underlying mixture of affiliation networks. In numerous applications the rapid increase of data size over time makes classical clustering algorithms too slow because of the high computational cost. In such situations online clustering algorithms are an efficient alternative to classical batch algorithms. We present an original online algorithm for graph clustering based on a Erdos-Renyi graph mixture. The relevance of the algorithm is illustrated, using both simulated and real data sets. The real data set is a network extracted from the French political blogosphere and presents an interesting community organization.
Archive | 1997
Christophe Ambroise; Mô Dang; Gérard Govaert
A clustering algorithm for spatial data is presented. It seeks a fuzzy partition which is optimal according to a criterion interpretable as a penalized likelihood. We propose to penalize the energy function exhibited by Hathaway (1986) with a term taking into account spatial contiguity constraints. The structure of the EM algorithm may be used to maximize the proposed criterion. The Maximization step is then unchanged and the Expectation step becomes iterative. The efficiency of the new clustering algorithm has been tested with biological images and compared with other clustering techniques.
Statistical Modelling | 2012
Pierre Latouche; Etienne Birmelé; Christophe Ambroise
It is now widely accepted that knowledge can be acquired from networks by clustering their vertices according to the connection profiles. Many methods have been proposed and in this paper we concentrate on the Stochastic Block Model (SBM). The clustering of vertices and the estimation of SBM model parameters have been subject to previous work, and numerous inference strategies such as variational expectation maximization (EM) and classification EM have been proposed. However, SBM still suffers from a lack of criteria to estimate the number of components in the mixture. To our knowledge, only one model-based criterion, Integrated Complete-data Likelihood (ICL), has been derived for SBM in the literature. It relies on an asymptotic approximation of the integrated complete-data likelihood and recent studies have shown that it tends to be too conservative in the case of small networks. To tackle this issue, we propose a new criterion that we call Integrated Likelihood Variational Bayes (ILvb), based on a non-asymptotic approximation of the marginal likelihood. We describe how the criterion can be computed through a variational Bayes EM algorithm.
Statistics and Computing | 2011
Julien Chiquet; Yves Grandvalet; Christophe Ambroise
Gaussian Graphical Models provide a convenient framework for representing dependencies between variables. Recently, this tool has received a high interest for the discovery of biological networks. The literature focuses on the case where a single network is inferred from a set of measurements. But, as wetlab data is typically scarce, several assays, where the experimental conditions affect interactions, are usually merged to infer a single network. In this paper, we propose two approaches for estimating multiple related graphs, by rendering the closeness assumption into an empirical prior or group penalties. We provide quantitative results demonstrating the benefits of the proposed approaches. The methods presented in this paper are embeded in the R package simone from version 1.0-0 and later.
Electronic Journal of Statistics | 2009
Christophe Ambroise; Julien Chiquet; Catherine Matias
Our concern is selecting the concentration matrixs nonzero coefficients for a sparse Gaussian graphical model in a high-dimensional setting. This corresponds to estimating the graph of conditional dependencies between the variables. We describe a novel framework taking into account a latent structure on the concentration matrix. This latent structure is used to drive a penalty matrix and thus to recover a graphical model with a constrained topology. Our method uses an
PLOS ONE | 2011
Matthieu Bouaziz; Christophe Ambroise; Mickael Guedj
\ell_1
Pattern Recognition Letters | 2006
Aurélien Cord; Christophe Ambroise; Jean Pierre Cocquerez
penalized likelihood criterion. Inference of the graph of conditional dependencies between the variates and of the hidden variables is performed simultaneously in an iterative EM-like algorithm named SIMoNe (Statistical Inference for Modular Networks). Performances are illustrated on synthetic as well as real data, the latter concerning breast cancer. For gene regulation networks, our method can provide a useful insight both on the mutual influence existing between genes, and on the modules existing in the network.