Gérard Govaert
Centre national de la recherche scientifique
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Gérard Govaert.
IEEE Transactions on Pattern Analysis and Machine Intelligence | 2000
Christophe Biernacki; Gilles Celeux; Gérard Govaert
We propose an assessing method of mixture model in a cluster analysis setting with integrated completed likelihood. For this purpose, the observed data are assigned to unknown clusters using a maximum a posteriori operator. Then, the integrated completed likelihood (ICL) is approximated using the Bayesian information criterion (BIC). Numerical experiments on simulated and real data of the resulting ICL criterion show that it performs well both for choosing a mixture model and a relevant number of clusters. In particular, ICL appears to be more robust than BIC to violation of some of the mixture model assumptions and it can select a number of dusters leading to a sensible partitioning of the data.
Computational Statistics & Data Analysis | 1992
Gilles Celeux; Gérard Govaert
Abstract Setting the optimization-based clustering methods under the classification maximum likelihood approach, we define and study a general Classification EM algorithm. Then, we derive from this algorithm two stochastic algorithms, incorporating random perturbations, to reduce the initial-position dependence of the classical optimization clustering algorithms. Numerical experiments, reported for the variance criterion, show that both stochastic algorithms perform well compared with the standard k -means algorithm which is a particular version of the Classification EM algorithm.
Computational Statistics & Data Analysis | 2003
Christophe Biernacki; Gilles Celeux; Gérard Govaert
Simple methods to choose sensible starting values for the EM algorithm to get maximum likelihood parameter estimation in mixture models are compared. They are based on random initialization, using a classification EM algorithm (CEM), a Stochastic EM algorithm (SEM) or previous short runs of EM itself. Those initializations are included in a search/run/select strategy which can be compounded by repeating the three steps. They are compared in the context of multivariate Gaussian mixtures on the basis of numerical experiments on both simulated and real data sets in a target number of iterations. The main conclusions of those numerical experiments are the following. The simple random initialization which is probably the most employed way of initiating EM is often outperformed by strategies using CEM, SEM or shorts runs of EM before running EM. Also, it appears that compounding is generally profitable since using a single run of EM can often lead to suboptimal solutions. Otherwise, none of the experimental strategies can be regarded as the best one and it is difficult to characterize situations where a particular strategy can be expected to outperform the other ones. However, the strategy initiating EM with short runs of EM can be recommended. This strategy, which as far as we know was not used before the present study, has some advantages. It is simple, performs well in a lot of situations presupposing no particular form of the mixture to be fitted to the data and seems little sensitive to noisy data.
Pattern Recognition | 2003
Gérard Govaert; Mohamed Nadif
Abstract Basing cluster analysis on mixture models has become a classical and powerful approach. Until now, this approach, which allows to explain some classic clustering criteria such as the well-known k-means criteria and to propose general criteria, has been developed to classify a set of objects measured on a set of variables. But, for this kind of data, if most clustering procedures are designated to construct an optimal partition of objects or, sometimes, of variables, there exist others methods, named block clustering methods, which consider simultaneously the two sets and organize the data into homogeneous blocks. In this work, a new mixture model called block mixture model is proposed to take into account this situation. This model allows to embed simultaneous clustering of objects and variables in a mixture approach. We first consider this probabilistic model in a general context and we develop a new algorithm of simultaneous partitioning based on the CEM algorithm. Then, we focus on the case of binary data and we show that our approach allows us to extend a block clustering method, which had been proposed in this case. Simplicity, fast convergence and the possibility to process large data sets are the major advantages of the proposed approach.
Computational Statistics & Data Analysis | 2006
Christophe Biernacki; Gilles Celeux; Gérard Govaert; Florent Langrognet
The Mixture Modeling (MIXMOD) program fits mixture models to a given data set for the purposes of density estimation, clustering or discriminant analysis. A large variety of algorithms to estimate the mixture parameters are proposed (EM, Classification EM, Stochastic EM), and it is possible to combine these to yield different strategies for obtaining a sensible maximum for the likelihood (or complete-data likelihood) function. MIXMOD is currently intended to be used for multivariate Gaussian mixtures, and fourteen different Gaussian models can be distinguished according to different assumptions regarding the component variance matrix eigenvalue decomposition. Moreover, different information criteria for choosing a parsimonious model (the number of mixture components, for instance) are included, their suitability depending on the particular perspective (cluster analysis or discriminant analysis). Written in C++, MIXMOD is interfaced with SCILAB and MATLAB. The program, the statistical documentation and the user guide are available on the internet at the following address: http://www-math.univ-fcomte.fr/mixmod/index.php.
Pattern Recognition Letters | 1999
Christophe Biernacki; Gilles Celeux; Gérard Govaert
Abstract The entropy criterion NEC showed good performances for choosing the number of clusters arising from a mixture model. But it was not valid to decide between one and more than one clusters. This note presents a natural extension of this criterion to deal with this situation. Illustrative experiments exhibit good behavior of this modified entropy criterion.
Journal of Statistical Computation and Simulation | 1999
Christophe Biernacki; Gérard Govaert
Using an eigenvalue decomposition of variance matrices, Celeux and Govaert (1993) obtained numerous and powerful models for Gaussian model-based clustering and discriminant analysis. Through Monte Carlo simulations, we compare the performances of many classical criteria to select these models: information criteria as AIC, the Bayesian criterion BIC, classification criteria as NEC and cross-validation. In the clustering context, information criteria and BIC outperform the classification criteria. In the discriminant analysis context, cross-validation shows good performance but information criteria and BIC give satisfactory results as well with, by far, less time-computing.
Journal of Statistical Computation and Simulation | 1993
Gilles Celeux; Gérard Govaert
Generally, the mixture and the classification approaches via maximum likelihood had been contrasted under different underlying assumptions.In the classification approach, the mixing proportions are assumed to be equal whereas, in the mixture approach, there are supposed to be unknown.In this paper, Monte-Carlo numerical experiments comparing both approaches, mixture and classification, in both assumptions, equal and unknown mixing proprotions are reported.These numerical experiments exhibited that assumptions on the mixing proportions is a more sensitive factor than the choice of the clustering approach, especially in the small setting.Morever, the differences between the finited sample and the asymptotic behaviour of both approaches are analyzed through additional simulations.
Computational Statistics & Data Analysis | 2008
Gérard Govaert; Mohamed Nadif
The block or simultaneous clustering problem on a set of objects and a set of variables is embedded in the mixture model. Two algorithms have been developed: block EM as part of the maximum likelihood and fuzzy approaches, and block CEM as part of the classification maximum likelihood approach. A unified framework for obtaining different variants of block EM is proposed. These variants are studied and their performances evaluated in comparison with block CEM, two-way EM and two-way CEM, i.e EM and CEM applied separately to the two sets.
Pattern Recognition Letters | 1998
Christophe Ambroise; Gérard Govaert
Abstract Ambroise et al. (1996) have proposed a clustering algorithm that is well-suited for dealing with spatial data. This algorithm, derived from the EM algorithm (Dempster et al., 1977), has been designed for penalized likelihood estimation in situations with unobserved class labels. Some very satisfactory empirical results lead us to believe that this algorithm converges (Ambroise et al., 1996). However, this convergence has not been proven theoretically. In this paper, we present sufficient conditions and proof of the convergence. A practical application illustrates the use of this algorithm.