Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Christophe Biernacki is active.

Publication


Featured researches published by Christophe Biernacki.


IEEE Transactions on Pattern Analysis and Machine Intelligence | 2000

Assessing a mixture model for clustering with the integrated completed likelihood

Christophe Biernacki; Gilles Celeux; Gérard Govaert

We propose an assessing method of mixture model in a cluster analysis setting with integrated completed likelihood. For this purpose, the observed data are assigned to unknown clusters using a maximum a posteriori operator. Then, the integrated completed likelihood (ICL) is approximated using the Bayesian information criterion (BIC). Numerical experiments on simulated and real data of the resulting ICL criterion show that it performs well both for choosing a mixture model and a relevant number of clusters. In particular, ICL appears to be more robust than BIC to violation of some of the mixture model assumptions and it can select a number of dusters leading to a sensible partitioning of the data.


Computational Statistics & Data Analysis | 2003

Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models

Christophe Biernacki; Gilles Celeux; Gérard Govaert

Simple methods to choose sensible starting values for the EM algorithm to get maximum likelihood parameter estimation in mixture models are compared. They are based on random initialization, using a classification EM algorithm (CEM), a Stochastic EM algorithm (SEM) or previous short runs of EM itself. Those initializations are included in a search/run/select strategy which can be compounded by repeating the three steps. They are compared in the context of multivariate Gaussian mixtures on the basis of numerical experiments on both simulated and real data sets in a target number of iterations. The main conclusions of those numerical experiments are the following. The simple random initialization which is probably the most employed way of initiating EM is often outperformed by strategies using CEM, SEM or shorts runs of EM before running EM. Also, it appears that compounding is generally profitable since using a single run of EM can often lead to suboptimal solutions. Otherwise, none of the experimental strategies can be regarded as the best one and it is difficult to characterize situations where a particular strategy can be expected to outperform the other ones. However, the strategy initiating EM with short runs of EM can be recommended. This strategy, which as far as we know was not used before the present study, has some advantages. It is simple, performs well in a lot of situations presupposing no particular form of the mixture to be fitted to the data and seems little sensitive to noisy data.


Computational Statistics & Data Analysis | 2006

Model-based cluster and discriminant analysis with the MIXMOD software

Christophe Biernacki; Gilles Celeux; Gérard Govaert; Florent Langrognet

The Mixture Modeling (MIXMOD) program fits mixture models to a given data set for the purposes of density estimation, clustering or discriminant analysis. A large variety of algorithms to estimate the mixture parameters are proposed (EM, Classification EM, Stochastic EM), and it is possible to combine these to yield different strategies for obtaining a sensible maximum for the likelihood (or complete-data likelihood) function. MIXMOD is currently intended to be used for multivariate Gaussian mixtures, and fourteen different Gaussian models can be distinguished according to different assumptions regarding the component variance matrix eigenvalue decomposition. Moreover, different information criteria for choosing a parsimonious model (the number of mixture components, for instance) are included, their suitability depending on the particular perspective (cluster analysis or discriminant analysis). Written in C++, MIXMOD is interfaced with SCILAB and MATLAB. The program, the statistical documentation and the user guide are available on the internet at the following address: http://www-math.univ-fcomte.fr/mixmod/index.php.


Pattern Recognition Letters | 1999

AN IMPROVEMENT OF THE NEC CRITERION FOR ASSESSING THE NUMBER OF CLUSTERS IN A MIXTURE MODEL

Christophe Biernacki; Gilles Celeux; Gérard Govaert

Abstract The entropy criterion NEC showed good performances for choosing the number of clusters arising from a mixture model. But it was not valid to decide between one and more than one clusters. This note presents a natural extension of this criterion to deal with this situation. Illustrative experiments exhibit good behavior of this modified entropy criterion.


Journal of Statistical Computation and Simulation | 1999

Choosing models in model-based clustering and discriminant analysis

Christophe Biernacki; Gérard Govaert

Using an eigenvalue decomposition of variance matrices, Celeux and Govaert (1993) obtained numerous and powerful models for Gaussian model-based clustering and discriminant analysis. Through Monte Carlo simulations, we compare the performances of many classical criteria to select these models: information criteria as AIC, the Bayesian criterion BIC, classification criteria as NEC and cross-validation. In the clustering context, information criteria and BIC outperform the classification criteria. In the discriminant analysis context, cross-validation shows good performance but information criteria and BIC give satisfactory results as well with, by far, less time-computing.


Statistics & Probability Letters | 2003

Degeneracy in the maximum likelihood estimation of univariate Gaussian mixtures with EM

Christophe Biernacki; Stéphane Chrétien

As is well known, the likelihood in the Gaussian mixture is unbounded for any parameters such that a Dirac is placed at any observed sample point. The behavior of the EM algorithm near a degenerated solution is studied. It is established that there exists a domain of attraction around degeneracy and that convergence to these particular solutions is extremely fast. It confirms what many practitioners already noted in their experiments. Some available proposals to avoid degenerating are discussed but the presented convergence results make it possible to defend the pragmatic approach to the degeneracy problem in EM which consists in random restarts.


Neurocomputing | 2014

Mixture of Gaussians for distance estimation with missing data

Emil Eirola; Amaury Lendasse; Vincent Vandewalle; Christophe Biernacki

Many data sets have missing values in practical application contexts, but the majority of commonly studied machine learning methods cannot be applied directly when there are incomplete samples. However, most such methods only depend on the relative differences between samples instead of their particular values, and thus one useful approach is to directly estimate the pairwise distances between all samples in the data set. This is accomplished by fitting a Gaussian mixture model to the data, and using it to derive estimates for the distances. A variant of the model for high-dimensional data with missing values is also studied. Experimental simulations confirm that the proposed method provides accurate estimates compared to alternative methods for estimating distances. In particular, using the mixture model for estimating distances is on average more accurate than using the same model to impute any missing values and then calculating distances. The experimental evaluation additionally shows that more accurately estimating distances lead to improved prediction performance for classification and regression tasks when used as inputs for a neural network.


Statistics and Computing | 2004

Initializing EM using the properties of its trajectories in Gaussian mixtures

Christophe Biernacki

A strategy is proposed to initialize the EM algorithm in the multivariate Gaussian mixture context. It consists in randomly drawing, with a low computational cost in many situations, initial mixture parameters in an appropriate space including all possible EM trajectories. This space is simply defined by two relations between the two first empirical moments and the mixture parameters satisfied by any EM iteration. An experimental study on simulated and real data sets clearly shows that this strategy outperforms classical methods, since it has the nice property to widely explore local maxima of the likelihood function.


Computational Statistics & Data Analysis | 2013

A generative model for rank data based on insertion sort algorithm

Christophe Biernacki; Julien Jacques

An original and meaningful probabilistic generative model for full rank data modelling is proposed. Rank data arise from a sorting mechanism which is generally unobservable for statisticians. Assuming that this process relies on paired comparisons, the insertion sort algorithm is known as being the best candidate in order to minimize the number of potential paired misclassifications for a moderate number of objects to be ordered. Combining this optimality argument with a Bernoulli event during a paired comparison step, a model that possesses desirable theoretical properties, among which are unimodality, symmetry and identifiability is obtained. Maximum likelihood estimation can also be performed easily through an EM or a SEM-Gibbs algorithm (depending on the number of objects to be ordered) by involving the latent initial presentation order of the objects. Finally, the practical relevance of the proposal is illustrated through its adequacy with several real data sets and a comparison with a standard rank data model.


Journal of Applied Statistics | 2010

Extension of model-based classification for binary data when training and test populations differ

Julien Jacques; Christophe Biernacki

Standard discriminant analysis supposes that both the training sample and the test sample are derived from the same population. When these samples arise from populations differing in their descriptive parameters, a generalization of discriminant analysis consists of adapting the classification rule related to the training population to another rule related to the test population, by estimating a link map between both populations. This paper extends an existing work in the multinormal context to the case of binary data. In order to solve the problem of defining a link map between the two binary populations, it is assumed that the binary data result from the discretization of latent Gaussian data. An estimation method and a robustness study are presented, and two applications in a biological context illustrate this work.

Collaboration


Dive into the Christophe Biernacki's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Gérard Govaert

Centre national de la recherche scientifique

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Florent Langrognet

University of Franche-Comté

View shared research outputs
Top Co-Authors

Avatar

Loic Yengo

University of Queensland

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Pierre Frankhauser

University of Franche-Comté

View shared research outputs
Top Co-Authors

Avatar

Stéphane Chrétien

University of Franche-Comté

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge