Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Vincent Vandewalle is active.

Publication


Featured researches published by Vincent Vandewalle.


Neurocomputing | 2014

Mixture of Gaussians for distance estimation with missing data

Emil Eirola; Amaury Lendasse; Vincent Vandewalle; Christophe Biernacki

Many data sets have missing values in practical application contexts, but the majority of commonly studied machine learning methods cannot be applied directly when there are incomplete samples. However, most such methods only depend on the relative differences between samples instead of their particular values, and thus one useful approach is to directly estimate the pairwise distances between all samples in the data set. This is accomplished by fitting a Gaussian mixture model to the data, and using it to derive estimates for the distances. A variant of the model for high-dimensional data with missing values is also studied. Experimental simulations confirm that the proposed method provides accurate estimates compared to alternative methods for estimating distances. In particular, using the mixture model for estimating distances is on average more accurate than using the same model to impute any missing values and then calculating distances. The experimental evaluation additionally shows that more accurately estimating distances lead to improved prediction performance for classification and regression tasks when used as inputs for a neural network.


Communications in Statistics-theory and Methods | 2017

Model-based clustering of Gaussian copulas for mixed data

Matthieu Marbac; Christophe Biernacki; Vincent Vandewalle

ABSTRACT Clustering of mixed data is important yet challenging due to a shortage of conventional distributions for such data. In this article, we propose a mixture model of Gaussian copulas for clustering mixed data. Indeed copulas, and Gaussian copulas in particular, are powerful tools for easily modeling the distribution of multivariate variables. This model clusters data sets with continuous, integer, and ordinal variables (all having a cumulative distribution function) by considering the intra-component dependencies in a similar way to the Gaussian mixture. Indeed, each component of the Gaussian copula mixture produces a correlation coefficient for each pair of variables and its univariate margins follow standard distributions (Gaussian, Poisson, and ordered multinomial) depending on the nature of the variable (continuous, integer, or ordinal). As an interesting by-product, this model generalizes many well-known approaches and provides tools for visualization based on its parameters. The Bayesian inference is achieved with a Metropolis-within-Gibbs sampler. The numerical experiments, on simulated and real data, illustrate the benefits of the proposed model: flexible and meaningful parameterization combined with visualization features.


Journal of Classification | 2015

Model-Based Clustering for Conditionally Correlated Categorical Data

Matthieu Marbac; Christophe Biernacki; Vincent Vandewalle

An extension of the latent class model is presented for clustering categorical data by relaxing the classical “class conditional independence assumption” of variables. This model consists in grouping the variables into inter-independent and intra-dependent blocks, in order to consider the main intra-class correlations. The dependency between variables grouped inside the same block of a class is taken into account by mixing two extreme distributions, which are respectively the independence and the maximum dependency. When the variables are dependent given the class, this approach is expected to reduce the biases of the latent class model. Indeed, it produces a meaningful dependency model with only a few additional parameters. The parameters are estimated, by maximum likelihood, by means of an EM algorithm. Moreover, a Gibbs sampler is used for model selection in order to overcome the computational intractability of the combinatorial problems involved by the block structure search. Two applications on medical and biological data sets show the relevance of this new model. The results strengthen the view that this model is meaningful and that it reduces the biases induced by the conditional independence assumption of the latent class model.


Advanced Data Analysis and Classification | 2016

Latent class model with conditional dependency per modes to cluster categorical data

Matthieu Marbac; Christophe Biernacki; Vincent Vandewalle

We propose a parsimonious extension of the classical latent class model to cluster categorical data by relaxing the conditional independence assumption. Under this new mixture model, named conditional modes model (CMM), variables are grouped into conditionally independent blocks. Each block follows a parsimonious multinomial distribution where the few free parameters model the probabilities of the most likely levels, while the remaining probability mass is uniformly spread over the other levels of the block. Thus, when the conditional independence assumption holds, this model defines parsimonious versions of the standard latent class model. Moreover, when this assumption is violated, the proposed model brings out the main intra-class dependencies between variables, summarizing thus each class with relatively few characteristic levels. The model selection is carried out by an hybrid MCMC algorithm that does not require preliminary parameter estimation. Then, the maximum likelihood estimation is performed via an EM algorithm only for the best model. The model properties are illustrated on simulated data and on three real data sets by using the associated R package CoModes. The results show that this model allows to reduce biases involved by the conditional independence assumption while providing meaningful parameters.


NUMERICAL ANALYSIS AND APPLIED MATHEMATICS ICNAAM 2011: International Conference on Numerical Analysis and Applied Mathematics | 2011

Label Switching in Mixtures

Christophe Biernacki; Vincent Vandewalle

We propose a posterior distribution for which the latent partition is restricted to a special numbering leading to the largest separation with its permutations. Two different measures of separation are proposed, the first one being global but intractable even from very small sample sizes (Kullback divergence), the second one being local and thus very easy to compute (difference of distributions at the MAP). A Gibbs algorithm allows to sample easily according to this new distribution. This procedure is general enough to apply directly with any distribution and some experiments in Gaussian and multinomial settings show particularly encouraging results.


Computational Statistics & Data Analysis | 2018

A tractable multi-partitions clustering

Matthieu Marbac; Vincent Vandewalle

In the framework of model-based clustering, a model allowing several latent class variables is proposed. This model assumes that the distribution of the observed data can be factorized into several independent blocks of variables. Each block is assumed to follow a latent class model ({\it i.e.,} mixture with conditional independence assumption). The proposed model includes variable selection, as a special case, and is able to cope with the mixed-data setting. The simplicity of the model allows to estimate the repartition of the variables into blocks and the mixture parameters simultaneously, thus avoiding to run EM algorithms for each possible repartition of variables into blocks. For the proposed method, a model is defined by the number of blocks, the number of clusters inside each block and the repartition of variables into block. Model selection can be done with two information criteria, the BIC and the MICL, for which an efficient optimization is proposed. The performances of the model are investigated on simulated and real data. It is shown that the proposed method gives a rich interpretation of the dataset at hand ({\it i.e.,} analysis of the repartition of the variables into blocks and analysis of the clusters produced by each block of variables).


Computational Statistics & Data Analysis | 2013

A predictive deviance criterion for selecting a generative model in semi-supervised classification

Vincent Vandewalle; Christophe Biernacki; Gilles Celeux; Gérard Govaert


arXiv: Methodology | 2014

Finite mixture model of conditional dependencies modes to cluster categorical data.

Matthieu Marbac; Christophe Biernacki; Vincent Vandewalle


IEEE PHM 2017 | 2017

Survival analysis with complex covariates: a model-based clustering preprocessing step

Vincent Vandewalle; Christophe Biernacki


ICB Seminars 2017 - 154th Seminar on ”Statistics and clinical practice” | 2017

Dealing with missing data through mixture models

Vincent Vandewalle; Christophe Biernacki

Collaboration


Dive into the Vincent Vandewalle's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Matthieu Marbac

French Institute for Research in Computer Science and Automation

View shared research outputs
Top Co-Authors

Avatar

Cathy Maugis

Institut national des sciences appliquées de Toulouse

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jean-Michel Poggi

Paris Descartes University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Lionel Cucala

University of Montpellier

View shared research outputs
Top Co-Authors

Avatar

Stéphane Chrétien

University of Franche-Comté

View shared research outputs
Top Co-Authors

Avatar

Gérard Govaert

Centre national de la recherche scientifique

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge