Marie Chavent
University of Bordeaux
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Marie Chavent.
Pattern Recognition Letters | 2006
Francisco de A. T. de Carvalho; Renata M. C. R. de Souza; Marie Chavent; Yves Lechevallier
This paper presents a partitional dynamic clustering method for interval data based on adaptive Hausdorff distances. Dynamic clustering algorithms are iterative two-step relocation algorithms involving the construction of the clusters at each iteration and the identification of a suitable representation or prototype (means, axes, probability laws, groups of elements, etc.) for each cluster by locally optimizing an adequacy criterion that measures the fitting between the clusters and their corresponding representatives. In this paper, each pattern is represented by a vector of intervals. Adaptive Hausdorff distances are the measures used to compare two interval vectors. Adaptive distances at each iteration change for each cluster according to its intra-class structure. The advantage of these adaptive distances is that the clustering algorithm is able to recognize clusters of different shapes and sizes. To evaluate this method, experiments with real and synthetic interval data sets were performed. The evaluation is based on an external cluster validity index (corrected Rand index) in a framework of a Monte Carlo experiment with 100 replications. These experiments showed the usefulness of the proposed method.
Pattern Recognition Letters | 1998
Marie Chavent
Abstract The proposed divisive clustering method performs simultaneously a hierarchy of a set of objects and a monothetic characterization of each cluster of the hierarchy. A division is performed according to the within-cluster inertia criterion which is minimized among the bipartitions induced by a set of binary questions. In order to improve the clustering, the algorithm revises at each step the division which has induced the cluster chosen for division.
Journal of Classification | 2002
Marie Chavent; Yves Lechevallier
In order to extend the dynamical clustering algorithm to interval data sets, we define the prototype of a cluster by optimization of a classical adequacy criterion based on Hausdorff distance. Once this class prototype properly defined we give a simple and converging algorithm for this new type of interval data.
Phytochemistry | 2001
Sébastien Mongrand; Alain Badoc; Brigitte Patouille; Chantal Lacomblez; Marie Chavent; Claude Cassagne; Jean-Jacques Bessoule
The fatty acid composition of photosynthetic tissues from 137 species of gymnosperms belonging to 14 families was determined by gas chromatography. Statistical analysis clearly discriminated four groups. Ginkgoaceae, Cycadaceae, Stangeriaceae, Zamiaceae, Sciadopityaceae, Podocarpaceae, Cephalotaxaceae, Taxaceae, Ephedraceae and Welwitschiaceae are in the first group, while Cupressaceae and Araucariaceae are mainly in the second one. The third and the fourth groups composed of Pinaceae species are characterized by the genera Larix, and Abies and Cedrus, respectively. Principal component and discriminant analyses and divisive hierarchical clustering analysis of the 43 Pinaceae species were also performed. A clear-cut separation of the genera Abies, Larix, and Cedrus from the other Pinaceae was evidenced. In addition, a mass analysis of the two main chloroplastic lipids from 14 gymnosperms was performed. The results point to a great originality in gymnosperms since in several species and contrary to the angiosperms, the amount of digalactosyldiacylglycerol exceeds that of monogalactosyldiacylglycerol.
Journal of Classification | 2012
Julie Josse; Marie Chavent; Benoit Liquet; François Husson
A common approach to deal with missing values in multivariate exploratory data analysis consists in minimizing the loss function over all non-missing elements, which can be achieved by EM-type algorithms where an iterative imputation of the missing values is performed during the estimation of the axes and components. This paper proposes such an algorithm, named iterative multiple correspondence analysis, to handle missing values in multiple correspondence analysis (MCA). The algorithm, based on an iterative PCA algorithm, is described and its properties are studied. We point out the overfitting problem and propose a regularized version of the algorithm to overcome this major issue. Finally, performances of the regularized iterative MCA algorithm (implemented in the R-package named missMDA) are assessed from both simulations and a real dataset. Results are promising with respect to other methods such as the missing-data passive modified margin method, an adaptation of the missing passive method used in Gifi’s Homogeneity analysis framework.
Computational Statistics & Data Analysis | 2007
Marie Chavent; Yves Lechevallier; Olivier Briant
DIVCLUS-T is a divisive hierarchical clustering algorithm based on a monothetic bipartitional approach allowing the dendrogram of the hierarchy to be read as a decision tree. It is designed for either numerical or categorical data. Like the Ward agglomerative hierarchical clustering algorithm and the k-means partitioning algorithm, it is based on the minimization of the inertia criterion. However, unlike Ward and k-means, it provides a simple and natural interpretation of the clusters. The price paid by construction in terms of inertia by DIVCLUS-T for this additional interpretation is studied by applying the three algorithms on six databases from the UCI Machine Learning repository.
Archive | 2004
Marie Chavent
The Hausdorff distance between two sets is used in this paper to compare hyper-rectangles. An explicit formula for the optimum class prototype is found in the particular case of the Hausdorff distance for the L ∞ norm. When used for dynamical clustering of interval data, this prototype will ensure that the clustering criterion decreases at each iteration.
Communications in Statistics-theory and Methods | 2008
Marie Chavent; Jérôme Saracco
The uncertainty or the variability of the data may be treated by considering, rather than a single value for each data, the interval of values in which it may fall. This article studies the derivation of basic description statistics for interval-valued datasets. We propose a geometrical approach in the determination of summary statistics (central tendency and dispersion measures) for interval-valued variables.
Advanced Data Analysis and Classification | 2012
Marie Chavent; Vanessa Kuentz-Simonet; Jérôme Saracco
Kiers (Psychometrika 56:197–212, 1991) considered the orthogonal rotation in PCAMIX, a principal component method for a mixture of qualitative and quantitative variables. PCAMIX includes the ordinary principal component analysis and multiple correspondence analysis (MCA) as special cases. In this paper, we give a new presentation of PCAMIX where the principal components and the squared loadings are obtained from a Singular Value Decomposition. The loadings of the quantitative variables and the principal coordinates of the categories of the qualitative variables are also obtained directly. In this context, we propose a computationally efficient procedure for varimax rotation in PCAMIX and a direct solution for the optimal angle of rotation. A simulation study shows the good computational behavior of the proposed algorithm. An application on a real data set illustrates the interest of using rotation in MCA. All source codes are available in the R package “PCAmixdata”.
Archive | 2010
Marie Chavent; Vanessa Kuentz; Jérôme Saracco
In the framework of clustering, the usual aim is to cluster observations and not variables. However the issue of clustering variables clearly appears for dimension reduction, selection of variables or in some case studies. A simple approach for the clustering of variables could be to construct a dissimilarity matrix between the variables and to apply classical clustering methods. But specific methods have been developed for the clustering of variables. In this context center-based clustering algorithms have been proposed for the clustering of quantitative variables. In this article we extend this approach to categorical variables. The homogeneity criterion of a cluster of categorical variables is based on correlation ratios and Multiple Correspondence Analysis is used to determine the latent variable of each cluster. A simulation study shows that the method recovers well the underlying simulated clusters of variables. Finally an application on a real data set also highlights the practical benefits of the proposed approach.