Gilbert Saporta
Conservatoire national des arts et métiers
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Gilbert Saporta.
Computational Statistics & Data Analysis | 2007
Marie Plasse; Ndeye Niang; Gilbert Saporta; Alexandre Villeminot; Laurent Leblond
A method to analyse links between binary attributes in a large sparse data set is proposed. Initially the variables are clustered to obtain homogeneous clusters of attributes. Association rules are then mined in each cluster. A graphical comparison of some rule relevancy indexes is presented. It is used to extract best rules depending on the application concerned. The proposed methodology is illustrated by an industrial application from the automotive industry with more than 80 000 vehicles each described by more than 3000 rare attributes.
Journal of the Royal Society Interface | 2008
Serge Zaugg; Gilbert Saporta; E. Emiel van Loon; Heiko Schmaljohann; Felix Liechti
Bird identification with radar is important for bird migration research, environmental impact assessments (e.g. wind farms), aircraft security and radar meteorology. In a study on bird migration, radar signals from birds, insects and ground clutter were recorded. Signals from birds show a typical pattern due to wing flapping. The data were labelled by experts into the four classes BIRD, INSECT, CLUTTER and UFO (unidentifiable signals). We present a classification algorithm aimed at automatic recognition of bird targets. Variables related to signal intensity and wing flapping pattern were extracted (via continuous wavelet transform). We used support vector classifiers to build predictive models. We estimated classification performance via cross validation on four datasets. When data from the same dataset were used for training and testing the classifier, the classification performance was extremely to moderately high. When data from one dataset were used for training and the three remaining datasets were used as test sets, the performance was lower but still extremely to moderately high. This shows that the method generalizes well across different locations or times. Our method provides a substantial gain of time when birds must be identified in large collections of radar signals and it represents the first substantial step in developing a real time bird identification radar system. We provide some guidelines and ideas for future research.
Computational Statistics & Data Analysis | 2005
Cristian Preda; Gilbert Saporta
The clusterwise linear regression is studied when the set of predictor variables forms a L 2 -continuous stochastic process. For each cluster the estimators of the regression coefficients are given by partial least square regression. The number of clusters is treated as unknown and the convergence of the clusterwise algorithm is discussed. The approach is compared with other methods via an application on stock-exchange data.
Computational Statistics & Data Analysis | 2002
Gilbert Saporta
Data fusion is concerned with the problem of merging data bases coming from different sources into a single data base when variables are absent or missing in some files. After a survey of the main techniques, we present some new approaches, in particular one based on homogeneity analysis and future directions. We insist on validation problems and caveats.
Computational Statistics & Data Analysis | 2009
Maurizio Vichi; Gilbert Saporta
A constrained principal component analysis, which aims at a simultaneous clustering of objects and a partitioning of variables, is proposed. The new methodology allows us to identify components with maximum variance, each one a linear combination of a subset of variables. All the subsets form a partition of variables. Simultaneously, a partition of objects is also computed maximizing the between cluster variance. The methodology is formulated in a semi-parametric least-squares framework as a quadratic mixed continuous and integer problem. An alternating least-squares algorithm is proposed to solve the clustering and disjoint PCA. Two applications are given to show the features of the methodology.
Communications in Statistics-theory and Methods | 2003
Dimitris Karlis; Gilbert Saporta; Antonis Spinakis
Abstract A vast literature has been devoted to the assessment of the proper number of eigenvalues that have to be retained in Principal Components Analysis. Most of the publications are based on either distributional assumptions for the underlying populations or on empirical evident. In addition, techniques that are based on bootstrap or cross-validatory techniques have been proposed despite the computational effort implied. In this paper a simple technique based on a control chart approach is proposed for selecting the number of principal components to retain for the analysis. This approach accounts for the sampling variability which can lead to the selection of components that are not in fact statistically significant. The method is compared with other methods and is found to be superior regardless of the underlying distributional properties of the population as well as the existing structure. An illustrative example is provided.
Journal of Econometrics | 1983
J.-C. Deville; Gilbert Saporta
Abstract Correspondence analysis is a technique for studying the relationship between two nominal variables which uses mainly simultaneous graphical displays. It has been generalized to more than two variables under the name of ‘multiple correspondence analysis’. ‘Qualitative harmonic analysis’ is an other extension towards individual time-series where one observes the evolution of a nominal variable through a finite period of time. The present paper is based essentially on the concept of multidimensional scaling by means of barycentric representation.
COMPSTAT 2002 | 2002
Gilbert Saporta; Genane Youness
We propose a methodology for finding the empirical distribution of the Rand’s measure of association when the two partitions only differ by chance. For that purpose we simulate data coming from a latent profile model and we partition them according to 2 groups of variables. We also study two other indices: the first is based on an adaptation of Mac Nemar’s test, the second being Jaccard’s index. Surprisingly, the distributions of the 3 indices are bimodal.
Advanced Data Analysis and Classification | 2010
Genane Youness; Gilbert Saporta
We propose a procedure based on a latent variable model for the comparison of two partitions of different units described by the same set of variables. The null hypothesis here is that the two partitions come from the same underlying mixture model. We define a method of “projecting” partitions using a supervised classification method: once one partition is taken as a reference; the individuals of the second data set are allocated to the clusters of the reference partition; it gives two partitions of the same units of the second data set: the original and the projected one and we evaluate their difference by usual measures of association. The empirical distributions of the association measures are derived by simulation.
Archive | 2010
Valentina Stan; Gilbert Saporta
In PLS approach, it is frequently assumed that the blocks of variables satisfy the assumption of unidimensionality. In order to fulfill at best this hypothesis, we use clustering methods of variables. We illustrate the conjoint use of variables clustering and PLS structural equations modeling on data provided by PSA Company (Peugeot Citroen) on customers’ satisfaction. The data are satisfaction scores on 32 manifest variables given by 2,922 customers.