Maurizio Vichi
Sapienza University of Rome
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Maurizio Vichi.
Computational Statistics & Data Analysis | 2001
Maurizio Vichi; Henk A. L. Kiers
A discrete clustering model together with a continuous factorial one are fitted simultaneously to two-way data, with the aim of identifying the best partition of the objects, described by the best orthogonal linear combinations of the variables (factors) according to the least-squares criterion. This methodology named for its features factorialk-means analysis has a very wide range of applications since it fulfills a double objective: data reduction and synthesis, simultaneously in the direction of objects and variables; variable selection in cluster analysis, identifying variables that most contribute to determine the classification of the objects. The least-squares fitting problem proposed here is mathematically formalized as a quadratic constrained minimization problem with mixed variables. An iterative alternating least-squares algorithm based on two main steps is proposed to solve the quadratic constrained problem. Starting from the cluster centroids, the subspace projection is found that leads to the smallest distances between object points and centroids. Updating the centroids, the partition is detected assigning objects to the closest centroids. At each step the algorithm decreases the least-squares criterion, thus converging to an optimal solution. Two data sets are analyzed to show the features of the factorial k-means model. The proposed technique has a fast algorithm that allows researchers to use it also with large data sets.
Archive | 2003
Martin Schader; Wolfgang Gaul; Maurizio Vichi
The volume presents new developments in data analysis and classification and gives an overview of the state of the art in these scientific fields and relevant applications. Areas that receive considerable attention in the book are clustering, discriminitation, data analysis, and statistics, as well as applications in economics, biology, and medicine. The reader will find material on recent technical and methodological developments and a large number of application papers demonstrating the usefulness of the newly developed techniques.
Computational Statistics & Data Analysis | 2009
Maurizio Vichi; Gilbert Saporta
A constrained principal component analysis, which aims at a simultaneous clustering of objects and a partitioning of variables, is proposed. The new methodology allows us to identify components with maximum variance, each one a linear combination of a subset of variables. All the subsets form a partition of variables. Simultaneously, a partition of objects is also computed maximizing the between cluster variance. The methodology is formulated in a semi-parametric least-squares framework as a quadratic mixed continuous and integer problem. An alternating least-squares algorithm is proposed to solve the clustering and disjoint PCA. Two applications are given to show the features of the methodology.
Journal of Classification | 2007
Maurizio Vichi; Roberto Rocci; Henk A. L. Kiers
In this paper two techniques for units clustering and factorial dimensionality reduction of variables and occasions of a three-mode data set are discussed. These techniques can be seen as the simultaneous version of two procedures based on the sequential application of k-means and Tucker2 algorithms and vice versa. The two techniques, T3Clus and 3Fk-means, have been compared theoretically and empirically by a simulation study. In the latter, it has been noted that neither T3Clus nor 3Fk-means outperforms the other in every case. From these results rises the idea to combine the two techniques in a unique general model, named CT3Clus, having T3Clus and 3Fk-means as special cases. A simulation study follows to show the effectiveness of the proposal.
Psychometrika | 2001
A. D. Gordon; Maurizio Vichi
Methodology is described for fitting a fuzzy consensus partition to a set of partitions of the same set of objects. Three models defining median partitions are described: two of them are obtained from a least-squares fit of a set of membership functions, and the third (proposed by Pittau and Vichi) is acquired from a least-squares fit of a set of joint membership functions. The models are illustrated by application to both a set of hard partitions and a set of fuzzy partitions and comparisons are made between them and an alternative approach to obtaining a consensus fuzzy partition proposed by Sato and Sato; a discussion is given of some interesting differences in the results.
Computational Statistics & Data Analysis | 2008
Roberto Rocci; Maurizio Vichi
New methodologies for two-mode (objects and variables) multi-partitioning of two way data are presented. In particular, by reanalyzing the double k-means, that identifies a unique partition for each mode of the data, a relevant extension is discussed which allows to specify more partitions of one mode, conditionally to the partition of the other one. The performance of such generalized double k-means has been tested by both a simulation study and an application to gene microarray data.
Computational Statistics & Data Analysis | 2010
Marieke E. Timmerman; Eva Ceulemans; Henk A. L. Kiers; Maurizio Vichi
Factorial K-means analysis (FKM) and Reduced K-means analysis (RKM) are clustering methods that aim at simultaneously achieving a clustering of the objects and a dimension reduction of the variables. Because a comprehensive comparison between FKM and RKM is lacking in the literature so far, a theoretical and simulation-based comparison between FKM and RKM is provided. It is shown theoretically how FKMs versus RKMs performances are affected by the presence of residuals within the clustering subspace and/or within its orthocomplement in the observed data. The simulation study confirmed that for both FKM and RKM, the cluster membership recovery generally deteriorates with increasing amount of overlap between clusters. Furthermore, the conjectures were confirmed that for FKM the subspace recovery deteriorates with increasing relative sizes of subspace residuals compared to the complement residuals, and that the reverse holds for RKM. As such, FKM and RKM complement each other. When the majority of the variables reflect the clustering structure, and/or standardized variables are being analyzed, RKM can be expected to perform reasonably well. However, because both RKM and FKM may suffer from subspace and membership recovery problems, it is essential to critically evaluate their solutions on the basis of the content of the clustering problem at hand.
Angle Orthodontist | 2009
Rosalia Leonardi; Ersilia Barbato; Maurizio Vichi; Mario Caltabiano
OBJECTIVE To test the null hypothesis that there is no increased prevalence of skeletal anomalies and/or normal variants as evidenced by the cephalometric radiographs of patients with palatally displaced canines (PDC). MATERIALS AND METHODS The treatment records of 38 white subjects between 14 and 20 years old with PDC were collected and evaluated retrospectively. Inclusion criteria for the study required that the case records include good-quality panoramic radiographs and lateral cephalometric radiographs with the first four cervical vertebrae clearly visible. The anomalies recorded for each case included sella bridge, atlanto-occipital ligament calcification or ponticulus posticus, and posterior arch atlas deficiency. A control group consisted of 70 consecutively treated subjects who had no other dental anomalies and whose maxillary canines had erupted normally. Fishers exact test and Pearsons chi-square test were used to determine possible statistically significant differences in the incidence of skeletal anomalies and/or normal variants between the group of patients with PDC and the control group. RESULTS The prevalence of skull anomalies and normal variants seen in cephalometric radiographs was increased in patients with PDC. Because of the presence of ponticulus posticus (Pearsons chi-square, P < .050; Fishers exact test, P < .052), sella bridge (Pearsons chi-square, P < .042; Fishers exact test, P < .042), and posterior arch deficiency (Pearsons chi-square, P < .047; Fishers exact test, P < .039), statistically significant differences were observed between subjects with PDC and the control group. CONCLUSIONS The null hypothesis was rejected. There is an increased prevalence of skull skeletal anomalies and/or normal variants in patients with PDC.
Journal of Classification | 2011
Roberto Rocci; Stefano Antonio Gattone; Maurizio Vichi
Reduced K-means (RKM) and Factorial K-means (FKM) are two data reduction techniques incorporating principal component analysis and K-means into a unified methodology to obtain a reduced set of components for variables and an optimal partition for objects. RKM finds clusters in a reduced space by maximizing the between-clusters deviance without imposing any condition on the within-clusters deviance, so that clusters are isolated but they might be heterogeneous. On the other hand, FKM identifies clusters in a reduced space by minimizing the within-clusters deviance without imposing any condition on the between-clusters deviance. Thus, clusters are homogeneous, but they might not be isolated. The two techniques give different results because the total deviance in the reduced space for the two methodologies is not constant; hence the minimization of the within-clusters deviance is not equivalent to the maximization of the between-clusters deviance. In this paper a modification of the two techniques is introduced to avoid the afore mentioned weaknesses. It is shown that the two modified methods give the same results, thus merging RKM and FKM into a new methodology. It is called Factor Discriminant K-means (FDKM), because it combines Linear Discriminant Analysis and K-means. The paper examines several theoretical properties of FDKM and its performances with a simulation study. An application on real-world data is presented to show the features of FDKM.
The International Journal of Biostatistics | 2008
Francesca Martella; Marco Alfò; Maurizio Vichi
A challenge in microarray data analysis concerns discovering local structures composed by sets of genes that show homogeneous expression patterns across subsets of conditions. We present an extension of the mixture of factor analyzers model (MFA) allowing for simultaneous clustering of genes and conditions. The proposed model is rather flexible since it models the density of high-dimensional data assuming a mixture of Gaussian distributions with a particular omponent-specific covariance structure. Specifically, a binary and row stochastic matrix representing tissue membership is used to cluster tissues (experimental conditions), whereas the traditional mixture approach is used to define the gene clustering. An alternating expectation conditional maximization (AECM) algorithm is proposed for parameter estimation; experiments on simulated and real data show the efficiency of our method as a general approach to biclustering. The Matlab code of the algorithm is available upon request from authors.