Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Alfonso Gordaliza is active.

Publication


Featured researches published by Alfonso Gordaliza.


Annals of Statistics | 2008

A general trimming approach to robust cluster Analysis

Luis Angel García-Escudero; Alfonso Gordaliza; Carlos Matrán; Agustín Mayo-Iscar

We introduce a new method for performing clustering with the aim of fitting clusters with different scatters and weights. It is de- signed by allowing to handle a proportionof contaminating data to guarantee the robustness of the method. As a characteristic fea- ture, restrictions on the ratio between the maximum and the mini- mum eigenvalues of the groups scatter matrices are introduced. This makes the problem to be well defined and guarantees the consistency of the sample solutions to the population ones. The method covers a wide range of clustering approaches depend- ing on the strength of the chosen restrictions. Our proposal includes an algorithm for approximately solving the sample problem.


Journal of the American Statistical Association | 1999

Robustness Properties of k Means and Trimmed k Means

Luis Angel García-Escudero; Alfonso Gordaliza

Abstract The generalized k means method is based on the minimization of the discrepancy between a random variable (or a sample of this random variable) and a set with k points measured through a penalty function Φ. As in the M estimators setting (k = 1), a penalty function, Φ, with unbounded derivative, Ψ, naturally leads to nonrobust generalized k means. However, surprisingly the lack of robustness extends also to the case of bounded Ψ; that is, generalized k means do not inherit the robustness properties of the M estimator from which they came. Attempting to robustify the generalized k means method, the generalized trimmed k means method arises from combining k means idea with a so-called impartial trimming procedure. In this article study generalized k means and generalized trimmed k means performance from the viewpoint of Hampels robustness criteria; that is, we investigate the influence function, breakdown point, and qualitative robustness, confirming the superiority provided by the trimming. We inc...


Advanced Data Analysis and Classification | 2010

A review of robust clustering methods

Luis Angel García-Escudero; Alfonso Gordaliza; Carlos Matrán; Agustín Mayo-Iscar

Deviations from theoretical assumptions together with the presence of certain amount of outlying observations are common in many practical statistical applications. This is also the case when applying Cluster Analysis methods, where those troubles could lead to unsatisfactory clustering results. Robust Clustering methods are aimed at avoiding these unsatisfactory results. Moreover, there exist certain connections between robust procedures and Cluster Analysis that make Robust Clustering an appealing unifying framework. A review of different robust clustering approaches in the literature is presented. Special attention is paid to methods based on trimming which try to discard most outlying data when carrying out the clustering process.


Journal of Classification | 2005

A Proposal for Robust Curve Clustering

Luis Angel García-Escudero; Alfonso Gordaliza

Functional data sets appear in many areas of science. Although each data point may be seen as a large finite-dimensional vector it is preferable to think of them as functions, and many classical multivariate techniques have been generalized for this kind of data. A widely used technique for dealing with functional data is to choose a finite-dimensional basis and find the best projection of each curve onto this basis. Therefore, given a functional basis, an approach for doing curve clustering relies on applying the k-means methodology to the fitted basis coefficients corresponding to all the curves in the data set. Unfortunately, a serious drawback follows from the lack of robustness of k-means. Trimmed k-means clustering (Cuesta-Albertos, Gordaliza, and Matran 1997) provides a robust alternative to the use of k-means and, consequently, it may be successfully used in this functional framework. The proposed approach will be exemplified by considering cubic B-splines bases, but other bases can be applied analogously depending on the application at hand.


Journal of Computational and Graphical Statistics | 2003

Trimming Tools in Exploratory Data Analysis

Luis Angel García-Escudero; Alfonso Gordaliza; Carlos Matrán

Exploratory graphical tools based on trimming are proposed for detecting main clusters in a given dataset. The trimming is obtained by resorting to trimmed k-means methodology. The analysis always reduces to the examination of real valued curves, even in the multivariate case. As the technique is based on a robust clustering criterium, it is able to handle the presence of different kinds of outliers. An algorithm is proposed to carry out this (computer intensive) method. As with classical k-means, the method is specially oriented to mixtures of spherical distributions. A possible generalization is outlined to overcome this drawback.


Journal of Approximation Theory | 1991

Best approximations to random variables based on trimming procedures

Alfonso Gordaliza

Abstract Let X be a R n -valued random variable; for a class of suitable nondecreasing functions Φ: R + → R + and α ϵ (0, 1), a family of best approximations to X based on trimming procedures is obtained. Existence and a characterization which relates the best approximations and the best trimming sets are obtained. The problem of uniqueness is studied for real valued random variables.


Test | 1999

Multivariate L-estimation

Ricardo Fraiman; Jean Meloche; Luis Angel García-Escudero; Alfonso Gordaliza; Xuming He; Ricardo A. Maronna; Victor J. Yohai; Simon J. Sheather; Joseph W. McKean; Christopher G. Small; Andrew T. A. Wood

In one dimension, order statistics and ranks are widely used because they form a basis for distribution free tests and some robust estimation procedures. In more than one dimension, the concept of order statistics and ranks is not clear and several definitions have been proposed in the last years. The proposed definitions are based on different concepts of depth. In this paper, we define a new notion of order statistics and ranks for multivariate data based on density estimation. The resulting ranks are invariant under affinc transformations and asymptotically distribution free. We use the corresponding order statistics to define a class of multivariate estimators of location that can be regarded as multivariate L-estimators. Under mild assumptions on the underlying distribution, we show the asymptotic normality of the estimators. A modification of the proposed estimates results in a high breakdown point procedure that can deal with patches of outliers. The main idea is to order the observations according to their likelihoodf(X1),...,f(Xn). If the densityf happens to be cllipsoidal, the above ranking is similar to the rankings that are derived from the various notions of depth. We propose to define a ranking based on a kernel estimate of the densityf. One advantage of estimating the likelihoods is that the underlying distribution does not need to have a density. In addition, because the approximate likelihoods are only used to rank the observations, they can be derived from a density estimate using a fixed bandwidth. This fixed bandwidth overcomes the curse of dimensionality that typically plagues density estimation in high dimension.


Statistics and Computing | 2011

Exploring the number of groups in robust model-based clustering

Luis Angel García-Escudero; Alfonso Gordaliza; Carlos Matrán; Agustín Mayo-Iscar

Two key questions in Clustering problems are how to determine the number of groups properly and measure the strength of group-assignments. These questions are specially involved when the presence of certain fraction of outlying data is also expected.Any answer to these two key questions should depend on the assumed probabilistic-model, the allowed group scatters and what we understand by noise. With this in mind, some exploratory “trimming-based” tools are presented in this work together with their justifications. The monitoring of optimal values reached when solving a robust clustering criteria and the use of some “discriminant” factors are the basis for these exploratory tools.


Advanced Data Analysis and Classification | 2014

A constrained robust proposal for mixture modeling avoiding spurious solutions

Luis Angel García-Escudero; Alfonso Gordaliza; Agustín Mayo-Iscar

The high prevalence of spurious solutions and the disturbing effect of outlying observations in mixture modeling are well known problems that pose serious difficulties for non-expert practitioners of this kind of models in different applied areas. An approach which combines the use of Trimmed Maximum Likelihood ideas and the imposition of restrictions on the maximization problem will be presented and studied in this paper. The proposed methodology is shown to have nice mathematical properties as well as good performance in avoiding the appearance of spurious solutions in a quite automatic manner.


Journal of the American Statistical Association | 2005

Generalized Radius Processes for Elliptically Contoured Distributions

Luis Angel García-Escudero; Alfonso Gordaliza

The use of Mahalanobis distances has a long history in statistics. Given a sample of size n and general location and scatter estimators, mn and Σn, we can define “generalized” radii as . If we wish to trim observations based on the estimators mn and Σn, then it is natural to first remove the most remote ones (i.e., those with the largest s). With this in mind, we define a process that maps the trimming proportion, α in (0, 1], to the generalized radius of the observation that has just been removed by this level of trimming. We analyze the asymptotic behavior of this process for elliptically contoured distributions. We show that the limit law depends only on the elliptical family considered and how Σn serves to estimate the underlying “scale” factor through its determinant. We carry out Monte Carlo simulations for finite sample sizes, and outline an application for assessing fit to a fixed elliptical family and also for the case where a proportion of outlying observations is discarded.

Collaboration


Dive into the Alfonso Gordaliza's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Carlos Matrán

University of Valladolid

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Christophe Croux

Katholieke Universiteit Leuven

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge