José E. Chacón
University of Extremadura
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by José E. Chacón.
Electronic Journal of Statistics | 2013
José E. Chacón; Tarn Duong
Important information concerning a multivariate data set, such as clusters and modal regions, is contained in the derivatives of the probability density function. Despite this importance, nonparametric estimation of higher order derivatives of the density functions have received only relatively scant attention. Kernel estimators of density functions are widely used as they exhibit excellent theoretical and practical properties, though their generalization to density derivatives has progressed more slowly due to the mathematical intractabilities encountered in the crucial problem of bandwidth (or smoothing parameter) selection. This paper presents the first fully automatic, data-based bandwidth selectors for multivariate kernel density derivative estimators. This is achieved by synthesizing recent advances in matrix analytic theory which allow mathematically and computationally tractable representations of higher order derivatives of multivariate vector valued functions. The theoretical asymptotic properties as well as the finite sample behaviour of the proposed selectors are studied. {In addition, we explore in detail the applications of the new data-driven methods for two other statistical problems: clustering and bump hunting. The introduced techniques are combined with the mean shift algorithm to develop novel automatic, nonparametric clustering procedures which are shown to outperform mixture-model cluster analysis and other recent nonparametric approaches in practice. Furthermore, the advantage of the use of smoothing parameters designed for density derivative estimation for feature significance analysis for bump hunting is illustrated with a real data example.
Statistical Science | 2015
José E. Chacón
Despite its popularity, it is widely recognized that the investigation of some theoretical aspects of clustering has been relatively sparse. One of the main reasons for this lack of theoretical results is surely the fact that, whereas for other statistical problems the theoretical population goal is clearly defined (as in regression or classification), for some of the clustering methodologies it is difficult to specify the population goal to which the data-based clustering algorithms should try to get close. This paper aims to provide some insight into the theoretical foundations of clustering by focusing on two main objectives: to provide an explicit formulation for the ideal population goal of the modal clustering methodology, which understands clusters as regions of high density; and to present two new loss functions, applicable in fact to any clustering methodology, to evaluate the performance of a data-based clustering algorithm with respect to the ideal population goal. In particular, it is shown that only mild conditions on a sequence of density estimators are needed to ensure that the sequence of modal clusterings that they induce is consistent.
Journal of Nonparametric Statistics | 2007
José E. Chacón; J. Montanero; Agustín García Nogales
In the context of kernel density estimation, we give a characterization of the kernels for which the parametric mean integrated squared error (MISE) rate n −1 may be obtained, where n is the sample size. Also, for the cases where this rate is attainable, we give an asymptotic bandwidth choice that makes the kernel estimator consistent in mean integrated squared error at that rate and a numerical example showing the superior performance of the superkernel estimator when the bandwidth is properly chosen. †Research supported by Spanish Ministerio de Ciencia y Tecnología project MTM2005-06348.
Computers & Geosciences | 2011
José E. Chacón; G. Mateu-Figueras; Josep A. Martín-Fernández
Common simplifications of the bandwidth matrix cannot be applied to existing kernels for density estimation with compositional data. In this paper, kernel density estimation methods are modified on the basis of recent developments in compositional data analysis and bandwidth matrix selection theory. The isometric log-ratio normal kernel is used to define a new estimator in which the smoothing parameter is chosen from the most general class of bandwidth matrices on the basis of a recently proposed plug-in algorithm. Both simulated and real examples are presented in which the behaviour of our approach is illustrated, which shows the advantage of the new estimator over existing proposed methods.
Statistics and Computing | 2015
José E. Chacón; Tarn Duong
Many developments in Mathematics involve the computation of higher order derivatives of Gaussian density functions. The analysis of univariate Gaussian random variables is a well-established field whereas the analysis of their multivariate counterparts consists of a body of results which are more dispersed. These latter results generally fall into two main categories: theoretical expressions which reveal the deep structure of the problem, or computational algorithms which can mask the connections with closely related problems. In this paper, we unify existing results and develop new results in a framework which is both conceptually cogent and computationally efficient. We focus on the underlying connections between higher order derivatives of Gaussian density functions, the expected value of products of quadratic forms in Gaussian random variables, and
Communications in Statistics-theory and Methods | 2013
José E. Chacón; Carlos Tenreiro
Communications in Statistics-theory and Methods | 2007
José E. Chacón; J. Montanero; Agustín García Nogales; P. Pérez
V
Journal of Nonparametric Statistics | 2009
José E. Chacón; J. Montanero; Agustín García Nogales; P. Pérez
Test | 2010
José E. Chacón; T. Duong
V-statistics of degree two based on Gaussian density functions. These three sets of results are combined into an analysis of non-parametric data smoothers.
Canadian Journal of Statistics-revue Canadienne De Statistique | 2009
José E. Chacón
The choice of the bandwidth is a crucial issue for kernel density estimation. Among all the data-dependent methods for choosing the bandwidth, the direct plug-in method has shown a particularly good performance in practice. This procedure is based on estimating an asymptotic approximation of the optimal bandwidth, using two “pilot” kernel estimation stages. Although two pilot stages seem to be enough for most densities, for a long time the problem of how to choose an appropriate number of stages has remained open. Here we propose an automatic (i.e., data-based) method for choosing the number of stages to be employed in the plug-in bandwidth selector. Asymptotic properties of the method are presented and an extensive simulation study is carried out to compare its small-sample performance with that of the most recommended bandwidth selectors in the literature.