Stefano Antonio Gattone

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Stefano Antonio Gattone is active.

Explore More

Publication

Featured researches published by Stefano Antonio Gattone.

Journal of Classification | 2011

A New Dimension Reduction Method: Factor Discriminant K -means

Roberto Rocci; Stefano Antonio Gattone; Maurizio Vichi

Reduced K-means (RKM) and Factorial K-means (FKM) are two data reduction techniques incorporating principal component analysis and K-means into a unified methodology to obtain a reduced set of components for variables and an optimal partition for objects. RKM finds clusters in a reduced space by maximizing the between-clusters deviance without imposing any condition on the within-clusters deviance, so that clusters are isolated but they might be heterogeneous. On the other hand, FKM identifies clusters in a reduced space by minimizing the within-clusters deviance without imposing any condition on the between-clusters deviance. Thus, clusters are homogeneous, but they might not be isolated. The two techniques give different results because the total deviance in the reduced space for the two methodologies is not constant; hence the minimization of the within-clusters deviance is not equivalent to the maximization of the between-clusters deviance. In this paper a modification of the two techniques is introduced to avoid the afore mentioned weaknesses. It is shown that the two modified methods give the same results, thus merging RKM and FKM into a new methodology. It is called Factor Discriminant K-means (FDKM), because it combines Linear Discriminant Analysis and K-means. The paper examines several theoretical properties of FDKM and its performances with a simulation study. An application on real-world data is presented to show the features of FDKM.

Statistical Methods and Applications | 2011

Adaptive cluster sampling with a data driven stopping rule

Stefano Antonio Gattone; Tonio Di Battista

The adaptive cluster sampling (ACS) is a suitable sampling design for rare and clustered populations. In environmental and ecological applications, biological populations are generally animals or plants with highly patchy spatial distribution. However, ACS would be a less efficient design when the study population is not rare with low aggregation since the final sample size could be easily out of control. In this paper, a new variant of ACS is proposed in order to improve the performance (in term of precision and cost) of ACS versus simple random sampling (SRS). The idea is to detect the optimal sample size by means of a data-driven stopping rule in order to determine when to stop the adaptive procedure. By introducing a stopping rule the theoretical basis of ACS are not respected and the behaviour of the ordinary estimators used in ACS is explored by using Monte Carlo simulations. Results show that the proposed variant of ACS allows to control the effective sample size and to prevent from excessive efficiency loss typical of ACS when the population is less clustered than anticipated. The proposed strategy may be recommended especially when no prior information about the population structure is available as it does not require a prior knowledge of the degree of rarity and clustering of the population of interest.

Environmental and Ecological Statistics | 2004

Multivariate bootstrap confidence regions for abundance vector using

Tonio Di Battista; Stefano Antonio Gattone

Abundance vector estimation is a well investigated problem in statistical ecology. The use of simple random sampling with replacement or replicated sampling ensures good asymptotic properties of the abundance vector estimators. However, real surveys are based on small sample sizes, and assuming any specific distribution of the abundance vector estimator may be hazardous.In this paper we focus our attention on situations where the population is not too large and the sample size is small. We propose bootstrap multivariate confidence regions based on data depth. Data depth is a geometrical concept of ordering data from the center outwardly in higher dimensions. The Simplicial depth, the Tukeys depth and the Mahalanobis depth are presented. In order to build confidence regions in the presence of a skewed distribution of the abundance vector estimator, the use of Tukeys depth is suggested. The proposed method has been applied to the benthic community of Lake Lesina. A comparison with Mahalanobis depth and standard existing methods is reported.

Journal of Computational and Graphical Statistics | 2012

Clustering Curves on a Reduced Subspace

Stefano Antonio Gattone; Roberto Rocci

The aim of this article is to propose a procedure to cluster functional observations in a subspace of reduced dimension. The dimensional reduction is obtained by constraining the cluster centroids to lie into a subspace which preserves the maximum amount of discriminative information contained in the original data. The model is estimated by using penalized least squares to take into account the functional nature of the data. The smoothing is carried out within the clustering and its amount is adaptively calibrated. A simulation study shows how the combination of these two elements, feature-extraction and automatic data-driven smoothing, improves the performance of clustering by reducing irrelevant and redundant information in the data. The effectiveness of the proposal is demonstrated by an application to a real dataset regarding a speech recognition problem. Implementation details of the algorithm together with a computer code are available in the online supplements.

Environmental and Ecological Statistics | 2016

Adaptive cluster sampling with clusters selected without replacement and stopping rule

Stefano Antonio Gattone; Esha Mohamed; Tonio Di Battista

Adaptive cluster sampling (ACS) has received much attention in recent years since it yields more precise estimates than conventional sampling designs when applied to rare and clustered populations. These results, however, are impacted by the availability of some prior knowledge about the spatial distribution and the absolute abundance of the population under study. This prior information helps the researcher to select a suitable critical value that triggers the adaptive search, the neighborhood definition and the initial sample size. A bad setting of the ACS design would worsen the performance of the adaptive estimators. In particular, one of the greatest weaknesses in ACS is the inability to control the final sampling effort if, for example, the critical value is set too low. To overcome this drawback one can introduce ACS with clusters selected without replacement where one can fix in advance the number of distinct clusters to be selected or ACS with a stopping rule which stops the adaptive sampling when a predetermined sample size limit is reached or when a given stopping rule is verified. However, the stopping rule breaks down the theoretical basis for the unbiasedness of the ACS estimators introducing an unknown amount of bias in the estimates. The current study improves the performance of ACS when applied to patchy and clustered but not rare populations and/or less clustered populations. This is done by combining the stopping rule with ACS without replacement of clusters so as to further limit the sampling effort in form of traveling expenses by avoiding repeat observations and by reducing the final sample size. The performance of the proposed design is investigated using simulated and real data.

Archive | 2011

Dealing with FDA Estimation Methods

Tonio Di Battista; Stefano Antonio Gattone; Angela De Sanctis

In many different research fields, such as medicine, physics, economics, etc., the evaluation of real phenomena observed at each statistical unit is described by a curve or an assigned function. In this framework, a suitable statistical approach is Functional Data Analysis based on the use of basis functions. An alternative method, using Functional Analysis tools, is considered in order to estimate functional statistics. Assuming a parametric family of functional data, the problem of computing summary statistics of the same parametric form when the set of all functions having that parametric form does not constitute a linear space is investigated. The central idea is to make statistics on the parameters instead of on the functions themselves.

Journal of Classification | 2011

Heterogeneity Measures in Customer Satisfaction Analysis

Pasquale Valentini; Tonio Di Battista; Stefano Antonio Gattone

In this paper we deal with the problem of identifying a valid way to characterize heterogeneity in the analysis of customer satisfaction observing the phenomenon through a new perspective. In the literature, the variability of a Customer Satisfaction index is measured by the standard deviation or the coefficient of variation. In this way, heterogeneity among customers may be masked. To overcome this drawback, we provide a new approach to the construction of a multi-dimensional measure of heterogeneity of the Customer Satisfaction index not depending on the choice of a particular heterogeneity index. The approach is based on heterogeneity profiles which lead to a more detailed description of heterogeneity than alternative measures. Moreover, a latent class model is used for classifying individuals into distinct groups based on responses to a set of items. Once groups are formed, Customer Satisfaction researchers can make conclusions about the level of satisfaction and the characteristics of groups in terms of heterogeneity.

Advanced Data Analysis and Classification | 2018

A data driven equivariant approach to constrained Gaussian mixture modeling

Roberto Rocci; Stefano Antonio Gattone; Roberto Di Mari

Maximum likelihood estimation of Gaussian mixture models with different class-specific covariance matrices is known to be problematic. This is due to the unboundedness of the likelihood, together with the presence of spurious maximizers. Existing methods to bypass this obstacle are based on the fact that unboundedness is avoided if the eigenvalues of the covariance matrices are bounded away from zero. This can be done imposing some constraints on the covariance matrices, i.e. by incorporating a priori information on the covariance structure of the mixture components. The present work introduces a constrained approach, where the class conditional covariance matrices are shrunk towards a pre-specified target matrix

international symposium on distributed computing | 2017