Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Paola Zuccolotto is active.

Publication


Featured researches published by Paola Zuccolotto.


Journal of Computational and Graphical Statistics | 2008

A Bias Correction Algorithm for the Gini Variable Importance Measure in Classification Trees

Marco Sandri; Paola Zuccolotto

This article considers a measure of variable importance frequently used in variable-selection methods based on decision trees and tree-based ensemble models. These models include CART, random forests, and gradient boosting machine. The measure of variable importance is defined as the total heterogeneity reduction produced by a given covariate on the response variable when the sample space is recursively partitioned. Despite its popularity, some authors have shown that this measure is biased to the extent that, under certain conditions, there may be dangerous effects on variable selection. Here we present a simple and effective method for bias correction, focusing on the easily generalizable case of the Gini index as a measure of heterogeneity.


Advanced Data Analysis and Classification | 2012

Sensory analysis in the food industry as a tool for marketing decisions

Maria Iannario; Marica Manisera; Domenico Piccolo; Paola Zuccolotto

In the food industry, sensory analysis can be useful to direct marketing decisions concerning not only products, for example product positioning with respect to competitors, but also market segmentation, customer relationship management, advertising strategies and price policies. In this paper we show how interesting information useful for marketing management can be obtained by combining the results from cub models and algorithmic data mining techniques (specifically, variable importance measurements from Random Forest). A case study on sensory evaluation of different varieties of Italian espresso is presented.


Advanced Data Analysis and Classification | 2011

A tail dependence-based dissimilarity measure for financial time series clustering

Giovanni De Luca; Paola Zuccolotto

In this paper we propose a clustering procedure aimed at grouping time series with an association between extremely low values, measured by the lower tail dependence coefficient. Firstly, we estimate the coefficient using an Archimedean copula function. Then, we propose a dissimilarity measure based on tail dependence coefficients and a two-step procedure to be used with clustering algorithms which require that the objects we want to cluster have a geometric interpretation. We show how the results of the clustering applied to financial returns could be used to construct defensive portfolios reducing the effect of a simultaneous financial crisis.


Statistics and Computing | 2010

Analysis and correction of bias in Total Decrease in Node Impurity measures for tree-based algorithms

Marco Sandri; Paola Zuccolotto

Variable selection is one of the main problems faced by data mining and machine learning techniques. These techniques are often, more or less explicitly, based on some measure of variable importance. This paper considers Total Decrease in Node Impurity (TDNI) measures, a popular class of variable importance measures defined in the field of decision trees and tree-based ensemble methods, like Random Forests and Gradient Boosting Machines. In spite of their wide use, some measures of this class are known to be biased and some correction strategies have been proposed. The aim of this paper is twofold. Firstly, to investigate the source and the characteristics of bias in TDNI measures using the notions of informative and uninformative splits. Secondly, a bias-correction algorithm, recently proposed for the Gini measure in the context of classification, is extended to the entire class of TDNI measures and its performance is investigated in the regression framework using simulated and real data.


CLADAG 2005 | 2006

Variable Selection Using Random Forests

Marco Sandri; Paola Zuccolotto

One of the main topic in the development of predictive models is the identification of variables which are predictors of a given outcome. Automated model selection methods, such as backward or forward stepwise regression, are classical solutions to this problem, but are generally based on strong assumptions about the functional form of the model or the distribution of residuals. In this pa-per an alternative selection method, based on the technique of Random Forests, is proposed in the context of classification, with an application to a real dataset.


Pattern Recognition Letters | 2014

Modeling “don’t know” responses in rating scales ☆

Marica Manisera; Paola Zuccolotto

Abstract We propose a probabilistic framework for the treatment of “don’t know” responses in surveys aimed at investigating human perceptions through expressed ratings. The rationale behind the proposal is that “don’t know” is a valid response to all extents because it informs about a specific state of mind of the respondent, and therefore, it is not correct to treat it as a missing value, as it is usually treated. The actual insightfulness of the proposed model depends on the chosen probability distributions. The required assumptions of these distributions first pertain to the expressed ratings and then to the state of mind of “don’t know” respondents toward the ratings. Regarding the former, we worked in the CUB model framework, while for the latter, we proposed using the Uniform distribution for formal and empirical reasons. We show that these two choices provide a solution that is both tractable and easy to interpret, where “don’t know” responses can be taken into account by simply adjusting one parameter in the model.


Computational Statistics & Data Analysis | 2006

Regime-switching Pareto distributions for ACD models

Giovanni De Luca; Paola Zuccolotto

Refinements have been proposed for the autoregressive conditional duration model within the framework of financial durations. It is argued that a Pareto distribution is a meaningful representation for durations. The model is analyzed under the hypothesis of regime-switching parameters with different transition functions governed both by an observable and a latent variable.


Computational Statistics & Data Analysis | 2014

Modeling rating data with Nonlinear CUB models

Marica Manisera; Paola Zuccolotto

A general statistical model for ordinal or rating data, which includes some existing approaches as special cases, is proposed. The focus is on the CUB models and a new class of models, called Nonlinear CUB, which generalize CUB. In the framework of the Nonlinear CUB models, it is possible to express a transition probability, i.e. the probability of increasing one rating point at a given step of the decision process. Transition probabilities and the related transition plots are able to describe the state of mind of the respondents about the response scale used to express judgments. Unlike classical CUB, the Nonlinear CUB models are able to model decision processes with non-constant transition probabilities.


Statistical Methods and Applications | 2007

Principal components of sample estimates: an approach through symbolic data analysis

Paola Zuccolotto

This paper deals with the analysis of datasets, where the subjects are described by the estimated means of a p-dimensional variable. Classical statistical methods of data analysis do not treat measurements affected by intrinsic variability, as in the case of estimates, so that the heterogeneity induced among subjects by this condition is not taken into account. In this paper a way to solve the problem is suggested in the context of symbolic data analysis, whose specific aim is to handle data tables where single valued measurements are substituted by complex data structures like frequency distributions, intervals, and sets of values. A principal component analysis is carried out according to this proposal, with a significant improvement in the treatment of information.


Journal of Multivariate Analysis | 2015

Identifiability of a model for discrete frequency distributions with a multidimensional parameter space

Marica Manisera; Paola Zuccolotto

This paper is concerned with the identifiability of models depending on a multidimensional parameter vector, aimed at fitting a probability distribution to discrete observed data, with a special focus on a recently proposed mixture model. Starting from the necessary and sufficient condition derived by the definition of identifiability, we describe a general method to verify whether a specific model is identifiable or not. This procedure is then applied to investigate the identifiability of a recently proposed mixture model for rating data, Nonlinear CUB, which is an extension of a class of mixture models called CUB (Combination of Uniform and Binomial). Formal proofs and a numerical study show that some sufficient conditions for identifiability of Nonlinear CUB are always satisfied, provided that in the estimation procedure one quantity is fixed at a relatively small value.

Collaboration


Dive into the Paola Zuccolotto's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Giovanni De Luca

University of Naples Federico II

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Maria Iannario

University of Naples Federico II

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Giorgia Rivieccio

University of Naples Federico II

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Angelica Grasso

Fondazione IRCCS Ca' Granda Ospedale Maggiore Policlinico

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge