Domenico Perrotta | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Domenico Perrotta is active.

Explore More

Publication

Featured researches published by Domenico Perrotta.

Electronic Journal of Statistics | 2014

Monitoring robust regression

Marco Riani; Andrea Cerioli; Anthony C. Atkinson; Domenico Perrotta

Robust methods are little applied (although much studied by statisticians). We monitor very robust regression by looking at the be- haviour of residuals and test statistics as we smoothly change the robustness of parameter estimation from a breakdown point of 50% to non-robust least squares. The resulting procedure provides insight into the structure of the data including outliers and the presence of more than one population. Moni- toring overcomes the hindrances to the routine adoption of robust methods, being informative about the choice between the various robust procedures. Methods tuned to give nominal high efficiency fail with our most compli- cated example. We find that the most informative analyses come from S estimates combined with Tukeys biweight or with the optimalfunctions. For our major example with 1,949 observations and 13 explanatory vari- ables, we combine robust S estimation with regression using the forward search, so obtaining an understanding of the importance of individual obser- vations, which is missing from standard robust procedures. We discover that the data come from two different populations. They also contain six outliers. Our analyses are accompanied by numerous graphs. Algebraic results are contained in two appendices, the second of which provides useful new results on the absolute odd moments of elliptically truncated multivariate normal random variables.

Statistical Science | 2014

A Parametric Framework for the Comparison of Methods of Very Robust Regression

Marco Riani; Anthony C. Atkinson; Domenico Perrotta

There are several methods for obtaining very robust estimates of regression parameters that asymptotically resist 50% of outliers in the data. Differences in the behaviour of these algorithms depend on the distance between the regression data and the outliers. We introduce a parameter

Advanced Data Analysis and Classification | 2014

Robust clustering around regression lines with high density regions

Andrea Cerioli; Domenico Perrotta

\lambda

Computational Statistics & Data Analysis | 2012

Benchmark testing of algorithms for very robust regression: FS, LMS and LTS

Francesca Torti; Domenico Perrotta; Anthony C. Atkinson; Marco Riani

that defines a parametric path in the space of models and enables us to study, in a systematic way, the properties of estimators as the groups of data move from being far apart to close together. We examine, as a function of

Advanced Data Analysis and Classification | 2009

New robust dynamic plots for regression mixture detection

Domenico Perrotta; Marco Riani; Francesca Torti

\lambda

Archive | 2010

Detecting Price Outliers in European Trade Data with the Forward Search

Domenico Perrotta; Francesca Torti

, the variance and squared bias of five estimators and we also consider their power when used in the detection of outliers. This systematic approach provides tools for gaining knowledge and better understanding of the properties of robust estimators.

soft methods in probability and statistics | 2013

Robustness Issues in Text Mining

Marco Turchi; Domenico Perrotta; Marco Riani; Andrea Cerioli

Robust methods are needed to fit regression lines when outliers are present. In a clustering framework, outliers can be extreme observations, high leverage points, but also data points which lie among the groups. Outliers are also of paramount importance in the analysis of international trade data, which motivate our work, because they may provide information about anomalies like fraudulent transactions. In this paper we show that robust techniques can fail when a large proportion of non-contaminated observations fall in a small region, which is a likely occurrence in many international trade data sets. In such instances, the effect of a high-density region is so strong that it can override the benefits of trimming and other robust devices. We propose to solve the problem by sampling a much smaller subset of observations which preserves the cluster structure and retains the main outliers of the original data set. This goal is achieved by defining the retention probability of each point as an inverse function of the estimated density function for the whole data set. We motivate our proposal as a thinning operation on a point pattern generated by different components. We then apply robust clustering methods to the thinned data set for the purposes of classification and outlier detection. We show the advantages of our method both in empirical applications to international trade examples and through a simulation study.

Classification and Data Mining | 2013

Issues on Clustering and Data Gridding

Jukka Heikkonen; Domenico Perrotta; Marco Riani; Francesca Torti

The methods of very robust regression resist up to 50% of outliers. The algorithms for very robust regression rely on selecting numerous subsamples of the data. New algorithms for LMS and LTS estimators that have increased computational efficiency due to improved combinatorial sampling are proposed. These and other publicly available algorithms are compared for outlier detection. Timings and estimator quality are also considered. An algorithm using the forward search (FS) has the best properties for both size and power of the outlier tests.

Statistical Methods and Applications | 2018

Discussion of “The power of monitoring: how to make the most of a contaminated multivariate sample”

Domenico Perrotta; Francesca Torti

The forward search is a powerful general method for detecting multiple masked outliers and for determining their effect on inferences about models fitted to data. From the monitoring of a series of statistics based on subsets of data of increasing size we obtain multiple views of any hidden structure. One of the problems of the forward search has always been the lack of an automatic link among the great variety of plots which are monitored. Usually it happens that a lot of interesting features emerge unexpectedly during the progression of the forward search only when a specific combination of forward plots is inspected at the same time. Thus, the analyst should be able to interact with the plots and redefine or refine the links among them. In the absence of dynamic linking and interaction tools, the analyst risks to miss relevant hidden information. In this paper we fill this gap and provide the user with a set of new robust graphical tools whose power will be demonstrated on several regression problems. Through the analysis of real and simulated data we give a series of examples where dynamic interaction with different “robust plots” is used to highlight the presence of groups of outliers and regression mixtures and appraise the effect that these hidden groups exert on the fitted model.

Journal of Business & Economic Statistics | 2018

Goodness-of-Fit Testing for the Newcomb-Benford Law With Application to the Detection of Customs Fraud

Lucio Barabesi; Andrea Cerasa; Andrea Cerioli; Domenico Perrotta

We describe empirical work in the domain of clustering and outlier detection, for the analysis of European trade data. It is our first attempt to evaluate benefits and limitations of the forward search approach for regression and multivariate analysis Atkinson and Riani (Robust diagnostic regression analysis, Springer, 2000), Atkinson et al. (Exploring multivariate data with the forward search, Springer, 2004), within a concrete application scenario and in relation to a comparable backward method developed in the JRC by Arsenis et al. (Price outliers in eu external trade data, Enlargement and Integration Workshop 2005, 2005). Our findings suggest that the automatic clustering based on Mahalanobis distances may be inappropriate in presence of a high-density area in the dataset. Follow up work is discussed extensively in Riani et al. (Fitting mixtures of regression lines with the forward search, Mining massive data sets for security, IOS, 2008).

Explore More