Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Giovanna Menardi is active.

Publication


Featured researches published by Giovanna Menardi.


Data Mining and Knowledge Discovery | 2014

Training and assessing classification rules with imbalanced data

Giovanna Menardi; Nicola Torelli

The problem of modeling binary responses by using cross-sectional data has been addressed with a number of satisfying solutions that draw on both parametric and nonparametric methods. However, there exist many real situations where one of the two responses (usually the most interesting for the analysis) is rare. It has been largely reported that this class imbalance heavily compromises the process of learning, because the model tends to focus on the prevalent class and to ignore the rare events. However, not only the estimation of the classification model is affected by a skewed distribution of the classes, but also the evaluation of its accuracy is jeopardized, because the scarcity of data leads to poor estimates of the model’s accuracy. In this work, the effects of class imbalance on model training and model assessing are discussed. Moreover, a unified and systematic framework for dealing with the problem of imbalanced classification is proposed, based on a smoothed bootstrap re-sampling technique. The proposed technique is founded on a sound theoretical basis and an extensive empirical study shows that it outperforms the main other remedies to face imbalanced learning problems.


Statistics and Computing | 2014

An advancement in clustering via nonparametric density estimation

Giovanna Menardi; Adelchi Azzalini

Density-based clustering methods hinge on the idea of associating groups to the connected components of the level sets of the density underlying the data, to be estimated by a nonparametric method. These methods claim some desirable properties and generally good performance, but they involve a non-trivial computational effort, required for the identification of the connected regions. In a previous work, the use of spatial tessellation such as the Delaunay triangulation has been proposed, because it suitably generalizes the univariate procedure for detecting the connected components. However, its computational complexity grows exponentially with the dimensionality of data, thus making the triangulation unfeasible for high dimensions. Our aim is to overcome the limitations of Delaunay triangulation. We discuss the use of an alternative procedure for identifying the connected regions associated to the level sets of the density. By measuring the extent of possible valleys of the density along the segment connecting pairs of observations, the proposed procedure shifts the formulation from a space with arbitrary dimension to a univariate one, thus leading benefits both in computation and visualization.


Statistics and Computing | 2011

Density-based Silhouette diagnostics for clustering methods

Giovanna Menardi

Silhouette information evaluates the quality of the partition detected by a clustering technique. Since it is based on a measure of distance between the clustered observations, its standard formulation is not adequate when a density-based clustering technique is used. In this work we propose a suitable modification of the Silhouette information aimed at evaluating the quality of clusters in a density-based framework. It is based on the estimation of the data posterior probabilities of belonging to the clusters and may be used to measure our confidence about data allocation to the clusters as well as to choose the best partition among different ones.


Archive | 2010

Preserving the Clustering Structure by a Projection Pursuit Approach

Giovanna Menardi; Nicola Torelli

A projection pursuit technique to reduce the dimensionality of a data set preserving the clustering structure is proposed. It is based on Silverman’s (J R Stat Soc B 43:97–99, 1981) critical bandwidth. We show that critical bandwidth is scale equivariant and this property allows us to keep affine invariance of the projection pursuit solution.


Journal of Statistical Computation and Simulation | 2013

Reducing data dimension for cluster detection

Giovanna Menardi; Nicola Torelli

Clustering high-dimensional data is often a challenging task both because of the computational burden required to run any technique, and because the difficulty in interpreting clusters generally increases with the data dimension. In this work, a method for finding low-dimensional representations of high-dimensional data is discussed, specifically conceived to preserve possible clusters in data. It is based on the critical bandwidth, a nonparametric statistic to test unimodality, related to kernel density estimation. Some useful properties of the aforementioned statistic are enlightened and an adjustment to use it as a basis for reducing dimensionality is suggested. The method is illustrated by simulated and real data examples.


Proceedings of The European Physical Society Conference on High Energy Physics — PoS(EPS-HEP2017) | 2017

Hemisphere Mixing: A Fully Data-Driven Model Of QCD Multijet Backgrounds For LHC Searches

Martino Dall'Osso; Pablo de Castro Manzano; Tommaso Dorigo; Livio Finos; Grzegorz Kotkowski; Giovanna Menardi; Bruno Scarpa

A novel method is proposed here to precisely model the multi-dimensional features of QCD multi-jet events in hadron collisions. The method relies on the schematization of high-pT QCD processes as 2->2 reactions made complex by sub-leading effects. The construction of libraries of hemispheres from experimental data and the definition of a suitable nearest-neighbor-based association map allow for the generation of artificial events that reproduce with surprising accuracy the kinematics of the QCD component of original data, while remaining insensitive to small signal contaminations. The method is succinctly described and its performance is tested in the case of the search for the hh->bbbb process at the LHC.


Archive | 2013

Multidimensional Connected Set Detection in Clustering Based on Nonparametric Density Estimation

Giovanna Menardi

Clustering methods based on nonparametric density estimation hinge on the idea of identifying groups with the level sets of the probability distribution underlying data. Any section of such distribution, at a given threshold, identifies a level set, being the region with density greater than the threshold. The aim is to find the maximum connected components of this region, as the threshold varies. In this way, a hierarchical structure of the number of groups for each threshold is created.In multidimensional spaces, identification of the connected sets is nontrivial. The use of spatial tessellation such as the Delaunay triangulation has been successfully adopted to this aim but its computational complexity is too high for large dimensions. We discuss the use of an alternative procedure for identifying the connected regions associated with the level sets of a density function. The proposed procedure claims a computational complexity which depends only mildly on the data dimension, thus overcoming the main limitations of the spatial tessellation. The main idea behind this contribution is to emulate the unidimensional procedure to identify connected sets. The method is illustrated with some numerical examples.


Archive | 2011

On the Use of Boosting Procedures to Predict the Risk of Default

Giovanna Menardi; Federico Tedeschi; Nicola Torelli

Statistical models have been widely applied with the aim of evaluating the risk of default of enterprises. However, a typical problem is that the occurrence of the default event is rare, and this class imbalance strongly affects the performance of traditional classifiers. Boosting is a general class of methods which iteratively enforces the accuracy of any weak learner, but it suffers from some drawbacks in presence of unbalanced classes. Performance of standard boosting procedures to deal with unbalanced classes is discussed and a new algorithm is proposed.


Journal of Statistical Software | 2014

Clustering via Nonparametric Density Estimation: The R Package pdfCluster

Adelchi Azzalini; Giovanna Menardi


R Journal | 2014

ROSE: a Package for Binary Imbalanced Learning

Nicola Lunardon; Giovanna Menardi; Nicola Torelli

Collaboration


Dive into the Giovanna Menardi's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge