Is this you? Create Your Porfile

Silvia Cateni

Sant'Anna School of Advanced Studies

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Silvia Cateni is active.

Explore More

Publication

Featured researches published by Silvia Cateni.

Neurocomputing | 2014

A method for resampling imbalanced datasets in binary classification tasks for real-world problems

Silvia Cateni; Valentina Colla; Marco Vannucci

The paper presents a novel resampling method for binary classification problems on imbalanced datasets. Imbalanced datasets are frequently found in many industrial applications: for instance, the occurrence of particular product defects, the diagnosis of severe diseases in a series of patients or machine faults are rare events whose detection is of utmost importance. In this paper a new resampling method is proposed combining an oversampling and an undersampling technique. Several tests have been developed aiming at assessing the efficiency of the proposed method. Four classifiers based, respectively, on Support Vector Machine, Decision Tree, labelled Self-Organizing Map and Bayesian Classifiers have been developed and applied for binary classification on the following four datasets: a synthetic dataset, a widely used public dataset and two datasets coming from industrial applications. The results that have been obtained in the tests are presented and discussed in the paper; in particular, the performances that are achieved by the four classifiers through the proposed novel resampling approach have been compared to the ones that are obtained, without any resampling, through a widely applied and well known resampling technique, i.e. the classical SMOTE approach, and through another approach coupling informed SMOTE-based oversampling and informed clustering-based undersampling.

international conference on robotics and automation | 2008

Outlier Detection Methods for Industrial Applications

Silvia Cateni; Valentina Colla; Marco Vannucci

An outlier is an observation (or measurement) that is different with respect to the other values contained in a given dataset. Outliers can be due to several causes. The measurement can be incorrectly observed, recorded or entered into the process computer, the observed datum can come from a different population with respect to the normal situation and thus is correctly measured but represents a rare event. In literature different definitions of outlier exist: the most commonly referred are reported in the following: - “An outlier is an observation that deviates so much from other observations as to arouse suspicions that is was generated by a different mechanism “ (Hawkins, 1980). - “An outlier is an observation (or subset of observations) which appear to be inconsistent with the remainder of the dataset” (Barnet & Lewis, 1994). - “An outlier is an observation that lies outside the overall pattern of a distribution” (Moore and McCabe, 1999). - “Outliers are those data records that do not follow any pattern in an application” (Chen and al., 2002). - “An outlier in a set of data is an observation or a point that is considerably dissimilar or inconsistent with the remainder of the data” (Ramasmawy at al., 2000). Many data mining algorithms try to minimize the influence of outliers for instance on a final model to develop, or to eliminate them in the data pre-processing phase. However, a data miner should be careful when automatically detecting and eliminating outliers because, if the data are correct, their elimination can cause the loss of important hidden information (Kantardzic, 2003). Some data mining applications are focused on outlier detection and they are the essential result of a data-analysis (Sane & Ghatol, 2006). The outlier detection techniques find applications in credit card fraud, network robustness analysis, network intrusion detection, financial applications and marketing (Han & Kamber, 2001). A more exhaustive list of applications that exploit outlier detection is provided below (Hodge, 2004): - Fraud detection: fraudulent applications for credit cards, state benefits or fraudulent usage of credit cards or mobile phones. - Loan application processing: fraudulent applications or potentially problematical customers. - Intrusion detection, such as unauthorized access in computer networks.

intelligent systems design and applications | 2009

General Purpose Input Variables Extraction: A Genetic Algorithm Based Procedure GIVE A GAP

Silvia Cateni; Valentina Colla; Marco Vannucci

The paper presents an application of genetic algorithms to the problem of input variables selection for the design of neural systems. The basic idea of the proposed method lies in the use of genetic algorithms in order to select the set of variables to be fed to the neural networks. However, the main concept behind this approach is far more general and does not depend on the particular adopted model: it can be used for a wide category of systems, also non-neural, and with a variety of performance indicators. The proposed method has been tested on a simple case study, in order to demonstrate its effectiveness. The results obtained in the processing of experimental data are presented and discussed.

Journal of Intelligent and Fuzzy Systems | 2013

A multivariate fuzzy system applied for outliers detection

Silvia Cateni; Valentina Colla; Gianluca Nastasi

The paper presents an application of fuzzy logic to the problem of outliers detection. The overall purpose of the work is to point out anomalous data due different causes through a combination of several traditional methods for outliers detection in multivariate datasets and such combination is achieved through a fuzzy inference system. Moreover, the proposed solutions aims to be automatic and self-adaptive, as some parameters which are required for the combination of the different approaches are automatically evaluated by exploiting the available data, without the need of a-priori assumptions or information on a subset of the available data. The proposed method therefore belongs to the class of the unsupervised outliers detection methods. In order to demonstrate the effectiveness of the developed method, extensive tests have been performed on both a simple case study and a database coming from a real industrial context, where the data have to be filtered before their exploitation for process control purposes. The achieved numerical results are presented and discussed.

european modelling symposium | 2014

A Hybrid Feature Selection Method for Classification Purposes

Silvia Cateni; Valentina Colla; Marco Vannucci

This paper presents a novel combination of filter features selection algorithms for classification problem. Feature selection is one of the most important issues in pattern recognition, machine learning and computer vision. The main objective of feature selection regards the dimensionality reduction, the performance of machine learning improvement and the process comprehensibility increase. Exhaustive search method is the only method which guarantees to find the optimal subsets but its computational time complexity is exponential. In this paper the set of available variables are firstly reduced using a combination of filter selection methods and then exhaustive search is performed in order to obtain a sub-optimal set of variables in a reasonable time. The proposed approach is tested on several commonly used datasets from UCI repository and two datasets coming from industrial context.

Journal of Intelligent Manufacturing | 2018

Implementation and comparison of algorithms for multi-objective optimization based on genetic algorithms applied to the management of an automated warehouse

Gianluca Nastasi; Valentina Colla; Silvia Cateni; Simone Campigli

The paper presents strategies optimization for an existing automated warehouse located in a steelmaking industry. Genetic algorithms are applied to this purpose and three different popular algorithms capable to deal with multi-objective optimization are compared. The three algorithms, namely the Niched Pareto Genetic Algorithm, the Non-dominated Sorting Genetic Algorithm 2 and the Strength Pareto Genetic Algorithm 2, are described in details and the achieved results are widely discussed; moreover several statistical tests have been applied in order to evaluate the statistical significance of the obtained results.

Archive | 2013

Variable Selection and Feature Extraction Through Artificial Intelligence Techniques

Silvia Cateni; Marco Vannucci; Marco Vannocci; Valentina Colla

unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

intelligent systems design and applications | 2011

Novel resampling method for the classification of imbalanced datasets for industrial and other real-world problems

Silvia Cateni; Valentina Colla; Marco Vannucci

The paper deals a novel resampling method in order to cope with imbalanced dataset in binary classification problems. Imbalanced datasets are frequently found in many industrial applications: for instance, the occurrence of particular product defects or machine faults are rare events whose detection is of utmost importance. In this paper a new resampling method combining an oversampling and an undersampling techniques is treated. In order to prove the effectiveness of the proposed approach, several tests have been developed. Two classifiers based on Support Vector Machine and Decision Tree have been designed, which are applied for binary classification on four datasets: a synthetic dataset, a widely used public dataset and two industrial datasets. The obtained results are presented and discussed in the paper; in particular, the performance that is achieved by the two classifiers through our resampling approach is compared to the ones that are obtained without any resampling and through the classical SMOTE approach, respectively.

international conference on artificial intelligence and applications | 2014

A PROCEDURE FOR BUILDING REDUCED RELIABLE TRAINING DATASETS FROM REAL-WORLD DATA

Silvia Cateni; Valentina Colla; Marco Vannucci; Marco Vannocci

Dimensionality reduction and anomalous data detection are important tasks in machine learning and data mining applications. Many real-world datasets are affected by errors and variable redundancy and this fact can generate problems when the data are used to develop accurate models exploiting some training procedures for parameters tuning. In this paper an automatic procedure is proposed combining detection of unreliable data and reduction of dimensionality to be adopted before exploiting the data to develop a model for prediction purposes. The method has been tested on several datasets belonging to the UCI repository and industrial fields. The results of tests are showed and discussed in the paper. The proposed approach provide a good prediction accuracy providing a minimal but essential dataset.

international work-conference on artificial and natural neural networks | 2015

An Hybrid Ensemble Method Based on Data Clustering and Weak Learners Reliabilities Estimated Through Neural Networks

Marco Vannucci; Valentina Colla; Silvia Cateni

In this paper a novel hybrid ensemble method aiming at the improvement of models accuracy in regression tasks is presented. The proposed ensemble is composed by a strong learner trained exploiting data belonging to the whole training dataset and a set of specialised weak learners trained by using data coming from limited regions of the input space determined by means of a Self Organising Map based clustering. In the simulation phase, the strong and weak learners operate alternatively according to their punctual self-estimated reliabilities so as to handle each specific sample by means of the most promising learner. The method has been tested both on literature and real world datasets achieving satisfactory results.

Explore More