Armando Segatori | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Armando Segatori is active.

Explore More

Publication

Featured researches published by Armando Segatori.

Information Sciences | 2016

A MapReduce solution for associative classification of big data

Alessio Bechini; Armando Segatori

Associative classifiers have proven to be very effective in classification problems. Unfortunately, the algorithms used for learning these classifiers are not able to adequately manage big data because of time complexity and memory constraints. To overcome such drawbacks, we propose a distributed association rule-based classification scheme shaped according to the MapReduce programming model. The scheme mines classification association rules (CARs) using a properly enhanced, distributed version of the well-known FP-Growth algorithm. Once CARs have been mined, the proposed scheme performs a distributed rule pruning. The set of survived CARs is used to classify unlabeled patterns. The memory usage and time complexity for each phase of the learning process are discussed, and the scheme is evaluated on seven real-world big datasets on the Hadoop framework, characterizing its scalability and achievable speedup on small computer clusters. The proposed solution for associative classifiers turns to be suitable to practically address big datasets even with modest hardware support. Comparisons with two state-of-the-art distributed learning algorithms are also discussed in terms of accuracy, model complexity, and computation time.

IEEE Transactions on Fuzzy Systems | 2018

On Distributed Fuzzy Decision Trees for Big Data

Armando Segatori; Witold Pedrycz

Fuzzy decision trees (FDTs) have shown to be an effective solution in the framework of fuzzy classification. The approaches proposed so far to FDT learning, however, have generally neglected time and space requirements. In this paper, we propose a distributed FDT learning scheme shaped according to the MapReduce programming model for generating both binary and multiway FDTs from big data. The scheme relies on a novel distributed fuzzy discretizer that generates a strong fuzzy partition for each continuous attribute based on fuzzy information entropy. The fuzzy partitions are, therefore, used as an input to the FDT learning algorithm, which employs fuzzy information gain for selecting the attributes at the decision nodes. We have implemented the FDT learning scheme on the Apache Spark framework. We have used ten real-world publicly available big datasets for evaluating the behavior of the scheme along three dimensions: 1) performance in terms of classification accuracy, model complexity, and execution time; 2) scalability varying the number of computing units; and 3) ability to efficiently accommodate an increasing dataset size. We have demonstrated that the proposed scheme turns out to be suitable for managing big datasets even with a modest commodity hardware support. Finally, we have used the distributed decision tree learning algorithm implemented in the MLLib library and the Chi-FRBCS-BigData algorithm, a MapReduce distributed fuzzy rule-based classification system, for comparative analysis.

Information Sciences | 2016

On the influence of feature selection in fuzzy rule-based regression model generation

Michela Antonelli; Pietro Ducange; Armando Segatori

Fuzzy rule-based models have been extensively used in regression problems. Besides high accuracy, one of the most appreciated characteristics of these models is their interpretability, which is generally measured in terms of complexity. Complexity is affected by the number of features used for generating the model: the lower the number of features, the lower the complexity. Feature selection can therefore considerably contribute not only to speed up the learning process, but also to improve the interpretability of the final model. Nevertheless, a very few methods for selecting features before learning regression models have been proposed in the literature. In this paper, we focus on these methods, which perform feature selection as pre-processing step. In particular, we have adapted two state-of-the-art feature selection algorithms, namely NMIFS and CFS, originally proposed for classification, to deal with regression. Further, we have proposed FMIFS, a novel forward sequential feature selection approach, based on the minimal-redundancy-maximal-relevance criterion, which can manage directly fuzzy partitions. The relevance and the redundancy of a feature are measured in terms of, respectively, the fuzzy mutual information between the feature and the output variable, and the average fuzzy mutual information between the feature and the just selected features. The stopping criterion for the sequential selection is based on the average values of relevance and redundancy of the just selected features.We have performed two experiments on twenty regression datasets. In the first experiment, we aimed to show the effectiveness of feature selection in fuzzy rule-based regression model generation by comparing the mean square errors achieved by the fuzzy rule-based models generated using all the features, and the features selected by FMIFS, NMIFS and CFS. In order to avoid possible biases related to the specific algorithm, we adopted the well-known Wang and Mendel algorithm for generating the fuzzy rule-based models. We present that the mean square errors obtained by models generated by using the features selected by FMIFS are on average similar to the values achieved by using all the features and lower than the ones obtained by employing the subset of features selected by NMIFS and CFS. In the second experiment, we intended to evaluate how feature selection can reduce the convergence time of the evolutionary fuzzy systems, which are probably the most effective fuzzy techniques for tackling regression problems. By using a state-of-the-art multi-objective evolutionary fuzzy system based on rule learning and membership function tuning, we show that the number of evaluations can be considerably reduced when pre-processing the dataset by feature selection.

Expert Systems With Applications | 2015

A novel associative classification model based on a fuzzy frequent pattern mining algorithm

Michela Antonelli; Pietro Ducange; Armando Segatori

We propose a novel efficient fuzzy associative classification approach.We exploit a fuzzy version of the FP-Growth algorithm.We perform an experimental analysis on 17 classification datasets.We compare our approach with three well-known associative classifiers. Associative classification models are based on two different data mining paradigms, namely pattern classification and association rule mining. These models are very popular for building highly accurate classifiers and have been employed in a number of real world applications.During the last years, several studies and different algorithms have been proposed to integrate associative classification models with the fuzzy set theory, leading to the so-called fuzzy associative classifiers.In this paper, we propose a novel efficient fuzzy associative classification approach based on a fuzzy frequent pattern mining algorithm. Fuzzy items are generated by discretizing the input variables and defining strong fuzzy partitions on the intervals resulting from these discretizations. Then, fuzzy associative classification rules are mined by employing a fuzzy extension of the FP-Growth algorithm, one of the most efficient frequent pattern mining algorithms. Finally, a set of highly accurate classification rules is generated after a pruning stage.We tested our approach on seventeen real-world datasets and compared the achieved results with the ones obtained by using both a non-fuzzy associative classifier, namely CMAR, and two recent state-of-the-art classifiers, namely FARC-HD and D-MOFARC, based on fuzzy association rules. Using non-parametric statistical tests, we show that our approach outperforms CMAR and achieves accuracies similar to FARC-HD and D-MOFARC.

the internet of things | 2014

Low-Effort Support to Efficient Urban Parking in a Smart City Perspective

Alessio Bechini; Armando Segatori

The Internet of Things (IoT) is today considered as one of the most important enabling technologies for developing a wide variety of smart services aimed at assisting the final user in the urban environment. In this chapter, we present how IoT can be employed to develop a system for the effective and efficient management of urban parking, thus providing a small, yet relevant contribution to the implementation of a real Smart City. Our system relies on the identification of each single parking slot but, unlike other approaches proposed in the last years, it does not require dedicated sensors and/or infrastructure, thus it can be regarded as a low-cost and low-effort solution. Indeed, it collects parking data from a mobile application on the drivers’ mobile devices and possibly identifies each slot by QR codes deployed on the single parking spots. The amount of data collected by the system on parking occupancy allows inferring valuable information that can be used by local governments. For instance, it will be possible to define appropriate pricing schemes so as to promote parking areas not particularly occupied. The employment of an SOA design guarantees the integration of the developed system with other existing services within a Smart City.

Information Sciences | 2017

A distributed approach to multi-objective evolutionary generation of fuzzy rule-based classifiers from big data

Andrea Ferranti; Armando Segatori; Michela Antonelli; Pietro Ducange

Abstract In the last years, multi-objective evolutionary algorithms (MOEAs) have been extensively used to generate sets of fuzzy rule-based classifiers (FRBCs) with different trade-offs between accuracy and interpretability. Since the computation of the accuracy for each chromosome evaluation requires the scan of the overall training set, these approaches have proved to be very expensive in terms of execution time and memory occupation. For this reason, they have not been applied to very large datasets yet. On the other hand, just for these datasets, interpretability of classifiers would be very desirable. In the last years the advent of a number of open source cluster computing frameworks has however opened new interesting perspectives. In this paper, we exploit one of these frameworks, namely Apache Spark, and propose the first distributed multi-objective evolutionary approach to learn concurrently the rule and data bases of FRBCs by maximizing accuracy and minimizing complexity. During the evolutionary process, the computation of the fitness is divided among the cluster nodes, thus allowing the designer to distribute both the computational complexity and the dataset storing. We have performed a number of experiments on ten real-world big datasets, evaluating our distributed approach in terms of both classification rate and scalability, and comparing it with two well-known state-of-art distributed classifiers. Finally, we have evaluated the achievable speedup on a small computer cluster. We present that the distributed version can efficiently extract compact rule bases with high accuracy, preserving the interpretability of the rule base, and can manage big datasets even with modest hardware support.

ieee international conference on fuzzy systems | 2015

A new approach to fuzzy random forest generation

Adriano Donato De Matteis; Armando Segatori

Random forests have proved to be very effective classifiers, which can achieve very high accuracies. Although a number of papers have discussed the use of fuzzy sets for coping with uncertain data in decision tree learning, fuzzy random forests have not been particularly investigated in the fuzzy community. In this paper, we first propose a simple method for generating fuzzy decision trees by creating fuzzy partitions for continuous variables during the learning phase. Then, we discuss how the method can be used for generating forests of fuzzy decision trees. Finally, we show how these fuzzy random forests achieve accuracies higher than two fuzzy rule-based classifiers recently proposed in the literature. Also, we highlight how fuzzy random forests are more tolerant to noise in datasets than classical crisp random forests.

IEEE Transactions on Systems, Man, and Cybernetics | 2018

A Distributed Fuzzy Associative Classifier for Big Data

Armando Segatori; Alessio Bechini; Pietro Ducange

Fuzzy associative classification has not been widely analyzed in the literature, although associative classifiers (ACs) have proved to be very effective in different real domain applications. The main reason is that learning fuzzy ACs is a very heavy task, especially when dealing with large datasets. To overcome this drawback, in this paper, we propose an efficient distributed fuzzy associative classification approach based on the MapReduce paradigm. The approach exploits a novel distributed discretizer based on fuzzy entropy for efficiently generating fuzzy partitions of the attributes. Then, a set of candidate fuzzy association rules is generated by employing a distributed fuzzy extension of the well-known FP-Growth algorithm. Finally, this set is pruned by using three purposely adapted types of pruning. We implemented our approach on the popular Hadoop framework. Hadoop allows distributing storage and processing of very large data sets on computer clusters built from commodity hardware. We have performed an extensive experimentation and a detailed analysis of the results using six very large datasets with up to 11 000 000 instances. We have also experimented different types of reasoning methods. Focusing on accuracy, model complexity, computation time, and scalability, we compare the results achieved by our approach with those obtained by two distributed nonfuzzy ACs recently proposed in the literature. We highlight that, although the accuracies result to be comparable, the complexity, evaluated in terms of number of rules, of the classifiers generated by the fuzzy distributed approach is lower than the one of the nonfuzzy classifiers.

ieee international conference on fuzzy systems | 2016

A Multi-objective evolutionary fuzzy system for big data

Andrea Ferranti; Armando Segatori

One of the most appealing features of fuzzy rule-based classifiers is the capability of explaining how the conclusions are inferred. This feature is hard to preserve when fuzzy rules are extracted from a very large amount of data. In this paper, we propose a distributed version of PAES-RCS, a multiobjective evolutionary approach to learn concurrently the rule and data bases of fuzzy rule-based classifiers by maximizing accuracy and minimizing complexity. PAES-RCS has proven to be very efficient in obtaining satisfactory approximations of the Pareto front exploiting a limited number of iterations. We implemented the distributed version of PAES-RCS by using Apache Spark as data processing framework. We discuss the effectiveness of our approach in terms of classification rate and scalability by performing a number of experiments on three real-world big datasets. Further, we compare our approach with other well-known state-of-art algorithms in terms of both accuracy and complexity, and evaluate the achievable speedup on a small computer cluster. We show that the distributed version can efficiently extract compact rule bases with high accuracy and allows handling big datasets even with modest hardware support.

systems, man and cybernetics | 2016

Spreading fuzzy random forests with MapReduce

Alessio Bechini; Adriano Donato De Matteis; Armando Segatori

Random forests are currently considered among the most accurate and efficient classifiers. Moreover, recently fuzzy implementations of random forests have been proposed to exploit the ability of fuzzy decision trees to cope with uncertain data. Whenever the size of training sets grows substantially, as it happens in the case of Big Data, ordinary implementations of classifiers become inadequate, and fuzzy random forests make no exception. In this paper, we consider a method, which generates fuzzy partitions of the continuous attributes along the decision tree learning, and we propose a distributed implementation of fuzzy random forests based on this method. The implementation relies on the MapReduce programming model and the Apache Hadoop framework. It is shown that such a model can easily accommodate an effective distribution strategy for the computation, yielding good scalability figures. The novel distributed algorithm makes fuzzy random forests able to deal with extremely large data sets, both in the learning and in the classification phases, thus fostering its applicability in the modern scenario of increasingly frequent data deluges.

Explore More