Clara Pizzuti | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Clara Pizzuti is active.

Explore More

Publication

Featured researches published by Clara Pizzuti.

european conference on principles of data mining and knowledge discovery | 2002

Fast Outlier Detection in High Dimensional Spaces

Fabrizio Angiulli; Clara Pizzuti

In this paper we propose a new definition of distance-based outlier that considers for each point the sum of the distances from its k nearest neighbors, called weight. Outliers are those points having the largest values of weight. In order to compute these weights, we find the k nearest neighbors of each point in a fast and efficient way by linearizing the search space through the Hilbert space filling curve. The algorithm consists of two phases, the first provides an approximated solution, within a small factor, after executing at most d + 1 scans of the data set with a low time complexity cost, where d is the number of dimensions of the data set. During each scan the number of points candidate to belong to the solution set is sensibly reduced. The second phase returns the exact solution by doing a single scan which examines further a little fraction of the data set. Experimental results show that the algorithm always finds the exact solution during the first phase after d ? d + 1 steps and it scales linearly both in the dimensionality and the size of the data set.

IEEE Transactions on Knowledge and Data Engineering | 2005

Outlier mining in large high-dimensional data sets

Fabrizio Angiulli; Clara Pizzuti

A new definition of distance-based outlier and an algorithm, called HilOut, designed to efficiently detect the top n outliers of a large and high-dimensional data set are proposed. Given an integer k, the weight of a point is defined as the sum of the distances separating it from its k nearest-neighbors. Outlier are those points scoring the largest values of weight. The algorithm HilOut makes use of the notion of space-filling curve to linearize the data set, and it consists of two phases. The first phase provides an approximate solution, within a rough factor, after the execution of at most d + 1 sorts and scans of the data set, with temporal cost quadratic in d and linear in N and in k, where d is the number of dimensions of the data set and N is the number of points in the data set. During this phase, the algorithm isolates points candidate to be outliers and reduces this set at each iteration. If the size of this set becomes n, then the algorithm stops reporting the exact solution. The second phase calculates the exact solution with a final scan examining further the candidate outliers that remained after the first phase. Experimental results show that the algorithm always stops, reporting the exact solution, during the first phase after much less than d + 1 steps. We present both an in-memory and disk-based implementation of the HilOut algorithm and a thorough scaling analysis for real and synthetic data sets showing that the algorithm scales well in both cases.

parallel problem solving from nature | 2008

GA-Net: A Genetic Algorithm for Community Detection in Social Networks

Clara Pizzuti

The problem of community structure detection in complex networks has been intensively investigated in recent years. In this paper we propose a genetic based approach to discover communities in social networks. The algorithm optimizes a simple but efficacious fitness function able to identify densely connected groups of nodes with sparse connections between groups. The method is efficient because the variation operators are modified to take into consideration only the actual correlations among the nodes, thus sensibly reducing the research space of possible solutions. Experiments on synthetic and real life networks show the capability of the method to successfully detect the network structure.

IEEE Transactions on Knowledge and Data Engineering | 2006

Distance-based detection and prediction of outliers

Fabrizio Angiulli; Stefano Basta; Clara Pizzuti

A distance-based outlier detection method that finds the top outliers in an unlabeled data set and provides a subset of it, called outlier detection solving set, that can be used to predict the outlierness of new unseen objects, is proposed. The solving set includes a sufficient number of points that permits the detection of the top outliers by considering only a subset of all the pairwise distances from the data set. The properties of the solving set are investigated, and algorithms for computing it, with subquadratic time requirements, are proposed. Experiments on synthetic and real data sets to evaluate the effectiveness of the approach are presented. A scaling analysis of the solving set size is performed, and the false positive rate, that is, the fraction of new objects misclassified as outliers using the solving set instead of the overall data set, is shown to be negligible. Finally, to investigate the accuracy in separating outliers from inliers, ROC analysis of the method is accomplished. Results obtained show that using the solving set instead of the data set guarantees a comparable quality of the prediction, but at a lower computational cost.

IEEE Transactions on Evolutionary Computation | 2012

A Multiobjective Genetic Algorithm to Find Communities in Complex Networks

Clara Pizzuti

A multiobjective genetic algorithm to uncover community structure in complex network is proposed. The algorithm optimizes two objective functions able to identify densely connected groups of nodes having sparse inter-connections. The method generates a set of network divisions at different hierarchical levels in which solutions at deeper levels, consisting of a higher number of modules, are contained in solutions having a lower number of communities. The number of modules is automatically determined by the better tradeoff values of the objective functions. Experiments on synthetic and real life networks show that the algorithm successfully detects the network structure and it is competitive with state-of-the-art approaches.

international conference on tools with artificial intelligence | 2007

An Adaptive Distributed Ensemble Approach to Mine Concept-Drifting Data Streams

Gianluigi Folino; Clara Pizzuti; Giandomenico Spezzano

An adaptive boosting ensemble algorithm for classifying homogeneous distributed data streams is presented. The method builds an ensemble of classifiers by using Genetic Programming (GP) to inductively generate decision trees, each trained on different parts of the distributed training set. The approach adopts a co-evolutionary platform to support a cooperative model of GP. A change detection strategy, based on self-similarity of the ensemble behavior, and measured by its fractal dimension, permits to capture time- evolving trends and patterns in the stream, and to reveal changes in evolving data streams. The approach tracks online ensemble accuracy deviation over time and decides to recompute the ensemble if the deviation has exceeded a pre- specified threshold. This allows the maintenance of an accurate and up-to-date ensemble of classifiers for continuous flows of data with concept drifts. Experimental results on a real life data set show the validity of the approach.

IEEE Transactions on Knowledge and Data Engineering | 2014

An Evolutionary Multiobjective Approach for Community Discovery in Dynamic Networks

Francesco Folino; Clara Pizzuti

The discovery of evolving communities in dynamic networks is an important research topic that poses challenging tasks. Evolutionary clustering is a recent framework for clustering dynamic networks that introduces the concept of temporal smoothness inside the community structure detection method. Evolutionary-based clustering approaches try to maximize cluster accuracy with respect to incoming data of the current time step, and minimize clustering drift from one time step to the successive one. In order to optimize both these two competing objectives, an input parameter that controls the preference degree of a user towards either the snapshot quality or the temporal quality is needed. In this paper the detection of communities with temporal smoothness is formulated as a multiobjective problem and a method based on genetic algorithms is proposed. The main advantage of the algorithm is that it automatically provides a solution representing the best trade-off between the accuracy of the clustering obtained, and the deviation from one time step to the successive. Experiments on synthetic data sets show the very good performance of the method when compared with state-of-the-art approaches.

international conference on tools with artificial intelligence | 2009

A Multi-objective Genetic Algorithm for Community Detection in Networks

Clara Pizzuti

A multiobjective genetic algorithm to uncover community structure in complex network is proposed. The algorithm optimizes two objective functions able to identify densely connected groups of nodes having sparse interconnections. The method generates a set of network divisions at different hierarchical levels in which solutions at deeper levels, consisting of a higher number of modules, are contained in solutions having a lower number of communities. The number of modules is automatically determined by the better tradeoff values of the objective functions. Experiments on synthetic and real life networks show the capability of the method to successfully detect the network structure.

genetic and evolutionary computation conference | 2008

Community detection in social networks with genetic algorithms

Clara Pizzuti

A new genetic algorithm to detect communities in social networks is presented. The algorithm uses a fitness function able to identify groups of nodes in the network having dense intra-connections, and sparse inter-connections. The variation operators employed are suitably adapted to take into account the actual links among the nodes. These modified operators makes the method efficient because the space of possible solutions is sensibly reduced. Experiments on a real life network show the capability of the method to successfully identify the network structure.

european conference on genetic programming | 2000

Genetic Programming and Simulated Annealing: A Hybrid Method to Evolve Decision Trees

Gianluigi Folino; Clara Pizzuti; Giandomenico Spezzano

A method for the data mining task of data classification, suitable to be implemented on massively parallel architectures, is proposed. The method combines genetic programming and simulated annealing to evolve a population of decision trees. A cellular automaton is used to realise a fine-grained parallel implementation of genetic programming through the diffusion model and the annealing schedule to decide the acceptance of a new solution. Preliminary experimental results, obtained by simulating the behaviour of the cellular automaton on a sequential machine, show significant better performances with respect to C4.5.

Explore More