Alessandro Lulli
University of Pisa
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Alessandro Lulli.
international symposium on computers and communications | 2015
Alessandro Lulli; Laura Ricci; Emanuele Carlini; Patrizio Dazzi; Claudio Lucchese
The problem of finding connected components in a graph is common to several applications dealing with graph analytics, such as social network analysis, web graph mining and image processing. The exponentially growing size of graphs requires the definition of appropriated computational models and algorithms for their processing on high throughput distributed architectures. In this paper we present cracker, an efficient iterative algorithm to detect connected components in large graphs. The strategy of cracker is to iteratively grow a spanning tree for each connected component of the graph. Nodes added to such trees are discarded from the computation in the subsequent iterations. We provide an extensive experimental evaluation considering a wide variety of synthetic and real-world graphs. The experimental evaluation shows that cracker consistently outperforms state-of-the-art approaches both in terms of total computation time and volume of messages exchanged.
european conference on parallel processing | 2014
Emanuele Carlini; Patrizio Dazzi; Andrea Esposito; Alessandro Lulli; Laura Ricci
A significant part of the data produced every day by online services is structured as a graph. Therefore, there is the need for efficient processing and analysis solutions for large scale graphs. Among the others, the balanced graph partitioning is a well known NP-complete problem with a wide range of applications. Several solutions have been proposed so far, however most of the existing state-of-the-art algorithms are not directly applicable in very large-scale distributed scenarios. A recently proposed promising alternative exploits a vertex-center heuristics to solve the balance graph partitioning problem. Their algorithm is massively parallel: there is no central coordination, and each node is processed independently. Unfortunately, we found such algorithm to be not directly exploitable in current BSP-like distributed programming frameworks. In this paper we present the adaptations we applied to the original algorithm while implementing it on Spark, a state-of-the-art distributed framework for data processing.
very large data bases | 2016
Alessandro Lulli; Matteo Dell'Amico; Pietro Michiardi; Laura Ricci
We present NG-DBSCAN, an approximate density-based clustering algorithm that operates on arbitrary data and any symmetric distance measure. The distributed design of our algorithm makes it scalable to very large datasets; its approximate nature makes it fast, yet capable of producing high quality clustering results. We provide a detailed overview of the steps of NG-DBSCAN, together with their analysis. Our results, obtained through an extensive experimental campaign with real and synthetic data, substantiate our claims about NG-DBSCANs performance and scalability.
Future Generation Computer Systems | 2016
Emanuele Carlini; Alessandro Lulli; Laura Ricci
Abstract Distributed query processing is of paramount importance in next-generation distribution services, such as Internet of Things (IoT) and cyber–physical systems. Even if several multi-attribute range queries supports have been proposed for peer-to-peer systems, these solutions must be rethought to fully meet the requirements of new computational paradigms for IoT, like fog computing. This paper proposes dragon , an efficient support for distributed multi-dimensional range query processing targeting efficient query resolution on highly dynamic data. In dragon nodes at the edges of the network collect and publish multi-dimensional data. The nodes collectively manage an aggregation tree storing data digests which are then exploited, when resolving queries, to prune the sub-trees containing few or no relevant matches. Multi-attribute queries are managed by linearizing the attribute space through space filling curves. We extensively analysed different aggregation and query resolution strategies in a wide spectrum of experimental set-ups. We show that dragon manages efficiently fast changing data values. Further, we show that dragon resolves queries by contacting a lower number of nodes when compared to a similar approach in the state of the art.
international conference on big data | 2015
Alessandro Lulli; Thibault Debatty; Matteo Dell'Amico; Pietro Michiardi; Laura Ricci
Clustering items using textual features is an important problem with many applications, such as root-cause analysis of spam campaigns, as well as identifying common topics in social media. Due to the sheer size of such data, algorithmic scalability becomes a major concern. In this work, we present our approach for text clustering that builds an approximate k-NN graph, which is then used to compute connected components representing clusters. Our focus is to understand the scalability / accuracy tradeoff that underlies our method: we do so through an extensive experimental campaign, where we use real-life datasets, and show that even rough approximations of k-NN graphs are sufficient to identify valid clusters. Our method is scalable and can be easily tuned to meet requirements stemming from different application domains.
international symposium on computers and communications | 2016
Alessandro Lulli; Lorenzo Gabrielli; Patrizio Dazzi; Matteo Dell'Amico; Pietro Michiardi; Mirco Nanni; Laura Ricci
Statistical authorities promote and safeguard the production and publication of official statistics that serve the public good. One of their duties is to monitor the presence of individuals region by region. Traditionally this activity has been conducted by means of censuses and surveys. Nowadays technologies open new possibilities such as a continuous sensing of the presences by leveraging the data associated to mobile devices, e.g., the behaviour of users on doing calls. In this paper first we propose a specifically conceived similarity function able to capture similarity between individuals call behaviours. Second we make use of a clustering algorithm able to handle arbitrary metric leading to a good internal and external consistency of clusters. The approach provides better population estimation with respect to state of the art comparing with real census data. The scalability and flexibility that characterises the proposed framework enables novel scenarios for the characterization of people by means of data derived from mobile users, ranging from the nearly-realtime estimation of presences to the definition of complex, uncommon user archetypes.
IEEE Transactions on Parallel and Distributed Systems | 2017
Alessandro Lulli; Emanuele Carlini; Patrizio Dazzi; Claudio Lucchese; Laura Ricci
Finding connected components is a fundamental task in applications dealing with graph analytics, such as social network analysis, web graph mining and image processing. The exponentially growing size of todays graphs has required the definition of new computational models and algorithms for their efficient processing on highly distributed architectures. In this paper we present cracker, an efficient iterative MapReduce-like algorithm to detect connected components in large graphs. The strategy of cracker is to transform the input graph in a set of trees, one for each connected component in the graph. Nodes are iteratively removed from the graph and added to the trees, reducing the amount of computation at each iteration. We prove the correctness of the algorithm, evaluate its computational cost and provide an extensive experimental evaluation considering a wide variety of synthetic and real-world graphs. The experimental results show that cracker consistently outperforms state-of-the-art approaches both in terms of total computation time and volume of messages exchanged.
self-adaptive and self-organizing systems | 2015
Alessandro Lulli; Laura Ricci; Emanuele Carlini; Patrizio Dazzi
The computation of nodes centrality is of great importance for the analysis of graphs. The current flow betweenness is an interesting centrality index that is computed by considering how the information travels along all the possible paths of a graph. The current flow betweenness exploits basic results from electrical circuits, i.e. Kirchhoffs laws, to evaluate the centrality of vertices. The computation of the current flow betweenness may exceed the computational capability of a single machine for very large graphs composed by millions of nodes. In this paper we propose a solution that estimates the current flow betweenness in a distributed setting, by defining a vertex-centric, gossip-based algorithm. Each node, relying on its local information, in a self-adaptive way generates new flows to improve the betweenness of all the nodes of the graph. Our experimental evaluation shows that our proposal achieves high correlation with the exact current flow betweenness, and provides a good centrality measure for large graphs.
european conference on parallel processing | 2016
Emanuele Carlini; Alessandro Lulli; Laura Ricci
The development and evaluation of a proper mobility model is an essential feature to evaluate a system that manages a virtual world. In distributed virtual environments, this is also more important because each avatar requires a consistent view of the world that usually is splitted on multiple machines. Several models have been proposed in the literature to describe avatars’ mobility, but a single environment supporting the generation of traces from different models to enable a simple comparison of them is still lacking. In this work we present a tool that implements popular mobility models and supports the generation of traces generated by them. This may help developers to easily validate their systems using several mobility models. Our tool provides a unified format to describe the traces, enables the generation of traces for thousands of avatars and defines an API enabling the integration of additional models.
european conference on parallel processing | 2015
Alessandro Lulli; Patrizio Dazzi; Laura Ricci; Emanuele Carlini
The processing of graph in a parallel and distributed fashion is a constantly rising trend, due to the size of the today’s graphs. This paper proposes a multi-layer graph overlay approach to support the orchestration of distributed, vertex-centric computations targeting large graphs. Our approach takes inspiration from the overlay networks, a widely exploited approach for information dissemination, aggregation and computing orchestration in massively distributed systems. We propose Telos, an environment supporting the definition of multi-layer graph overlays which provides each vertex with a layered, vertex-centric, view of the graph. Telos is defined on the top of Apache Spark and has been evaluated by considering two well-known graph problems. We present a set of experimental results showing the effectiveness of our approach.