Stefano Lodi | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Stefano Lodi is active.

Explore More

Publication

Featured researches published by Stefano Lodi.

Lecture Notes in Computer Science | 2003

Agent-based distributed data mining: the KDEC scheme

Matthias Klusch; Stefano Lodi; Gianluca Moro

One key aspect of exploiting the huge amount of autonomous and heterogeneous data sources in the Internet is not only how to retrieve, collect and integrate relevant information but to discover previously unknown, implicit and valuable knowledge. In recent years several approaches to distributed data mining and knowledge discovery have been developed, but only a few of them make use of intelligent agents. This paper is intended to argue for the potential added value of using agent technology in the domain of knowledge discovery. We briefly review and classify existing approaches to agent-based distributed data mining, propose a novel approach to distributed data clustering based on density estimation, and discuss issues of its agent-oriented implementation.

IEEE Transactions on Knowledge and Data Engineering | 1998

Consistency checking in complex object database schemata with integrity constraints

Domenico Beneventano; Sonia Bergamaschi; Stefano Lodi; Claudio Sartori

Integrity constraints are rules that should guarantee the integrity of a database. Provided an adequate mechanism to express them is available, the following question arises: is there any way to populate a database which satisfies the constraints supplied by a database designer? That is, does the database schema, including constraints, admit at least a nonempty model? This work answers the above question in a complex object database environment, providing a theoretical framework, including the following ingredients: (1) two alternative formalisms, able to express a relevant set of state integrity constraints with a declarative style; (2) two specialized reasoners, based on the tableaux calculus, able to check the consistency of complex objects database schemata expressed with the two formalisms. The proposed formalisms share a common kernel, which supports complex objects and object identifiers, and which allow the expression of acyclic descriptions of: classes, nested relations and views, built up by means of the recursive use of record, quantified set, and object type constructors and by the intersection, union, and complement operators. Furthermore, the kernel formalism allows the declarative formulation of typing constraints and integrity rules. In order to improve the expressiveness and maintain the decidability of the reasoning activities, we extend the kernel formalism into two alternative directions. The first formalism, OLCP, introduces the capability of expressing path relations. Because cyclic schemas are extremely useful, we introduce a second formalism, OLCD, with the capability of expressing cyclic descriptions but disallowing the expression of path relations. In fact, we show that the reasoning activity in OLCDP (i.e., OLCP with cycles) is undecidable.

ieee wic acm international conference on intelligent agent technology | 2003

The role of agents in distributed data mining: issues and benefits

Matthias Klusch; Stefano Lodi; M. Gianluca

The increasing demand to extend data mining technology to data sets inherently distributed among a large number of autonomous and heterogeneous sources over a network with limited bandwidth has motivated the development of several approaches to distributed data mining and knowledge discovery, of which only a few make use of agents. We briefly review existing approaches and argue for the potential added value of using agent technology in the domain of knowledge discovery, discussing both issues and benefits. We also propose an approach to distributed data clustering, outline its agent-oriented implementation, and examine potential privacy violating attacks in which agents may incur.

IEEE Transactions on Knowledge and Data Engineering | 2013

Distributed Strategies for Mining Outliers in Large Data Sets

Fabrizio Angiulli; Stefano Basta; Stefano Lodi; Claudio Sartori

We introduce a distributed method for detecting distance-based outliers in very large data sets. Our approach is based on the concept of outlier detection solving set [2], which is a small subset of the data set that can be also employed for predicting novel outliers. The method exploits parallel computation in order to obtain vast time savings. Indeed, beyond preserving the correctness of the result, the proposed schema exhibits excellent performances. From the theoretical point of view, for common settings, the temporal cost of our algorithm is expected to be at least three orders of magnitude faster than the classical nested-loop like approach to detect outliers. Experimental results show that the algorithm is efficient and that its running time scales quite well for an increasing number of nodes. We discuss also a variant of the basic strategy which reduces the amount of data to be transferred in order to improve both the communication cost and the overall runtime. Importantly, the solving set computed by our approach in a distributed environment has the same quality as that produced by the corresponding centralized method.

Journal of Visual Languages and Computing | 2004

VidaMine: a visual data mining environment

Stephen Kimani; Stefano Lodi; Tiziana Catarci; Giuseppe Santucci; Claudio Sartori

Abstract That the already vast and ever-increasing amounts of data still do present formidable challenges to effective and efficient acquisition of knowledge is by no means an exaggeration. The knowledge discovery process entails more than just the application of data mining strategies. There are many other aspects including, but not limited to: planning, data pre-processing, data integration, evaluation and presentation. The human-vision channel is capable of recognizing and understanding data at an instant. Effective visual strategies can be used to tap the outstanding human visual channel in extracting useful information from data. Unlike is the case with most research efforts, the exploitation should be employed not just at the beginning or at the end of the knowledge discovery process but across the entire discovery process. In essence, this calls for the development of an effective user/visual component, the development of an overall framework that can support the entire discovery process/all discovery phases, and the strategic placement of the visual component in that framework. Key issues of this component will be the open architecture, allowing extensions and adaptations to specific mining environments, and the precise semantics and syntax, allowing an optimal integration between the presentation and the computation.

adaptive agents and multi-agents systems | 2003

Issues of agent-based distributed data mining

Matthias Klusch; Stefano Lodi; Gianluca Moro

Matthias Klusch Deduction and Multiagent Systems German Research Centre for Artificial Intelligence Stuhlsatzenhausweg 3 66123 Saarbruecken, Germany [email protected] Stefano Lodi Department of Electronics, Computer Science and Systems IEIIT-BO/CNR University of Bologna Viale Risorgimento 2 40136 Bologna BO, Italy [email protected] Gianluca Moro Department of Electronics, Computer Science and Systems University of Bologna Via Rasi e Spinelli 176 47023 Cesena FC, Italy [email protected]

european conference on parallel processing | 2010

A distributed approach to detect outliers in very large data sets

Fabrizio Angiulli; Stefano Basta; Stefano Lodi; Claudio Sartori

We propose a distributed approach addressing the problem of distance-based outlier detection in very large data sets. The presented algorithm is based on the concept of outlier detection solving set ([1]), which is a small subset of the data set that can be provably used for predicting novel outliers. The algorithm exploits parallel computation in order to meet two basic needs: (i) the reduction of the run time with respect to the centralized version and (ii) the ability to deal with distributed data sets. The former goal is achieved by decomposing the overall computation into cooperating parallel tasks. Other than preserving the correctness of the result, the proposed schema exhibited excellent performances. As a matter of fact, experimental results showed that the run time scales up with respect to the number of nodes. The latter goal is accomplished through executing each of these parallel tasks only on a portion of the entire data set, so that the proposed algorithm is suitable to be used over distributed data sets. Importantly, while solving the distance-based outlier detection task in the distributed scenario, our method computes an outlier detection solving set of the overall data set of the same quality as that computed by the corresponding centralized method.

international conference on peer-to-peer computing | 2007

W*-Grid: A Robust Decentralized Cross-layer Infrastructure for Routing and Multi-Dimensional Data Management in Wireless Ad-Hoc Sensor Networks

Gabriele Monti; Gianluca Moro; Stefano Lodi

Network coding is an emerging field of research with sound and mature theory supporting it. Recent works shows that it has many benefits like improved fault tolerance, higher flexibility in selection of file parts to transfer and resiliency to network partitions [4, 3]. Despite those appealing properties there is no wide usage of network coding in real file sharing applications. In this work, we try to bridge the gap between theory of network coding and practice. From the one hand, we deploy one the most successful file sharing client, the BitTorrent client. We use the BitTorrent algorithm for optimizing the neighbor selections for maximizing the upload bandwidth. From the other hand, we propose several simple heuristics that improve significantly the efficiency of the network coding deployed. In a nutshell, we propose computation intensive variant of network coding that can be applied to most of the existing network coding protocols. By changing the random selection of coded parts to a selection based on feedback from the network, we significantly improve the network utilization and the efficiency of the protocol. In this paper we report our work in progress building the BitCod client. Using extensive simulations we demonstrate that our technique can compete with the performance of the state-of-the-art BitTorrent [2] file sharing client. Next, we plan to implement and test a prototype of the BitCod client over the WAN.Sensor networks are usually composed by small units able to sense and transmit to a sink elementary data which are successively processed by an external machine. However recent improvements in the memory and computational power of sensors, together with the reduction of energy consumptions, are rapidly changing the potential of such systems, moving the attention towards data-centric sensor networks. This paper presents W*-Grid, a fully decentralized and robust infrastructure for self-organizing data- centric sensor networks, where wireless communications occur through multi-hop routing among devices. The solution extends W-Grid by strongly improving the network recovery performance from link and/or device failures. In particular W*-Grid guarantees, by construction, at least two disjoint paths between each couple of nodes. This implies that the recovery in W*-Grid occurs without broadcasting transmissions and guaranteeing robustness while drastically reducing the energy consumption. An extensive number of simulations show the efficiency, robustness and traffic load of resulting networks under several scenarios of device density and of number of coordinates.

international conference on high performance computing and simulation | 2013

Fast outlier detection using a GPU

Fabrizio Angiulli; Stefano Basta; Stefano Lodi; Claudio Sartori

The availability of cost-effective data collections and storage hardware has allowed organizations to accumulate very large data sets, which are a potential source of previously unknown valuable information. The process of discovering interesting patterns in such large data sets is referred to as data mining. Outlier detection is a data mining task consisting in the discovery of observations which deviate substantially from the rest of the data, and has many important practical applications. Outlier detection in very large data sets is however computationally very demanding and currently requires highperformance computing facilities. We propose a family of parallel algorithms for Graphic Processing Units (GPU), derived from two distance-based outlier detection algorithms: the BruteForce and the SolvingSet. We analyze their performance with an extensive set of experiments, comparing the GPU implementations with the base CPU versions and obtaining significant speedups.

iberoamerican congress on pattern recognition | 2010

A new algorithm for training SVMs using approximate minimal enclosing balls

Emanuele Frandi; Maria Grazia Gasparo; Stefano Lodi; Ricardo Ñanculef; Claudio Sartori

It has been shown that many kernel methods can be equivalently formulated as minimal enclosing ball (MEB) problems in a certain feature space. Exploiting this reduction, efficient algorithms to scale up Support Vector Machines (SVMs) and other kernel methods have been introduced under the name of Core Vector Machines (CVMs). In this paper, we study a new algorithm to train SVMs based on an instance of the Frank-Wolfe optimization method recently proposed to approximate the solution of the MEB problem. We show that, specialized to SVM training, this algorithm can scale better than CVMs at the price of a slightly lower accuracy.

Explore More