Luca Becchetti | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Luca Becchetti is active.

Explore More

Publication

Featured researches published by Luca Becchetti.

international acm sigir conference on research and development in information retrieval | 2006

A reference collection for web spam

Carlos Castillo; Debora Donato; Luca Becchetti; Paolo Boldi; Stefano Leonardi; Massimo Santini; Sebastiano Vigna

We describe the WEBSPAM-UK2006 collection, a large set of Web pages that have been manually annotated with labels indicating if the hosts are include Web spam aspects or not. This is the first publicly available Web spam collection that includes page contents and links, and that has been labelled by a large and diverse set of judges.

knowledge discovery and data mining | 2008

Efficient semi-streaming algorithms for local triangle counting in massive graphs

Luca Becchetti; Paolo Boldi; Carlos Castillo; Aristides Gionis

In this paper we study the problem of local triangle counting in large graphs. Namely, given a large graph G = (V;E) we want to estimate as accurately as possible the number of triangles incident to every node υ ∈ V in the graph. The problem of computing the global number of triangles in a graph has been considered before, but to our knowledge this is the first paper that addresses the problem of local triangle counting with a focus on the efficiency issues arising in massive graphs. The distribution of the local number of triangles and the related local clustering coefficient can be used in many interesting applications. For example, we show that the measures we compute can help to detect the presence of spamming activity in large-scale Web graphs, as well as to provide useful features to assess content quality in social networks. For computing the local number of triangles we propose two approximation algorithms, which are based on the idea of min-wise independent permutations (Broder et al. 1998). Our algorithms operate in a semi-streaming fashion, using O(jV j) space in main memory and performing O(log jV j) sequential scans over the edges of the graph. The first algorithm we describe in this paper also uses O(jEj) space in external memory during computation, while the second algorithm uses only main memory. We present the theoretical analysis as well as experimental results in massive graphs demonstrating the practical efficiency of our approach.

international world wide web conferences | 2012

Online team formation in social networks

Aris Anagnostopoulos; Luca Becchetti; Carlos Castillo; Aristides Gionis; Stefano Leonardi

We study the problem of online team formation. We consider a setting in which people possess different skills and compatibility among potential team members is modeled by a social network. A sequence of tasks arrives in an online fashion, and each task requires a specific set of skills. The goal is to form a new team upon arrival of each task, so that (i) each team possesses all skills required by the task, (ii) each team has small communication overhead, and (iii) the workload of performing the tasks is balanced among people in the fairest possible way. We propose efficient algorithms that address all these requirements: our algorithms form teams that always satisfy the required skills, provide approximation guarantees with respect to team communication overhead, and they are online-competitive with respect to load balancing. Experiments performed on collaboration networks among film actors and scientists, confirm that our algorithms are successful at balancing these conflicting requirements. This is the first paper that simultaneously addresses all these aspects. Previous work has either focused on minimizing coordination for a single task or balancing the workload neglecting coordination costs.

ACM Transactions on The Web | 2008

Link analysis for Web spam detection

Luca Becchetti; Carlos Castillo; Debora Donato; Ricardo A. Baeza-Yates; Stefano Leonardi

We propose link-based techniques for automatic detection of Web spam, a term referring to pages which use deceptive techniques to obtain undeservedly high scores in search engines. The use of Web spam is widespread and difficult to solve, mostly due to the large size of the Web which means that, in practice, many algorithms are infeasible. We perform a statistical analysis of a large collection of Web pages. In particular, we compute statistics of the links in the vicinity of every Web page applying rank propagation and probabilistic counting over the entire Web graph in a scalable way. These statistical features are used to build Web spam classifiers which only consider the link structure of the Web, regardless of page contents. We then present a study of the performance of each of the classifiers alone, as well as their combined performance, by testing them over a large collection of Web link spam. After tenfold cross-validation, our best classifiers have a performance comparable to that of state-of-the-art spam classifiers that use content attributes, but are orthogonal to content-based methods.

conference on information and knowledge management | 2010

Power in unity: forming teams in large-scale community systems

Aris Anagnostopoulos; Luca Becchetti; Carlos Castillo; Aristides Gionis; Stefano Leonardi

The internet has enabled the collaboration of groups at a scale that was unseen before. A key problem for large collaboration groups is to be able to allocate tasks effectively. An effective task assignment method should consider both how fit teams are for each job as well as how fair the assignment is to team members, in terms that no one should be overloaded or unfairly singled out. The assignment has to be done automatically or semi-automatically given that it is difficult and time-consuming to keep track of the skills and the workload of each person. Obviously the method to do this assignment must also be computationally efficient. In this paper we present a general framework for task assignment problems. We provide a formal treatment on how to represent teams and tasks. We propose alternative functions for measuring the fitness of a team performing a task and we discuss desirable properties of those functions. Then we focus on one class of task-assignment problems, we characterize the complexity of the problem, and we provide algorithms with provable approximation guarantees, as well as lower bounds. We also present experimental results that show that our methods are useful in practice in several application scenarios.

Journal of the ACM | 2004

Nonclairvoyant scheduling to minimize the total flow time on single and parallel machines

Luca Becchetti; Stefano Leonardi

Scheduling a sequence of jobs released over time when the processing time of a job is only known at its completion is a classical problem in CPU scheduling in time sharing operating systems. A widely used measure for the responsiveness of the system is the average flow time of the jobs, that is, the average time spent by jobs in the system between release and completion.The Windows NT and the Unix operating system scheduling policies are based on the Multilevel Feedback algorithm. In this article, we prove that a randomized version of the Multilevel Feedback algorithm is competitive for single and parallel machine systems, in our opinion providing one theoretical validation of the goodness of an idea that has proven effective in practice along the last two decades.The randomized Multilevel Feedback algorithm (RMLF) was first proposed by Kalyanasundaram and Pruhs for a single machine achieving an O(log n log log n) competitive ratio to minimize the average flow time against the on-line adaptive adversary, where n is the number of jobs that are released. We present a version of RMLF working for any number m of parallel machines. We show for RMLF a first O(log n log n/m) competitiveness result against the oblivious adversary on parallel machines. We also show that the same RMLF algorithm surprisingly achieves a tight O(log n) competitive ratio against the oblivious adversary on a single machine, therefore matching the lower bound for this case.

Journal of Discrete Algorithms | 2006

Online weighted flow time and deadline scheduling

Luca Becchetti; Stefano Leonardi; Alberto Marchetti-Spaccamela; Kirk Pruhs

Abstract In this paper we study some aspects of weighted flow time. We first show that the online algorithm Highest Density First is an O(1)-speed O(1)-approximation algorithm for P | r i , pmtn | ∑ w i F i . We then consider a related Deadline Scheduling Problem that involves minimizing the weight of the jobs unfinished by some unknown deadline D on a uniprocessor. We show that any c -competitive online algorithm for weighted flow time must also be c -competitive for deadline scheduling. We then give an O(1)-competitive algorithm for deadline scheduling.

ACM Transactions on Algorithms | 2009

Latency-constrained aggregation in sensor networks

Luca Becchetti; Alberto Marchetti-Spaccamela; Andrea Vitaletti; Peter Korteweg; Martin Skutella; Leen Stougie

A sensor network consists of sensing devices which may exchange data through wireless communication; sensor networks are highly energy constrained since they are usually battery operated. Data aggregation is a possible way to save energy consumption: nodes may delay data in order to aggregate them into a single packet before forwarding them towards some central node (sink). However, many applications impose constraints on the maximum delay of data; this translates into latency constraints for data arriving at the sink. We study the problem of data aggregation to minimize maximum energy consumption under latency constraints on sensed data delivery, and we assume unique communication paths that form an intree rooted at the sink. We prove that the offline problem is strongly NP-hard and we design a 2-approximation algorithm. The latter uses a novel rounding technique. Almost all real-life sensor networks are managed online by simple distributed algorithms in the nodes. In this context we consider both the case in which sensor nodes are synchronized or not. We assess the performance of the algorithm by competitive analysis. We also provide lower bounds for the models we consider, in some cases showing optimality of the algorithms we propose. Most of our results also hold when minimizing the total energy consumption of all nodes.

european symposium on algorithms | 2004

Modeling Locality: A Probabilistic Analysis of LRU and FWF

Luca Becchetti

In this paper we explore the effects of locality on the performance of paging algorithms. Traditional competitive analysis fails to explain important properties of paging assessed by practical experience. In particular, the competitive ratios of paging algorithms that are known to be efficient in practice (e.g. LRU) are as poor as those of naive heuristics (e.g. FWF). It has been recognized that the main reason for these discrepancies lies in an unsatisfactory modelling of locality of reference exhibited by real request sequences.

foundations of computer science | 2003

Average case and smoothed competitive analysis of the multi-level feedback algorithm

Luca Becchetti; Stefano Leonardi; Alberto Marchetti-Spaccamela; Guido Schäfer; Tjark Vredeveld

In this paper, we introduce the notion of smoothed competitive analysis of online algorithms. Smoothed analysis has been proposed by Spielman and Teng (2001) to explain the behavior of algorithms that work well in practice while performing very poorly from a worst case analysis point of view. We apply this notion to analyze the Multi-Level Feedback (MLF) algorithm to minimize the total flow time on a sequence of jobs released over time when the processing time of a job is only known at time of completion. The initial processing times are integers in the range [1,2/sup K/] We use a partial bit randomization model, where the initial processing times are smoothened by changing the k least significant bits under a quite general class of probability distributions. We show that MLF admits a smoothed competitive ratio of O((2/sup k///spl sigma/)/sup 3/ + (2/sup k///spl sigma/)/sup 2/2/sup K-k/), where /spl sigma/ denotes the standard deviation of the distribution. In particular, we obtain a competitive ratio of O(2/sup K-k/) if /spl sigma/ = /spl Theta/(2/sup k/). We also prove an /spl Omega/(2/sup K-k/) lower bound for any deterministic algorithm that is run on processing times smoothened according to the partial bit randomization model. For various other smoothening models, we give a higher lower bound of /spl Omega/(2/sup K/). A direct consequence of our result is also the first average case analysis of MLF. We show a constant expected ratio of the total flow time of MLF to the optimum under several distributions including the uniform distribution.

Explore More