Raffaele Perego

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Raffaele Perego is active.

Explore More

Publication

Featured researches published by Raffaele Perego.

ACM Transactions on Information Systems | 2006

Boosting the performance of Web search engines: Caching and prefetching query results by exploiting historical usage data

Tiziano Fagni; Raffaele Perego; Fabrizio Silvestri; Salvatore Orlando

This article discusses efficiency and effectiveness issues in caching the results of queries submitted to a Web search engine (WSE). We propose SDC (Static Dynamic Cache), a new caching strategy aimed to efficiently exploit the temporal and spatial locality present in the stream of processed queries. SDC extracts from historical usage data the results of the most frequently submitted queries and stores them in a static, read-only portion of the cache. The remaining entries of the cache are dynamically managed according to a given replacement policy and are used for those queries that cannot be satisfied by the static portion. Moreover, we improve the hit ratio of SDC by using an adaptive prefetching strategy, which anticipates future requests by introducing a limited overhead over the back-end WSE. We experimentally demonstrate the superiority of SDC over purely static and dynamic policies by measuring the hit ratio achieved on three large query logs by varying the cache parameters and the replacement policy used for managing the dynamic part of the cache. Finally, we deploy and measure the throughput achieved by a concurrent version of our caching system. Our tests show how the SDC cache can be efficiently exploited by many threads that concurrently serve the queries of different users.

IEEE Transactions on Knowledge and Data Engineering | 2006

Fast and memory efficient mining of frequent closed itemsets

Claudio Lucchese; Salvatore Orlando; Raffaele Perego

This paper presents a new scalable algorithm for discovering closed frequent itemsets, a lossless and condensed representation of all the frequent itemsets that can be mined from a transactional database. Our algorithm exploits a divide-and-conquer approach and a bitwise vertical representation of the database and adopts a particular visit and partitioning strategy of the search space based on an original theoretical framework, which formalizes the problem of closed itemsets mining in detail. The algorithm adopts several optimizations aimed to save both space and time in computing itemset closures and their supports. In particular, since one of the main problems in this type of algorithms is the multiple generation of the same closed itemset, we propose a new effective and memory-efficient pruning technique, which, unlike other previous proposals, does not require the whole set of closed patterns mined so far to be kept in the main memory. This technique also permits each visited partition of the search space to be mined independently in any order and, thus, also in parallel. The tests conducted on many publicly available data sets show that our algorithm is scalable and outperforms other state-of-the-art algorithms like CLOSET+ and FP-CLOSE, in some cases by more than one order of magnitude. More importantly, the performance improvements become more and more significant as the support threshold is decreased.

IEEE Transactions on Evolutionary Computation | 2001

A hybrid heuristic for the traveling salesman problem

Ranieri Baraglia; José Ignacio Hidalgo; Raffaele Perego

The combination of genetic and local search heuristics has been shown to be an effective approach to solving the traveling salesman problem (TSP). This paper describes a new hybrid algorithm that exploits a compact genetic algorithm in order to generate high-quality tours, which are then refined by means of the Lin-Kernighan (LK) local search. The local optima found by the LK local search are in turn exploited by the evolutionary part of the algorithm in order to improve the quality of its simulated population. The results of several experiments conducted on different TSP instances with up to 13,509 cities show the efficacy of the symbiosis between the two heuristics.

international conference on data mining | 2002

Adaptive and resource-aware mining of frequent sets

Salvatore Orlando; Paolo Palmerini; Raffaele Perego; Fabrizio Silvestri

The performance of an algorithm that mines frequent sets from transactional databases may severely depend on the specific features of the data being analyzed. Moreover, some architectural characteristics of the computational platform used - e.g. the available main memory - can dramatically change its runtime behavior. In this paper we present DCI (Direct Count & Intersect), an efficient algorithm for discovering frequent sets from large databases. Due to the multiple heuristics strategies adopted, DCI can adapt its behavior not only to the features of the specific computing platform, but also to the features of the dataset being mined, so that it results very effective in mining both short and long patterns from sparse and dense datasets. Finally we also discuss the parallelization strategies adopted in the design of ParDCI, a distributed and multi-threaded implementation of DCI.

web search and data mining | 2011

Identifying task-based sessions in search engine query logs

Claudio Lucchese; Salvatore Orlando; Raffaele Perego; Fabrizio Silvestri; Gabriele Tolomei

The research challenge addressed in this paper is to devise effective techniques for identifying task-based sessions, i.e. sets of possibly non contiguous queries issued by the user of a Web Search Engine for carrying out a given task. In order to evaluate and compare different approaches, we built, by means of a manual labeling process, a ground-truth where the queries of a given query log have been grouped in tasks. Our analysis of this ground-truth shows that users tend to perform more than one task at the same time, since about 75% of the submitted queries involve a multi-tasking activity. We formally define the Task-based Session Discovery Problem (TSDP) as the problem of best approximating the manually annotated tasks, and we propose several variants of well known clustering algorithms, as well as a novel efficient heuristic algorithm, specifically tuned for solving the TSDP. These algorithms also exploit the collaborative knowledge collected by Wiktionary and Wikipedia for detecting query pairs that are not similar from a lexical content point of view, but actually semantically related. The proposed algorithms have been evaluated on the above ground-truth, and are shown to perform better than state-of-the-art approaches, because they effectively take into account the multi-tasking behavior of users.

data warehousing and knowledge discovery | 2001

Enhancing the Apriori Algorithm for Frequent Set Counting

Salvatore Orlando; Paolo Palmerini; Raffaele Perego

In this paper we propose DCP, a new algorithm for solving the Frequent Set Counting problem, which enhances Apriori. Our goal was to optimize the initial iterations of Apriori, i.e. the most time consuming ones when datasets characterized by short or medium length frequent patterns are considered. The main improvements regard the use of an innovative method for storing candidate set of items and counting their support, and the exploitation of effective pruning techniques which significantly reduce the size of the dataset as execution progresses.

eurographics | 1993

Parallel 3D Delaunay Triangulation

Paolo Cignoni; Claudio Montani; Raffaele Perego; Roberto Scopigno

The paper deals with the parallelization of Delaunay triangulation algorithms, giving more emphasis to pratical issues and implementation than to theoretical complexity. Two parallel implementations are presented. The first one is built on De Wall, an Ed triangulator based on an original interpretation of the divide & conquer paradigm. The second is based on an incremental construction algorithm. The parallelization strategies are presented and evaluated. The target parallel machine is a distributed computing environment, composed of coarse grain processing nodes. Results of first implementations are reported and compared with the performance of the serial versions running on a Unix workstation.

conference on information and knowledge management | 2013

Learning relatedness measures for entity linking

Diego Ceccarelli; Claudio Lucchese; Salvatore Orlando; Raffaele Perego; Salvatore Trani

Entity Linking is the task of detecting, in text documents, relevant mentions to entities of a given knowledge base. To this end, entity-linking algorithms use several signals and features extracted from the input text or from the knowledge base. The most important of such features is entity relatedness. Indeed, we argue that these algorithms benefit from maximizing the relatedness among the relevant entities selected for annotation, since this minimizes errors in disambiguating entity-linking. The definition of an effective relatedness function is thus a crucial point in any entity-linking algorithm. In this paper we address the problem of learning high quality entity relatedness functions. First, we formalize the problem of learning entity relatedness as a learning-to-rank problem. We propose a methodology to create reference datasets on the basis of manually annotated data. Finally, we show that our machine-learned entity relatedness function performs better than other relatedness functions previously proposed, and, more importantly, improves the overall performance of different state-of-the-art entity-linking algorithms.

Proceedings of the 1992 workshop on Volume visualization | 1992

Parallel volume visualization on a hypercube architecture

Claudio Montani; Raffaele Perego; Roberto Scopigno

A parallel solution to the visualiaation of high resolution uolume data is presented. Baaed on the ray tracing (RT) uiaualization technique, the system works on a distributed memory MIMD architecture. A hybrid strategy to my tracing parallelitation is applied, using ray-dataflow within an image partition approach. This strategy allows the flexible and efiectiue management of huge dataset on architectures with limited local memory. The dataaet is distributed over the nodes using a slice-partitioning technique. The simple data partition chosen implies a atraighforward communications pattern of the visualization processes and this improves both software design and eficiency, while providing deadlock prevention. The partitioning technique used and the network interconnection topology allow for the efjicient implementation of a statical load balancing technique through pre-rendering of a low resolution image. Details related to the practical issues involved in the parallelitation of volumetric RT are discussed, with particular reference to deadlock and termination issues.

international acm sigir conference on research and development in information retrieval | 2004

Assigning identifiers to documents to enhance the clustering property of fulltext indexes

Fabrizio Silvestri; Salvatore Orlando; Raffaele Perego

Web Search Engines provide a large-scale text document retrieval service by processing huge Inverted File indexes. Inverted File indexes allow fast query resolution and good memory utilization since their d-gaps representation can be effectively and efficiently compressed by using variable length encoding methods. This paper proposes and evaluates some algorithms aimed to find an assignment of the document identifiers which minimizes the average values of d-gaps, thus enhancing the effectiveness of traditional compression methods. We ran several tests over the Google contest collection in order to validate the techniques proposed. The experiments demonstrated the scalability and effectiveness of our algorithms. Using the proposed algorithms, we were able to sensibly improve (up to 20.81%) the compression ratios of several encoding schemes.

Explore More

Collaboration

Dive into the Raffaele Perego's collaboration.

Explore More

Istituto di Scienza e Tecnologie dell'Informazione

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot

Dive into the research topics where Raffaele Perego is active.

Publication

Featured researches published by Raffaele Perego.

Boosting the performance of Web search engines: Caching and prefetching query results by exploiting historical usage data

Fast and memory efficient mining of frequent closed itemsets

A hybrid heuristic for the traveling salesman problem

Adaptive and resource-aware mining of frequent sets

Identifying task-based sessions in search engine query logs

Enhancing the Apriori Algorithm for Frequent Set Counting

Parallel 3D Delaunay Triangulation

Learning relatedness measures for entity linking

Parallel volume visualization on a hypercube architecture

Assigning identifiers to documents to enhance the clustering property of fulltext indexes

Collaboration

Dive into the Raffaele Perego's collaboration.