Rossano Venturini | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Rossano Venturini is active.

Explore More

Publication

Featured researches published by Rossano Venturini.

ACM Journal of Experimental Algorithms | 2009

Compressed text indexes: From theory to practice

Paolo Ferragina; Rodrigo González; Gonzalo Navarro; Rossano Venturini

A compressed full-text self-index represents a text in a compressed form and still answers queries efficiently. This represents a significant advancement over the (full-)text indexing techniques of the previous decade, whose indexes required several times the size of the text. Although it is relatively new, this algorithmic technology has matured up to a point where theoretical research is giving way to practical developments. Nonetheless this requires significant programming skills, a deep engineering effort, and a strong algorithmic background to dig into the research results. To date only isolated implementations and focused comparisons of compressed indexes have been reported, and they missed a common API, which prevented their re-use or deployment within other applications. The goal of this article is to fill this gap. First, we present the existing implementations of compressed indexes from a practitioners point of view. Second, we introduce the Pizza&Chili site, which offers tuned implementations and a standardized API for the most successful compressed full-text self-indexes, together with effective test-beds and scripts for their automatic validation and test. Third, we show the results of our extensive experiments on these codes with the aim of demonstrating the practical relevance of this novel algorithmic technology.

international acm sigir conference on research and development in information retrieval | 2014

Partitioned Elias-Fano indexes

Giuseppe Ottaviano; Rossano Venturini

The Elias-Fano representation of monotone sequences has been recently applied to the compression of inverted indexes, showing excellent query performance thanks to its efficient random access and search operations. While its space occupancy is competitive with some state-of-the-art methods such as gamma-delta-Golomb codes and PForDelta, it fails to exploit the local clustering that inverted lists usually exhibit, namely the presence of long subsequences of close identifiers. In this paper we describe a new representation based on partitioning the list into chunks and encoding both the chunks and their endpoints with Elias-Fano, hence forming a two-level data structure. This partitioning enables the encoding to better adapt to the local statistics of the chunk, thus exploiting clustering and improving compression. We present two partition strategies, respectively with fixed and variable-length chunks. For the latter case we introduce a linear-time optimization algorithm which identifies the minimum-space partition up to an arbitrarily small approximation factor. We show that our partitioned Elias-Fano indexes offer significantly better compression than plain Elias-Fano, while preserving their query time efficiency. Furthermore, compared with other state-of-the-art compressed encodings, our indexes exhibit the best compression ratio/query time trade-off.

conference on information and knowledge management | 2010

VSEncoding: efficient coding and fast decoding of integer lists via dynamic programming

Fabrizio Silvestri; Rossano Venturini

Encoding lists of integers efficiently is important for many applications in different fields. Adjacency lists of large graphs are usually encoded to save space and to improve decoding speed. Inverted indexes of Information Retrieval systems keep the lists of postings compressed in order to exploit the memory hierarchy. Secondary indexes of DBMSs are stored similarly to inverted indexes in IR systems. In this paper we propose Vector of Splits Encoding (VSEncoding), a novel class of encoders that work by optimally partitioning a list of integers into blocks which are efficiently compressed by using simple encoders. In previous works heuristics were applied during the partitioning step. Instead, we find the optimal solution by using a dynamic programming approach. Experiments show that our class of encoders outperform all the existing methods in literature by more than 10% (with the exception of Binary Interpolative Coding with which they, roughly, tie) still retaining a very fast decompression algorithm.

ACM Transactions on Algorithms | 2010

The compressed permuterm index

Paolo Ferragina; Rossano Venturini

The Permuterm index [Garfield 1976] is a time-efficient and elegant solution to the string dictionary problem in which pattern queries may possibly include one wild-card symbol (called Tolerant Retrieval problem). Unfortunately the Permuterm index is space inefficient because it quadruples the dictionary size. In this article we propose the Compressed Permuterm Index which solves the Tolerant Retrieval problem in time proportional to the length of the searched pattern, and space close to the kth order empirical entropy of the indexed dictionary. We also design a dynamic version of this index that allows to efficiently manage insertion in, and deletion from, the dictionary of individual strings. The result is based on a simple variant of the Burrows-Wheeler Transform, defined on a dictionary of strings of variable length, that allows to efficiently solve the Tolerant Retrieval problem via known (dynamic) compressed indexes [Navarro and Mäkinen 2007]. We will complement our theoretical study with a significant set of experiments that show that the Compressed Permuterm Index supports fast queries within a space occupancy that is close to the one achievable by compressing the string dictionary via gzip or bzip. This improves known approaches based on Front-Coding [Witten et al. 1999] by more than 50% in absolute space occupancy, still guaranteeing comparable query time.

european conference on information retrieval | 2012

How random walks can help tourism

Claudio Lucchese; Raffaele Perego; Fabrizio Silvestri; Hossein Vahabi; Rossano Venturini

On-line photo sharing services allow users to share their touristic experiences. Tourists can publish photos of interesting locations or monuments visited, and they can also share comments, annotations, and even the GPS traces of their visits. By analyzing such data, it is possible to turn colorful photos into metadata-rich trajectories through the points of interest present in a city. In this paper we propose a novel algorithm for the interactive generation of personalized recommendations of touristic places of interest based on the knowledge mined from photo albums and Wikipedia. The distinguishing features of our approach are multiple. First, the underlying recommendation model is built fully automatically in an unsupervised way and it can be easily extended with heterogeneous sources of information. Moreover, recommendations are personalized according to the places previously visited by the user. Finally, such personalized recommendations can be generated very efficiently even on-line from a mobile device.

international acm sigir conference on research and development in information retrieval | 2015

QuickScorer: A Fast Algorithm to Rank Documents with Additive Ensembles of Regression Trees

Claudio Lucchese; Franco Maria Nardini; Salvatore Orlando; Raffaele Perego; Nicola Tonellotto; Rossano Venturini

Learning-to-Rank models based on additive ensembles of regression trees have proven to be very effective for ranking query results returned by Web search engines, a scenario where quality and efficiency requirements are very demanding. Unfortunately, the computational cost of these ranking models is high. Thus, several works already proposed solutions aiming at improving the efficiency of the scoring process by dealing with features and peculiarities of modern CPUs and memory hierarchies. In this paper, we present QuickScorer, a new algorithm that adopts a novel bitvector representation of the tree-based ranking model, and performs an interleaved traversal of the ensemble by means of simple logical bitwise operations. The performance of the proposed algorithm are unprecedented, due to its cache-aware approach, both in terms of data layout and access patterns, and to a control flow that entails very low branch mis-prediction rates. The experiments on real Learning-to-Rank datasets show that QuickScorer is able to achieve speedups over the best state-of-the-art baseline ranging from 2x to 6.5x.

international acm sigir conference on research and development in information retrieval | 2012

Efficient query recommendations in the long tail via center-piece subgraphs

Francesco Bonchi; Raffaele Perego; Fabrizio Silvestri; Hossein Vahabi; Rossano Venturini

We present a recommendation method based on the well-known concept of center-piece subgraph, that allows for the time/space efficient generation of suggestions also for rare, i.e., long-tail queries. Our method is scalable with respect to both the size of datasets from which the model is computed and the heavy workloads that current web search engines have to deal with. Basically, we relate terms contained into queries with highly correlated queries in a query-flow graph. This enables a novel recommendation generation method able to produce recommendations for approximately 99% of the workload of a real-world search engine. The method is based on a graph having term nodes, query nodes, and two kinds of connections: term-query and query-query. The first connects a term to the queries in which it is contained, the second connects two query nodes if the likelihood that a user submits the second query after having issued the first one is sufficiently high. On such large graph we need to compute the center-piece subgraph induced by terms contained into queries. In order to reduce the cost of the above computation, we introduce a novel and efficient method based on an inverted index representation of the model. We experiment our solution on two real-world query logs and we show that its effectiveness is comparable (and in some case better) than state-of-the-art methods for head-queries. More importantly, the quality of the recommendations generated remains very high also for long-tail queries, where other methods fail even to produce any suggestion. Finally, we extensively investigate scalability and efficiency issues and we show the viability of our method in real world search engines.

international acm sigir conference on research and development in information retrieval | 2007

Compressed permuterm index

Paolo Ferragina; Rossano Venturini

Recently [Manning et al., 2007] resorted the Permuterm indexof Garfield (1976) as a time-efficient and elegant solution to the string dictionary problem in which pattern queries may possibly include one wild-card symbol (called, Tolerant Retrieval problem). Unfortunately the Permuterm index is space inefficient because its quadruples the dictionary size. In this paper we propose the Compressed Permuterm Index which solves the Tolerant Retrieval problem in optimal query time, i.e. time proportional to the length of the searched pattern, and space close to the k-th order empirical entropy of the indexed dictionary. Our index can be used to solve also more sophisticated queries which involve several wild-card symbols, or require to prefix-match multiple fields in a database of records.The result is based on an elegant variant of the Burrows-Wheeler Transform defined on a dictionary of strings of variable length, which allows to easily adapt known compressed indexes [Makinen-Navarro, 2007] to solve the Tolerant Retrieval problem. Experiments show that our index supports fast queries within a space occupancy that is close to the one achievable by compressing the string dictionary via gzip, bzip or ppmdi. This improves known approaches based on front-coding by more than 50% in absolute space occupancy, still guaranteeing comparable query time.

web search and data mining | 2015

Optimal Space-time Tradeoffs for Inverted Indexes

Giuseppe Ottaviano; Nicola Tonellotto; Rossano Venturini

Inverted indexes are usually represented by dividing posting lists into constant-sized blocks and representing them with an encoder for sequences of integers. Different encoders yield a different point in the space-time trade-off curve, with the fastest being several times larger than the most space-efficient. An important design decision for an index is thus the choice of the fastest encoding method such that the index fits in the available memory. However, a better usage of the space budget could be obtained by using faster encoders for frequently accessed blocks, and more space-efficient ones those that are rarely accessed. To perform this choice optimally, we introduce a linear time algorithm that, given a query distribution and a set of encoders, selects the best encoder for each index block to obtain the lowest expected query processing time respecting a given space constraint. To demonstrate the effectiveness of this approach we perform an extensive experimental analysis, which shows that our algorithm produces indexes which are significantly faster than single-encoder indexes under several query processing strategies, while respecting the same space constraints.

ACM Transactions on Information Systems | 2016

Fast Ranking with Additive Ensembles of Oblivious and Non-Oblivious Regression Trees

Domenico Dato; Claudio Lucchese; Franco Maria Nardini; Salvatore Orlando; Raffaele Perego; Nicola Tonellotto; Rossano Venturini

Learning-to-Rank models based on additive ensembles of regression trees have been proven to be very effective for scoring query results returned by large-scale Web search engines. Unfortunately, the computational cost of scoring thousands of candidate documents by traversing large ensembles of trees is high. Thus, several works have investigated solutions aimed at improving the efficiency of document scoring by exploiting advanced features of modern CPUs and memory hierarchies. In this article, we present QuickScorer, a new algorithm that adopts a novel cache-efficient representation of a given tree ensemble, performs an interleaved traversal by means of fast bitwise operations, and supports ensembles of oblivious trees. An extensive and detailed test assessment is conducted on two standard Learning-to-Rank datasets and on a novel very large dataset we made publicly available for conducting significant efficiency tests. The experiments show unprecedented speedups over the best state-of-the-art baselines ranging from 1.9 × to 6.6 × . The analysis of low-level profiling traces shows that QuickScorer efficiency is due to its cache-aware approach in terms of both data layout and access patterns and to a control flow that entails very low branch mis-prediction rates.

Explore More