Álvaro Barreiro | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Álvaro Barreiro is active.

Explore More

Publication

Featured researches published by Álvaro Barreiro.

ACM Transactions on Information Systems | 2010

Probabilistic static pruning of inverted files

Roi Blanco; Álvaro Barreiro

Information retrieval (IR) systems typically compress their indexes in order to increase their efficiency. Static pruning is a form of lossy data compression: it removes from the index, data that is estimated to be the least important to retrieval performance, according to some criterion. Generally, pruning criteria are derived from term weighting functions, which assign weights to terms according to their contribution to a documents contents. Usually, document-term occurrences that are assigned a low weight are ruled out from the index. The main assumption is that those entries contribute little to the document content. We present a novel pruning technique that is based on a probabilistic model of IR. We employ the Probability Ranking Principle as a decision criterion over which posting list entries are to be pruned. The proposed approach requires the estimation of three probabilities, combining them in such a way that we gather all the necessary information to apply the aforementioned criterion. We evaluate our proposed pruning technique on five TREC collections and various retrieval tasks, and show that in almost every situation it outperforms the state of the art in index pruning. The main contribution of this work is proposing a pruning technique that stems directly from the same source as probabilistic retrieval models, and hence is independent of the final model used for retrieval.

Information Processing and Management | 2013

Relevance-based language modelling for recommender systems

Javier Parapar; Alejandro Bellogín; Pablo Castells; Álvaro Barreiro

Relevance-Based Language Models, commonly known as Relevance Models, are successful approaches to explicitly introduce the concept of relevance in the statistical Language Modelling framework of Information Retrieval. These models achieve state-of-the-art retrieval performance in the pseudo relevance feedback task. On the other hand, the field of recommender systems is a fertile research area where users are provided with personalised recommendations in several applications. In this paper, we propose an adaptation of the Relevance Modelling framework to effectively suggest recommendations to a user. We also propose a probabilistic clustering technique to perform the neighbour selection process as a way to achieve a better approximation of the set of relevant items in the pseudo relevance feedback process. These techniques, although well known in the Information Retrieval field, have not been applied yet to recommender systems, and, as the empirical evaluation results show, both proposals outperform individually several baseline methods. Furthermore, by combining both approaches even larger effectiveness improvements are achieved.

international acm sigir conference on research and development in information retrieval | 1999

Using a belief revision operator for document ranking in extended Boolean models

David E. Losada; Álvaro Barreiro

This paper claims that Belief Revision can be seen as a theoretical framework for document ranking in Extended Boolean Models. For a model of Information Retrieval based on propositional logic, we propose a similarity measure which is equivalent to a P-Norm case. Therefore it shares the PNorm good properties and behaviour. Besides, it is theoretically ensured that this measure follows the notion of proximity between the documents and the query. The logical model can naturally deal with incomplete descriptions of documents and the similarity values are also obtained for this case.

The Computer Journal | 2001

A logical model of information retrieval based on propositional logic and belief revision

David E. Losada; Álvaro Barreiro

En esta tesis se propone un modelo logico para modelar el problema de Recuperacion de Informacion RI, A partir un formalismo basico se han formalizado varias tareas clasicas de RI, estudiado sus costes comptuacionales y propuesto implementaciones eficientes. En todos los pasos se ha enfatizado al ventajas del uso de una aproximacion logica. La flexibilidad representacional de la logica ha permitido la creacion de un marco homogeneo donde se modelan distintos elementos involucrados en el problema de RI. Primeramente se ha modelado el problema basico de RI dentro de un formalismo logico. Seguidamente se ha definido una implementacion eficiente para el modelo propuesto. Esta implementacion ha permitido la evaluacion del modelo con colecciones de prueba estandar en RI. Estos experimentos permiten valorar cuantitativamene el rendimiento del modelo teorico propuesto. A continuacion el modelo se ha extendido para manejar situaciones de recuperacion y para modelar el proceso de relevance feedback. Esto permite mostrar que un marco formal puede manejar extensiones de forma homogenea. Por ultimo, las nociones de similaridad entre terminos y frecuencia inversa en documentos han sido incluidas en el modelo. Estas ultimas extensiones han sido acompanadas de sus correspondientes tests de evaluacion. Las principales aportaciones de esta investigacion son las siguientes. Primero, el modelo teorico propuesto ha sido implementado y evaluado, asegurando su aplicabilidad real. De hecho, muy pocas aproximaciones logicas a RI han sido implementadas y evaluadas. El modelo basico puede representar vectores clasicos con pesos binarios y, ademas, nuestra medida de relevancia se corresponde con la medida clasica del producto interno consulta-documento. De esta forma, hemos formalizado tareas clasicas como casos dentro del modelo. Sin embargo, el modelo propuesto es inherentemente mas expresivo que los formalismos clasicos.

european conference on information retrieval | 2007

Static pruning of terms in inverted files

Roi Blanco; Álvaro Barreiro

This paper addresses the problem of identifying collection dependent stop-words in order to reduce the size of inverted files. We present four methods to automatically recognise stop-words, analyse the tradeoff between efficiency and effectiveness, and compare them with a previous pruning approach. The experiments allow us to conclude that in some situations stop-words pruning is competitive with respect to other inverted file reduction techniques.

european conference on information retrieval | 2006

TSP and cluster-based solutions to the reassignment of document identifiers

Roi Blanco; Álvaro Barreiro

Recent studies demonstrated that it is possible to reduce Inverted Files (IF) sizes by reassigning the document identifiers of the original collection, as this lowers the distance between the positions of documents related to a single term. Variable-bit encoding schemes can exploit the average gap reduction and decrease the total amount of bits per document pointer. This paper presents an efficient solution to the reassignment problem, which consists in reducing the input data dimensionality using a SVD transformation, as well as considering it a Travelling Salesman Problem (TSP). We also present some efficient solutions based on clustering. Finally, we combine both the TSP and the clustering strategies for reordering the document identifiers. We present experimental tests and performance results in two text TREC collections, obtaining good compression ratios with low running times, and advance the possibility of obtaining scalable solutions for web collections based on the techniques presented here.

european conference on information retrieval | 2005

Document identifier reassignment through dimensionality reduction

Roi Blanco; Álvaro Barreiro

Most modern retrieval systems use compressed Inverted Files (IF) for indexing. Recent works demonstrated that it is possible to reduce IF sizes by reassigning the document identifiers of the original collection, as it lowers the average distance between documents related to a single term. Variable-bit encoding schemes can exploit the average gap reduction and decrease the total amount of bits per document pointer. However, approximations developed so far requires great amounts of time or use an uncontrolled memory size. This paper presents an efficient solution to the reassignment problem consisting in reducing the input data dimensionality using a SVD transformation. We tested this approximation with the Greedy-NN TSP algorithm and one more efficient variant based on dividing the original problem in sub-problems. We present experimental tests and performance results in two TREC collections, obtaining good compression ratios with low running times. We also show experimental results about the tradeoff between dimensionality reduction and compression, and time performance.

european conference on information retrieval | 2008

Probabilistic document length priors for language models

Roi Blanco; Álvaro Barreiro

This paper addresses the issue of devising a new document prior for the language modeling (LM) approach for Information Retrieval. The prior is based on term statistics, derived in a probabilistic fashion and portrays a novel way of considering document length. Furthermore, we developed a new way of combining document length priors with the query likelihood estimation based on the risk of accepting the latter as a score. This prior has been combined with a document retrieval language model that uses Jelinek-Mercer (JM), a smoothing technique which does not take into account document length. The combination of the prior boosts the retrieval performance, so that it outperforms a LM with a document length dependent smoothing component (Dirichlet prior) and other state of the art high-performing scoring function (BM25). Improvements are significant, robust across different collections and query sizes.

international acm sigir conference on research and development in information retrieval | 2007

Boosting static pruning of inverted files

Roi Blanco; Álvaro Barreiro

This paper revisits the static term-based pruning technique presented in Carmel et al., SIGIR 2001 for ad-hoc retrieval, addressing different issues concerning its algorithmic design not yet taken into account. Although the original technique is able to retain precision when a considerable part of the inverted file is removed, we show that it is possible to improve precision in some scenarios if some key design features are properly selected.

formal methods | 2003

Embedding term similarity and inverse document frequency into a logical model of information retrieval

David E. Losada; Álvaro Barreiro

We propose a novel approach to incorporate term similarity and inverse document frequency into a logical model of information retrieval. The ability of the logic to handle expressive representations along with the use of such classical notions are promising characteristics for IR systems. The approach proposed here has been efficiently implemented and experiments against test collections are presented.

Explore More