Rodrygo L. T. Santos | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Rodrygo L. T. Santos is active.

Explore More

Publication

Featured researches published by Rodrygo L. T. Santos.

international world wide web conferences | 2010

Exploiting query reformulations for web search result diversification

Rodrygo L. T. Santos; Craig Macdonald; Iadh Ounis

When a Web users underlying information need is not clearly specified from the initial query, an effective approach is to diversify the results retrieved for this query. In this paper, we introduce a novel probabilistic framework for Web search result diversification, which explicitly accounts for the various aspects associated to an underspecified query. In particular, we diversify a document ranking by estimating how well a given document satisfies each uncovered aspect and the extent to which different aspects are satisfied by the ranking as a whole. We thoroughly evaluate our framework in the context of the diversity task of the TREC 2009 Web track. Moreover, we exploit query reformulations provided by three major Web search engines (WSEs) as a means to uncover different query aspects. The results attest the effectiveness of our framework when compared to state-of-the-art diversification approaches in the literature. Additionally, by simulating an upper-bound query reformulation mechanism from official TREC data, we draw useful insights regarding the effectiveness of the query reformulations generated by the different WSEs in promoting diversity.

international acm sigir conference on research and development in information retrieval | 2011

Intent-aware search result diversification

Rodrygo L. T. Santos; Craig Macdonald; Iadh Ounis

Search result diversification has gained momentum as a way to tackle ambiguous queries. An effective approach to this problem is to explicitly model the possible aspects underlying a query, in order to maximise the estimated relevance of the retrieved documents with respect to the different aspects. However, such aspects themselves may represent information needs with rather distinct intents (e.g., informational or navigational). Hence, a diverse ranking could benefit from applying intent-aware retrieval models when estimating the relevance of documents to different aspects. In this paper, we propose to diversify the results retrieved for a given query, by learning the appropriateness of different retrieval models for each of the aspects underlying this query. Thorough experiments within the evaluation framework provided by the diversity task of the TREC 2009 and 2010 Web tracks show that the proposed approach can significantly improve state-of-the-art diversification approaches.

european conference on information retrieval | 2010

Explicit search result diversification through sub-queries

Rodrygo L. T. Santos; Jie Peng; Craig Macdonald; Iadh Ounis

Queries submitted to a retrieval system are often ambiguous. In such a situation, a sensible strategy is to diversify the ranking of results to be retrieved, in the hope that users will find at least one of these results to be relevant to their information need. In this paper, we introduce xQuAD, a novel framework for search result diversification that builds such a diversified ranking by explicitly accounting for the relationship between documents retrieved for the original query and the possible aspects underlying this query, in the form of sub-queries. We evaluate the effectiveness of xQuAD using a standard TREC collection. The results show that our framework markedly outperforms state-of-the-art diversification approaches under a simulated best-case scenario. Moreover, we show that its effectiveness can be further improved by estimating the relative importance of each identified sub-query. Finally, we show that our framework can still outperform the simulated best-case scenario of the state-of-the-art diversification approaches using sub-queries automatically derived from the baseline document ranking itself.

conference on information and knowledge management | 2010

Selectively diversifying web search results

Rodrygo L. T. Santos; Craig Macdonald; Iadh Ounis

Search result diversification is a natural approach for tackling ambiguous queries. Nevertheless, not all queries are equally ambiguous, and hence different queries could benefit from different diversification strategies. A more lenient or more aggressive diversification strategy is typically encoded by existing approaches as a trade-off between promoting relevance or diversity in the search results. In this paper, we propose to learn such a trade-off on a per-query basis. In particular, we examine how the need for diversification can be learnt for each query - given a diversification approach and an unseen query, we predict an effective trade-off between relevance and diversity based on similar previously seen queries. Thorough experiments using the TREC ClueWeb09 collection show that our selective approach can significantly outperform a uniform diversification for both classical and state-of-the-art diversification approaches.

Information Retrieval | 2013

The whens and hows of learning to rank for web search

Craig Macdonald; Rodrygo L. T. Santos; Iadh Ounis

Web search engines are increasingly deploying many features, combined using learning to rank techniques. However, various practical questions remain concerning the manner in which learning to rank should be deployed. For instance, a sample of documents with sufficient recall is used, such that re-ranking of the sample by the learned model brings the relevant documents to the top. However, the properties of the document sample such as when to stop ranking—i.e. its minimum effective size—remain unstudied. Similarly, effective listwise learning to rank techniques minimise a loss function corresponding to a standard information retrieval evaluation measure. However, the appropriate choice of how to calculate the loss function—i.e. the choice of the learning evaluation measure and the rank depth at which this measure should be calculated—are as yet unclear. In this paper, we address all of these issues by formulating various hypotheses and research questions, before performing exhaustive experiments using multiple learning to rank techniques and different types of information needs on the ClueWeb09 and LETOR corpora. Among many conclusions, we find, for instance, that the smallest effective sample for a given query set is dependent on the type of information need of the queries, the document representation used during sampling and the test evaluation measure. As the sample size is varied, the selected features markedly change—for instance, we find that the link analysis features are favoured for smaller document samples. Moreover, despite reflecting a more realistic user model, the recently proposed ERR measure is not as effective as the traditional NDCG as a learning loss function. Overall, our comprehensive experiments provide the first empirical derivation of best practices for learning to rank deployments.

international acm sigir conference on research and development in information retrieval | 2010

Blog track research at TREC

Craig Macdonald; Rodrygo L. T. Santos; Iadh Ounis; Ian Soboroff

The TREC Blog track aims to explore information seeking behaviour in the blogosphere, by building reusable test collections for blog-related search tasks. Since, its advent in TREC 2006, the Blog track has led to much research in this growing field, and encapsulated cross-pollination from natural language processing research. This paper recaps on the tasks addressed at the TREC Blog track thus far, covering the period 2006 - 2009. In particular, we describe the used corpora, the tasks addressed within the track, and the resulting published research.

european conference on information retrieval | 2009

Integrating Proximity to Subjective Sentences for Blog Opinion Retrieval

Rodrygo L. T. Santos; Ben He; Craig Macdonald; Iadh Ounis

Opinion finding is a challenging retrieval task, where it has been shown that it is especially difficult to improve over a strongly performing topic-relevance baseline. In this paper, we propose a novel approach for opinion finding, which takes into account the proximity of query terms to subjective sentences in a document. We adapt two state-of-the-art opinion detection techniques to identify subjective sentences from the retrieved documents. Our first technique uses the OpinionFinder toolkit to classify the subjectiveness of sentences in a document. Our second technique uses an automatically generated dictionary of subjective terms derived from the document collection itself to identify the most subjective sentences in a document. We extend the Divergence From Randomness (DFR) proximity model to integrate the proximity of query terms to the subjective sentences identified by either of the proposed techniques. We evaluate these techniques on five different strong baselines across two different query datasets from the TREC Blog track. We show that we can significantly improve over the baselines and that, in several settings, our proposed techniques can at least match the top performing systems at the TREC Blog track.

Information Retrieval | 2012

On the role of novelty for search result diversification

Rodrygo L. T. Santos; Craig Macdonald; Iadh Ounis

Re-ranking the search results in order to promote novel ones has traditionally been regarded as an intuitive diversification strategy. In this paper, we challenge this common intuition and thoroughly investigate the actual role of novelty for search result diversification, based upon the framework provided by the diversity task of the TREC 2009 and 2010 Web tracks. Our results show that existing diversification approaches based solely on novelty cannot consistently improve over a standard, non-diversified baseline ranking. Moreover, when deployed as an additional component by the current state-of-the-art diversification approaches, our results show that novelty does not bring significant improvements, while adding considerable efficiency overheads. Finally, through a comprehensive analysis with simulated rankings of various quality, we demonstrate that, although inherently limited by the performance of the initial ranking, novelty plays a role at breaking the tie between similarly diverse results.

document engineering | 2008

Keeping a digital library clean: new solutions to old problems

Alberto H. F. Laender; Marcos André Gonçalves; Ricardo G. Cota; Anderson A. Ferreira; Rodrygo L. T. Santos; Allan J. C. Silva

Digital Libraries are complex information systems that involve rich sets of digital objects and their respective metadata, along with multiple organizational structures and services (e.g., searching, browsing, and personalization), and are normally built having a target community of users with specific interests. Central to the success of this type of system is the quality of their services and content. In the context of DLs of scientific literature, among the many problems faced to sustain their information quality, two specific ones, related to information consistency, have taken a lot of attention from the research community: name disambiguation and lack of information to access the full-text of cataloged documents. In this paper, we examine these two problems and describe the solutions we have proposed to solve them.

Information Retrieval | 2013

Learning to rank query suggestions for adhoc and diversity search

Rodrygo L. T. Santos; Craig Macdonald; Iadh Ounis

Query suggestions have become pervasive in modern web search, as a mechanism to guide users towards a better representation of their information need. In this article, we propose a ranking approach for producing effective query suggestions. In particular, we devise a structured representation of candidate suggestions mined from a query log that leverages evidence from other queries with a common session or a common click. This enriched representation not only helps overcome data sparsity for long-tail queries, but also leads to multiple ranking criteria, which we integrate as features for learning to rank query suggestions. To validate our approach, we build upon existing efforts for web search evaluation and propose a novel framework for the quantitative assessment of query suggestion effectiveness. Thorough experiments using publicly available data from the TREC Web track show that our approach provides effective suggestions for adhoc and diversity search.

Explore More