Ronald T. Fernández
University of Santiago de Compostela
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ronald T. Fernández.
Information Retrieval | 2011
Ronald T. Fernández; David E. Losada; Leif Azzopardi
Employing effective methods of sentence retrieval is essential for many tasks in Information Retrieval, such as summarization, novelty detection and question answering. The best performing sentence retrieval techniques attempt to perform matching directly between the sentences and the query. However, in this paper, we posit that the local context of a sentence can provide crucial additional evidence to further improve sentence retrieval. Using a Language Modeling Framework, we propose a novel reformulation of the sentence retrieval problem that extends previous approaches so that the local context is seamlessly incorporated within the retrieval models. In a series of comprehensive experiments, we show that localized smoothing and the prior importance of a sentence can improve retrieval effectiveness. The proposed models significantly and substantially outperform the state of the art and other competitive sentence retrieval baselines on recall-oriented measures, while remaining competitive on precision-oriented measures. This research demonstrates that local context plays an important role in estimating the relevance of a sentence, and that existing sentence retrieval language models can be extended to utilize this evidence effectively.
string processing and information retrieval | 2007
David E. Losada; Ronald T. Fernández
In this paper we propose a novel sentence retrieval method based on extracting highly frequent terms from top retrieved documents. We compare it against state of the art sentence retrieval techniques, including those based on pseudo-relevant feedback, showing that the approach is robust and competitive. Our results reinforce the idea that top retrieved data is a valuable source to enhance retrieval systems. This is especially true for short queries because there are usually few query-sentence matching terms. Moreover, the approach is particularly promising for weak queries. We demonstrate that this novel method is able to improve significantly the precision at top ranks when handling poorly specified information needs.
international acm sigir conference on research and development in information retrieval | 2011
David Elsweiler; David E. Losada; José Carlos Toucedo; Ronald T. Fernández
In this paper we perform a lab-based user study (n=21) of email re-finding behaviour, examining how the characteristics of submitted queries change in different situations. A number of logistic regression models are developed on the query data to explore the relationship between user- and contextual- variables and query characteristics including length, field submitted to and use of named entities. We reveal several interesting trends and use the findings to seed a simulated evaluation of various retrieval models. Not only is this an enhancement of existing evaluation methods for Personal Search, but the results show that different models are more effective in different situations, which has implications both for the design of email search tools and for the way algorithms for Personal Search are evaluated.
conference on information and knowledge management | 2009
Ronald T. Fernández; David E. Losada
Opinion mining has become recently a major research topic. A wide range of techniques have been proposed to enable opinion-oriented information seeking systems. However, little is known about the ability of opinion-related information to improve regular retrieval tasks. Our hypothesis is that standard retrieval methods might benefit from the inclusion of opinion-based features. A sentence retrieval scenario is a natural choice to evaluate this claim. We propose here a formal method to incorporate some opinion-based features of the sentences as query-independent evidence. We show that this incorporation leads to retrieval methods whose performance is significantly better than the the performance of state of the art sentence retrieval models.
international acm sigir conference on research and development in information retrieval | 2007
Ronald T. Fernández; David E. Losada
The aim of this work is to determine the utility of Local Context Analysis (LCA)[5] for retrieval of relevant and novel sentences. LCA has been successful in different areas and we check here whether this method is also useful to drive the selection of novel material. We adopt the Novelty task as defined in the TREC conference [2, 4, 3]. Giving a set of documents associated to a topic, the task consists of finding the relevant and novel sentences. This problem is interesting for many areas, such as text summarization, web information access, question answering, etc. Some researchers have proposed that the estimation of novelty for a given sentence should be based on the set of seen sentences that share common meanings [6]. In this way, the degree of redundancy of a sentence si is not influenced by past sentences that are totally unrelated to si. The intuition is that novelty estimation might be more robust if focused on this set of terms. In our work we pursue a similar idea because we apply LCA to focus the estimation of novelty on query-related terms.
Information Processing and Management | 2012
Ronald T. Fernández; David E. Losada
In this paper we propose an effective sentence retrieval method that consists of incorporating query-independent features into standard sentence retrieval models. To meet this aim, we apply a formal methodology and consider different query-independent features. In particular, we show that opinion-based features are promising. Opinion mining is an increasingly important research topic but little is known about how to improve retrieval algorithms with opinion-based components. In this respect, we consider here different kinds of opinion-based features to act as query-independent evidence and study whether this incorporation improves retrieval performance. On the other hand, information needs are usually related to people, locations or organizations. We hypothesize here that using these named entities as query-independent features may also improve the sentence relevance estimation. Finally, the length of the retrieval unit has been shown to be an important component in different retrieval scenarios. We therefore include length-based features in our study. Our evaluation demonstrates that, either in isolation or in combination, these query-independent features help to improve substantially the performance of state-of-the-art sentence retrieval methods.
international acm sigir conference on research and development in information retrieval | 2010
Ronald T. Fernández; Javier Parapar; David E. Losada; Álvaro Barreiro
Novelty detection is a difficult task, particularly at sentence level. Most of the approaches proposed in the past consist of re-ordering all sentences following their novelty scores. However, this re-ordering has usually little value. In fact, a naive baseline with no novelty detection capabilities yields often better performance than any state-of-the-art novelty detection mechanism. We argue here that this is because current methods initiate too early the novelty detection process. When few sentences have been seen, it is unlikely that the user is negatively affected by redundancy. Therefore, re-ordering the first sentences may be harmful in terms of performance. We propose here a query-dependent method based on cluster analysis to determine where we must start filtering redundancy.
international acm sigir conference on research and development in information retrieval | 2011
Ronald T. Fernández
In this thesis we study thoroughly sentence retrieval and novelty detection. We analyze the strengths and weaknesses of current state of the art methods and, subsequently, new mechanisms to address sentence retrieval and novelty detection are proposed. Retrieval and novelty detection are related tasks: usually, we initially apply a retrieval model that estimates properly the relevance of passages (e.g. sentences) and generates a ranking of passages sorted by their relevance. Next, this ranking is used as the input of a novelty detection module, which tries to filter out redundant passages in the ranking. The estimation of relevance at sentence level is difficult. Standard methods used to estimate relevance are simply based on matching query and sentence terms. However, queries usually contain two or three terms and sentences are also short. Therefore, the matching between query and sentences is poor. In order to address this problem, we study in this thesis how to enrich this process with additional information: the context. The context refers to the information provided by the surrounding sentences or the document where the sentence is located. Such context reduces ambiguity and supplies additional information not included in the sentence itself. Additionally, it is important to estimate how important or central a sentence is within the document. These two components, the context and the centrality of the sentences, are studied in this thesis following a formal framework based on Statistical Language Models. In this respect, we demonstrate that these components yield to improvements in current sentence retrieval methods. In this thesis we work with collections of sentences that were extracted from news. News not only explain facts but also express opinions that people have about a particular event or topic. Therefore, the proper estimation of which passages are opinionated may help to further improve the estimation of relevance for sentences. We apply a formal methodology that helps us to incorporate opinions into standard sentence retrieval methods. Additionally, we propose simple empirical alternatives to incorporate query-independent features into sentence retrieval models. We demonstrate that the incorporation of opinions to estimate relevance is an important factor that makes sentence retrieval methods more effective. In the course of our study, we also analyze query-independent features based on sentence length and named entities. The combination of the context-based approach with the incorporation of opinion-based features is straightforward. We study how to combine these two approaches and the impact of such combination. We demonstrate that context-based models are implicitly promoting sentences with opinions and, therefore, opinion-based features do not help to further improve context-based methods. The second part of this thesis is dedicated to novelty detection at sentence level. Because novelty is actually dependent on a retrieval ranking, we consider here two approaches: a) the perfect-relevance approach, which consists of using a ranking where all sentences are relevant (this is an ideal approach); and b) the non-perfect relevance approach, which consists of applying first a sentence retrieval method (therefore, the ranking may contain sentences that are not relevant). We first study which baseline performs the best and, next, we propose a number of variations. One of the mechanisms proposed is based on vocabulary pruning. We demonstrate that considering terms from the top ranked sentences in the original ranking helps to guide the estimation of novelty. The application of Language Models to support novelty detection is another challenge that we face in this thesis. We apply different smoothing methods (Dirichlet and Jelinek-Mercer) in the context of alternative mechanisms to detect novelty (Aggregate and Non-Aggregate Models). Additionally, we test a mechanism based on mixture models that uses the Expectation-Maximization algorithm to obtain automatically the novelty score of a sentence. In the last part of this work we demonstrate that most novelty methods lead to a strong re-ordering of the initial ranking. However, we show that the top ranked sentences in the initial list are usually novel and re-ordering them is often harmful. Therefore, we propose different mechanisms that determine the position threshold where novelty detection should be initiated. In this respect, we consider query-independent (a fixed position for all queries) and query-dependent approaches (cluster-based and normalized-score approaches). Summing up, we identify important limitations of current sentence retrieval and novelty methods and, along this thesis, we propose alternative methods that are novel and effective. The thesis is available for download at http://www.gsi.dec.usc.es/ir/.
international acm sigir conference on research and development in information retrieval | 2010
Leif Azzopardi; Ronald T. Fernández; David E. Losada
The retrieval of sentences is a core task within Information Retrieval. In this poster we employ a Language Model that incorporates a prior which encodes the importance of sentences within the retrieval model. Then, in a set of comprehensive experiments using the TREC Novelty Tracks, we show that including this prior substantially improves retrieval effectiveness, and significantly outperforms the current state of the art in sentence retrieval.
information interaction in context | 2008
Ronald T. Fernández; David E. Losada
Current Information Retrieval systems are often based on topicality. They estimate relevance by comparing the similarity between the user query and each document. These systems do not take into account important contextual information. More specifically, they do not often apply mechanisms to filter out redundant information. We interpret context here as the set of chunks of text from the ranked set of documents that the user has already seen. This is a valuable contextual information to guide the retrieval processes in a way that avoids redundancy. It is desirable that the ranking of results is composed by relevant but also novel material. This means that each document must provide to the user unseen information which is related to his need. In this work we study different novelty detection approaches that make good use of this contextual information. We show that these techniques can be applied effectively and efficiently at the sentence level.