David E. Losada
University of Santiago de Compostela
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by David E. Losada.
Information Retrieval | 2008
David E. Losada; Leif Azzopardi
Document length is widely recognized as an important factor for adjusting retrieval systems. Many models tend to favor the retrieval of either short or long documents and, thus, a length-based correction needs to be applied for avoiding any length bias. In Language Modeling for Information Retrieval, smoothing methods are applied to move probability mass from document terms to unseen words, which is often dependant upon document length. In this article, we perform an in-depth study of this behavior, characterized by the document length retrieval trends, of three popular smoothing methods across a number of factors, and its impact on the length of documents retrieved and retrieval performance. First, we theoretically analyze the Jelinek–Mercer, Dirichlet prior and two-stage smoothing strategies and, then, conduct an empirical analysis. In our analysis we show how Dirichlet prior smoothing caters for document length more appropriately than Jelinek–Mercer smoothing which leads to its superior retrieval performance. In a follow up analysis, we posit that length-based priors can be used to offset any bias in the length retrieval trends stemming from the retrieval formula derived by the smoothing technique. We show that the performance of Jelinek–Mercer smoothing can be significantly improved by using such a prior, which provides a natural and simple alternative to decouple the query and document modeling roles of smoothing. With the analysis of retrieval behavior conducted in this article, it is possible to understand why the Dirichlet Prior smoothing performs better than the Jelinek–Mercer, and why the performance of the Jelinek–Mercer method is improved by including a length-based prior.
Information Sciences | 2014
Jose M. Chenlo; David E. Losada
While a number of isolated studies have analysed how di erent sentence features are beneficial in Sentiment Analysis, a complete picture of their e ectiveness is still lacking. In this paper we extend and combine the body of empirical evidence regarding sentence subjectivity classification and sentence polarity classification, and provide a comprehensive analysis of the relative importance of each set of features using data from multiple benchmarks. To the best of our knowledge, this is the first study that evaluates a highly diversified set of sentence features for the two main sentiment classification tasks.
IEEE Transactions on Fuzzy Systems | 2005
Félix Díaz-Hermida; David E. Losada; Alberto Bugarín; Senén Barro
In this paper, we propose a new quantifier fuzzification mechanism which is deeply rooted in the theory of probability. This quantifier fuzzification mechanism skips the nested assumption, which is inherent to other probabilistic quantification methods. The new quantification approach complies with the properties required for determiner fuzzification schemes (DFS) with finite sets and, hence, its good behavior is assured. Moreover, this new approach is suitable for some application domains. In particular, the use of fuzzy quantifiers for implementing query quantified statements for information retrieval exemplifies the adequacy of the new proposal. The new quantifier fuzzification mechanism has been efficiently implemented and empirically tested for a retrieval task. This practical evaluation followed the standard methodology in the field of information retrieval and was conducted against a popular benchmark consisting of a large collection of documents. The retrieval performance evaluation made evident that: 1) the new method can work in realistic scenarios, and 2) it can overcome recent proposals for applying fuzzy quantifiers in information retrieval
ACM Transactions on Information Systems | 2008
David E. Losada; Leif Azzopardi
Although the seminal proposal to introduce language modeling in information retrieval was based on a multivariate Bernoulli model, the predominant modeling approach is now centered on multinomial models. Language modeling for retrieval based on multivariate Bernoulli distributions is seen inefficient and believed less effective than the multinomial model. In this article, we examine the multivariate Bernoulli model with respect to its successor and examine its role in future retrieval systems. In the context of Bayesian learning, these two modeling approaches are described, contrasted, and compared both theoretically and computationally. We show that the query likelihood following a multivariate Bernoulli distribution introduces interesting retrieval features which may be useful for specific retrieval tasks such as sentence retrieval. Then, we address the efficiency aspect and show that algorithms can be designed to perform retrieval efficiently for multivariate Bernoulli models, before performing an empirical comparison to study the behaviorial aspects of the models. A series of comparisons is then conducted on a number of test collections and retrieval tasks to determine the empirical and practical differences between the different models. Our results indicate that for sentence retrieval the multivariate Bernoulli model can significantly outperform the multinomial model. However, for the other tasks the multinomial model provides consistently better performance (and in most cases significantly so). An analysis of the various retrieval characteristics reveals that the multivariate Bernoulli model tends to promote long documents whose nonquery terms are informative. While this is detrimental to the task of document retrieval (documents tend to contain considerable nonquery content), it is valuable for other tasks such as sentence retrieval, where the retrieved elements are very short and focused.
international acm sigir conference on research and development in information retrieval | 1999
David E. Losada; Álvaro Barreiro
This paper claims that Belief Revision can be seen as a theoretical framework for document ranking in Extended Boolean Models. For a model of Information Retrieval based on propositional logic, we propose a similarity measure which is equivalent to a P-Norm case. Therefore it shares the PNorm good properties and behaviour. Besides, it is theoretically ensured that this measure follows the notion of proximity between the documents and the query. The logical model can naturally deal with incomplete descriptions of documents and the similarity values are also obtained for this case.
The Computer Journal | 2001
David E. Losada; Álvaro Barreiro
En esta tesis se propone un modelo logico para modelar el problema de Recuperacion de Informacion RI, A partir un formalismo basico se han formalizado varias tareas clasicas de RI, estudiado sus costes comptuacionales y propuesto implementaciones eficientes. En todos los pasos se ha enfatizado al ventajas del uso de una aproximacion logica. La flexibilidad representacional de la logica ha permitido la creacion de un marco homogeneo donde se modelan distintos elementos involucrados en el problema de RI. Primeramente se ha modelado el problema basico de RI dentro de un formalismo logico. Seguidamente se ha definido una implementacion eficiente para el modelo propuesto. Esta implementacion ha permitido la evaluacion del modelo con colecciones de prueba estandar en RI. Estos experimentos permiten valorar cuantitativamene el rendimiento del modelo teorico propuesto. A continuacion el modelo se ha extendido para manejar situaciones de recuperacion y para modelar el proceso de relevance feedback. Esto permite mostrar que un marco formal puede manejar extensiones de forma homogenea. Por ultimo, las nociones de similaridad entre terminos y frecuencia inversa en documentos han sido incluidas en el modelo. Estas ultimas extensiones han sido acompanadas de sus correspondientes tests de evaluacion. Las principales aportaciones de esta investigacion son las siguientes. Primero, el modelo teorico propuesto ha sido implementado y evaluado, asegurando su aplicabilidad real. De hecho, muy pocas aproximaciones logicas a RI han sido implementadas y evaluadas. El modelo basico puede representar vectores clasicos con pesos binarios y, ademas, nuestra medida de relevancia se corresponde con la medida clasica del producto interno consulta-documento. De esta forma, hemos formalizado tareas clasicas como casos dentro del modelo. Sin embargo, el modelo propuesto es inherentemente mas expresivo que los formalismos clasicos.
acm symposium on applied computing | 2004
David E. Losada; Félix Díaz-Hermida; Alberto Bugarín; Senén Barro
In this work we implement and evaluate a fuzzy approach to Information Retrieval whose query language incorporates fuzzy quantifiers. Fuzzy quantified sentences are suitable for imposing additional restrictions in the retrieval process which are not typical in classic information retrieval. Moreover, fuzzy quantifiers can be implemented in different relaxed ways leading to a wide range of methods for combining query terms. The large-scale evaluation conducted here shows clearly the practical benefits obtained in terms of retrieval performance. These empirical results strengthen previous theoretical works that already advanced the adequacy of fuzzy quantifiers for modeling information needs.
Information Retrieval | 2011
Ronald T. Fernández; David E. Losada; Leif Azzopardi
Employing effective methods of sentence retrieval is essential for many tasks in Information Retrieval, such as summarization, novelty detection and question answering. The best performing sentence retrieval techniques attempt to perform matching directly between the sentences and the query. However, in this paper, we posit that the local context of a sentence can provide crucial additional evidence to further improve sentence retrieval. Using a Language Modeling Framework, we propose a novel reformulation of the sentence retrieval problem that extends previous approaches so that the local context is seamlessly incorporated within the retrieval models. In a series of comprehensive experiments, we show that localized smoothing and the prior importance of a sentence can improve retrieval effectiveness. The proposed models significantly and substantially outperform the state of the art and other competitive sentence retrieval baselines on recall-oriented measures, while remaining competitive on precision-oriented measures. This research demonstrates that local context plays an important role in estimating the relevance of a sentence, and that existing sentence retrieval language models can be extended to utilize this evidence effectively.
cross language evaluation forum | 2016
David E. Losada; Fabio Crestani
Several studies in the literature have shown that the words people use are indicative of their psychological states. In particular, depression was found to be associated with distinctive linguistic patterns. However, there is a lack of publicly available data for doing research on the interaction between language and depression. In this paper, we describe our first steps to fill this gap. We outline the methodology we have adopted to build and make publicly available a test collection on depression and language use. The resulting corpus includes a series of textual interactions written by different subjects. The new collection not only encourages research on differences in language between depressed and non-depressed individuals, but also on the evolution of the language use of depressed individuals. Further, we propose a novel early detection task and define a novel effectiveness measure to systematically compare early detection algorithms. This new measure takes into account both the accuracy of the decisions taken by the algorithm and the delay in detecting positive cases. We also present baseline results with novel detection methods that process users’ interactions in different ways.
Information Retrieval | 2010
David E. Losada
The retrieval of sentences that are relevant to a given information need is a challenging passage retrieval task. In this context, the well-known vocabulary mismatch problem arises severely because of the fine granularity of the task. Short queries, which are usually the rule rather than the exception, aggravate the problem. Consequently, effective sentence retrieval methods tend to apply some form of query expansion, usually based on pseudo-relevance feedback. Nevertheless, there are no extensive studies comparing different statistical expansion strategies for sentence retrieval. In this work we study thoroughly the effect of distinct statistical expansion methods on sentence retrieval. We start from a set of retrieved documents in which relevant sentences have to be found. In our experiments different term selection strategies are evaluated and we provide empirical evidence to show that expansion before sentence retrieval yields competitive performance. This is particularly novel because expansion for sentence retrieval is often done after sentence retrieval (i.e. expansion terms are mined from a ranked set of sentences) and there are no comparative results available between both types of expansion. Furthermore, this comparison is particularly valuable because there are important implications in time efficiency. We also carefully analyze expansion on weak and strong queries and demonstrate clearly that expanding queries before sentence retrieval is not only more convenient for efficiency purposes, but also more effective when handling poor queries.