Jonathan L. Elsas
Carnegie Mellon University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jonathan L. Elsas.
international acm sigir conference on research and development in information retrieval | 2008
Jonathan L. Elsas; Jaime Arguello; Jamie Callan; Jaime G. Carbonell
Blog feed search poses different and interesting challenges from traditional ad hoc document retrieval. The units of retrieval, the blogs, are collections of documents, the blog posts. In this work we adapt a state-of-the-art federated search model to the feed retrieval task, showing a significant improvement over algorithms based on the best performing submissions in the TREC 2007 Blog Distillation task[12]. We also show that typical query expansion techniques such as pseudo-relevance feedback using the blog corpus do not provide any significant performance improvement and in many cases dramatically hurt performance. We perform an in-depth analysis of the behavior of pseudo-relevance feedback for this task and develop a novel query expansion technique using the link structure in Wikipedia. This query expansion technique provides significant and consistent performance improvements for this task, yielding a 22% and 14% improvement in MAP over the unexpanded query for our baseline and federated algorithms respectively.
web search and data mining | 2009
Eytan Adar; Jaime Teevan; Susan T. Dumais; Jonathan L. Elsas
The Web is a dynamic, ever changing collection of information. This paper explores changes in Web content by analyzing a crawl of 55,000 Web pages, selected to represent different user visitation patterns. Although change over long intervals has been explored on random (and potentially unvisited) samples of Web pages, little is known about the nature of finer grained changes to pages that are actively consumed by users, such as those in our sample. We describe algorithms, analyses, and models for characterizing changes in Web content, focusing on both time (by using hourly and sub-hourly crawls) and structure (by looking at page-, DOM-, and term-level changes). Change rates are higher in our behavior-based sample than found in previous work on randomly sampled pages, with a large portion of pages changing more than hourly. Detailed content and structure analyses identify stable and dynamic content within each page. The understanding of Web change we develop in this paper has implications for tools designed to help people interact with dynamic Web content, such as search engines, advertising, and Web browsers.
web search and data mining | 2010
Jonathan L. Elsas; Susan T. Dumais
Many web documents are dynamic, with content changing in varying amounts at varying frequencies. However, current document search algorithms have a static view of the document content, with only a single version of the document in the index at any point in time. In this paper, we present the first published analysis of using the temporal dynamics of document content to improve relevance ranking. We show that there is a strong relationship between the amount and frequency of content change and relevance. We develop a novel probabilistic document ranking algorithm that allows differential weighting of terms based on their temporal characteristics. By leveraging such content dynamics we show significant performance improvements for navigational queries.
international acm sigir conference on research and development in information retrieval | 2009
Jonathan L. Elsas; Jaime G. Carbonell
Online forums host a rich information exchange, often with contributions from many subject matter experts. In this work we evaluate algorithms for thread retrieval in a large and active online forum community. We compare methods that utilize thread structure to a naïve method that treats a thread as a single document. We find that thread structure helps, and additionally selective methods of thread scoring, which only use evidence from a small number of messages in the thread, significantly and consistently outperform inclusive methods which use all the messages in the thread.
conference on information and knowledge management | 2010
Matthew W. Bilotti; Jonathan L. Elsas; Jaime G. Carbonell; Eric Nyberg
This work presents a general rank-learning framework for passage ranking within Question Answering (QA) systems using linguistic and semantic features. The framework enables query-time checking of complex linguistic and semantic constraints over keywords. Constraints are composed of a mixture of keyword and named entity features, as well as features derived from semantic role labeling. The framework supports the checking of constraints of arbitrary length relating any number of keywords. We show that a trained ranking model using this rich feature set achieves greater than a 20% improvement in Mean Average Precision over baseline keyword retrieval models. We also show that constraints based on semantic role labeling features are particularly effective for passage retrieval; when they can be leveraged, an 40% improvement in MAP over the baseline can be realized.
acm/ieee joint conference on digital libraries | 2004
Miles Efron; Jonathan L. Elsas; Gary Marchionini; Junliang Zhang
We describe ongoing research into the application of machine learning techniques for improving access to governmental information in complex digital libraries. Under the auspices of the GovStat Project, our goal is to identify a small number of semantically valid concepts that adequately spans the intellectual domain of a collection. The goal of this discovery is twofold. First we desire a practical aid for information architects. Second, automatically derived document-concept relationships are a necessary precondition for real-world deployment of many dynamic interfaces. The current study compares concept learning strategies based on three document representations: keywords, titles, and full-text. In statistical and user-based studies, human-created keywords provide significant improvements in concept learning over both title-only and full-text representations.
conference on information and knowledge management | 2008
Vitor R. Carvalho; Jonathan L. Elsas; William W. Cohen; Jaime G. Carbonell
Many of the recently proposed algorithms for learning feature-based ranking functions are based on the pairwise preference framework, in which instead of taking documents in isolation, document pairs are used as instances in the learning process. One disadvantage of this process is that a noisy relevance judgment on a single document can lead to a large number of mis-labeled document pairs. This can jeopardize robustness and deteriorate overall ranking performance. In this paper we study the effects of outlying pairs in rank learning with pairwise preferences and introduce a new meta-learning algorithm capable of suppressing these undesirable effects. This algorithm works as a second optimization step in which any linear baseline ranker can be used as input. Experiments on eight different ranking datasets show that this optimization step produces statistically significant performance gains over state-of-the-art methods.
international conference on weblogs and social media | 2008
Jaime Arguello; Jonathan L. Elsas; Jamie Callan; Jaime G. Carbonell
web search and data mining | 2008
Jonathan L. Elsas; Vitor R. Carvalho; Jaime G. Carbonell
text retrieval conference | 2007
Jonathan L. Elsas; Jaime Arguello; Jamie Callan; Jaime G. Carbonell