Donald Metzler | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Donald Metzler is active.

Explore More

Publication

Featured researches published by Donald Metzler.

international acm sigir conference on research and development in information retrieval | 2005

A Markov random field model for term dependencies

Donald Metzler; W. Bruce Croft

This paper develops a general, formal framework for modeling term dependencies via Markov random fields. The model allows for arbitrary text features to be incorporated as evidence. In particular, we make use of features based on occurrences of single terms, ordered phrases, and unordered phrases. We explore full independence, sequential dependence, and full dependence variants of the model. A novel approach is developed to train the model that directly maximizes the mean average precision rather than maximizing the likelihood of the training data. Ad hoc retrieval experiments are presented on several newswire and web collections, including the GOV2 collection used at the TREC 2004 Terabyte Track. The results show significant improvements are possible by modeling dependencies, especially on the larger web collections.

Information Processing and Management | 2004

Combining the language model and inference network approaches to retrieval

Donald Metzler; W. Bruce Croft

The inference network retrieval model, as implemented in the InQuery search engine, allows for richly structured queries. However, it incorporates a form of ad hoc tf.idf estimates for word probabilities. Language modeling offers more formal estimation techniques. In this paper we combine the language modeling and inference network approaches into a single framework. The resulting model allows structured queries to be evaluated using language modeling estimates. We explore the issues involved, such as combining beliefs and smoothing of proximity nodes. Experimental results are presented comparing the query likelihood model, the InQuery system, and our new model. The results reaffirm that high quality structured queries outperform unstructured queries and show that our system consistently achieves higher average precision than InQuery.

european conference on information retrieval | 2007

Similarity measures for short segments of text

Donald Metzler; Susan T. Dumais; Christopher Meek

Measuring the similarity between documents and queries has been extensively studied in information retrieval. However, there are a growing number of tasks that require computing the similarity between two very short segments of text. These tasks include query reformulation, sponsored search, and image retrieval. Standard text similarity measures perform poorly on such tasks because of data sparseness and the lack of context. In this work, we study this problem from an information retrieval perspective, focusing on text representations and similarity measures. We examine a range of similarity measures, including purely lexical measures, stemming, and language modeling-based measures. We formally evaluate and analyze the methods on a query-query similarity task using 363,822 queries from a web search log. Our analysis provides insights into the strengths and weaknesses of each method, including important tradeoffs between effectiveness and efficiency.

Information Retrieval | 2007

Linear feature-based models for information retrieval

Donald Metzler; W. Bruce Croft

There have been a number of linear, feature-based models proposed by the information retrieval community recently. Although each model is presented differently, they all share a common underlying framework. In this paper, we explore and discuss the theoretical issues of this framework, including a novel look at the parameter space. We then detail supervised training algorithms that directly maximize the evaluation metric under consideration, such as mean average precision. We present results that show training models in this way can lead to significantly better test set performance compared to other training methods that do not directly maximize the metric. Finally, we show that linear feature-based models can consistently and significantly outperform current state of the art retrieval models with the correct choice of features.

international acm sigir conference on research and development in information retrieval | 2007

Latent concept expansion using markov random fields

Donald Metzler; W. Bruce Croft

Query expansion, in the form of pseudo-relevance feedback or relevance feedback, is a common technique used to improve retrieval effectiveness. Most previous approaches have ignored important issues, such as the role of features and the importance of modeling term dependencies. In this paper, we propose a robust query expansion technique based onthe Markov random field model for information retrieval. The technique, called latent concept expansion, provides a mechanism for modeling term dependencies during expansion. Furthermore, the use of arbitrary features within the model provides a powerful framework for going beyond simple term occurrence features that are implicitly used by most other expansion techniques. We evaluate our technique against relevance models, a state-of-the-art language modeling query expansion technique. Our model demonstrates consistent and significant improvements in retrieval effectiveness across several TREC data sets. We also describe how our technique can be used to generate meaningful multi-term concepts for tasks such as query suggestion/reformulation.

international acm sigir conference on research and development in information retrieval | 2006

Improving the estimation of relevance models using large external corpora

Fernando Diaz; Donald Metzler

Information retrieval algorithms leverage various collection statistics to improve performance. Because these statistics are often computed on a relatively small evaluation corpus, we believe using larger, non-evaluation corpora should improve performance. Specifically, we advocate incorporating external corpora based on language modeling. We refer to this process as external expansion. When compared to traditional pseudo-relevance feedback techniques, external expansion is more stable across topics and up to 10% more effective in terms of mean average precision. Our results show that using a high quality corpus that is comparable to the evaluation corpus can be as, if not more, effective than using the web. Our results also show that external expansion outperforms simulated relevance feedback. In addition, we propose a method for predicting the extent to which external expansion will improve retrieval performance. Our new measure demonstrates positive correlation with improvements in mean average precision.

conference on information and knowledge management | 2005

Similarity measures for tracking information flow

Donald Metzler; Yaniv Bernstein; W. Bruce Croft; Alistair Moffat; Justin Zobel

Text similarity spans a spectrum, with broad topical similarity near one extreme and document identity at the other. Intermediate levels of similarity -- resulting from summarization, paraphrasing, copying, and stronger forms of topical relevance -- are useful for applications such as information flow analysis and question-answering tasks. In this paper, we explore mechanisms for measuring such intermediate kinds of similarity, focusing on the task of identifying where a particular piece of information originated. We consider both sentence-to-sentence and document-to-document comparison, and have incorporated these algorithms into <small>RECAP</small>, a prototype information flow analysis tool. Our experimental results with <small>RECAP</small> indicate that new mechanisms such as those we propose are likely to be more appropriate than existing methods for identifying the intermediate forms of similarity.

web search and data mining | 2010

Learning concept importance using a weighted dependence model

Michael Bendersky; Donald Metzler; W. Bruce Croft

Modeling query concepts through term dependencies has been shown to have a significant positive effect on retrieval performance, especially for tasks such as web search, where relevance at high ranks is particularly critical. Most previous work, however, treats all concepts as equally important, an assumption that often does not hold, especially for longer, more complex queries. In this paper, we show that one of the most effective existing term dependence models can be naturally extended by assigning weights to concepts. We demonstrate that the weighted dependence model can be trained using existing learning-to-rank techniques, even with a relatively small number of training queries. Our study compares the effectiveness of both endogenous (collection-based) and exogenous (based on external sources) features for determining concept importance. To test the weighted dependence model, we perform experiments on both publicly available TREC corpora and a proprietary web corpus. Our experimental results indicate that our model consistently and significantly outperforms both the standard bag-of-words model and the unweighted term dependence model, and that combining endogenous and exogenous features generally results in the best retrieval effectiveness.

international acm sigir conference on research and development in information retrieval | 2009

Improving search relevance for implicitly temporal queries

Donald Metzler; Rosie Jones; Fuchun Peng; Ruiqiang Zhang

Many web search queries have implicit intents associated with them that, if detected and used effectively, can be used to improve search quality. For example, a user who enters the query “toyota camry” may wish to find the official web page for the car, reviews about the car, or the location of the closest Toyota dealership. However, since the user only entered a couple of keywords, it may be difficult to accurately determine which of these implicit intents the user actually meant. Given such an ambiguous query, a search engine must use personalization, click information, query log analysis, and other means for determining implicit intent. Rather than solving the general problem of automatically determining user intent, we focus on queries that have a temporally dependent intent. Temporally dependent queries are queries for which the best search results change with time. Simple examples include “new years” and “presidential elections”, which are events that recur over time. The search results for these queries should reflect the freshest, most current results. A slightly more complex example is the query “turkey”. For this query, it may be useful to return turkey recipes or cooking instructions around the Thanksgiving holiday and travel information during peak vacation times. In all of our examples thus far, the events have occurred with (mostly) predictable periodicity. However, for queries such as “oldest person alive”, the best result changes unpredictably, making it difficult for search engines to consistently return correct results. Therefore, temporally dependent queries come in many different forms and pose many challenges to search engines. In this paper, we investigate a subset of temporal queries that we call implicitly year qualified queries. A year qualified query is a query that contains a year. An implicitly year qualified query is a query that does not actually contain a year, but yet the user may have implicitly formulated the query with a specific year in mind. An example implicitly year qualified query is “miss universe”. It is plausible that the user actually meant “miss universe 2008”, “miss universe 2007”, or maybe even“miss universe 1990”, yet did not actu-

Information Retrieval | 2005

Analysis of Statistical Question Classification for Fact-Based Questions

Donald Metzler; W. Bruce Croft

Question classification systems play an important role in question answering systems and can be used in a wide range of other domains. The goal of question classification is to accurately assign labels to questions based on expected answer type. Most approaches in the past have relied on matching questions against hand-crafted rules. However, rules require laborious effort to create and often suffer from being too specific. Statistical question classification methods overcome these issues by employing machine learning techniques. We empirically show that a statistical approach is robust and achieves good performance on three diverse data sets with little or no hand tuning. Furthermore, we examine the role different syntactic and semantic features have on performance. We find that semantic features tend to increase performance more than purely syntactic features. Finally, we analyze common causes of misclassification error and provide insight into ways they may be overcome.

Explore More