Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where W. Bruce Croft is active.

Publication


Featured researches published by W. Bruce Croft.


international acm sigir conference on research and development in information retrieval | 2001

Relevance-Based Language Models

Victor Lavrenko; W. Bruce Croft

We explore the relation between classical probabilistic models of information retrieval and the emerging language modeling approaches. It has long been recognized that the primary obstacle to effective performance of classical models is the need to estimate a relevance model: probabilities of words in the relevant class. We propose a novel technique for estimating these probabilities using the query alone. We demonstrate that our technique can produce highly accurate relevance models, addressing important notions of synonymy and polysemy. Our experiments show relevance models outperforming baseline language modeling systems on TREC retrieval and TDT tracking tasks. The main contribution of this work is an effective formal method for estimating a relevance model with no training data.


international acm sigir conference on research and development in information retrieval | 1996

Quary Expansion Using Local and Global Document Analysis

Jinxi Xu; W. Bruce Croft

Automatic query expansion has long been suggested as a technique for dealing with the fundamental issue of word mismatch in information retrieval. A number of approaches to expansion have been studied and, more recently, attention has focused on techniques that analyze the corpus to discover word relationship (global techniques) and those that analyze documents retrieved by the initial query ( local feedback). In this paper, we compare the effectiveness of these approaches and show that, although global analysis haa some advantages, local analysia is generally more effective. We also show that using global analysis techniques.


Communications of The ACM | 1992

Information filtering and information retrieval: two sides of the same coin?

Nicholas J. Belkin; W. Bruce Croft

Information filtering systems are designed for unstructured or semistructured data, as opposed to database applications, which use very structured data. The systems also deal primarily with textual information, but they may also entail images, voice, video or other data types that are part of multimedia information systems. Information filtering systems also involve a large amount of data and streams of incoming data, whether broadcast from a remote source or sent directly by other sources. Filtering is based on descriptions of individual or group information preferences, or profiles, that typically represent long-term interests. Filtering also implies removal of data from an incoming stream rather than finding data in the stream; users see only the data that is extracted. Models of information retrieval and filtering, and lessons for filtering from retrieval research are presented.


international acm sigir conference on research and development in information retrieval | 2006

LDA-based document models for ad-hoc retrieval

Xing Wei; W. Bruce Croft

Search algorithms incorporating some form of topic model have a long history in information retrieval. For example, cluster-based retrieval has been studied since the 60s and has recently produced good results in the language model framework. An approach to building topic models based on a formal generative model of documents, Latent Dirichlet Allocation (LDA), is heavily cited in the machine learning literature, but its feasibility and effectiveness in information retrieval is mostly unknown. In this paper, we study how to efficiently use LDA to improve ad-hoc retrieval. We propose an LDA-based document model within the language modeling framework, and evaluate it on several TREC collections. Gibbs sampling is employed to conduct approximate inference in LDA and the computational complexity is analyzed. We show that improvements over retrieval using cluster-based models can be obtained with reasonable efficiency.


international acm sigir conference on research and development in information retrieval | 1995

Searching distributed collections with inference networks

James P. Callan; Zhihong Lu; W. Bruce Croft

The use of information retrieval systems in networked environments raises a new set of issues that have received little attention. These issues include ranking document collections for relevance to a query, selecting the best set of collections from a ranked list, and merging the document rankings that are returned from a set of collections. This paper describes methods of addressing each issue in the inference network model, discusses their implementation in the INQUERY system, and presents experimental results demonstrating their effectiveness.


international acm sigir conference on research and development in information retrieval | 2005

A Markov random field model for term dependencies

Donald Metzler; W. Bruce Croft

This paper develops a general, formal framework for modeling term dependencies via Markov random fields. The model allows for arbitrary text features to be incorporated as evidence. In particular, we make use of features based on occurrences of single terms, ordered phrases, and unordered phrases. We explore full independence, sequential dependence, and full dependence variants of the model. A novel approach is developed to train the model that directly maximizes the mean average precision rather than maximizing the likelihood of the training data. Ad hoc retrieval experiments are presented on several newswire and web collections, including the GOV2 collection used at the TREC 2004 Terabyte Track. The results show significant improvements are possible by modeling dependencies, especially on the larger web collections.


international acm sigir conference on research and development in information retrieval | 2002

Predicting query performance

Stephen Cronen-Townsend; Yun Zhou; W. Bruce Croft

We develop a method for predicting query performance by computing the relative entropy between a query language model and the corresponding collection language model. The resulting clarity score measures the coherence of the language usage in documents whose models are likely to generate the query. We suggest that clarity scores measure the ambiguity of a query with respect to a collection of documents and show that they correlate positively with average precision in a variety of TREC test sets. Thus, the clarity score may be used to identify ineffective queries, on average, without relevance information. We develop an algorithm for automatically setting the clarity score threshold between predicted poorly-performing queries and acceptable queries and validate it using TREC data. In particular, we compare the automatic thresholds to optimum thresholds and also check how frequently results as good are achieved in sampling experiments that randomly assign queries to the two classes.


international acm sigir conference on research and development in information retrieval | 1991

Evaluation of an inference network-based retrieval model

Howard R. Turtle; W. Bruce Croft

Network representations have been used in information retrieval since at least the early 1960’s. Networks have been used to support diverse retrieval functions, including browsing [38], document clustering [7], spreading activation search [4], support for multiple search strategies [11], and representation of user knowledge [27] or document content [40]. Recent work suggests that significant improvements in retrieval performance will require techniques that, in some sense “understand” the content of documents and queries [9, 43] and can be used to infer probable relationships between documents and queries. In this view, information retrieval is an inference or evidential reasoning process in which we estimate the probability that a user’s information need, expressed as one or more queries, is met given a document as “evidence.” Network representations show promise as mechanisms for inferring these kinds of relationships [4, 12].


database and expert systems applications | 1992

The INQUERY Retrieval System

James P. Callan; W. Bruce Croft; Stephen M. Harding

As larger and more heterogeneous text databases become available, information retrieval research will depend on the development of powerful, efficient and flexible retrieval engines. In this paper, we describe a retrieval system (INQUERY) that is based on a probabilistic retrieval model and provides support for sophisticated indexing and complex query formulation. INQUERY has been used successfully with databases containing nearly 400,000 documents.


ACM Transactions on Information Systems | 2000

Improving the effectiveness of information retrieval with local context analysis

Jinxi Xu; W. Bruce Croft

Techniques for automatic query expansion have been extensively studied in information research as a means of addressing the word mismatch between queries and documents. These techniques can be categorized as either global or local. While global techniques rely on analysis of a whole collection to discover word relationships, local techniques emphasize analysis of the top-ranked documents retrieved for a query. While local techniques have shown to be more effective that global techniques in general, existing local techniques are not robust and can seriously hurt retrieved when few of the retrieval documents are relevant. We propose a new technique, called local context analysis, which selects expansion terms based on cooccurrence with the query terms within the top-ranked documents. Experiments on a number of collections, both English and non-English, show that local context analysis offers more effective and consistent retrieval results.

Collaboration


Dive into the W. Bruce Croft's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

James P. Callan

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar

Qingyao Ai

University of Massachusetts Amherst

View shared research outputs
Top Co-Authors

Avatar

Hamed Zamani

University of Massachusetts Amherst

View shared research outputs
Top Co-Authors

Avatar

Michael Bendersky

University of Massachusetts Amherst

View shared research outputs
Top Co-Authors

Avatar

Jiafeng Guo

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Jangwon Seo

University of Massachusetts Amherst

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge