James P. Callan | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where James P. Callan is active.

Explore More

Publication

Featured researches published by James P. Callan.

international acm sigir conference on research and development in information retrieval | 1995

Searching distributed collections with inference networks

James P. Callan; Zhihong Lu; W. Bruce Croft

The use of information retrieval systems in networked environments raises a new set of issues that have received little attention. These issues include ranking document collections for relevance to a query, selecting the best set of collections from a ranked list, and merging the document rankings that are returned from a set of collections. This paper describes methods of addressing each issue in the inference network model, discusses their implementation in the INQUERY system, and presents experimental results demonstrating their effectiveness.

database and expert systems applications | 1992

The INQUERY Retrieval System

James P. Callan; W. Bruce Croft; Stephen M. Harding

As larger and more heterogeneous text databases become available, information retrieval research will depend on the development of powerful, efficient and flexible retrieval engines. In this paper, we describe a retrieval system (INQUERY) that is based on a probabilistic retrieval model and provides support for sophisticated indexing and complex query formulation. INQUERY has been used successfully with databases containing nearly 400,000 documents.

international acm sigir conference on research and development in information retrieval | 1996

Training algorithms for linear text classifiers

David D. Lewis; Robert E. Schapire; James P. Callan

Systems for text retrieval, routing, categorization and other IR tasks rely heavily on linear classifiers. We propose that two machine learning algorithms, the Widrow-Hoff and EG algorithms, be used in training linear text classifiers. In contrast to most IR methods, theoretical analysis provides performance guarantees and guidance on parameter settings for these algorithms. Experimental data is presented showing Widrow-Hoff and EG to be more effective than the widely used Rocchio algorithm on several categorization and routing tasks.

international acm sigir conference on research and development in information retrieval | 1994

Passage-level evidence in document retrieval

James P. Callan

The increasing lengths of documents in full-text collections encourages renewed interest in the ranking and retrieval of document passages. Past research showed that evidence from passages can improve retrieval results, but it also raised questions about how passages are defined, how they can be ranked efficiently, and what is their proper role in long, structured documents.

international acm sigir conference on research and development in information retrieval | 2002

Novelty and redundancy detection in adaptive filtering

Yi Zhang; James P. Callan; Thomas P. Minka

This paper addresses the problem of extending an adaptive information filtering system to make decisions about the novelty and redundancy of relevant documents. It argues that relevance and redundance should each be modelled explicitly and separately. A set of five redundancy measures are proposed and evaluated in experiments with and without redundancy thresholds. The experimental results demonstrate that the cosine similarity metric and a redundancy measure based on a mixture of language models are both effective for identifying redundant documents.

ACM Transactions on Information Systems | 2001

Query-based sampling of text databases

James P. Callan; Margaret E. Connell

The proliferation of searchable text databases on corporate networks and the Internet causes a database selection problem for many people. Algorithms such as gGLOSS and CORI can automatically select which text databases to search for a given information need, but only if given a set of resource descriptions that accurately represent the contents of each database. The existing techniques for a acquiring resource descriptions have significant limitations when used in wide-area networks controlled by many parties. This paper presents query-based sampling, a new technicque for acquiring accurate resource descriptions. Query-based sampling does not require the cooperation of resource providers, nor does it require that resource providers use a particular search engine or representation technique. An extensive set of experimental results demonstrates that accurate resource descriptions are crated, that computation and communication costs are reasonable, and that the resource descriptions do in fact enable accurate automatic dtabase selection.

international acm sigir conference on research and development in information retrieval | 2003

Combining document representations for known-item search

Paul Ogilvie; James P. Callan

This paper investigates the pre-conditions for successful combination of document representations formed from structural markup for the task of known-item search. As this task is very similar to work in meta-search and data fusion, we adapt several hypotheses from those research areas and investigate them in this context. To investigate these hypotheses, we present a mixture-based language model and also examine many of the current meta-search algorithms. We find that compatible output from systems is important for successful combination of document representations. We also demonstrate that combining low performing document representations can improve performance, but not consistently. We find that the techniques best suited for this task are robust to the inclusion of poorly performing document representations. We also explore the role of variance of results across systems and its impact on the performance of fusion, with the surprising result that the correct documents have higher variance across document representations than highly ranking incorrect documents.

international acm sigir conference on research and development in information retrieval | 2003

Challenges in information retrieval and language modeling: report of a workshop held at the center for intelligent information retrieval, University of Massachusetts Amherst, September 2002

James Allan; Jay Aslam; Nicholas J. Belkin; Chris Buckley; James P. Callan; W. Bruce Croft; Susan T. Dumais; Norbert Fuhr; Donna Harman; David J. Harper; Djoerd Hiemstra; Thomas Hofmann; Eduard H. Hovy; Wessel Kraaij; John D. Lafferty; Victor Lavrenko; David Lewis; Liz Liddy; R. Manmatha; Andrew McCallum; Jay M. Ponte; John M. Prager; Dragomir R. Radev; Philip Resnik; Stephen E. Robertson; Ron G. Rosenfeld; Salim Roukos; Mark Sanderson; Richard M. Schwartz; Amit Singhal

Information retrieval (IR) research has reached a point where it is appropriate to assess progress and to define a research agenda for the next five to ten years. This report summarizes a discussion of IR research challenges that took place at a recent workshop. The attendees of the workshop considered information retrieval research in a range of areas chosen to give broad coverage of topic areas that engage information retrieval researchers. Those areas are retrieval models, cross-lingual retrieval, Web search, user modeling, filtering, topic detection and tracking, classification, summarization, question answering, metasearch, distributed retrieval, multimedia retrieval, information extraction, as well as testbed requirements for future work. The potential use of language modeling techniques in these areas was also discussed. The workshop identified major challenges within each of those areas. The following are recurring themes that ran throughout: • User and context sensitive retrieval • Multi-lingual and multi-media issues • Better target tasks • Improved objective evaluations • Substantially more labeled data • Greater variety of data sources • Improved formal models Contextual retrieval and global information access were identified as particularly important long-term challenges.

text retrieval conference | 1997

TREC and TIPSTER experiments with INQUERY

James P. Callan; W. Bruce Croft; John Broglio

Abstract INQUERY is a probabilistic information retrieval system based upon a Bayesian inference network model. This paper describes recent improvements to the system as a result of participation in the TIPSTER project and the TREC-2 conference. Improvements include transforming forms-based specifications of information needs into complex structured queries, automatic query expansion, automatic recognition of features in documents, relevance feedback, and simulated document routing. Experiments with one- and two-gigabyte document collections are also described.

conference on information and knowledge management | 2002

A language modeling framework for resource selection and results merging

Luo Si; Rong Jin; James P. Callan; Paul Ogilvie

Statistical language models have been proposed recently for several information retrieval tasks, including the resource selection task in distributed information retrieval. This paper extends the language modeling approach to integrate resource selection, ad-hoc searching, and merging of results from different text databases into a single probabilistic retrieval model. This new approach is designed primarily for Intranet environments, where it is reasonable to assume that resource providers are relatively homogeneous and can adopt the same kind of search engine. Experiments demonstrate that this new, integrated approach is at least as effective as the prior state-of-the-art in distributed IR.

Explore More