Is this you? Create Your Porfile

Ellen M. Voorhees

National Institute of Standards and Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ellen M. Voorhees is active.

Explore More

Publication

Featured researches published by Ellen M. Voorhees.

international acm sigir conference on research and development in information retrieval | 1994

Query expansion using lexical-semantic relations

Ellen M. Voorhees

Applications such as office automation, news filtering, help facilities in complex systems, and the like require the ability to retrieve documents from full-text databases where vocabulary problems can be particularly severe. Experiments performed on small collections with single-domain thesauri suggest that expanding query vectors with words that are lexically related to the original query words can ameliorate some of the problems of mismatched vocabularies. This paper examines the utility of lexical query expansion in the large, diverse TREC collection. Concepts are represented by WordNet synonym sets and are expanded by following the typed links included in WordNet. Experimental results show this query expansion technique makes little difference in retrieval effectiveness if the original queries are relatively complete descriptions of the information being sought even when the concepts to be expanded are selected by hand. Less well developed queries can be significantly improved by expansion of hand-chosen concepts. However, an automatic procedure that can approximate the set of hand picked synonym sets has yet to be devised, and expanding by the synonym sets that are automatically generated can degrade retrieval performance.

international acm sigir conference on research and development in information retrieval | 1998

Variations in relevance judgments and the measurement of retrieval effectiveness

Ellen M. Voorhees

Test collections have traditionally been used by information retrieval researchers to improve their retrieval strategies. To be viable as a laboratory tool, a collection must reliably rank diAerent retrieval variants according to their true eAectiveness. In particular, the relative eAectiveness of two retrieval strategies should be insensitive to modest changes in the relevant document set since individual relevance assessments are known to vary widely. The test collections developed in the TREC workshops have become the collections of choice in the retrieval research community. To verify their reliability, NIST investigated the eAect changes in the relevance assessments have on the evaluation of retrieval results. Very high correlations were found among the rankings of systems produced using diAerent relevance judgment sets. The high correlations indicate that the comparative evaluation of retrieval performance is stable despite substantial diAerences in relevance judgments, and thus reaArm the use of the TREC collections as laboratory tools. Published by Elsevier Science Ltd.

international acm sigir conference on research and development in information retrieval | 2004

Retrieval evaluation with incomplete information

Chris Buckley; Ellen M. Voorhees

This paper examines whether the Cranfield evaluation methodology is robust to gross violations of the completeness assumption (i.e., the assumption that all relevant documents within a test collection have been identified and are present in the collection). We show that current evaluation measures are not robust to substantially incomplete relevance judgments. A new measure is introduced that is both highly correlated with existing measures when complete judgments are available and more robust to incomplete judgment sets. This finding suggests that substantially larger or dynamic test collections built using current pooling practices should be viable laboratory tools, despite the fact that the relevance information will be incomplete and imperfect.

international acm sigir conference on research and development in information retrieval | 2000

Evaluating Evaluation Measure Stability

Chris Buckley; Ellen M. Voorhees

This paper presents a novel way of examining the accuracy of the evaluation measures commonly used in information retrieval experiments. It validates several of the rules-of-thumb experimenters use, such as the number of queries needed for a good experiment is at least 25 and 50 is better, while challenging other beliefs, such as the common evaluation measures are equally reliable. As an example, we show that Precision at 30 documents has about twice the average error rate as Average Precision has. These results can help information retrieval researchers design experiments that provide a desired level of confidence in their results. In particular, we suggest researchers using Web measures such as Precision at 10 documents will need to use many more than 50 queries or will have to require two methods to have a very large difference in evaluation scores before concluding that the two methods are actually different.

international acm sigir conference on research and development in information retrieval | 1993

Using WordNet to disambiguate word senses for text retrieval

Ellen M. Voorhees

This paper describes an automatic indexing procedure that uses the “IS-A” relations contained within WordNet and the set of nouns contained in a text to select a sense for each plysemous noun in the text. The result of the indexing procedure is a vector in which some of the terms represent word senses instead of word stems. Retrieval experiments comparing the effectivenss of these sense-based vectors vs. stem-based vectors show the stem-based vectors to be superior overall, although the sense-based vectors do improve the performance of some queries. The overall degradation is due in large part to the difficulty of disambiguating senses in short query statements. An analysis of these results suggests two conclusions: the IS-A links define a generalization/specialization hierarchy that is not sufficient to reliably select the correct sense of a noun from the set of fine sense distinctions in WordNet; and missing correct matches because of incorrect sense resolution has a much more deleterious effect on retrieval performance than does making spurious matches.

text retrieval conference | 2000

Overview of the sixth text REtrieval conference (TREC-6)

Ellen M. Voorhees; Donna Harman

The Text REtrieval Conference is a workshop series designed to encourage research on text retrieval for realistic applications by providing large test collections, uniform scoring procedures and a forum for organizations interested in comparing results. TREC contains two main retrieval tasks plus optional subtasks that allow participants to focus on particular common subproblems in retrieval. The emphasis on individual experiments evaluated in a common setting has proven to be very successful. In the six years since the beginning of TREC, the state of the art in retrieval effectiveness has approximately doubled, and technology transfer among research labs and between research systems and commercial products has accelerated. In addition, TREC has sponsored the first large-scale evaluations of Chinese language retrieval, retrieval of speech and retrieval across different languages.

cross language evaluation forum | 2001

The Philosophy of Information Retrieval Evaluation

Ellen M. Voorhees

Evaluation conferences such as TREC, CLEF, and NTCIR are modern examples of the Cranfield evaluation paradigm. In Cranfield, researchers perform experiments on test collections to compare the relative effectiveness of different retrieval approaches. The test collections allow the researchers to control the effects of different system parameters, increasing the power and decreasing the cost of retrieval experiments as compared to user-based evaluations. This paper reviews the fundamental assumptions and appropriate uses of the Cranfield paradigm, especially as they apply in the context of the evaluation conferences.

international acm sigir conference on research and development in information retrieval | 2000

Building a question answering test collection

Ellen M. Voorhees; Dawn M. Tice

The TREC-8 Question Answering (QA) Track was the first large-scale evaluation of domain-independent question answering systems. In addition to fostering research on the QA task, the track was used to investigate whether the evaluation methodology used for document retrieval is appropriate for a different natural language processing task. As with document relevance judging, assessors had legitimate differences of opinions as to whether a response actually answers a question, but comparative evaluation of QA systems was stable despite these differences. Creating a reusable QA test collection is fundamentally more difficult than creating a document retrieval test collection since the QA task has no equivalent to document identifiers.

international acm sigir conference on research and development in information retrieval | 1995

Learning collection fusion strategies

Ellen M. Voorhees; Narendra K. Gupta; Ben Johnson-Laird

Collection fusion is a data fusion problem in which the results of retrieval runs on separate, autonomous document collections must be merged to produce a single, effective result. This paper explores two collection fusion techniques that learn the rmrnber of documents to retrieve from each collection using only the ranked lists of documents returned in response to past queries and those documents! relevance judgments. Retrieval experiments using the TREC test co]lection demonstrate that the effectiveness of the fusion techniques is within 10’?%of the effectiveness of a run in which the entire set of documents is treated as a single collection.

international acm sigir conference on research and development in information retrieval | 2003

Challenges in information retrieval and language modeling: report of a workshop held at the center for intelligent information retrieval, University of Massachusetts Amherst, September 2002

James Allan; Jay Aslam; Nicholas J. Belkin; Chris Buckley; James P. Callan; W. Bruce Croft; Susan T. Dumais; Norbert Fuhr; Donna Harman; David J. Harper; Djoerd Hiemstra; Thomas Hofmann; Eduard H. Hovy; Wessel Kraaij; John D. Lafferty; Victor Lavrenko; David Lewis; Liz Liddy; R. Manmatha; Andrew McCallum; Jay M. Ponte; John M. Prager; Dragomir R. Radev; Philip Resnik; Stephen E. Robertson; Ron G. Rosenfeld; Salim Roukos; Mark Sanderson; Richard M. Schwartz; Amit Singhal

Information retrieval (IR) research has reached a point where it is appropriate to assess progress and to define a research agenda for the next five to ten years. This report summarizes a discussion of IR research challenges that took place at a recent workshop. The attendees of the workshop considered information retrieval research in a range of areas chosen to give broad coverage of topic areas that engage information retrieval researchers. Those areas are retrieval models, cross-lingual retrieval, Web search, user modeling, filtering, topic detection and tracking, classification, summarization, question answering, metasearch, distributed retrieval, multimedia retrieval, information extraction, as well as testbed requirements for future work. The potential use of language modeling techniques in these areas was also discussed. The workshop identified major challenges within each of those areas. The following are recurring themes that ran throughout: • User and context sensitive retrieval • Multi-lingual and multi-media issues • Better target tasks • Improved objective evaluations • Substantially more labeled data • Greater variety of data sources • Improved formal models Contextual retrieval and global information access were identified as particularly important long-term challenges.

Explore More