Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where David D. Lewis is active.

Publication


Featured researches published by David D. Lewis.


international acm sigir conference on research and development in information retrieval | 1992

An evaluation of phrasal and clustered representations on a text categorization task

David D. Lewis

Syntactic phrase indexing and term clustering have been widely explored as text representation techniques for text retrieval. In this paper we study the properties of phrasal and clustered indexing languages on a text categorization task, enabling us to study their properties in isolation from query interpretation issues. We show that optimal effectiveness occurs when using only a small proportion of the indexing terms available, and that effectiveness peaks at a higher feature set size and lower effectiveness level for a syntactic phrase indexing than for word-based indexing. We also present results suggesting that traditional term clustering method are unlikely to provide significantly improved text representations. An improved probabilistic text categorization method is also presented.


human language technology | 1992

Feature selection and feature extraction for text categorization

David D. Lewis

The effect of selecting varying numbers and kinds of features for use in predicting category membership was investigated on the Reuters and MUC-3 text categorization data sets. Good categorization performance was achieved using a statistical classifier and a proportional assignment strategy. The optimal feature set size for word-based indexing was found to be surprisingly low (10 to 15 features) despite the large training sets. The extraction of new text features by syntactic analysis and feature clustering was investigated on the Reuters data set. Syntactic indexing phrases, clusters of these phrases, and clusters of words were all found to provide less effective representations than individual words.


human language technology | 1991

Evaluating text categorization

David D. Lewis

While certain standard procedures are widely used for evaluating text retrieval systems and algorithms, the same is not true for text categorization. Omission of important data from reports is common and methods of measuring effectiveness vary widely. This has made judging the relative merits of techniques for text categorization difficult and has disguised important research issues.In this paper I discuss a variety of ways of evaluating the effectiveness of text categorization systems, drawing both on reported categorization experiments and on methods used in evaluating query-driven retrieval. I also consider the extent to which the same evaluation methods may be used with systems for text extraction, a more complex task. In evaluating either kind of system, the purpose for which the output is to be used is crucial in choosing appropriate evaluation methods.


human language technology | 1990

Representation quality in text classification: an introduction and experiment

David D. Lewis

The way in which text is represented has a strong impact on the performance of text classification (retrieval and categorization) systems. We discuss the operation of text classification systems, introduce a theoretical model of how text representation impacts their performance, and describe how the performance of text classification systems is evaluated. We then present the results of an experiment on improving text representation quality, as well as an analysis of the results and the directions they suggest for future research.


MUC3 '91 Proceedings of the 3rd conference on Message understanding | 1991

Data extraction as text categorization: an experiment with the MUC-3 corpus

David D. Lewis

The data extraction systems studied in the MUC-3 evaluation perform a variety of subtasks in filling out templates. Some of these tasks are quite complex, and seem to require a system to represent the structure of a text in some detail to perform the task successfully. Capturing reference relations between slot fillers, distinguishing between historic and recent events, and many other subtasks appear to have this character.


Archive | 1994

A comparison of two learning algorithms for text categorization

David D. Lewis; Marc Ringuette


Archive | 1991

Representation and learning in information retrieval

David D. Lewis


Computational Linguistics | 1993

Evaluating message understanding systems: an analysis of the third message understanding conference (MUC-3)

Nancy Chinchor; David D. Lewis; Lynette Hirschman


international conference on machine learning | 1991

Learning in intelligent information retrieval

David D. Lewis


MUC4 '92 Proceedings of the 4th conference on Message understanding | 1992

Text filtering in MUC-3 and MUC-4

David D. Lewis; Richard M. Tong

Collaboration


Dive into the David D. Lewis's collaboration.

Top Co-Authors

Avatar

Nancy Chinchor

Science Applications International Corporation

View shared research outputs
Researchain Logo
Decentralizing Knowledge