David D. Lewis
University of Chicago
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by David D. Lewis.
international acm sigir conference on research and development in information retrieval | 1992
David D. Lewis
Syntactic phrase indexing and term clustering have been widely explored as text representation techniques for text retrieval. In this paper we study the properties of phrasal and clustered indexing languages on a text categorization task, enabling us to study their properties in isolation from query interpretation issues. We show that optimal effectiveness occurs when using only a small proportion of the indexing terms available, and that effectiveness peaks at a higher feature set size and lower effectiveness level for a syntactic phrase indexing than for word-based indexing. We also present results suggesting that traditional term clustering method are unlikely to provide significantly improved text representations. An improved probabilistic text categorization method is also presented.
human language technology | 1992
David D. Lewis
The effect of selecting varying numbers and kinds of features for use in predicting category membership was investigated on the Reuters and MUC-3 text categorization data sets. Good categorization performance was achieved using a statistical classifier and a proportional assignment strategy. The optimal feature set size for word-based indexing was found to be surprisingly low (10 to 15 features) despite the large training sets. The extraction of new text features by syntactic analysis and feature clustering was investigated on the Reuters data set. Syntactic indexing phrases, clusters of these phrases, and clusters of words were all found to provide less effective representations than individual words.
human language technology | 1991
David D. Lewis
While certain standard procedures are widely used for evaluating text retrieval systems and algorithms, the same is not true for text categorization. Omission of important data from reports is common and methods of measuring effectiveness vary widely. This has made judging the relative merits of techniques for text categorization difficult and has disguised important research issues.In this paper I discuss a variety of ways of evaluating the effectiveness of text categorization systems, drawing both on reported categorization experiments and on methods used in evaluating query-driven retrieval. I also consider the extent to which the same evaluation methods may be used with systems for text extraction, a more complex task. In evaluating either kind of system, the purpose for which the output is to be used is crucial in choosing appropriate evaluation methods.
human language technology | 1990
David D. Lewis
The way in which text is represented has a strong impact on the performance of text classification (retrieval and categorization) systems. We discuss the operation of text classification systems, introduce a theoretical model of how text representation impacts their performance, and describe how the performance of text classification systems is evaluated. We then present the results of an experiment on improving text representation quality, as well as an analysis of the results and the directions they suggest for future research.
MUC3 '91 Proceedings of the 3rd conference on Message understanding | 1991
David D. Lewis
The data extraction systems studied in the MUC-3 evaluation perform a variety of subtasks in filling out templates. Some of these tasks are quite complex, and seem to require a system to represent the structure of a text in some detail to perform the task successfully. Capturing reference relations between slot fillers, distinguishing between historic and recent events, and many other subtasks appear to have this character.
Archive | 1994
David D. Lewis; Marc Ringuette
Archive | 1991
David D. Lewis
Computational Linguistics | 1993
Nancy Chinchor; David D. Lewis; Lynette Hirschman
international conference on machine learning | 1991
David D. Lewis
MUC4 '92 Proceedings of the 4th conference on Message understanding | 1992
David D. Lewis; Richard M. Tong