Mark Kantrowitz | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mark Kantrowitz is active.

Explore More

Publication

Featured researches published by Mark Kantrowitz.

international acm sigir conference on research and development in information retrieval | 1999

Summarizing text documents: sentence selection and evaluation metrics

Jade Goldstein; Mark Kantrowitz; Vibhu O. Mittal; Jaime G. Carbonell

Human-quality text summarization systems are di cult to design, and even more di cult to evaluate, in part because documents can di er along several dimensions, such as length, writing style and lexical usage. Nevertheless, certain cues can often help suggest the selection of sentences for inclusion in a summary. This paper presents our analysis of news-article summaries generated by sentence selection. Sentences are ranked for potential inclusion in the summary using a weighted combination of statistical and linguistic features. The statistical features were adapted from standard IR methods. The potential linguistic ones were derived from an analysis of news-wire summaries. To evaluate these features we use a normalized version of precision-recall curves, with a baseline of random sentence selection, as well as analyze the properties of such a baseline. We illustrate our discussions with empirical results showing the importance of corpus-dependent baseline summarization standards, compression ratios and carefully crafted long queries.

north american chapter of the association for computational linguistics | 2000

Multi-document summarization by sentence extraction

Jade Goldstein; Vibhu O. Mittal; Jaime G. Carbonell; Mark Kantrowitz

This paper discusses a text extraction approach to multi-document summarization that builds on single-document summarization methods by using additional, available information about the document set as a whole and the relationships between the documents. Multi-document summarization differs from single in that the issues of compression, speed, redundancy and passage selection are critical in the formation of useful summaries. Our approach addresses these issues by using domain-independent techniques based mainly on fast, statistical processing, a metric for reducing redundancy and maximizing diversity in the selected passages, and a modular framework to allow easy parameterization for different genres, corpora characteristics and user requirements.

natural language generation | 1992

Integrated Natural Language Generation Systems

Mark Kantrowitz; Joseph Bates

Many existing natural language generation systems can be characterized according to their modularization as either pipelined or interleaved. In these separated systems, the generator is divided into several modules (e.g., planning and realization), with control and information passing between the modules during the generation process. This paper proposes a third type of generator, which we call integrated, that unifies the modules into a single mechanism. The mechanism uses a small set of orthogonal basic operations to produce planned and grammatical language output. Integrated systems are conceptually attractive and may support generation of pragmatic effects more effectively than other systems. After discussing the advantages of the integrated approach, we summarize GLINDA, an integrated generator currently under development at Carnegie Mellon. GLINDA is the generator used for narration and intercharacter communication in the Oz Interactive Fiction and Virtual Reality Project.

international acm sigir conference on research and development in information retrieval | 2000

Stemming and its effects on TFIDF ranking (poster session)

Mark Kantrowitz; Behrang Mohit; Vibhu O. Mittal

High precision IR is often hard for a variety of reasons; one of these is the large number of morphological variants for any given term. To address some of the issues arising from a mismatch between different word forms used in the queries and the relevant documents, researchers have long proposed the use of various stemming algorithms to reduce terms to a common base form. Stemming, in general, has not been an unmitigated success in improving IR. This poster argues that stemming can help in certain contexts and that an empirical investigation of the relationship between stemming performance and retrieval performance can be valuable. We extend previously reported work on stemming and IR (e.g., [2,4,5]) by using a novel, dictionary based “perfect” stemmer, which can be parametrized for different accuracy and coverage levels. This allows us to measurechanges in IR performance for any given change in stemming performance on a given data set. To place this work in context, we discuss an empirical evaluation of stemming accuracy for three stemming algorithms – including the widely used Porter algorithm [9]. Section 2 briefly discusses the three variants of stemming, and presents experimental evidence for their relative coverage and accuracy. Section 3 discusses the use of these stemmers for IR and presents some of our findings. Finally, Section 4 concludes with a discussion of our results and possible future directions.

Intelligence\/sigart Bulletin | 1994

Book review: Generating Referring Expressions Constructing Descriptions in a Domain of Objects and Processes by Robert Dale (MIT Press, Cambridge, MA, 1992)

Mark Kantrowitz

This book is an updated version of Dales 1988 PhD thesis at the University of Edinburghs Centre for Cognitive Science. It will be of interest to all natural language generation researchers because it serves as a practical guide to the theory and methods of generating referring expressions. The key contributions of this work are the ontology and the algorithms for generating referring expressions.

national conference on artificial intelligence | 1999