Vibhu O. Mittal
Jordan University of Science and Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Vibhu O. Mittal.
international acm sigir conference on research and development in information retrieval | 1999
Jade Goldstein; Mark Kantrowitz; Vibhu O. Mittal; Jaime G. Carbonell
Human-quality text summarization systems are di cult to design, and even more di cult to evaluate, in part because documents can di er along several dimensions, such as length, writing style and lexical usage. Nevertheless, certain cues can often help suggest the selection of sentences for inclusion in a summary. This paper presents our analysis of news-article summaries generated by sentence selection. Sentences are ranked for potential inclusion in the summary using a weighted combination of statistical and linguistic features. The statistical features were adapted from standard IR methods. The potential linguistic ones were derived from an analysis of news-wire summaries. To evaluate these features we use a normalized version of precision-recall curves, with a baseline of random sentence selection, as well as analyze the properties of such a baseline. We illustrate our discussions with empirical results showing the importance of corpus-dependent baseline summarization standards, compression ratios and carefully crafted long queries.
north american chapter of the association for computational linguistics | 2000
Jade Goldstein; Vibhu O. Mittal; Jaime G. Carbonell; Mark Kantrowitz
This paper discusses a text extraction approach to multi-document summarization that builds on single-document summarization methods by using additional, available information about the document set as a whole and the relationships between the documents. Multi-document summarization differs from single in that the issues of compression, speed, redundancy and passage selection are critical in the formation of useful summaries. Our approach addresses these issues by using domain-independent techniques based mainly on fast, statistical processing, a metric for reducing redundancy and maximizing diversity in the selected passages, and a modular framework to allow easy parameterization for different genres, corpora characteristics and user requirements.
international acm sigir conference on research and development in information retrieval | 2000
Adam L. Berger; Rich Caruana; David A. Cohn; Dayne Freitag; Vibhu O. Mittal
This paper investigates whether a machine can automatically learn the task of finding, within a large collection of candidate responses, the answers to questions. The learning process consists of inspecting a collection of answered questions and characterizing the relation between question and answer with a statistical model. For the purpose of learning this relation, we propose two sources of data: Usenet FAQ documents and customer service call-center dialogues from a large retail company. We will show that the task of “answer-finding” differs from both document retrieval and tradition question-answering, presenting challenges different from those found in these problems. The central aim of this work is to discover, through theoretical and empirical investigation, those statistical techniques best suited to the answer-finding problem.
meeting of the association for computational linguistics | 2000
Michele Banko; Vibhu O. Mittal; Michael J. Witbrock
Extractive summarization techniques cannot generate document summaries shorter than a single sentence, something that is often required. An ideal summarization system would understand each document and generate an appropriate summary directly from the results of that understanding. A more practical approach to this problem results in the use of an approximation: viewing summarization as a problem analogous to statistical machine translation. The issue then becomes one of generating a target document in a more concise language from a source document in a more verbose language. This paper presents results on experiments using this approach, in which statistical models of the term selection and term ordering are jointly applied to produce summaries in a style learned from a training corpus.
international acm sigir conference on research and development in information retrieval | 2000
Adam L. Berger; Vibhu O. Mittal
We introduce OCELOT, a prototype system for automatically generating the “gist” of a web page by summarizing it. Although most text summarization research to date has focused on the task of news articles, web pages are quite different in both structure and content. Instead of coherent text with a well-defined discourse structure, they are more often likely to be a chaotic jumble of phrases, links, graphics and formatting commands. Such text provides little foothold for extractive summarization techniques, which attempt to generate a summary of a document by excerpting a contiguous, coherent span of text from it. This paper builds upon recent work in non-extractive summarization, producing the gist of a web page by “translating” it into a more concise representation rather than attempting to extract a text span verbatim. OCELOT uses probabilistic models to guide it in selecting and ordering words into a gist. This paper describes a technique for learning these models automatically from a collection of human-summarized web pages.
meeting of the association for computational linguistics | 2000
Adam L. Berger; Vibhu O. Mittal
This paper introduces a statistical model for query-relevant summarization: succinctly characterizing the relevance of a document to a query. Learning parameter values for the proposed model requires a large collection of summarized documents, which we do not have, but as a proxy, we use a collection of FAQ (frequently-asked question) documents. Taking a learning approach enables a principled, quantitative evaluation of the proposed system, and the results of some initial experiments---on a collection of Usenet FAQs and on a FAQ-like set of customer-submitted questions to several large retail companies---suggest the plausibility of learning for summarization.
computational intelligence | 2000
Shumeet Baluja; Vibhu O. Mittal; Rahul Sukthankar
This paper describes a machine learning approach to building an efficient and accurate name spotting system. Finding names in free text is an important task in many text‐based applications. Most previous approaches were based on hand‐crafted modules encoding language and genre‐specific knowledge. These approaches had at least two shortcomings: They required large amounts of time and expertise to develop and were not easily portable to new languages and genres. This paper describes an extensible system that automatically combines weak evidence from different, easily available sources: parts‐of‐speech tags, dictionaries, and surface‐level syntactic information such as capitalization and punctuation. Individually, each piece of evidence is insufficient for robust name detection. However, the combination of evidence, through standard machine learning techniques, yields a system that achieves performance equivalent to the best existing hand‐crafted approaches.
international acm sigir conference on research and development in information retrieval | 1999
Michael J. Witbrock; Vibhu O. Mittal
Abstract Using current extractive summarization techniques, it is impossible to produce a coherent document summary shorter than a single sentence, or to produce a summary that conforms to particular stylistic constraints. Ideally, one would prefer to understand the document, and to generate an appropriate summary directly from the results of that understanding. Absent a comprehensive natural language understanding system, an approximation must be used. This paper presents an alternative statistical model of a summarization process, which jointly applies statistical models of the term selection and term ordering process to produce brief coherent summaries in a style learned from a training corpus.
international acm sigir conference on research and development in information retrieval | 2000
Mark Kantrowitz; Behrang Mohit; Vibhu O. Mittal
High precision IR is often hard for a variety of reasons; one of these is the large number of morphological variants for any given term. To address some of the issues arising from a mismatch between different word forms used in the queries and the relevant documents, researchers have long proposed the use of various stemming algorithms to reduce terms to a common base form. Stemming, in general, has not been an unmitigated success in improving IR. This poster argues that stemming can help in certain contexts and that an empirical investigation of the relationship between stemming performance and retrieval performance can be valuable. We extend previously reported work on stemming and IR (e.g., [2,4,5]) by using a novel, dictionary based “perfect” stemmer, which can be parametrized for different accuracy and coverage levels. This allows us to measurechanges in IR performance for any given change in stemming performance on a given data set. To place this work in context, we discuss an empirical evaluation of stemming accuracy for three stemming algorithms – including the widely used Porter algorithm [9]. Section 2 briefly discusses the three variants of stemming, and presents experimental evidence for their relative coverage and accuracy. Section 3 discusses the use of these stemmers for IR and presents some of our findings. Finally, Section 4 concludes with a discussion of our results and possible future directions.
Expert Systems With Applications | 1995
Vibhu O. Mittal; Cécile Paris
Abstract Explanations for expert systems are best provided in context, and, recently, many systems have used some notion of context in different ways in their explanation module. For example, some explanation systems take into account a user model. Others generate an explanation depending on the preceding and current discourse. In this article, we bring together these different notions of context as elements of a global picture that might be taken into account by an explanation module, depending on the needs of the application and the user. We characterize each of these elements, describe the constraints they place on communication, and present examples to illustrate the points being made. We discuss the implications of these different aspects of context on the design of explanation facilities. Finally, we describe and illustrate with examples, an implemented intention-based planning framework for explanation that can take into account the different aspects of context discussed above.