Rebecca J. Passonneau

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Rebecca J. Passonneau is active.

Explore More

Publication

Featured researches published by Rebecca J. Passonneau.

north american chapter of the association for computational linguistics | 2004

Evaluating Content Selection in Summarization: The Pyramid Method

Ani Nenkova; Rebecca J. Passonneau

We present an empirically grounded method for evaluating content selection in summarization. It incorporates the idea that no single best model summary for a collection of documents exists. Our method quantifies the relative importance of facts to be conveyed. We argue that it is reliable, predictive and diagnostic, thus improves considerably over the shortcomings of the human evaluation method currently used in the Document Understanding Conference.

ACM Transactions on Speech and Language Processing | 2007

The Pyramid Method: Incorporating human content selection variation in summarization evaluation

Ani Nenkova; Rebecca J. Passonneau; Kathleen R. McKeown

Human variation in content selection in summarization has given rise to some fundamental research questions: How can one incorporate the observed variation in suitable evaluation measures? How can such measures reflect the fact that summaries conveying different content can be equally good and informative? In this article, we address these very questions by proposing a method for analysis of multiple human abstracts into semantic content units. Such analysis allows us not only to quantify human variation in content selection, but also to assign empirical importance weight to different content units. It serves as the basis for an evaluation method, the Pyramid Method, that incorporates the observed variation and is predictive of different equally informative summaries. We discuss the reliability of content unit annotation, the properties of Pyramid scores, and their correlation with other evaluation methods.

meeting of the association for computational linguistics | 1995

Combining Multiple Knowledge Sources for Discourse Segmentation

Diane J. Litman; Rebecca J. Passonneau

We predict discourse segment boundaries from linguistic features of utterances, using a corpus of spoken narratives as data. We present two methods for developing segmentation algorithms from training data: hand tuning and machine learning. When multiple types of features are used, results approach human performance on an independent test set (both methods), and using cross-validation (machine learning).

language resources and evaluation | 2006

Measuring Agreement on Set-valued Items (MASI) for Semantic and Pragmatic Annotation

Rebecca J. Passonneau

Annotation projects dealing with complex semantic or pragmatic phenomena face the dilemma of creating annotation schemes that oversimplify the phenomena, or that capture distinctions conventional reliability metrics cannot measure adequately. The solution to the dilemma is to develop metrics that quantify the decisions that annotators are asked to make. This paper discusses MASI, distance metric for comparing sets, and illustrates its use in quantifying the reliability of a specific dataset. Annotations of Summary Content Units (SCUs) generate models referred to as pyramids which can be used to evaluate unseen human summaries or machine summaries. The paper presents reliability results for five pairs of pyramids created for document sets from the 2003 Document Understanding Conference (DUC). The annotators worked independently of each other. Differences between application of MASI to pyramid annotation and its previous application to co-reference annotation are discussed. In addition, it is argued that a paradigmatic reliability study should relate measures of inter-annotator agreement to independent assessments, such as significance tests of the annotated variables with respect to other phenomena. In effect, what counts as sufficiently reliable intera-annotator agreement depends on the use the annotated data will be put to.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2012

Machine Learning for the New York City Power Grid

Cynthia Rudin; David L. Waltz; Roger N. Anderson; Albert Boulanger; Ansaf Salleb-Aouissi; Maggie Chow; Haimonti Dutta; Philip Gross; Bert Huang; Steve Ierome; Delfina Isaac; Arthur Kressner; Rebecca J. Passonneau; Axinia Radeva; Leon Wu

Power companies can benefit from the use of knowledge discovery methods and statistical machine learning for preventive maintenance. We introduce a general process for transforming historical electrical grid data into models that aim to predict the risk of failures for components and systems. These models can be used directly by power companies to assist with prioritization of maintenance and repair work. Specialized versions of this process are used to produce (1) feeder failure rankings, (2) cable, joint, terminator, and transformer rankings, (3) feeder Mean Time Between Failure (MTBF) estimates, and (4) manhole events vulnerability rankings. The process in its most general form can handle diverse, noisy, sources that are historical (static), semi-real-time, or real-time, incorporates state-of-the-art machine learning algorithms for prioritization (supervised ranking or MTBF), and includes an evaluation of results via cross-validation and blind test. Above and beyond the ranked lists and MTBF estimates are business management interfaces that allow the prediction capability to be integrated directly into corporate planning and decision support; such interfaces rely on several important properties of our general modeling approach: that machine learning features are meaningful to domain experts, that the processing of data is transparent, and that prediction results are accurate enough to support sound decision making. We discuss the challenges in working with historical electrical grid data that were not designed for predictive purposes. The “rawness” of these data contrasts with the accuracy of the statistical models that can be obtained from the process; these models are sufficiently accurate to assist in maintaining New York Citys electrical grid.

language resources and evaluation | 2004

Computing Reliability for Coreference Annotation

Rebecca J. Passonneau

Coreference annotation is annotation of language corpora to indicate which expressions have been used to co-specify the same discourse entity. When annotations of the same data are collected from two or more coders, the reliability of the data may need to be quantified. Two obstacles have stood in the way of applying reliability metrics: incommensurate units across annotations, and lack of a convenient representation of the coding values. Given N coders and M coding units, reliability is computed from an N-by-M matrix that records the value assigned to unit Mj by coder Nk. The solution I present accommodates a wide range of coding choices for the annotator, while preserving the same units across codings. As a consequence, it permits a straightforward application of reliability measurement. In addition, in coreference annotation, disagreements can be complete or partial so I incorporate a distance metric to scale disagreements. This method has also been applied to a quite distinct coding task, namely semantic annotation of summaries.

Archive | 2005

Applying the Pyramid Method in DUC 2005

Kathleen R. McKeown; Rebecca J. Passonneau; Ani Nenkova; Sergey Sigelman

In DUC 2005, the pyramid method for content evaluation was used for the first time in a crosssite evaluation. We discuss the method used in creating pyramid models and performing peer annotation. Analysis of score averages for the peers indicates that the best systems score half as well as humans, and that systems can be grouped into better and worse performers. There were few significant differences among systems. High score correlations between sets from different annotators, and good interannotator agreement, indicate that participants can perform annotation reliably. We found that a modified pyramid score gave good results and would simplify peer annotation in the future.

meeting of the association for computational linguistics | 1993

TEMPORAL CENTERING

Megumi Kameyama; Rebecca J. Passonneau; Massimo Poesio

We present a semantic and pragmatic account of the anaphoric properties of past and perfect that improves on previous work by integrating discourse structure, aspectual type, surface structure and commonsense knowledge. A novel aspect of our account is that we distinguish between two kinds of temporal intervals in the interpretation of temporal operators --- discourse reference intervals and event intervals. This distinction makes it possible to develop an analogy between centering and temporal centering, which operates on discourse reference intervals. Our temporal property-sharing principle is a defeasible inference rule on the logical form. Along with lexical and causal reasoning, it plays a role in incrementally resolving underspecified aspects of the event structure representation of an utterance against the current context.

international joint conference on natural language processing | 2015

Abstractive Multi-Document Summarization via Phrase Selection and Merging

Lidong Bing; Piji Li; Yi Liao; Wai Lam; Weiwei Guo; Rebecca J. Passonneau

We propose an abstraction-based multi-document summarization framework that can construct new sentences by exploring more fine-grained syntactic units than sentences, namely, noun/verb phrases. Different from existing abstraction-based approaches, our method first constructs a pool of concepts and facts represented by phrases from the input documents. Then new sentences are generated by selecting and merging informative phrases to maximize the salience of phrases and meanwhile satisfy the sentence construction constraints. We employ integer linear optimization for conducting phrase selection and merging simultaneously in order to achieve the global optimal solution for a summary. Experimental results on the benchmark data set TAC 2011 show that our framework outperforms the state-of-the-art models under automated pyramid evaluation metric, and achieves reasonably well results on manual linguistic quality evaluation.

international acm sigir conference on research and development in information retrieval | 2005

Do Summaries Help? A Task-Based Evaluation of Multi-Document Summarization

Julia Hirschberg; Kathleen R. McKeown; Rebecca J. Passonneau; David K. Elson; Ani Nenkova

We describe a task-based evaluation to determine whether multi-document summaries measurably improve user performance when using online news browsing systems for directed research. We evaluated the multi-document summaries generated by Newsblaster, a robust news browsing system that clusters online news articles and summarizes multiple articles on each event. Four groups of subjects were asked to perform the same time-restricted fact-gathering tasks, reading news under different conditions: no summaries at all, single sentence summaries drawn from one of the articles, Newsblaster multi-document summaries, and human summaries. Our results show that, in comparison to source documents only, the quality of reports assembled using Newsblaster summaries was significantly better and user satisfaction was higher with both Newsblaster and human summaries.

Explore More