Chris Brockett | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Chris Brockett is active.

Explore More

Publication

Featured researches published by Chris Brockett.

international conference on computational linguistics | 2004

Unsupervised construction of large paraphrase corpora: exploiting massively parallel news sources

Bill Dolan; Chris Quirk; Chris Brockett

We investigate unsupervised techniques for acquiring monolingual sentence-level paraphrases from a corpus of temporally and topically clustered news articles collected from thousands of web-based news sources. Two techniques are employed: (1) simple string edit distance, and (2) a heuristic strategy that pairs initial (presumably summary) sentences from different news stories in the same cluster. We evaluate both datasets using a word alignment algorithm and a metric borrowed from machine translation. Results show that edit distance data is cleaner and more easily-aligned than the heuristic data, with an overall alignment error rate (AER) of 11.58% on a similarly-extracted test set. On test data extracted by the heuristic strategy, however, performance of the two training sets is similar, with AERs of 13.2% and 14.7% respectively. Analysis of 100 pairs of sentences from each set reveals that the edit distance data lacks many of the complex lexical and syntactic alternations that characterize monolingual paraphrase. The summary sentences, while less readily alignable, retain more of the non-trivial alternations that are of greatest interest learning paraphrase relationships.

north american chapter of the association for computational linguistics | 2015

A Neural Network Approach to Context-Sensitive Generation of Conversational Responses

Alessandro Sordoni; Michel Galley; Michael Auli; Chris Brockett; Yangfeng Ji; Margaret Mitchell; Jian-Yun Nie; Jianfeng Gao; Bill Dolan

We present a novel response generation system that can be trained end to end on large quantities of unstructured Twitter conversations. A neural network architecture is used to address sparsity issues that arise when integrating contextual information into classic statistical models, allowing the system to take into account previous dialog utterances. Our dynamic-context generative models show consistent gains over both context-sensitive and non-context-sensitive Machine Translation and Information Retrieval baselines.

Information Processing and Management | 2007

Beyond SumBasic: Task-focused summarization with sentence simplification and lexical expansion

Lucy Vanderwende; Hisami Suzuki; Chris Brockett; Ani Nenkova

Abstract In recent years, there has been increased interest in topic-focused multi-document summarization. In this task, automatic summaries are produced in response to a specific information request, or topic, stated by the user. The system we have designed to accomplish this task comprises four main components: a generic extractive summarization system, a topic-focusing component, sentence simplification, and lexical expansion of topic words. This paper details each of these components, together with experiments designed to quantify their individual contributions. We include an analysis of our results on two large datasets commonly used to evaluate task-focused summarization, the DUC2005 and DUC2006 datasets, using automatic metrics. Additionally, we include an analysis of our results on the DUC2006 task according to human evaluation metrics. In the human evaluation of system summaries compared to human summaries, i.e., the Pyramid method, our system ranked first out of 22 systems in terms of overall mean Pyramid score; and in the human evaluation of summary responsiveness to the topic, our system ranked third out of 35 systems.

meeting of the association for computational linguistics | 2006

Correcting ESL Errors Using Phrasal SMT Techniques

Chris Brockett; William B. Dolan; Michael Gamon

This paper presents a pilot study of the use of phrasal Statistical Machine Translation (SMT) techniques to identify and correct writing errors made by learners of English as a Second Language (ESL). Using examples of mass noun errors found in the Chinese Learner Error Corpus (CLEC) to guide creation of an engineered training set, we show that application of the SMT paradigm can capture errors not well addressed by widely-used proofing tools designed for native speakers. Our system was able to correct 61.81% of mistakes in a set of naturally-occurring examples of mass noun errors found on the World Wide Web, suggesting that efforts to collect alignable corpora of pre- and post-editing ESL writing samples offer can enable the development of SMT-based writing assistance tools capable of repairing many of the complex syntactic and lexical problems found in the writing of ESL learners.

meeting of the association for computational linguistics | 2016

A Persona-Based Neural Conversation Model

Jiwei Li; Michel Galley; Chris Brockett; Georgios P. Spithourakis; Jianfeng Gao; Bill Dolan

We present persona-based models for handling the issue of speaker consistency in neural response generation. A speaker model encodes personas in distributed embeddings that capture individual characteristics such as background information and speaking style. A dyadic speaker-addressee model captures properties of interactions between two interlocutors. Our models yield qualitative performance improvements in both perplexity and BLEU scores over baseline sequence-to-sequence models, with similar gains in speaker consistency as measured by human judges.

meeting of the association for computational linguistics | 2001

A Machine Learning Approach to the Automatic Evaluation of Machine Translation

Simon Corston-Oliver; Michael Gamon; Chris Brockett

We present a machine learning approach to evaluating the well-formedness of output of a machine translation system, using classifiers that learn to distinguish human reference translations from machine translations. This approach can be used to evaluate an MT system, tracking improvements over time; to aid in the kind of failure analysis that can help guide system development; and to select among alternative output strings. The method presented is fully automated and independent of source language, target language and domain.

international joint conference on natural language processing | 2015

deltaBLEU: A Discriminative Metric for Generation Tasks with Intrinsically Diverse Targets

Michel Galley; Chris Brockett; Alessandro Sordoni; Yangfeng Ji; Michael Auli; Chris Quirk; Margaret Mitchell; Jianfeng Gao; Bill Dolan

We introduce Discriminative BLEU (∆BLEU), a novel metric for intrinsic evaluation of generated text in tasks that admit a diverse range of possible outputs. Reference strings are scored for quality by human raters on a scale of [−1, +1] to weight multi-reference BLEU. In tasks involving generation of conversational responses, ∆BLEU correlates reasonably with human judgments and outperforms sentence-level and IBM BLEU in terms of both Spearman’s ρ and Kendall’s τ .

workshop on innovative use of nlp for building educational applications | 2009

User Input and Interactions on Microsoft Research ESL Assistant

Claudia Leacock; Michael Gamon; Chris Brockett

ESL Assistant is a prototype web-based writing-assistance tool that is being developed for English Language Learners. The system focuses on types of errors that are typically made by non-native writers of American English. A freely-available prototype was deployed in June 2008. User data from this system are manually evaluated to identify writing domain and measure system accuracy. Combining the user log data with the evaluated rewrite suggestions enables us to determine how effectively English language learners are using the system, across rule types and across writing domains. We find that repeat users typically make informed choices and can distinguish correct suggestions from incorrect.

international conference on computational linguistics | 2002

English-Japanese example-based machine translation using abstract linguistic representations

Chris Brockett; Takako Aikawa; Anthony Aue; Arul Menezes; Chris Quirk; Hisami Suzuki

This presentation describes an example-based English-Japanese machine translation system in which an abstract linguistic representation layer is used to extract and store bilingual translation knowledge, transfer patterns between languages, and generate output strings. Abstraction permits structural neutralizations that facilitate learning of translation examples across languages with radically different surface structure characteristics, and allows MT development to proceed within a largely language-independent NLP architecture. Comparative evaluation indicates that after training in a domain the English-Japanese system is statistically indistinguishable from a non-customized commercially available MT system in the same domain.

international conference on computational linguistics | 2000

Using a broad-coverage parser for word-breaking in Japanese

Hisami Suzuki; Chris Brockett; Gary Kacmarcik

We describe a method of word segmentation in Japanese in which a broad-coverage parser selects the best word sequence while producing a syntactic analysis. This technique is substantially different from traditional statistics- or heuristics-based models which attempt to select the best word sequence before handing it to the syntactic component. By breaking up the task of finding the best word sequence into the identification of words (in the word-breaking component) and the selection of the best sequence (a by-product of parsing), we have been able to simplify the task of each component and achieve high accuracy over a wide variety of data. Word-breaking accuracy of our system is currently around 97-98%.

Explore More