Bill Dolan | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Bill Dolan is active.

Explore More

Publication

Featured researches published by Bill Dolan.

meeting of the association for computational linguistics | 2007

The Third PASCAL Recognizing Textual Entailment Challenge

Danilo Giampiccolo; Bernardo Magnini; Ido Dagan; Bill Dolan

This paper presents the Third PASCAL Recognising Textual Entailment Challenge (RTE-3), providing an overview of the dataset creating methodology and the submitted systems. In creating this years dataset, a number of longer texts were introduced to make the challenge more oriented to realistic scenarios. Additionally, a pool of resources was offered so that the participants could share common tools. A pilot task was also set up, aimed at differentiating unknown entailments from identified contradictions and providing justifications for overall system decisions. 26 participants submitted 44 runs, using different approaches and generally presenting new entailment models and achieving higher scores than in the previous challenges.

international conference on computational linguistics | 2004

Unsupervised construction of large paraphrase corpora: exploiting massively parallel news sources

Bill Dolan; Chris Quirk; Chris Brockett

We investigate unsupervised techniques for acquiring monolingual sentence-level paraphrases from a corpus of temporally and topically clustered news articles collected from thousands of web-based news sources. Two techniques are employed: (1) simple string edit distance, and (2) a heuristic strategy that pairs initial (presumably summary) sentences from different news stories in the same cluster. We evaluate both datasets using a word alignment algorithm and a metric borrowed from machine translation. Results show that edit distance data is cleaner and more easily-aligned than the heuristic data, with an overall alignment error rate (AER) of 11.58% on a similarly-extracted test set. On test data extracted by the heuristic strategy, however, performance of the two training sets is similar, with AERs of 13.2% and 14.7% respectively. Analysis of 100 pairs of sentences from each set reveals that the edit distance data lacks many of the complex lexical and syntactic alternations that characterize monolingual paraphrase. The summary sentences, while less readily alignable, retain more of the non-trivial alternations that are of greatest interest learning paraphrase relationships.

north american chapter of the association for computational linguistics | 2015

A Neural Network Approach to Context-Sensitive Generation of Conversational Responses

Alessandro Sordoni; Michel Galley; Michael Auli; Chris Brockett; Yangfeng Ji; Margaret Mitchell; Jian-Yun Nie; Jianfeng Gao; Bill Dolan

We present a novel response generation system that can be trained end to end on large quantities of unstructured Twitter conversations. A neural network architecture is used to address sparsity issues that arise when integrating contextual information into classic statistical models, allowing the system to take into account previous dialog utterances. Our dynamic-context generative models show consistent gains over both context-sensitive and non-context-sensitive Machine Translation and Information Retrieval baselines.

Natural Language Engineering | 2009

Recognizing textual entailment: Rational, evaluation and approaches

Ido Dagan; Bill Dolan; Bernardo Magnini; Dan Roth

The goal of identifying textual entailment – whether one piece of text can be plausibly inferred from another – has emerged in recent years as a generic core problem in natural language understanding. Work in this area has been largely driven by the PASCAL Recognizing Textual Entailment (RTE) challenges, which are a series of annual competitive meetings. The current work exhibits strong ties to some earlier lines of research, particularly automatic acquisition of paraphrases and lexical semantic relationships and unsupervised inference in applications such as question answering, information extraction and summarization. It has also opened the way to newer lines of research on more involved inference methods, on knowledge representations needed to support this natural language understanding challenge and on the use of learning methods in this context. RTE has fostered an active and growing community of researchers focused on the problem of applied entailment. This special issue of the JNLE provides an opportunity to showcase some of the most important work in this emerging area.

meeting of the association for computational linguistics | 2016

A Persona-Based Neural Conversation Model

Jiwei Li; Michel Galley; Chris Brockett; Georgios P. Spithourakis; Jianfeng Gao; Bill Dolan

We present persona-based models for handling the issue of speaker consistency in neural response generation. A speaker model encodes personas in distributed embeddings that capture individual characteristics such as background information and speaking style. A dyadic speaker-addressee model captures properties of interactions between two interlocutors. Our models yield qualitative performance improvements in both perplexity and BLEU scores over baseline sequence-to-sequence models, with similar gains in speaker consistency as measured by human judges.

north american chapter of the association for computational linguistics | 2015

SemEval-2015 Task 1: Paraphrase and Semantic Similarity in Twitter (PIT)

Wei Xu; Chris Callison-Burch; Bill Dolan

In this shared task, we present evaluations on two related tasks Paraphrase Identification (PI) and Semantic Textual Similarity (SS) systems for the Twitter data. Given a pair of sentences, participants are asked to produce a binary yes/no judgement or a graded score to measure their semantic equivalence. The task features a newly constructed Twitter Paraphrase Corpus that contains 18,762 sentence pairs. A total of 19 teams participated, submitting 36 runs to the PI task and 26 runs to the SS task. The evaluation shows encouraging results and open challenges for future research. The best systems scored a F1-measure of 0.674 for the PI task and a Pearson correlation of 0.619 for the SS task respectively, comparing to a strong baseline using logistic regression model of 0.589 F1 and 0.511 Pearson; while the best SS systems can often reach >0.80 Pearson on well-formed text. This shared task also provides insights into the relation between the PI and SS tasks and suggests the importance to bringing these two research areas together. We make all the data, baseline systems and evaluation scripts publicly available. 1

international joint conference on natural language processing | 2015

deltaBLEU: A Discriminative Metric for Generation Tasks with Intrinsically Diverse Targets

Michel Galley; Chris Brockett; Alessandro Sordoni; Yangfeng Ji; Michael Auli; Chris Quirk; Margaret Mitchell; Jianfeng Gao; Bill Dolan

We introduce Discriminative BLEU (∆BLEU), a novel metric for intrinsic evaluation of generated text in tasks that admit a diverse range of possible outputs. Reference strings are scored for quality by human raters on a scale of [−1, +1] to weight multi-reference BLEU. In tasks involving generation of conversational responses, ∆BLEU correlates reasonably with human judgments and outperforms sentence-level and IBM BLEU in terms of both Spearman’s ρ and Kendall’s τ .

Natural Language Engineering | 2010

Recognizing textual entailment: Rational, evaluation and approaches – Erratum

Ido Dagan; Bill Dolan; Bernardo Magnini; Dan Roth

Due to publisher error, this article was omitted from the printed issue of Natural Language Engineering volume 15 issue 4. It is published online in the correct volume ( journals.cambridge.org/nle ) and also printed here in volume 16 issue 1. Sincere apologies are extended to the authors for this error.

human factors in computing systems | 2018

Emotional Dialogue Generation using Image-Grounded Language Models

Bernd Huber; Daniel McDuff; Chris Brockett; Michel Galley; Bill Dolan

Computer-based conversational agents are becoming ubiquitous. However, for these systems to be engaging and valuable to the user, they must be able to express emotion, in addition to providing informative responses. Humans rely on much more than language during conversations; visual information is key to providing context. We present the first example of an image-grounded conversational agent using visual sentiment, facial expression and scene features. We show that key qualities of the generated dialogue can be manipulated by the features used for training the agent. We evaluate our model on a large and very challenging real-world dataset of conversations from social media (Twitter). The image-grounding leads to significantly more informative, emotional and specific responses, and the exact qualities can be tuned depending on the image features used. Furthermore, our model improves the objective quality of dialogue responses when evaluated on standard natural language metrics.

north american chapter of the association for computational linguistics | 2010