Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Ellie Pavlick is active.

Publication


Featured researches published by Ellie Pavlick.


international joint conference on natural language processing | 2015

PPDB 2.0: Better paraphrase ranking, fine-grained entailment relations, word embeddings, and style classification

Ellie Pavlick; Pushpendre Rastogi; Juri Ganitkevitch; Benjamin Van Durme; Chris Callison-Burch

We present a new release of the Paraphrase Database. PPDB 2.0 includes a discriminatively re-ranked set of paraphrases that achieve a higher correlation with human judgments than PPDB 1.0’s heuristic rankings. Each paraphrase pair in the database now also includes finegrained entailment relations, word embedding similarities, and style annotations.


international joint conference on natural language processing | 2015

Adding Semantics to Data-Driven Paraphrasing

Ellie Pavlick; Johannes Bos; Malvina Nissim; Charley Beller; Benjamin Van Durme; Chris Callison-Burch

We add an interpretable semantics to the paraphrase database (PPDB). To date, the relationship between phrase pairs in the database has been weakly defined as approximately equivalent. We show that these pairs represent a variety of relations, including directed entailment (little girl/girl) and exclusion (nobody/someone). We automatically assign semantic entailment relations to entries in PPDB using features derived from past work on discovering inference rules from text and semantic taxonomy induction. We demonstrate that our model assigns these relations with high accuracy. In a downstream RTE task, our labels rival relations from WordNet and improve the coverage of a proof-based RTE system by 17%.


north american chapter of the association for computational linguistics | 2015

Inducing Lexical Style Properties for Paraphrase and Genre Differentiation.

Ellie Pavlick; Ani Nenkova

We present an intuitive and effective method for inducing style scores on words and phrases. We exploit signal in a phrase’s rate of occurrence across stylistically contrasting corpora, making our method simple to implement and efficient to scale. We show strong results both intrinsically, by correlation with human judgements, and extrinsically, in applications to genre analysis and paraphrasing.


international joint conference on natural language processing | 2015

FrameNet+: Fast Paraphrastic Tripling of FrameNet

Ellie Pavlick; Travis Wolfe; Pushpendre Rastogi; Chris Callison-Burch; Mark Dredze; Benjamin Van Durme

We increase the lexical coverage of FrameNet through automatic paraphrasing. We use crowdsourcing to manually filter out bad paraphrases in order to ensure a high-precision resource. Our expanded FrameNet contains an additional 22K lexical units, a 3-fold increase over the current FrameNet, and achieves 40% better coverage when evaluated in a practical setting on New York Times data.


meeting of the association for computational linguistics | 2014

Are Two Heads Better than One? Crowdsourced Translation via a Two-Step Collaboration of Non-Professional Translators and Editors

Rui Yan; Mingkun Gao; Ellie Pavlick; Chris Callison-Burch

Crowdsourcing is a viable mechanism for creating training data for machine translation. It provides a low cost, fast turnaround way of processing large volumes of data. However, when compared to professional translation, naive collection of translations from non-professionals yields low-quality results. Careful quality control is necessary for crowdsourcing to work well. In this paper, we examine the challenges of a two-step collaboration process with translation and post-editing by non-professionals. We develop graphbased ranking models that automatically select the best output from multiple redundant versions of translations and edits, and improves translation quality closer to professionals.


meeting of the association for computational linguistics | 2016

Simple PPDB: A Paraphrase Database for Simplification

Ellie Pavlick; Chris Callison-Burch

We release the Simple Paraphrase Database, a subset of of the Paraphrase Database (PPDB) adapted for the task of text simplification. We train a supervised model to associate simplification scores with each phrase pair, producing rankings competitive with state-of-theart lexical simplification models. Our new simplification database contains 4.5 million paraphrase rules, making it the largest available resource for lexical simplification.


meeting of the association for computational linguistics | 2016

Most "babies" are "little" and most "problems" are "huge": Compositional Entailment in Adjective-Nouns

Ellie Pavlick; Chris Callison-Burch

We examine adjective-noun (AN) composition in the task of recognizing textual entailment (RTE). We analyze behavior of ANs in large corpora and show that, despite conventional wisdom, adjectives do not always restrict the denotation of the nouns they modify. We use natural logic to characterize the variety of entailment relations that can result from AN composition. Predicting these relations depends on context and on commonsense knowledge, making AN composition especially challenging for current RTE systems. We demonstrate the inability of current stateof-the-art systems to handle AN composition in a simplified RTE task which involves the insertion of only a single word.


empirical methods in natural language processing | 2016

The Gun Violence Database: A new task and data set for NLP.

Ellie Pavlick; Heng Ji; Xiaoman Pan; Chris Callison-Burch

We argue that NLP researchers are especially well-positioned to contribute to the national discussion about gun violence. Reasoning about the causes and outcomes of gun violence is typically dominated by politics and emotion, and data-driven research on the topic is stymied by a shortage of data and a lack of federal funding. However, data abounds in the form of unstructured text from news articles across the country. This is an ideal application of NLP technologies, such as relation extraction, coreference resolution, and event detection. We introduce a new and growing dataset, the Gun Violence Database, in order to facilitate the adaptation of current NLP technologies to the domain of gun violence, thus enabling better social science research on this important and under-resourced problem.


conference on computer supported cooperative work | 2014

Crowdsourcing for grammatical error correction

Ellie Pavlick; Rui Yan; Chris Callison-Burch

We discuss the problem of grammatical error correction, which has gained attention for its usefulness both in the development of tools for learners of foreign languages and as a component of statistical machine translation systems. We believe the task of suggesting grammar and style corrections in writing is well suited to a crowdsourcing solution but is currently hindered by the difficulty of automatic quality control. In this proposal, we motivate the problem of grammatical error correction and outline the challenges of ensuring quality in a setting where traditional methods of aggregation (e.g. majority vote) fail to produce the desired results. We then propose a design for quality control and present preliminary results indicating the potential of crowd workers to provide a scalable solution.


north american chapter of the association for computational linguistics | 2015

Crowdsourcing for NLP

Chris Callison-Burch; Lyle H. Ungar; Ellie Pavlick

Crowdsourced applications to scientific problems is a hot research area, with over 10,000 publications in the past five years. Platforms such as Amazons Mechanical Turk and CrowdFlower provide researchers with easy access to large numbers of workers. The crowds vast supply of inexpensive, intelligent labor allows people to attack problems that were previously impractical and gives potential for detailed scientific inquiry of social, psychological, economic, and linguistic phenomena via massive sample sizes of human annotated data. We introduce crowdsourcing and describe how it is being used in both industry and academia. Crowdsourcing is valuable to computational linguists both (a) as a source of labeled training data for use in machine learning and (b) as a means of collecting computational social science data that link language use to underlying beliefs and behavior. We present case studies for both categories: (a) collecting labeled data for use in natural language processing tasks such as word sense disambiguation and machine translation and (b) collecting experimental data in the context of psychology; e.g. finding how word use varies with age, sex, personality, health, and happiness. We will also cover tools and techniques for crowdsourcing. Effectively collecting crowdsourced data requires careful attention to the collection process, through selection of appropriately qualified workers, giving clear instructions that are understandable to non-?experts, and performing quality control on the results to eliminate spammers who complete tasks randomly or carelessly in order to collect the small financial reward. We will introduce different crowdsourcing platforms, review privacy and institutional review board issues, and provide rules of thumb for cost and time estimates. Crowdsourced data also has a particular structure that raises issues in statistical analysis; we describe some of the key methods to address these issues. No prior exposure to the area is required.

Collaboration


Dive into the Ellie Pavlick's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Charley Beller

Johns Hopkins University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Anne Cocos

University of Pennsylvania

View shared research outputs
Top Co-Authors

Avatar

Heng Ji

Rensselaer Polytechnic Institute

View shared research outputs
Top Co-Authors

Avatar

Quanze Chen

University of Pennsylvania

View shared research outputs
Researchain Logo
Decentralizing Knowledge