Annie Louis | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Annie Louis is active.

Explore More

Publication

Featured researches published by Annie Louis.

empirical methods in natural language processing | 2009

Automatically Evaluating Content Selection in Summarization without Human Models

Annie Louis; Ani Nenkova

We present a fully automatic method for content selection evaluation in summarization that does not require the creation of human model summaries. Our work capitalizes on the assumption that the distribution of words in the input and an informative summary of that input should be similar to each other. Results on a large scale evaluation from the Text Analysis Conference show that input-summary comparisons are very effective for the evaluation of content selection. Our automatic methods rank participating systems similarly to manual model-based pyramid evaluation and to manual human judgments of responsiveness. The best feature, Jensen-Shannon divergence, leads to a correlation as high as 0.88 with manual pyramid and 0.73 with responsiveness evaluations.

Computational Linguistics | 2013

Automatically assessing machine summary content without a gold standard

Annie Louis; Ani Nenkova

The most widely adopted approaches for evaluation of summary content follow some protocol for comparing a summary with gold-standard human summaries, which are traditionally called model summaries. This evaluation paradigm falls short when human summaries are not available and becomes less accurate when only a single model is available. We propose three novel evaluation techniques. Two of them are model-free and do not rely on a gold standard for the assessment. The third technique improves standard automatic evaluations by expanding the set of available model summaries with chosen system summaries.We show that quantifying the similarity between the source text and its summary with appropriately chosen measures produces summary scores which replicate human assessments accurately. We also explore ways of increasing evaluation quality when only one human model summary is available as a gold standard. We introduce pseudomodels, which are system summaries deemed to contain good content according to automatic evaluation. Combining the pseudomodels with the single human model to form the gold-standard leads to higher correlations with human judgments compared to using only the one available model. Finally, we explore the feasibility of another measure—similarity between a system summary and the pool of all other system summaries for the same input. This method of comparison with the consensus of systems produces impressively accurate rankings of system summaries, achieving correlation with human rankings above 0.9.

meeting of the association for computational linguistics | 2009

Performance Confidence Estimation for Automatic Summarization

Annie Louis; Ani Nenkova

We address the task of automatically predicting if summarization system performance will be good or bad based on features derived directly from either single- or multi-document inputs. Our labelled corpus for the task is composed of data from large scale evaluations completed over the span of several years. The variation of data between years allows for a comprehensive analysis of the robustness of features, but poses a challenge for building a combined corpus which can be used for training and testing. Still, we find that the problem can be mitigated by appropriately normalizing for differences within each year. We examine different formulations of the classification task which considerably influence performance. The best results are 84% prediction accuracy for single- and 74% for multi-document summarization.

Proceedings of the 2nd Workshop on Linking Models of Lexical, Sentential#N# and Discourse-level Semantics | 2017

LSDSem 2017 Shared Task: The Story Cloze Test

Nasrin Mostafazadeh; Michael Roth; Annie Louis; Nathanael Chambers; James F. Allen

The LSDSem’17 shared task is the Story Cloze Test, a new evaluation for story understanding and script learning. This test provides a system with a four-sentence story and two possible endings, and the system must choose the correct ending to the story. Successful narrative understanding (getting closer to human performance of 100%) requires systems to link various levels of semantics to commonsense knowledge. A total of eight systems participated in the shared task, with a variety of approaches including end-to-end neural networks, feature-based regression models, and rule-based methods. The highest performing system achieves an accuracy of 75.2%, a substantial improvement over the previous state-of-the-art.

Archive | 2009

Structural Features for Predicting the Linguistic Quality of Text

Ani Nenkova; Jieun Chae; Annie Louis; Emily Pitler

Sentence structure is considered to be an important component of the overall linguistic quality of text. Yet few empirical studies have sought to characterize how and to what extent structural features determine fluency and linguistic quality. We report the results of experiments on the predictive power of syntactic phrasing statistics and other structural features for these aspects of text. Manual assessments of sentence fluency for machine translation evaluation and text quality for summarization evaluation are used as gold-standard. We find that many structural features related to phrase length are weakly but significantly correlated with fluency and classifiers based on the entire suite of structural features can achieve high accuracy in pairwise comparison of sentence fluency and in distinguishing machine translations from human translations. We also test the hypothesis that the learned models capture general fluency properties applicable to human-authored text. The results from our experiments do not support the hypothesis. At the same time structural features and models based on them prove to be robust for automatic evaluation of the linguistic quality of multidocument summaries.

conference of the european chapter of the association for computational linguistics | 2014

Structured and Unstructured Cache Models for SMT Domain Adaptation

Annie Louis; Bonnie Webber

We present a French to English translation system for Wikipedia biography articles. We use training data from outof-domain corpora and adapt the system for biographies. We propose two forms of domain adaptation. The first biases the system towards words likely in biographies and encourages repetition of words across the document. Since biographies in Wikipedia follow a regular structure, our second model exploits this structure as a sequence of topic segments, where each segment discusses a narrower subtopic of the biography domain. In this structured model, the system is encouraged to use words likely in the current segment’s topic rather than in biographies as a whole. We implement both systems using cachebased translation techniques. We show that a system trained on Europarl and news can be adapted for biographies with 0.5 BLEU score improvement using our models. Further the structure-aware model outperforms the system which treats the entire document as a single segment.

empirical methods in natural language processing | 2015

Conversation Trees: A Grammar Model for Topic Structure in Forums

Annie Louis; Shay B. Cohen

Online forum discussions proceed differently from face-to-face conversations and any single thread on an online forum contains posts on different subtopics. This work aims to characterize the content of a forum thread as a conversation tree of topics. We present models that jointly perform two tasks: segment a thread into subparts, and assign a topic to each part. Our core idea is a definition of topic structure using probabilistic grammars. By leveraging the flexibility of two grammar formalisms, Context-Free Grammars and Linear Context-Free Rewriting Systems, our models create desirable structures for forum threads: our topic segmentation is hierarchical, links non-adjacent segments on the same topic, and jointly labels the topic during segmentation. We show that our models outperform a number of tree generation baselines.

computational intelligence and games | 2017

Beyond playing to win: Diversifying heuristics for GVGAI

Cristina Guerrero-Romero; Annie Louis; Diego Perez-Liebana

General Video Game Playing (GVGP) algorithms are usually focused on winning and maximizing score but combining different objectives could turn out to be a solution that has not been deeply investigated yet. This paper presents the results obtained when five GVGP agents play a set of games using heuristics with different objectives: maximizing winning, maximizing exploration, maximizing the discovery of the different elements presented in the game (and interactions with them) and maximizing the acquisition of knowledge in order to accurately estimate the outcome of each possible interaction. The results show that the performance of the agents changes depending on the heuristic used. So making use of several agents with different goals (and their pertinent heuristics) could be a feasible approach to follow in GVGP, allowing different behaviors in response to the diverse situations presented in the games.

conference of the european chapter of the association for computational linguistics | 2014

Verbose, Laconic or Just Right: A Simple Computational Model of Content Appropriateness under Length Constraints

Annie Louis; Ani Nenkova

Length constraints impose implicit requirements on the type of content that can be included in a text. Here we pro- pose the first model to computationally assess if a text deviates from these requirements. Specifically, our model predicts the appropriate length for texts based on content types present in a snippet of constant length. We consider a range of features to approximate content type, including syntactic phrasing, constituent compression probability, presence of named entities, sentence specificity and intersentence continuity. Weights for these features are learned using a corpus of summaries written by experts and on high quality journalistic writing. During test time, the difference between actual and predicted length allows us to quantify text verbosity. We use data from manual evaluation of summarization systems to assess the verbosity scores produced by our model. We show that the automatic verbosity scores are significantly negatively correlated with manual content quality scores given to the summaries.

empirical methods in natural language processing | 2015

Recovering discourse relations: Varying influence of discourse adverbials

Hannah Rohde; Anna Dickinson; Christopher Clark; Annie Louis; Bonnie Webber

Discourse relations are a bridge between sentence-level semantics and discourselevel semantics. They can be signalled explicitly with discourse connectives or conveyed implicitly, to be inferred by a comprehender. The same discourse units can be related in more than one way, signalled by multiple connectives. But multiple connectives aren’t necessary: Multiple relations can be conveyed even when only one connective is explicit. This paper describes the initial phase in a larger experimental study aimed at answering two questions: (1) Given an explicit discourse adverbial, what discourse relation(s) do naive subjects take to be operative, and (2) Can this be predicted on the basis of the explicit adverbial alone, or does it depend instead on other factors?

Explore More