Ann Bies
University of Pennsylvania
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ann Bies.
Proceedings of the Workshop on Frontiers in Linguistically Annotated Corpora 2006 | 2006
Olga Babko-Malaya; Ann Bies; Ann Taylor; Szu-ting Yi; Martha Palmer; Mitch Marcus; Seth Kulick; Libin Shen
The PropBank primarily adds semantic role labels to the syntactic constituents in the parsed trees of the Treebank. The goal is for automatic semantic role labeling to be able to use the domain of locality of a predicate in order to find its arguments. In principle, this is exactly what is wanted, but in practice the PropBank annotators often make choices that do not actually conform to the Treebank parses. As a result, the syntactic features extracted by automatic semantic role labeling systems are often inconsistent and contradictory. This paper discusses in detail the types of mismatches between the syntactic bracketing and the semantic role labeling that can be found, and our plans for reconciling them.
workshop on events definition detection coreference and representation | 2015
Zhiyi Song; Ann Bies; Stephanie M. Strassel; Tom Riese; Justin Mott; Joe Ellis; Jonathan Wright; Seth Kulick; Neville Ryant; Xiaoyi Ma
We describe the evolution of the Entities, Relations and Events (ERE) annotation task, created to support research and technology development within the DARPA DEFT program. We begin by describing the specification for Light ERE annotation, including the motivation for the task within the context of DEFT. We discuss the transition from Light ERE to a more complex Rich ERE specification, enabling more comprehensive treatment of phenomena of interest to DEFT.
workshop on events definition detection coreference and representation | 2015
Teruko Mitamura; Yukari Yamakawa; Susan E Holm; Zhiyi Song; Ann Bies; Seth Kulick; Stephanie M. Strassel
This paper describes the processes and issues of annotating event nuggets based on DEFT ERE Annotation Guidelines v1.3 and TAC KBP Event Detection Annotation Guidelines 1.7. Using Brat Rapid Annotation Tool (brat), newswire and discussion forum documents were annotated. One of the challenges arising from human annotation of documents is annotators’ disagreement about the way of tagging events. We propose using Event Nuggets to help meet the definitions of the specific type/subtypes which are part of this project. We present case studies of several examples of event annotation issues, including discontinuous multi-word events representing single events. Annotation statistics and consistency analysis is provided to characterize the interannotator agreement, considering single term events and multi-word events which are both continuous and discontinuous. Consistency analysis is conducted using a scorer to compare first pass annotated files against adjudicated files.
empirical methods in natural language processing | 2014
Ann Bies; Zhiyi Song; Mohamed Maamouri; Stephen Grimes; Haejoong Lee; Jonathan Wright; Stephanie M. Strassel; Nizar Habash; Ramy Eskander; Owen Rambow
This paper describes the process of creating a novel resource, a parallel Arabizi-Arabic script corpus of SMS/Chat data. The language used in social media expresses many differences from other written genres: its vocabulary is informal with intentional deviations from standard orthography such as repeated letters for emphasis; typos and nonstandard abbreviations are common; and nonlinguistic content is written out, such as laughter, sound representations, and emoticons. This situation is exacerbated in the case of Arabic social media for two reasons. First, Arabic dialects, commonly used in social media, are quite different from Modern Standard Arabic phonologically, morphologically and lexically, and most importantly, they lack standard orthographies. Second, Arabic speakers in social media as well as discussion forums, SMS messaging and online chat often use a non-standard romanization called Arabizi. In the context of natural language processing of social media Arabic, transliterating from Arabizi of various dialects to Arabic script is a necessary step, since many of the existing state-of-the-art resources for Arabic dialect processing expect Arabic script input. The corpus described in this paper is expected to support Arabic NLP by providing this resource.
meeting of the association for computational linguistics | 2005
Ann Bies; Seth Kulick; Mark A. Mandel
We describe a parallel annotation approach for PubMed abstracts. It includes both entity/relation annotation and a treebank containing syntactic structure, with a goal of mapping entities to constituents in the treebank. Crucial to this approach is a modification of the Penn Treebank guidelines and the characterization of entities as relation components, which allows the integration of the entity annotation with the syntactic structure while retaining the capacity to annotate and extract more complex events.
international conference on web engineering | 2015
Mariona Taulé; M. Antònia Martí; Ann Bies; Montserrat Nofre; Aina Garí; Zhiyi Song; Stephanie M. Strassel; Joe Ellis
This paper presents the Latin American Spanish Discussion Forum Treebank (LAS-DisFo). This corpus consists of 50,291 words and 2,846 sentences that are part-of-speech tagged, lemmatized and syntactically annotated with constituents and functions. We describe how it was built and the methodology followed for its annotation, the annotation scheme and criteria applied for dealing with the most problematic phenomena commonly encountered in this kind of informal unedited web text. This is the first available Latin American Spanish corpus of non-standard language that has been morphologically and syntactically annotated. It is a valuable linguistic resource that can be used for the training and evaluation of parsers and PoS taggers.
north american chapter of the association for computational linguistics | 2016
Ann Bies; Zhiyi Song; Jeremy Getman; Joe Ellis; Justin Mott; Stephanie M. Strassel; Martha Palmer; Teruko Mitamura; Marjorie Freedman; Heng Ji; Tim O'Gorman
This paper will discuss and compare event representations across a variety of types of event annotation: Rich Entities, Relations, and Events (Rich ERE), Light Entities, Relations, and Events (Light ERE), Event Nugget (EN), Event Argument Extraction (EAE), Richer Event Descriptions (RED), and Event-Event Relations (EER). Comparisons of event representations are presented, along with a comparison of data annotated according to each event representation. An event annotation experiment is also discussed, including annotation for all of these representations on the same set of sample data, with the purpose of being able to compare actual annotation across all of these approaches as directly as possible. We walk through a brief example to illustrate the various annotation approaches, and to show the intersections among the various annotated data sets.
workshop on events definition detection coreference and representation | 2014
Seth Kulick; Ann Bies; Justin Mott
This paper describes a system for interannotator agreement analysis of ERE annotation, focusing on entity mentions and how the higher-order annotations such as EVENTS are dependent on those entity mentions. The goal of this approach is to provide both (1) quantitative scores for the various levels of annotation, and (2) information about the types of annotation inconsistencies that might exist. While primarily designed for inter-annotator agreement, it can also be considered a system for evaluation of ERE annotation.
meeting of the association for computational linguistics | 2014
Seth Kulick; Ann Bies; Justin Mott; Anthony S. Kroch; Beatrice Santorini; Mark Liberman
This paper introduces a new technique for phrase-structure parser analysis, categorizing possible treebank structures by integrating regular expressions into derivation trees. We analyze the performance of the Berkeley parser on OntoNotes WSJ and the English Web Treebank. This provides some insight into the evalb scores, and the problem of domain adaptation with the web data. We also analyze a “test-ontrain” dataset, showing a wide variance in how the parser is generalizing from different structures in the training material.
north american chapter of the association for computational linguistics | 2016
Zhiyi Song; Ann Bies; Stephanie M. Strassel; Joe Ellis; Teruko Mitamura; Hoa Trang Dang; Yukari Yamakawa; Susan E Holm
In this paper, we describe the event nugget annotation created in support of the pilot Event Nugget Detection evaluation in 2014 and in support of the Event Nugget Detection and Coreference open evaluation in 2015, which was one of the Knowledge Base Population tracks within the NIST Text Analysis Conference. We present the data volume annotated for both training and evaluation data for the 2015 evaluation as well as changes to annotation in 2015 as compared to that of 2014. We also analyze the annotation for the 2015 evaluation as an example to show the annotation challenges and consistency, and identify the event types and subtypes that are most difficult for human annotators. Finally, we discuss annotation issues that we need to take into consideration in the future.