Zhiyi Song | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Zhiyi Song is active.

Explore More

Publication

Featured researches published by Zhiyi Song.

workshop on events definition detection coreference and representation | 2015

From Light to Rich ERE: Annotation of Entities, Relations, and Events

Zhiyi Song; Ann Bies; Stephanie M. Strassel; Tom Riese; Justin Mott; Joe Ellis; Jonathan Wright; Seth Kulick; Neville Ryant; Xiaoyi Ma

We describe the evolution of the Entities, Relations and Events (ERE) annotation task, created to support research and technology development within the DARPA DEFT program. We begin by describing the specification for Light ERE annotation, including the motivation for the task within the context of DEFT. We discuss the transition from Light ERE to a more complex Rich ERE specification, enabling more comprehensive treatment of phenomena of interest to DEFT.

workshop on events definition detection coreference and representation | 2014

A Comparison of the Events and Relations Across ACE, ERE, TAC-KBP, and FrameNet Annotation Standards

Jacqueline Aguilar; Charley Beller; Paul McNamee; Benjamin Van Durme; Stephanie M. Strassel; Zhiyi Song; Joe Ellis

The resurgence of effort within computational semantics has led to increased interest in various types of relation extraction and semantic parsing. While various manually annotated resources exist for enabling this work, these materials have been developed with different standards and goals in mind. In an effort to develop better general understanding across these resources, we provide a summary overview of the standards underlying ACE, ERE, TAC-KBP Slot-filling, and FrameNet. 1 Overview ACE and ERE are comprehensive annotation standards that aim to consistently annotate Entities, Events, and Relations within a variety of documents. The ACE (Automatic Content Extraction) standard was developed by NIST in 1999 and has evolved over time to support different evaluation cycles, the last evaluation having occurred in 2008. The ERE (Entities, Relations, Events) standard was created under the DARPA DEFT program as a lighter-weight version of ACE with the goal of making annotation easier, and more consistent across annotators. ERE attempts to achieve this goal by consolidating some of the annotation type distinctions that were found to be the most problematic in ACE, as well as removing some more complex annotation features. This paper provides an overview of the relationship between these two standards and compares them to the more restricted standard of the TACKBP slot-filling task and the more expansive standard of FrameNet. Sections 3 and 4 examine Relations and Events in the ACE/ERE standards, section 5 looks at TAC-KBP slot-filling, and section 6 compares FrameNet to the other standards.

workshop on events definition detection coreference and representation | 2015

Event Nugget Annotation: Processes and Issues

Teruko Mitamura; Yukari Yamakawa; Susan E Holm; Zhiyi Song; Ann Bies; Seth Kulick; Stephanie M. Strassel

This paper describes the processes and issues of annotating event nuggets based on DEFT ERE Annotation Guidelines v1.3 and TAC KBP Event Detection Annotation Guidelines 1.7. Using Brat Rapid Annotation Tool (brat), newswire and discussion forum documents were annotated. One of the challenges arising from human annotation of documents is annotators’ disagreement about the way of tagging events. We propose using Event Nuggets to help meet the definitions of the specific type/subtypes which are part of this project. We present case studies of several examples of event annotation issues, including discontinuous multi-word events representing single events. Annotation statistics and consistency analysis is provided to characterize the interannotator agreement, considering single term events and multi-word events which are both continuous and discontinuous. Consistency analysis is conducted using a scorer to compare first pass annotated files against adjudicated files.

empirical methods in natural language processing | 2014

Transliteration of Arabizi into Arabic Orthography: Developing a Parallel Annotated Arabizi-Arabic Script SMS/Chat Corpus

Ann Bies; Zhiyi Song; Mohamed Maamouri; Stephen Grimes; Haejoong Lee; Jonathan Wright; Stephanie M. Strassel; Nizar Habash; Ramy Eskander; Owen Rambow

This paper describes the process of creating a novel resource, a parallel Arabizi-Arabic script corpus of SMS/Chat data. The language used in social media expresses many differences from other written genres: its vocabulary is informal with intentional deviations from standard orthography such as repeated letters for emphasis; typos and nonstandard abbreviations are common; and nonlinguistic content is written out, such as laughter, sound representations, and emoticons. This situation is exacerbated in the case of Arabic social media for two reasons. First, Arabic dialects, commonly used in social media, are quite different from Modern Standard Arabic phonologically, morphologically and lexically, and most importantly, they lack standard orthographies. Second, Arabic speakers in social media as well as discussion forums, SMS messaging and online chat often use a non-standard romanization called Arabizi. In the context of natural language processing of social media Arabic, transliterating from Arabizi of various dialects to Arabic script is a necessary step, since many of the existing state-of-the-art resources for Arabic dialect processing expect Arabic script input. The corpus described in this paper is expected to support Arabic NLP by providing this resource.

international conference on web engineering | 2015

Spanish Treebank Annotation of Informal Non-standard Web Text

Mariona Taulé; M. Antònia Martí; Ann Bies; Montserrat Nofre; Aina Garí; Zhiyi Song; Stephanie M. Strassel; Joe Ellis

This paper presents the Latin American Spanish Discussion Forum Treebank (LAS-DisFo). This corpus consists of 50,291 words and 2,846 sentences that are part-of-speech tagged, lemmatized and syntactically annotated with constituents and functions. We describe how it was built and the methodology followed for its annotation, the annotation scheme and criteria applied for dealing with the most problematic phenomena commonly encountered in this kind of informal unedited web text. This is the first available Latin American Spanish corpus of non-standard language that has been morphologically and syntactically annotated. It is a valuable linguistic resource that can be used for the training and evaluation of parsers and PoS taggers.

north american chapter of the association for computational linguistics | 2016

A Comparison of Event Representations in DEFT

Ann Bies; Zhiyi Song; Jeremy Getman; Joe Ellis; Justin Mott; Stephanie M. Strassel; Martha Palmer; Teruko Mitamura; Marjorie Freedman; Heng Ji; Tim O'Gorman

This paper will discuss and compare event representations across a variety of types of event annotation: Rich Entities, Relations, and Events (Rich ERE), Light Entities, Relations, and Events (Light ERE), Event Nugget (EN), Event Argument Extraction (EAE), Richer Event Descriptions (RED), and Event-Event Relations (EER). Comparisons of event representations are presented, along with a comparison of data annotated according to each event representation. An event annotation experiment is also discussed, including annotation for all of these representations on the same set of sample data, with the purpose of being able to compare actual annotation across all of these approaches as directly as possible. We walk through a brief example to illustrate the various annotation approaches, and to show the intersections among the various annotated data sets.

north american chapter of the association for computational linguistics | 2016

Event Nugget and Event Coreference Annotation.

Zhiyi Song; Ann Bies; Stephanie M. Strassel; Joe Ellis; Teruko Mitamura; Hoa Trang Dang; Yukari Yamakawa; Susan E Holm

In this paper, we describe the event nugget annotation created in support of the pilot Event Nugget Detection evaluation in 2014 and in support of the Event Nugget Detection and Coreference open evaluation in 2015, which was one of the Knowledge Base Population tracks within the NIST Text Analysis Conference. We present the data volume annotated for both training and evaluation data for the 2015 evaluation as well as changes to annotation in 2015 as compared to that of 2014. We also analyze the annotation for the 2015 evaluation as an example to show the annotation challenges and consistency, and identify the event types and subtypes that are most difficult for human annotators. Finally, we discuss annotation issues that we need to take into consideration in the future.

Theory and Applications of Categories | 2015