Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Stephanie M. Strassel is active.

Publication


Featured researches published by Stephanie M. Strassel.


international acm sigir conference on research and development in information retrieval | 2004

Building an information retrieval test collection for spontaneous conversational speech

Douglas W. Oard; Dagobert Soergel; David S. Doermann; Xiaoli Huang; G. Craig Murray; Jianqiang Wang; Bhuvana Ramabhadran; Martin Franz; Samuel Gustman; James Mayfield; Liliya Kharevych; Stephanie M. Strassel

Test collections model use cases in ways that facilitate evaluation of information retrieval systems. This paper describes the use of search-guided relevance assessment to create a test collection for retrieval of spontaneous conversational speech. Approximately 10,000 thematically coherent segments were manually identified in 625 hours of oral history interviews with 246 individuals. Automatic speech recognition results, manually prepared summaries, controlled vocabulary indexing, and name authority control are available for every segment. Those features were leveraged by a team of four relevance assessors to identify topically relevant segments for 28 topics developed from actual user requests. Search-guided assessment yielded sufficient inter-annotator agreement to support formative evaluation during system development. Baseline results for ranked retrieval are presented to illustrate use of the collection.


Topic detection and tracking | 2002

Corpora for topic detection and tracking

Christopher Cieri; Stephanie M. Strassel; David Graff; Nii Martey; Kara Rennert; Mark Liberman

The TDT corpora, developed to support the DARPA-sponsored program in Topic Detection and Tracking, combine data collected over a nine month period from 8 English and 3 Chinese sources. The published corpora contain audio, reference text including written news text and transcripts of the broadcast audio, boundary tables segmenting the broadcasts into stories and relevance tables resulting from millions of human judgments. Sections of the corpora have undergone topic-story, first story and story link annotation. Both the TDT-2 and TDT-3 text corpora and the accompanying broadcast audio are now available from the Linguistic Data Consortium. This paper described the raw material collected for the corpora, the annotation of that material to prepare it for research use and the formats in which it is distributed. Special attention is paid to the quality control measures developed for these data sets.


workshop on events definition detection coreference and representation | 2015

From Light to Rich ERE: Annotation of Entities, Relations, and Events

Zhiyi Song; Ann Bies; Stephanie M. Strassel; Tom Riese; Justin Mott; Joe Ellis; Jonathan Wright; Seth Kulick; Neville Ryant; Xiaoyi Ma

We describe the evolution of the Entities, Relations and Events (ERE) annotation task, created to support research and technology development within the DARPA DEFT program. We begin by describing the specification for Light ERE annotation, including the motivation for the task within the context of DEFT. We discuss the transition from Light ERE to a more complex Rich ERE specification, enabling more comprehensive treatment of phenomena of interest to DEFT.


workshop on events definition detection coreference and representation | 2014

A Comparison of the Events and Relations Across ACE, ERE, TAC-KBP, and FrameNet Annotation Standards

Jacqueline Aguilar; Charley Beller; Paul McNamee; Benjamin Van Durme; Stephanie M. Strassel; Zhiyi Song; Joe Ellis

The resurgence of effort within computational semantics has led to increased interest in various types of relation extraction and semantic parsing. While various manually annotated resources exist for enabling this work, these materials have been developed with different standards and goals in mind. In an effort to develop better general understanding across these resources, we provide a summary overview of the standards underlying ACE, ERE, TAC-KBP Slot-filling, and FrameNet. 1 Overview ACE and ERE are comprehensive annotation standards that aim to consistently annotate Entities, Events, and Relations within a variety of documents. The ACE (Automatic Content Extraction) standard was developed by NIST in 1999 and has evolved over time to support different evaluation cycles, the last evaluation having occurred in 2008. The ERE (Entities, Relations, Events) standard was created under the DARPA DEFT program as a lighter-weight version of ACE with the goal of making annotation easier, and more consistent across annotators. ERE attempts to achieve this goal by consolidating some of the annotation type distinctions that were found to be the most problematic in ACE, as well as removing some more complex annotation features. This paper provides an overview of the relationship between these two standards and compares them to the more restricted standard of the TACKBP slot-filling task and the more expansive standard of FrameNet. Sections 3 and 4 examine Relations and Events in the ACE/ERE standards, section 5 looks at TAC-KBP slot-filling, and section 6 compares FrameNet to the other standards.


workshop on events definition detection coreference and representation | 2015

Event Nugget Annotation: Processes and Issues

Teruko Mitamura; Yukari Yamakawa; Susan E Holm; Zhiyi Song; Ann Bies; Seth Kulick; Stephanie M. Strassel

This paper describes the processes and issues of annotating event nuggets based on DEFT ERE Annotation Guidelines v1.3 and TAC KBP Event Detection Annotation Guidelines 1.7. Using Brat Rapid Annotation Tool (brat), newswire and discussion forum documents were annotated. One of the challenges arising from human annotation of documents is annotators’ disagreement about the way of tagging events. We propose using Event Nuggets to help meet the definitions of the specific type/subtypes which are part of this project. We present case studies of several examples of event annotation issues, including discontinuous multi-word events representing single events. Annotation statistics and consistency analysis is provided to characterize the interannotator agreement, considering single term events and multi-word events which are both continuous and discontinuous. Consistency analysis is conducted using a scorer to compare first pass annotated files against adjudicated files.


empirical methods in natural language processing | 2014

Transliteration of Arabizi into Arabic Orthography: Developing a Parallel Annotated Arabizi-Arabic Script SMS/Chat Corpus

Ann Bies; Zhiyi Song; Mohamed Maamouri; Stephen Grimes; Haejoong Lee; Jonathan Wright; Stephanie M. Strassel; Nizar Habash; Ramy Eskander; Owen Rambow

This paper describes the process of creating a novel resource, a parallel Arabizi-Arabic script corpus of SMS/Chat data. The language used in social media expresses many differences from other written genres: its vocabulary is informal with intentional deviations from standard orthography such as repeated letters for emphasis; typos and nonstandard abbreviations are common; and nonlinguistic content is written out, such as laughter, sound representations, and emoticons. This situation is exacerbated in the case of Arabic social media for two reasons. First, Arabic dialects, commonly used in social media, are quite different from Modern Standard Arabic phonologically, morphologically and lexically, and most importantly, they lack standard orthographies. Second, Arabic speakers in social media as well as discussion forums, SMS messaging and online chat often use a non-standard romanization called Arabizi. In the context of natural language processing of social media Arabic, transliterating from Arabizi of various dialects to Arabic script is a necessary step, since many of the existing state-of-the-art resources for Arabic dialect processing expect Arabic script input. The corpus described in this paper is expected to support Arabic NLP by providing this resource.


meeting of the association for computational linguistics | 2003

Multilingual Resources for Entity Extraction

Stephanie M. Strassel; Alexis Mitchell

Progress in human language technology requires increasing amounts of data and annotation in a growing variety of languages. Research in Named Entity extraction is no exception. Linguistic Data Consortium is creating annotated corpora to support information extraction in English, Chinese, Arabic, and other languages for a variety of US Government-sponsored programs. This paper covers the scope of annotation and research tasks within these programs, describes some of the challenges of multilingual corpus development for entity extraction, and concludes with a description of the corpora developed to support this research.


international acm sigir conference on research and development in information retrieval | 2016

The BOLT IR Test Collections of Multilingual Passage Retrieval from Discussion Forums

Ian Soboroff; Kira Griffitt; Stephanie M. Strassel

This paper describes a new test collection for passage retrieval from multilingual, informal text. The task being modeled is that of a monolingual English-speaking user who wishes to search discussion forum text in a foreign language. The system retrieves relevant short passages of text and presents them to the user, translated into English. The test collection contains more than 2 billion words of discussion thread text, 250 queries representing complex informational search needs, and manual relevance judgments of forum post passages, pooled from real systems. This information retrieval test collection is the first to combine multilingual search, passage retrieval, and informal online genre text.


international conference on machine learning | 2005

Linguistic resources for meeting speech recognition

Meghan Lammie Glenn; Stephanie M. Strassel

This paper describes efforts by the University of Pennsylvanias Linguistic Data Consortium to create and distribute shared linguistic resources – including data, annotations, tools and infrastructure – to support the Rich Transcription 2005 Spring Meeting Recognition Evaluation. In addition to distributing large volumes of training data, LDC produced reference transcripts for the RT-05S conference room evaluation corpus, which represents a variety of subjects, scenarios and recording conditions. Careful verbatim reference transcripts including rich markup were created for all two hours of data. One hour was also selected for a contrastive study using a quick transcription methodology. We review the two methodologies and discuss qualitative differences in the resulting transcripts. Finally, we describe infrastructure development including transcription tools to support our efforts.


joint conference on lexical and computational semantics | 2015

A New Dataset and Evaluation for Belief/Factuality

Vinodkumar Prabhakaran; Tomas By; Julia Hirschberg; Owen Rambow; Samira Shaikh; Tomek Strzalkowski; Jennifer Tracey; Michael Arrigo; Rupayan Basu; Micah Clark; Adam Dalton; Mona T. Diab; Louise Guthrie; Anna Prokofieva; Stephanie M. Strassel; Gregory Werner; Yorick Wilks; Janyce Wiebe

The terms “belief” and “factuality” both refer to the intention of the writer to present the propositional content of an utterance as firmly believed by the writer, not firmly believed, or having some other status. This paper presents an ongoing annotation effort and an associated evaluation.

Collaboration


Dive into the Stephanie M. Strassel's collaboration.

Top Co-Authors

Avatar

Zhiyi Song

University of Pennsylvania

View shared research outputs
Top Co-Authors

Avatar

Ann Bies

University of Pennsylvania

View shared research outputs
Top Co-Authors

Avatar

Joe Ellis

University of Pennsylvania

View shared research outputs
Top Co-Authors

Avatar

Christopher Cieri

University of Pennsylvania

View shared research outputs
Top Co-Authors

Avatar

Jonathan Wright

University of Pennsylvania

View shared research outputs
Top Co-Authors

Avatar

Xuansong Li

University of Pennsylvania

View shared research outputs
Top Co-Authors

Avatar

Kazuaki Maeda

University of Pennsylvania

View shared research outputs
Top Co-Authors

Avatar

Haejoong Lee

University of Pennsylvania

View shared research outputs
Top Co-Authors

Avatar

David Graff

University of Pennsylvania

View shared research outputs
Top Co-Authors

Avatar

Kira Griffitt

University of Pennsylvania

View shared research outputs
Researchain Logo
Decentralizing Knowledge