Haejoong Lee
University of Pennsylvania
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Haejoong Lee.
international conference on data engineering | 2006
Steven Bird; Yi Chen; Susan B. Davidson; Haejoong Lee; Yifeng Zheng
Linguistic research and natural language processing employ large repositories of ordered trees. XML, a standard ordered tree model, and XPath, its associated language, are natural choices for linguistic data and queries. However, several important expressive features required for linguistic queries are missing or hard to express in XPath. In this paper, we motivate and illustrate these features with a variety of linguistic queries. Then we propose extensions to XPath to support linguistic queries, and design an efficient query engine based on a novel labeling scheme. Experiments demonstrate that our language is not only sufficiently expressive for linguistic trees but also efficient for practical usage.
Proceedings of the 2014 Workshop on the Use of Computational Methods in the Study of Endangered Languages | 2014
Steven Bird; Florian Hanke; Oliver Adams; Haejoong Lee
Proliferating smartphones and mobile software offer linguists a scalable, networked recording device. This paper describes Aikuma, a mobile app that is designed to put the key language documentation tasks of recording, respeaking, and translating in the hands of a speech community. After motivating the approach we describe the system and briefly report on its use in field tests.
meeting of the association for computational linguistics | 2001
Steven Bird; Kazuaki Maeda; Xiaoyi Ma; Haejoong Lee
Annotation graphs provide an efficient and expressive data model for linguistic annotations of time-series data. This paper reports progress on a complete open-source software infrastructure supporting the rapid development of tools for transcribing and annotating time-series data. This generalpurpose infrastructure uses annotation graphs as the underlying model, and allows developers to quickly create special-purpose annotation tools using common components. An application programming interface, an I/O library, and graphical user interfaces are described. Our experience has shown us that it is a straightforward task to create new special-purpose annotation tools based on this general-purpose infrastructure.
empirical methods in natural language processing | 2014
Ann Bies; Zhiyi Song; Mohamed Maamouri; Stephen Grimes; Haejoong Lee; Jonathan Wright; Stephanie M. Strassel; Nizar Habash; Ramy Eskander; Owen Rambow
This paper describes the process of creating a novel resource, a parallel Arabizi-Arabic script corpus of SMS/Chat data. The language used in social media expresses many differences from other written genres: its vocabulary is informal with intentional deviations from standard orthography such as repeated letters for emphasis; typos and nonstandard abbreviations are common; and nonlinguistic content is written out, such as laughter, sound representations, and emoticons. This situation is exacerbated in the case of Arabic social media for two reasons. First, Arabic dialects, commonly used in social media, are quite different from Modern Standard Arabic phonologically, morphologically and lexically, and most importantly, they lack standard orthographies. Second, Arabic speakers in social media as well as discussion forums, SMS messaging and online chat often use a non-standard romanization called Arabizi. In the context of natural language processing of social media Arabic, transliterating from Arabizi of various dialects to Arabic script is a necessary step, since many of the existing state-of-the-art resources for Arabic dialect processing expect Arabic script input. The corpus described in this paper is expected to support Arabic NLP by providing this resource.
international conference on human language technology research | 2001
Kazuaki Maeda; Steven Bird; Xiaoyi Ma; Haejoong Lee
Annotation graphs provide an efficient and expressive data model for linguistic annotations of time-series data. This paper reports progress on a complete software infrastructure supporting the rapid development of tools for transcribing and annotating time-series data. This general-purpose infrastructure uses annotation graphs as the underlying model, and allows developers to quickly create special-purpose annotation tools using common components. An application programming interface, an I/O library, and graphical user interfaces are described. Our experience has shown us that it is a straightforward task to create new special-purpose annotation tools based on this general-purpose infrastructure.
PLANX | 2005
Steven Bird; Yi Chen; Susan B. Davidson; Haejoong Lee; Yifeng Zheng
language resources and evaluation | 2012
Stephanie M. Strassel; Amanda Morris; Jonathan G. Fiscus; Christopher Caruso; Haejoong Lee; Paul Over; James Fiumara; Barbara L. Shaw; Brian Antonishek; Martial Michel
language resources and evaluation | 2002
Kazuaki Maeda; Steven Bird; Xiaoyi Ma; Haejoong Lee
language resources and evaluation | 2002
Xiaoyi Ma; Haejoong Lee; Steven Bird; Kazuaki Maeda
language resources and evaluation | 2002
Steven Bird; Kazuaki Maeda; Xiaoyi Ma; Haejoong Lee; Beth Randall; Salim Zayat