Haejoong Lee | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Haejoong Lee is active.

Explore More

Publication

Featured researches published by Haejoong Lee.

international conference on data engineering | 2006

Designing and Evaluating an XPath Dialect for Linguistic Queries

Steven Bird; Yi Chen; Susan B. Davidson; Haejoong Lee; Yifeng Zheng

Linguistic research and natural language processing employ large repositories of ordered trees. XML, a standard ordered tree model, and XPath, its associated language, are natural choices for linguistic data and queries. However, several important expressive features required for linguistic queries are missing or hard to express in XPath. In this paper, we motivate and illustrate these features with a variety of linguistic queries. Then we propose extensions to XPath to support linguistic queries, and design an efficient query engine based on a novel labeling scheme. Experiments demonstrate that our language is not only sufficiently expressive for linguistic trees but also efficient for practical usage.

Proceedings of the 2014 Workshop on the Use of Computational Methods in the Study of Endangered Languages | 2014

Aikuma: A Mobile App for Collaborative Language Documentation

Steven Bird; Florian Hanke; Oliver Adams; Haejoong Lee

Proliferating smartphones and mobile software offer linguists a scalable, networked recording device. This paper describes Aikuma, a mobile app that is designed to put the key language documentation tasks of recording, respeaking, and translating in the hands of a speech community. After motivating the approach we describe the system and briefly report on its use in field tests.

meeting of the association for computational linguistics | 2001

Annotation tools based on the annotation graph API

Steven Bird; Kazuaki Maeda; Xiaoyi Ma; Haejoong Lee

Annotation graphs provide an efficient and expressive data model for linguistic annotations of time-series data. This paper reports progress on a complete open-source software infrastructure supporting the rapid development of tools for transcribing and annotating time-series data. This generalpurpose infrastructure uses annotation graphs as the underlying model, and allows developers to quickly create special-purpose annotation tools using common components. An application programming interface, an I/O library, and graphical user interfaces are described. Our experience has shown us that it is a straightforward task to create new special-purpose annotation tools based on this general-purpose infrastructure.

empirical methods in natural language processing | 2014

Transliteration of Arabizi into Arabic Orthography: Developing a Parallel Annotated Arabizi-Arabic Script SMS/Chat Corpus

Ann Bies; Zhiyi Song; Mohamed Maamouri; Stephen Grimes; Haejoong Lee; Jonathan Wright; Stephanie M. Strassel; Nizar Habash; Ramy Eskander; Owen Rambow

This paper describes the process of creating a novel resource, a parallel Arabizi-Arabic script corpus of SMS/Chat data. The language used in social media expresses many differences from other written genres: its vocabulary is informal with intentional deviations from standard orthography such as repeated letters for emphasis; typos and nonstandard abbreviations are common; and nonlinguistic content is written out, such as laughter, sound representations, and emoticons. This situation is exacerbated in the case of Arabic social media for two reasons. First, Arabic dialects, commonly used in social media, are quite different from Modern Standard Arabic phonologically, morphologically and lexically, and most importantly, they lack standard orthographies. Second, Arabic speakers in social media as well as discussion forums, SMS messaging and online chat often use a non-standard romanization called Arabizi. In the context of natural language processing of social media Arabic, transliterating from Arabizi of various dialects to Arabic script is a necessary step, since many of the existing state-of-the-art resources for Arabic dialect processing expect Arabic script input. The corpus described in this paper is expected to support Arabic NLP by providing this resource.

international conference on human language technology research | 2001

The annotation graph toolkit: software components for building linguistic annotation tools

Kazuaki Maeda; Steven Bird; Xiaoyi Ma; Haejoong Lee

Annotation graphs provide an efficient and expressive data model for linguistic annotations of time-series data. This paper reports progress on a complete software infrastructure supporting the rapid development of tools for transcribing and annotating time-series data. This general-purpose infrastructure uses annotation graphs as the underlying model, and allows developers to quickly create special-purpose annotation tools using common components. An application programming interface, an I/O library, and graphical user interfaces are described. Our experience has shown us that it is a straightforward task to create new special-purpose annotation tools based on this general-purpose infrastructure.

PLANX | 2005

Extending XPath to support linguistic queries

Steven Bird; Yi Chen; Susan B. Davidson; Haejoong Lee; Yifeng Zheng

language resources and evaluation | 2012

Creating HAVIC: Heterogeneous Audio Visual Internet Collection

Stephanie M. Strassel; Amanda Morris; Jonathan G. Fiscus; Christopher Caruso; Haejoong Lee; Paul Over; James Fiumara; Barbara L. Shaw; Brian Antonishek; Martial Michel

language resources and evaluation | 2002