Michael Wayne Goodman
University of Washington
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Michael Wayne Goodman.
language resources and evaluation | 2015
Michael Wayne Goodman; Joshua Crowgey; Fei Xia; Emily M. Bender
This paper presents Xigt, an extensible storage format for interlinear glossed text (IGT). We review design desiderata for such a format based on our own use cases as well as general best practices, and then explore existing representations of IGT through the lens of those desiderata. We give an overview of the data model and XML serialization of Xigt, and then describe its application to the use case of representing a large, noisy, heterogeneous set of IGT.
Proceedings of the 2014 Workshop on the Use of Computational Methods in the Study of Endangered Languages | 2014
Emily M. Bender; Joshua Crowgey; Michael Wayne Goodman; Fei Xia
We present a case study of the methodology of using information extracted from interlinear glossed text (IGT) to create of actual working HPSG grammar fragments using the Grammar Matrix focusing on one language: Chintang. Though the results are barely measurable in terms of coverage over running text, they nonetheless provide a proof of concept. Our experience report reflects on the ways in which this task is non-trivial and on mismatches between the assumptions of the methodology and the realities of IGT as produced in a large-scale field project.
meeting of the association for computational linguistics | 2009
Michael Wayne Goodman; Francis Bond
We demonstrate that the bidirectionality of deep grammars, allowing them to generate as well as parse sentences, can be used to automatically and effectively identify errors in the grammars. The system is tested on two implemented HPSG grammars: Jacy for Japanese, and the ERG for English. Using this system, we were able to increase generation coverage in Jacy by 18% (45% to 63%) with only four weeks of grammar development.
language resources and evaluation | 2016
Fei Xia; William D. Lewis; Michael Wayne Goodman; Glenn Slayden; Ryan Georgi; Joshua Crowgey; Emily M. Bender
The majority of the world’s languages have little to no NLP resources or tools. This is due to a lack of training data (“resources”) over which tools, such as taggers or parsers, can be trained. In recent years, there have been increasing efforts to apply NLP methods to a much broader swath of the world’s languages. In many cases this involves bootstrapping the learning process with enriched or partially enriched resources. We propose that Interlinear Glossed Text (IGT), a very common form of annotated data used in the field of linguistics, has great potential for bootstrapping NLP tools for resource-poor languages. Although IGT is generally very richly annotated, and can be enriched even further (e.g., through structural projection), much of the content is not easily consumable by machines since it remains “trapped” in linguistic scholarly documents and in human readable form. In this paper, we describe the expansion of the ODIN resource—a database containing many thousands of instances of IGT for over a thousand languages. We enrich the original IGT data by adding word alignment and syntactic structure. To make the data in ODIN more readily consumable by tool developers and NLP researchers, we adopt and extend a new XML format for IGT, called Xigt. We also develop two packages for manipulating IGT data: one, INTENT, enriches raw IGT automatically, and the other, XigtEdit, is a graphical IGT editor.
conference on intelligent text processing and computational linguistics | 2015
Fei Xia; Michael Wayne Goodman; Ryan Georgi; Glenn Slayden; William D. Lewis
The majority of the world’s languages have little to no NLP resources or tools. This is due to a lack of training data (“resources”) over which tools, such as taggers or parsers, can be trained. In recent years, there have been increasing efforts to apply NLP methods to a much broader swathe of the worlds languages. In many cases this involves bootstrapping the learning process with enriched or partially enriched resources. One promising line of research involves the use of Interlinear Glossed Text (IGT), a very common form of annotated data used in the field of linguistics. Although IGT is generally very richly annotated, and can be enriched even further (e.g., through structural projection), much of the content is not easily consumable by machines since it remains “trapped” in linguistic scholarly documents and in human readable form. In this paper, we introduce several tools that make IGT more accessible and consumable by NLP researchers.
meeting of the association for computational linguistics | 2016
Ryan Georgi; Michael Wayne Goodman; Fei Xia
The current release of the ODIN (Online Database of Interlinear Text) database contains over 150,000 linguistic examples, from nearly 1,500 languages, extracted from PDFs found on the web, representing a significant source of data for language research, particularly for low-resource languages. Errors introduced during PDF-totext conversion or poorly formatted examples can make the task of automatically analyzing the data more difficult, so we aim to clean and normalize the examples in order to maximize accuracy during analysis. In this paper we describe a system that allows users to automatically and manually correct errors in the source data in order to get the best possible analysis of the data. We also describe a RESTful service for managing collections of linguistic examples on the web. All software is distributed under an open-source license.
meeting of the association for computational linguistics | 2010
Emily M. Bender; Scott Drellishak; Antske Fokkens; Michael Wayne Goodman; Daniel P. Mills; Laurie Poulson; Safiyyah Saleem
sighum workshop on language technology for cultural heritage social sciences and humanities | 2013
Emily M. Bender; Michael Wayne Goodman; Joshua Crowgey; Fei Xia
language resources and evaluation | 2016
Ann A. Copestake; Guy Emerson; Michael Wayne Goodman; Matic Horvat; Alexander Kuhnle; Ewa Muszynska
Archive | 2013
Michael Wayne Goodman