Michael Wayne Goodman

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Michael Wayne Goodman is active.

Explore More

Publication

Featured researches published by Michael Wayne Goodman.

language resources and evaluation | 2015

Xigt: extensible interlinear glossed text for natural language processing

Michael Wayne Goodman; Joshua Crowgey; Fei Xia; Emily M. Bender

This paper presents Xigt, an extensible storage format for interlinear glossed text (IGT). We review design desiderata for such a format based on our own use cases as well as general best practices, and then explore existing representations of IGT through the lens of those desiderata. We give an overview of the data model and XML serialization of Xigt, and then describe its application to the use case of representing a large, noisy, heterogeneous set of IGT.

Proceedings of the 2014 Workshop on the Use of Computational Methods in the Study of Endangered Languages | 2014

Learning Grammar Specifications from IGT: A Case Study of Chintang

Emily M. Bender; Joshua Crowgey; Michael Wayne Goodman; Fei Xia

We present a case study of the methodology of using information extracted from interlinear glossed text (IGT) to create of actual working HPSG grammar fragments using the Grammar Matrix focusing on one language: Chintang. Though the results are barely measurable in terms of coverage over running text, they nonetheless provide a proof of concept. Our experience report reflects on the ways in which this task is non-trivial and on mismatches between the assumptions of the methodology and the realities of IGT as produced in a large-scale field project.

meeting of the association for computational linguistics | 2009

Using Generation for Grammar Analysis and Error Detection

Michael Wayne Goodman; Francis Bond

We demonstrate that the bidirectionality of deep grammars, allowing them to generate as well as parse sentences, can be used to automatically and effectively identify errors in the grammars. The system is tested on two implemented HPSG grammars: Jacy for Japanese, and the ERG for English. Using this system, we were able to increase generation coverage in Jacy by 18% (45% to 63%) with only four weeks of grammar development.

language resources and evaluation | 2016

Enriching a massively multilingual database of interlinear glossed text

Fei Xia; William D. Lewis; Michael Wayne Goodman; Glenn Slayden; Ryan Georgi; Joshua Crowgey; Emily M. Bender

The majority of the world’s languages have little to no NLP resources or tools. This is due to a lack of training data (“resources”) over which tools, such as taggers or parsers, can be trained. In recent years, there have been increasing efforts to apply NLP methods to a much broader swath of the world’s languages. In many cases this involves bootstrapping the learning process with enriched or partially enriched resources. We propose that Interlinear Glossed Text (IGT), a very common form of annotated data used in the field of linguistics, has great potential for bootstrapping NLP tools for resource-poor languages. Although IGT is generally very richly annotated, and can be enriched even further (e.g., through structural projection), much of the content is not easily consumable by machines since it remains “trapped” in linguistic scholarly documents and in human readable form. In this paper, we describe the expansion of the ODIN resource—a database containing many thousands of instances of IGT for over a thousand languages. We enrich the original IGT data by adding word alignment and syntactic structure. To make the data in ODIN more readily consumable by tool developers and NLP researchers, we adopt and extend a new XML format for IGT, called Xigt. We also develop two packages for manipulating IGT data: one, INTENT, enriches raw IGT automatically, and the other, XigtEdit, is a graphical IGT editor.

conference on intelligent text processing and computational linguistics | 2015

Enriching, Editing, and Representing Interlinear Glossed Text

Fei Xia; Michael Wayne Goodman; Ryan Georgi; Glenn Slayden; William D. Lewis

meeting of the association for computational linguistics | 2016

A Web-framework for ODIN Annotation.

Ryan Georgi; Michael Wayne Goodman; Fei Xia

The current release of the ODIN (Online Database of Interlinear Text) database contains over 150,000 linguistic examples, from nearly 1,500 languages, extracted from PDFs found on the web, representing a significant source of data for language research, particularly for low-resource languages. Errors introduced during PDF-totext conversion or poorly formatted examples can make the task of automatically analyzing the data more difficult, so we aim to clean and normalize the examples in order to maximize accuracy during analysis. In this paper we describe a system that allows users to automatically and manually correct errors in the source data in order to get the best possible analysis of the data. We also describe a RESTful service for managing collections of linguistic examples on the web. All software is distributed under an open-source license.

meeting of the association for computational linguistics | 2010