Christian Monson | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Christian Monson is active.

Explore More

Publication

Featured researches published by Christian Monson.

cross-language evaluation forum | 2008

ParaMor: Finding Paradigms across Morphology

Christian Monson; Jaime G. Carbonell; Alon Lavie; Lori S. Levin

ParaMor automatically learns morphological paradigms from unlabelled text, and uses them to annotate word forms with morpheme boundaries. ParaMor competed in the English and German tracks of Morpho Challenge 2007 (Kurimo et al., 2008). In English, ParaMors balanced precision and recall outperform at F1 an already sophisticated baseline induction algorithm, Morfessor (Creutz, 2006). In German, ParaMor suffers from a low morpheme recall. But combining ParaMors analyses with analyses from Morfessor results in a set of analyses that outperform either algorithm alone, and that place first in F1 among all algorithms submitted to Morpho Challenge 2007. Categories and Subject Descriptions: I.2 [Artificial Intelligence]: I.2.7 Natural Language Processing.

conference of the association for machine translation in the americas | 2002

Automatic Rule Learning for Resource-Limited MT

Jaime G. Carbonell; Katharina Probst; Erik Peterson; Christian Monson; Alon Lavie; Ralf D. Brown; Lori S. Levin

Machine Translation of minority languages presents unique challenges, including the paucity of bilingual training data and the unavailability of linguistically-trained speakers. This paper focuses on a machine learning approach to transfer-based MT, where data in the form of translations and lexical alignments are elicited from bilingual speakers, and a seeded version-space learning algorithm formulates and refines transfer rules. A rule-generalization lattice is defined based on LFG-style f-structures, permitting generalization operators in the search for the most general rules consistent with the elicited data. The paper presents these methods and illustrates examples.

meeting of the association for computational linguistics | 2007

ParaMor: Minimally Supervised Induction of Paradigm Structure and Morphological Analysis

Christian Monson; Jaime G. Carbonell; Alon Lavie; Lori S. Levin

Paradigms provide an inherent organizational structure to natural language morphology. ParaMor, our minimally supervised morphology induction algorithm, retrusses the word forms of raw text corpora back onto their paradigmatic skeletons; performing on par with state-of-the-art minimally supervised morphology induction algorithms at morphological analysis of English and German. ParaMor consists of two phases. Our algorithm first constructs sets of affixes closely mimicking the paradigms of a language. And with these structures in hand, ParaMor then annotates word forms with morpheme boundaries. To set ParaMors few free parameters we analyze a training corpus of Spanish. Without adjusting parameters, we induce the morphological structure of English and German. Adopting the evaluation methodology of Morpho Challenge 2007 (Kurimo et al., 2007), we compare ParaMors morphological analyses with Morfessor (Creutz, 2006), a modern minimally supervised morphology induction system. ParaMor consistently achieves competitive F1 measures.

cross language evaluation forum | 2008

ParaMor and Morpho challenge 2008

Christian Monson; Jaime G. Carbonell; Alon Lavie; Lori S. Levin

We summarize the strong performance of ParaMor, an unsupervised morphology induction system, at Morpho Challenge 2008. When ParaMors morphological analyses, which specialize at identifying inflectional morphology, are added to the analyses from the general-purpose unsupervised morphology induction system, Morfessor, the combined system identifies the morphemes of all five Morpho Challenge languages at recall scores higher than those of any other system which competed in the Challenge. These strong recall scores lead to F1 values for morpheme identification as high as or higher than those of any competing system for all the competition languages but English.

meeting of the association for computational linguistics | 2004

A framework for unsupervised natural language morphology induction

Christian Monson

This paper presents a framework for unsupervised natural language morphology induction wherein candidate suffixes are grouped into candidate inflection classes, which are then arranged in a lattice structure. With similar candidate inflection classes placed near one another in the lattice, I propose this structure is an ideal search space in which to isolate the true inflection classes of a language. This paper discusses and motivates possible search strategies over the inflection class lattice structure.

meeting of the association for computational linguistics | 2008

Evaluating an Agglutinative Segmentation Model for ParaMor

Christian Monson; Alon Lavie; Jaime G. Carbonell; Lori S. Levin

This paper describes and evaluates a modification to the segmentation model used in the unsupervised morphology induction system, ParaMor. Our improved segmentation model permits multiple morpheme boundaries in a single word. To prepare ParaMor to effectively apply the new agglutinative segmentation model, two heuristics improve ParaMors precision. These precision-enhancing heuristics are adaptations of those used in other unsupervised morphology induction systems, including work by Hafer and Weiss (1974) and Goldsmith (2006). By reformulating the segmentation model used in ParaMor, we significantly improve ParaMors performance in all language tracks and in both the linguistic evaluation as well as in the task based information retrieval (IR) evaluation of the peer operated competition Morpho Challenge 2007. ParaMors improved morpheme recall in the linguistic evaluations of German, Finnish, and Turkish is higher than that of any system which competed in the Challenge. In the three languages of the IR evaluation, our enhanced ParaMor significantly outperforms, at average precision over newswire queries, a morphologically naive baseline; scoring just behind the leading system from Morpho Challenge 2007 in English and ahead of the first place system in German.

cross-language evaluation forum | 2009

Morphological analysis by multiple sequence alignment

Tzvetan Tchoukalov; Christian Monson; Brian Roark

In biological sequence processing, Multiple Sequence Alignment (MSA) techniques capture information about long-distance dependencies and the three-dimensional structure of protein and nucleotide sequences without resorting to polynomial complexity context-free models. But MSA techniques have rarely been used in natural language (NL) processing, and never for NL morphology induction. Our MetaMorph algorithm is a first attempt at leveraging MSA techniques to induce NL morphology in an unsupervised fashion. Given a text corpus in any language, MetaMorph sequentially aligns words of the corpus to form an MSA and then segments the MSA to produce morphological analyses. Over corpora that contain millions of unique word types, MetaMorph identifies morphemes at an F1 below state-of-the-art performance. But when restricted to smaller sets of orthographically related words, Meta-Morph outperforms the state-of-the-art ParaMor-Morfessor Union morphology induction system. Tested on 5,000 orthographically similar Hungarian word types, MetaMorph reaches 54.1% and ParaMor-Morfessor just 41.9%. Hence, we conclude that MSA is a promising algorithm for unsupervised morphology induction. Future research directions are discussed.

cross language evaluation forum | 2009

Simulating morphological analyzers with stochastic taggers for confidence estimation

Christian Monson; Kristy Hollingshead; Brian Roark

We propose a method for providing stochastic confidence estimates for rule-based and black-box natural language (NL) processing systems. Our method does not require labeled training data: We simply train stochastic models on the output of the original NL systems. Numeric confidence estimates enable both minimum Bayes risk-style optimization as well as principled system combination for these knowledge-based and black-box systems. In our specific experiments, we enrich ParaMor, a rule-based system for unsupervised morphology induction, with probabilistic segmentation confidences by training a statistical natural language tagger to simulate ParaMors morphological segmentations. By adjusting the numeric threshold above which the simulator proposes morpheme boundaries, we improve F1 of morpheme identification on a Hungarian corpus by 5.9% absolute. With numeric confidences in hand, we also combine ParaMors segmentation decisions with those of a second (blackbox) unsupervised morphology induction system, Morfessor. Our joint ParaMor-Morfessor system enhances F1 performance by a further 3.4% absolute, ultimately moving F1 from 41.4% to 50.7%.

Archive | 2008