Alexander M. Fraser | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Alexander M. Fraser is active.

Explore More

Publication

Featured researches published by Alexander M. Fraser.

Computational Linguistics | 2007

Measuring Word Alignment Quality for Statistical Machine Translation

Alexander M. Fraser; Daniel Marcu

Automatic word alignment plays a critical role in statistical machine translation. Unfortunately, the relationship between alignment quality and statistical machine translation performance has not been well understood. In the recent literature, the alignment task has frequently been decoupled from the translation task and assumptions have been made about measuring alignment quality for machine translation which, it turns out, are not justified. In particular, none of the tens of papers published over the last five years has shown that significant decreases in alignment error rate (AER) result in significant increases in translation performance. This paper explains this state of affairs and presents steps towards measuring alignment quality in a way which is predictive of statistical machine translation performance.

meeting of the association for computational linguistics | 2006

Semi-Supervised Training for Statistical Word Alignment

Alexander M. Fraser; Daniel Marcu

We introduce a semi-supervised approach to training for statistical machine translation that alternates the traditional Expectation Maximization step that is applied on a large training corpus with a discriminative step aimed at increasing word-alignment quality on a small, manually word-aligned sub-corpus. We show that our algorithm leads not only to improved alignments but also to machine translation outputs of higher quality.

international acm sigir conference on research and development in information retrieval | 2002

Empirical studies in strategies for Arabic retrieval

Jinxi Xu; Alexander M. Fraser; Ralph M. Weischedel

This work evaluates a few search strategies for Arabic monolingual and cross-lingual retrieval, using the TREC Arabic corpus as the test-bed. The release by NIST in 2001 of an Arabic corpus of nearly 400k documents with both monolingual and cross-lingual queries and relevance judgments has been a new enabler for empirical studies. Experimental results show that spelling normalization and stemming can significantly improve Arabic monolingual retrieval. Character tri-grams from stems improved retrieval modestly on the test corpus, but the improvement is not statistically significant. To further improve retrieval, we propose a novel thesaurus-based technique. Different from existing approaches to thesaurus-based retrieval, ours formulates word synonyms as probabilistic term translations that can be automatically derived from a parallel corpus. Retrieval results show that the thesaurus can significantly improve Arabic monolingual retrieval. For cross-lingual retrieval (CLIR), we found that spelling normalization and stemming have little impact.

workshop on statistical machine translation | 2009

Experiments in Morphosyntactic Processing for Translating to and from German

Alexander M. Fraser

We describe two shared task systems and associated experiments. The German to English system used reordering rules applied to parses and morphological splitting and stemming. The English to German system used an additional translation step which recreated compound words and generated morphological inflection.

empirical methods in natural language processing | 2015

Joint Lemmatization and Morphological Tagging with Lemming

Thomas Müller; Ryan Cotterell; Alexander M. Fraser; Hinrich Schütze

We present LEMMING, a modular loglinear model that jointly models lemmatization and tagging and supports the integration of arbitrary global features. It is trainable on corpora annotated with gold standard tags and lemmata and does not rely on morphological dictionaries or analyzers. LEMMING sets the new state of the art in token-based statistical lemmatization on six languages; e.g., for Czech lemmatization, we reduce the error by 60%, from 4.05 to 1.58. We also give empirical evidence that jointly modeling morphological tags and lemmata is mutually beneficial.

conference of the european chapter of the association for computational linguistics | 2014

How to Produce Unseen Teddy Bears: Improved Morphological Processing of Compounds in SMT

Fabienne Cap; Alexander M. Fraser; Marion Weller; Aoife Cahill

Compounding in morphologically rich languages is a highly productive process which often causes SMT approaches to fail because of unseen words. We present an approach for translation into a compounding language that splits compounds into simple words for training and, due to an underspecified representation, allows for free merging of simple words into compounds after translation. In contrast to previous approaches, we use features projected from the source language to predict compound mergings. We integrate our approach into end-to-end SMT and show that many compounds matching the reference translation are produced which did not appear in the training data. Additional manual evaluations support the usefulness of generalizing compound formation in SMT.

meeting of the association for computational linguistics | 2005

ISI's Participation in the Romanian-English Alignment Task

Alexander M. Fraser; Daniel Marcu

We discuss results on the shared task of Romanian-English word alignment. The baseline technique is that of symmetrizing two word alignments automatically generated using IBM Model 4. A simple vocabulary reduction technique results in an improvement in performance. We also report on a new alignment model and a new training algorithm based on alternating maximization of likelihood with minimization of error rate.

Proceedings of the First Workshop on Computational Approaches to Compound Analysis (ComAComA 2014) | 2014

Distinguishing Degrees of Compositionality in Compound Splitting for Statistical Machine Translation

Marion Weller; Fabienne Cap; Stefan Müller; Sabine Schulte im Walde; Alexander M. Fraser

The paper presents an approach to morphological compound splitting that takes the degree of compositionality into account. We apply our approach to German noun compounds and particle verbs within a German‐English SMT system, and study the effect of only splitting compositional compounds as opposed to an aggressive splitting. A qualitative study explores the translational behaviour of non-compositional compounds.

Computational Linguistics | 2013

Knowledge sources for constituent parsing of german, a morphologically rich and less-configurational language

Alexander M. Fraser; Helmut Schmid; Richárd Farkas; Renjing Wang; Hinrich Schütze

We study constituent parsing of German, a morphologically rich and less-configurational language. We use a probabilistic context-free grammar treebank grammar that has been adapted to the morphologically rich properties of German by markovization and special features added to its productions. We evaluate the impact of adding lexical knowledge. Then we examine both monolingual and bilingual approaches to parse reranking. Our reranking parser is the new state of the art in constituency parsing of the TIGER Treebank. We perform an analysis, concluding with lessons learned, which apply to parsing other morphologically rich and less-configurational languages.

meeting of the association for computational linguistics | 2009

Rich Bitext Projection Features for Parse Reranking

Alexander M. Fraser; Renjing Wang; Hinrich Schütze

Many different types of features have been shown to improve accuracy in parse reranking. A class of features that thus far has not been considered is based on a projection of the syntactic structure of a translation of the text to be parsed. The intuition for using this type of bitext projection feature is that ambiguous structures in one language often correspond to unambiguous structures in another. We show that reranking based on bitext projection features increases parsing accuracy significantly.

Explore More