Philipp Koehn | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Philipp Koehn is active.

Explore More

Publication

Featured researches published by Philipp Koehn.

north american chapter of the association for computational linguistics | 2003

Statistical phrase-based translation

Philipp Koehn; Franz Josef Och; Daniel Marcu

We propose a new phrase-based translation model and decoding algorithm that enables us to evaluate and compare several, previously proposed phrase-based translation models. Within our framework, we carry out a large number of experiments to understand better and explain why phrase-based models out-perform word-based models. Our empirical results, which hold for all examined language pairs, suggest that the highest levels of performance can be obtained through relatively simple means: heuristic learning of phrase translations from word-based alignments and lexical weighting of phrase translations. Surprisingly, learning phrases longer than three words and learning phrases from high-accuracy word-level alignment models does not have a strong impact on performance. Learning only syntactically motivated phrases degrades the performance of our systems.

conference of the association for machine translation in the americas | 2004

Pharaoh: A Beam Search Decoder for Phrase-Based Statistical Machine Translation Models

Philipp Koehn

We describe Pharaoh, a freely available decoder for phrase-based statistical machine translation models. The decoder is the implement at ion of an efficient dynamic programming search algorithm with lattice generation and XML markup for external components.

meeting of the association for computational linguistics | 2005

Clause Restructuring for Statistical Machine Translation

Michael Collins; Philipp Koehn; Ivona Kučerová

We describe a method for incorporating syntactic information in statistical machine translation systems. The first step of the method is to parse the source language string that is being translated. The second step is to apply a series of transformations to the parse tree, effectively reordering the surface string on the source language side of the translation system. The goal of this step is to recover an underlying word order that is closer to the target language word-order than the original string. The reordering approach is applied as a pre-processing step in both the training and decoding phases of a phrase-based statistical MT system. We describe experiments on translation from German to English, showing an improvement from 25.2% Bleu score for a baseline system to 26.8% Bleu score for the system with reordering, a statistically significant improvement.

workshop on statistical machine translation | 2007

(Meta-) Evaluation of Machine Translation

Chris Callison-Burch; Cameron S. Fordyce; Philipp Koehn; Christof Monz; Josh Schroeder

This paper evaluates the translation quality of machine translation systems for 8 language pairs: translating French, German, Spanish, and Czech to English and back. We carried out an extensive human evaluation which allowed us not only to rank the different MT systems, but also to perform higher-level analysis of the evaluation process. We measured timing and intra- and inter-annotator agreement for three types of subjective evaluation. We measured the correlation of automatic evaluation metrics with human judgments. This meta-evaluation reveals surprising facts about the most commonly used methodologies.

workshop on statistical machine translation | 2007

Experiments in Domain Adaptation for Statistical Machine Translation

Philipp Koehn; Josh Schroeder

The special challenge of the WMT 2007 shared task was domain adaptation. We took this opportunity to experiment with various ways of adapting a statistical machine translation systems to a special domain (here: news commentary), when most of the training data is from a different domain (here: European Parliament speeches). This paper also gives a description of the submission of the University of Edinburgh to the shared task.

conference of the european chapter of the association for computational linguistics | 2003

Empirical methods for compound splitting

Philipp Koehn; Kevin Knight

Compounded words are a challenge for NLP applications such as machine translation (MT). We introduce methods to learn splitting rules from monolingual and parallel corpora. We evaluate them against a gold standard and measure their impact on performance of statistical MT systems. Results show accuracy of 99.1% and performance gains for MT of 0.039 BLEU on a German-English noun phrase translation task.

language and technology conference | 2006

Improved Statistical Machine Translation Using Paraphrases

Chris Callison-Burch; Philipp Koehn; Miles Osborne

Parallel corpora are crucial for training SMT systems. However, for many language pairs they are available only in very limited quantities. For these language pairs a huge portion of phrases encountered at run-time will be unknown. We show how techniques from paraphrasing can be used to deal with these otherwise unknown source language phrases. Our results show that augmenting a state-of-the-art SMT system with paraphrases leads to significantly improved coverage and translation quality. For a training corpus with 10,000 sentence pairs we increase the coverage of unique test set unigrams from 48% to 90%, with more than half of the newly covered items accurately translated, as opposed to none in current approaches.

meeting of the association for computational linguistics | 2002

Learning a Translation Lexicon from Monolingual Corpora

Philipp Koehn; Kevin Knight

This paper presents work on the task of constructing a word-level translation lexicon purely from unrelated monolingual corpora. We combine various clues such as cognates, similar context, preservation of word similarity, and word frequency. Experimental results for the construction of a German-English noun lexicon are reported. Noun translation accuracy of 39% scored against a parallel test corpus could be achieved.

workshop on statistical machine translation | 2006

Manual and Automatic Evaluation of Machine Translation between European Languages

Philipp Koehn; Christof Monz

We evaluated machine translation performance for six European language pairs that participated in a shared task: translating French, German, Spanish texts to English and back. Evaluation was done automatically using the Bleu score and manually on fluency and adequacy.

workshop on statistical machine translation | 2014

Findings of the 2014 Workshop on Statistical Machine Translation

Ondrej Bojar; Christian Buck; Christian Federmann; Barry Haddow; Philipp Koehn; Johannes Leveling; Christof Monz; Pavel Pecina; Matt Post; Herve Saint-Amand; Radu Soricut; Lucia Specia; Aleš Tamchyna

This paper presents the results of the WMT14 shared tasks, which included a standard news translation task, a separate medical translation task, a task for run-time estimation of machine translation quality, and a metrics task. This year, 143 machine translation systems from 23 institutions were submitted to the ten translation directions in the standard translation task. An additional 6 anonymized systems were included, and were then evaluated both automatically and manually. The quality estimation task had four subtasks, with a total of 10 teams, submitting 57 entries

Explore More