Gregor Leusch
RWTH Aachen University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Gregor Leusch.
IEEE Transactions on Audio, Speech, and Language Processing | 2008
Gregor Leusch; Rafael E. Banchs; Nicola Bertoldi; Daniel Déchelotte; Marcello Federico; Muntsin Kolss; Young-Suk Lee; José B. Mariño; Matthias Paulik; Salim Roukos; Holger Schwenk; Hermann Ney
This paper describes an approach for computing a consensus translation from the outputs of multiple machine translation (MT) systems. The consensus translation is computed by weighted majority voting on a confusion network, similarly to the well-established ROVER approach of Fiscus for combining speech recognition hypotheses. To create the confusion network, pairwise word alignments of the original MT hypotheses are learned using an enhanced statistical alignment algorithm that explicitly models word reordering. The context of a whole corpus of automatic translations rather than a single sentence is taken into account in order to achieve high alignment quality. The confusion network is rescored with a special language model, and the consensus translation is extracted as the best path. The proposed system combination approach was evaluated in the framework of the TC-STAR speech translation project. Up to six state-of-the-art statistical phrase-based translation systems from different project partners were combined in the experiments. Significant improvements in translation quality from Spanish to English and from English to Spanish in comparison with the best of the individual MT systems were achieved under official evaluation conditions.
workshop on statistical machine translation | 2009
Gregor Leusch; Hermann Ney
RWTH participated in the System Combination task of the Fourth Workshop on Statistical Machine Translation (WMT 2009). Hypotheses from 9 German→English MT systems were combined into a consensus translation. This consensus translation scored 2.1% better in Bleu and 2.3% better in Ter (abs.) than the best single system. In addition, cross-lingual output from 10 French, German, and Spanish→English systems was combined into a consensus translation, which gave an improvement of 2.0% in Bleu/3.5% in Ter (abs.) over the best single system.
workshop on statistical machine translation | 2007
David Vilar; Gregor Leusch; Hermann Ney; Rafael E. Banchs
We introduce a novel evaluation scheme for the human evaluation of different machine translation systems. Our method is based on direct comparison of two sentences at a time by human judges. These binary judgments are then used to decide between all possible rankings of the systems. The advantages of this new method are the lower dependency on extensive evaluation guidelines, and a tighter focus on a typical evaluation task, namely the ranking of systems. Furthermore we argue that machine translation evaluations should be regarded as statistical processes, both for human and automatic evaluation. We show how confidence ranges for state-of-the-art evaluation measures such as WER and TER can be computed accurately and efficiently without having to resort to Monte Carlo estimates. We give an example of our new evaluation scheme, as well as a comparison with classical automatic and human evaluation on data from a recent international evaluation campaign.
empirical methods in natural language processing | 2008
Gregor Leusch; Hermann Ney
Confusion networks are a simple representation of multiple speech recognition or translation hypotheses in a machine translation system. A typical operation on a confusion network is to find the path which minimizes or maximizes a certain evaluation metric. In this article, we show that this problem is generally NP-hard for the popular BLEU metric, as well as for smaller variants of BLEU. This also holds for more complex representations like generic word graphs. In addition, we give an efficient polynomial-time algorithm to calculate unigram BLEU on confusion networks, but show that even small generalizations of this data structure render the problem to be NP-hard again. Since finding the optimal solution is thus not always feasible, we introduce an approximating algorithm based on a multi-stack decoder, which finds a (not necessarily optimal) solution for n-gram BLEU in polynomial time.
Machine Translation | 2009
Gregor Leusch; Hermann Ney
We present two evaluation measures for Machine Translation (MT), which are defined as error rates extended by block moves. In contrast to Ter, these measures are constrained in a way that allows for an exact calculation in polynomial time. We then investigate three methods to estimate the standard error of error rates, and compare them to bootstrap estimates. We assess the correlation of our proposed measures with human judgment using data from the National Institute of Standards and Technology (NIST) 2008 MetricsMATR workshop.
language resources and evaluation | 2000
Sonja Nießen; Franz Josef Och; Gregor Leusch; Hermann Ney
conference of the european chapter of the association for computational linguistics | 2006
Gregor Leusch; Nicola Ueffing; Hermann Ney
IWSLT | 2005
Gregor Leusch; Oliver Bender; Hermann Ney
workshop on statistical machine translation | 2010
Matthias Huck; Joern Wuebker; Christoph Schmidt; Markus Freitag; Stephan Peitz; Daniel Stein; Arnaud Dagnelies; Saab Mansour; Gregor Leusch; Hermann Ney
meeting of the association for computational linguistics | 2013
Ahmed El Kholy; Nizar Habash; Gregor Leusch; Hassan Sawaf