Eunah Cho
Karlsruhe Institute of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Eunah Cho.
workshop on statistical machine translation | 2014
Markus Freitag; Stephan Peitz; Joern Wuebker; Hermann Ney; Matthias Huck; Rico Sennrich; Nadir Durrani; Maria Nadejde; Philip Williams; Philipp Koehn; Teresa Herrmann; Eunah Cho; Alex Waibel
This paper describes one of the collaborative efforts within EU-BRIDGE to further advance the state of the art in machine translation between two European language pairs, German→English and English→German. Three research institutes involved in the EU-BRIDGE project combined their individual machine translation systems and participated with a joint setup in the shared translation task of the evaluation campaign at the ACL 2014 Eighth Workshop on Statistical Machine Translation (WMT 2014). We combined up to nine different machine translation engines via system combination. RWTH Aachen University, the University of Edinburgh, and Karlsruhe Institute of Technology developed several individual systems which serve as system combination input. We devoted special attention to building syntax-based systems and combining them with the phrasebased ones. The joint setups yield empirical gains of up to 1.6 points in BLEU and 1.0 points in TER on the WMT newstest2013 test set compared to the best single systems.
conference of the international speech communication association | 2016
Jan Niehues; Thai Son Nguyen; Eunah Cho; Thanh-Le Ha; Kevin Kilgour; Markus Müller; Matthias Sperber; Sebastian Stüker; Alex Waibel
Latency is one of the main challenges in the task of simultaneous spoken language translation. While significant improvements in recent years have led to high quality automatic translations, their usefulness in real-time settings is still severely limited due to the large delay between the input speech and the delivered translation. In this paper, we present a novel scheme which reduces the latency of a large scale speech translation system drastically. Within this scheme, the transcribed text and its translation can be updated when more context is available, even after they are presented to the user. Thereby, this scheme allows us to display an initial transcript and its translation to the user with a very low latency. If necessary, both transcript and translation can later be updated to better, more accurate versions until eventually the final versions are displayed. Using this framework, we are able to reduce the latency of the source language transcript into half. For the translation, an average delay of 3.3s was achieved, which is more than twice as fast as our initial system.
Proceedings of the First Conference on Machine Translation: Volume 1,#N# Research Papers | 2016
Jan Niehues; Thanh-Le Ha; Eunah Cho; Alex Waibel
Neural network language and translation models have recently shown their great potentials in improving the performance of phrase-based machine translation. At the same time, word representations using different word factors have been translation quality and are part of many state-of-theart machine translation systems. used in many state-of-the-art machine translation systems, in order to support better translation quality. In this work, we combined these two ideas by investigating the combination of both techniques. By representing words in neural network language models using different factors, we were able to improve the models themselves as well as their impact on the overall machine translation performance. This is especially helpful for morphologically rich languages due to their large vocabulary size. Furthermore, it is easy to add additional knowledge, such as source side information, to the model. Using this model we improved the translation quality of a state-of-the-art phrasebased machine translation system by 0.7 BLEU points. We performed experiments on three language pairs for the news translation task of the WMT 2016 evaluation.
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers | 2016
Thanh-Le Ha; Eunah Cho; Jan Niehues; Mohammed Mediani; Matthias Sperber; Alexandre Allauzen; Alex Waibel
We present our experiments in the scope of the news translation task in WMT 2018, in directions: English→German. The core of our systems is the encoder-decoder based neural machine translation models using the transformer architecture. We enhanced the model with a deeper architecture. By using techniques to limit the memory consumption, we were able to train models that are 4 times larger on one GPU and improve the performance by 1.2 BLEU points. Furthermore, we performed sentence selection for the newly available ParaCrawl corpus. Thereby, we could improve the effectiveness of the corpus by 0.5 BLEU points.
conference of the european chapter of the association for computational linguistics | 2014
Eunah Cho; Jan Niehues; Alex Waibel
Speech disfluencies are one of the main challenges of spoken language processing. Conventional disfluency detection systems deploy a hard decision, which can have a negative influence on subsequent applications such as machine translation. In this paper we suggest a novel approach in which disfluency detection is integrated into the translation process. We train a CRF model to obtain a disfluency probability for each word. The SMT decoder will then skip the potentially disfluent word based on its disfluency probability. Using the suggested scheme, the translation score of both the manual transcript and ASR output is improved by around 0.35 BLEU points compared to the CRF hard decision system.
workshop on statistical machine translation | 2014
Thanh-Le Ha; Quoc-Khanh Do; Eunah Cho; Jan Niehues; Alexandre Allauzen; François Yvon; Alex Waibel
This paper presented the joined submission of KIT and LIMSI to the English to German translation task of WMT 2015. In this year submission, we integrated a neural network-based translation model into a phrase-based translation model by rescoring the n-best lists. Since the computation complexity is one of the main issues for continuous space models, we compared two techniques to reduce the computation cost. We investigated models using a structured output layer as well as models trained with noise contrastive estimation. Furthermore, we evaluated a new method to obtain the best log-linear combination in the rescoring phase. Using these techniques, we were able to improve the BLEU score of the baseline phrase-based system by 1.4 BLEU points.
meeting of the association for computational linguistics | 2017
Jan Niehues; Eunah Cho; Thanh-Le Ha; Alex Waibel
In this paper, we offer an in-depth analysis about the modeling and search performance. We address the question if a more complex search algorithm is necessary. Furthermore, we investigate the question if more complex models which might only be applicable during rescoring are promising. By separating the search space and the modeling using n-best list reranking, we analyze the influence of both parts of an NMT system independently. By comparing differently performing NMT systems, we show that the better translation is already in the search space of the translation systems with less performance. This results indicate that the current search algorithms are sufficient for the NMT systems. Furthermore, we could show that even a relatively small n-best list of 50 hypotheses already contain notably better translations.
workshop on statistical machine translation | 2011
Teresa Herrmann; Mohammed Mediani; Eunah Cho; Thanh-Le Ha; Jan Niehues; Isabel Slawik; Yuqi Zhang; Alex Waibel
IWSLT | 2012
Eunah Cho; Jan Niehues; Alex Waibel
IWSLT | 2011
Mohammed Mediani; Eunah Cho; Jan Niehues; Teresa Herrmann; Alex Waibel