Thanh-Le Ha
Karlsruhe Institute of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Thanh-Le Ha.
conference of the international speech communication association | 2016
Jan Niehues; Thai Son Nguyen; Eunah Cho; Thanh-Le Ha; Kevin Kilgour; Markus Müller; Matthias Sperber; Sebastian Stüker; Alex Waibel
Latency is one of the main challenges in the task of simultaneous spoken language translation. While significant improvements in recent years have led to high quality automatic translations, their usefulness in real-time settings is still severely limited due to the large delay between the input speech and the delivered translation. In this paper, we present a novel scheme which reduces the latency of a large scale speech translation system drastically. Within this scheme, the transcribed text and its translation can be updated when more context is available, even after they are presented to the user. Thereby, this scheme allows us to display an initial transcript and its translation to the user with a very low latency. If necessary, both transcript and translation can later be updated to better, more accurate versions until eventually the final versions are displayed. Using this framework, we are able to reduce the latency of the source language transcript into half. For the translation, an average delay of 3.3s was achieved, which is more than twice as fast as our initial system.
Proceedings of the First Conference on Machine Translation: Volume 1,#N# Research Papers | 2016
Jan Niehues; Thanh-Le Ha; Eunah Cho; Alex Waibel
Neural network language and translation models have recently shown their great potentials in improving the performance of phrase-based machine translation. At the same time, word representations using different word factors have been translation quality and are part of many state-of-theart machine translation systems. used in many state-of-the-art machine translation systems, in order to support better translation quality. In this work, we combined these two ideas by investigating the combination of both techniques. By representing words in neural network language models using different factors, we were able to improve the models themselves as well as their impact on the overall machine translation performance. This is especially helpful for morphologically rich languages due to their large vocabulary size. Furthermore, it is easy to add additional knowledge, such as source side information, to the model. Using this model we improved the translation quality of a state-of-the-art phrasebased machine translation system by 0.7 BLEU points. We performed experiments on three language pairs for the news translation task of the WMT 2016 evaluation.
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers | 2016
Thanh-Le Ha; Eunah Cho; Jan Niehues; Mohammed Mediani; Matthias Sperber; Alexandre Allauzen; Alex Waibel
We present our experiments in the scope of the news translation task in WMT 2018, in directions: English→German. The core of our systems is the encoder-decoder based neural machine translation models using the transformer architecture. We enhanced the model with a deeper architecture. By using techniques to limit the memory consumption, we were able to train models that are 4 times larger on one GPU and improve the performance by 1.2 BLEU points. Furthermore, we performed sentence selection for the newly available ParaCrawl corpus. Thereby, we could improve the effectiveness of the corpus by 0.5 BLEU points.
workshop on statistical machine translation | 2014
Thanh-Le Ha; Quoc-Khanh Do; Eunah Cho; Jan Niehues; Alexandre Allauzen; François Yvon; Alex Waibel
This paper presented the joined submission of KIT and LIMSI to the English to German translation task of WMT 2015. In this year submission, we integrated a neural network-based translation model into a phrase-based translation model by rescoring the n-best lists. Since the computation complexity is one of the main issues for continuous space models, we compared two techniques to reduce the computation cost. We investigated models using a structured output layer as well as models trained with noise contrastive estimation. Furthermore, we evaluated a new method to obtain the best log-linear combination in the rescoring phase. Using these techniques, we were able to improve the BLEU score of the baseline phrase-based system by 1.4 BLEU points.
meeting of the association for computational linguistics | 2017
Jan Niehues; Eunah Cho; Thanh-Le Ha; Alex Waibel
In this paper, we offer an in-depth analysis about the modeling and search performance. We address the question if a more complex search algorithm is necessary. Furthermore, we investigate the question if more complex models which might only be applicable during rescoring are promising. By separating the search space and the modeling using n-best list reranking, we analyze the influence of both parts of an NMT system independently. By comparing differently performing NMT systems, we show that the better translation is already in the search space of the translation systems with less performance. This results indicate that the current search algorithms are sufficient for the NMT systems. Furthermore, we could show that even a relatively small n-best list of 50 hypotheses already contain notably better translations.
workshop on statistical machine translation | 2011
Teresa Herrmann; Mohammed Mediani; Eunah Cho; Thanh-Le Ha; Jan Niehues; Isabel Slawik; Yuqi Zhang; Alex Waibel
international conference on computational linguistics | 2016
Jan Niehues; Eunah Cho; Thanh-Le Ha; Alex Waibel
arXiv: Computation and Language | 2016
Thanh-Le Ha; Jan Niehues; Alex Waibel
IWSLT | 2012
Mohammed Mediani; Yuqi Zhang; Thanh-Le Ha; Jan Niehues; Eunah Cho; Teresa Herrmann; Rainer Kärgel; Alex Waibel
Archive | 2013
Markus Freitag; Stephan Peitz; Joern Wuebker; Hermann Ney; Nadir Durrani; Matthias Huck; Philipp Koehn; Thanh-Le Ha; Jan Niehues; Mohammed Mediani; Teresa Herrmann; Alex Waibel; Nicola Bertoldi; Mauro Cettolo; Marcello Federico