Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Daniel Stein is active.

Publication


Featured researches published by Daniel Stein.


international conference natural language processing | 2006

Statistical machine translation of german compound words

Maja Popović; Daniel Stein; Hermann Ney

German compound words pose special problems to statistical machine translation systems: the occurence of each of the components in the training data is not sufficient for successful translation. Even if the compound itself has been seen during training, the system may not be capable of translating it properly into two or more words. If German is the target language, the system might generate only separated components or may not be capable of choosing the correct compound. In this work, we investigate and compare different strategies for the treatment of German compound words in statistical machine translation systems. For translation from German, we compare linguistic-based and corpus-based compound splitting. For translation into German, we investigate splitting and rejoining German compounds, as well as joining English potential components. Additionaly, we investigate word alignments enhanced with knowledge about the splitting points of German compounds. The translation quality is consistently improved by all methods for both translation directions.


Gesture-Based Human-Computer Interaction and Simulation | 2009

Enhancing a Sign Language Translation System with Vision-Based Features

Philippe Dreuw; Daniel Stein; Hermann Ney

In automatic sign language translation, one of the main problems is the usage of spatial information in sign language and its proper representation and translation, e.g. the handling of spatial reference points in the signing space. Such locations are encoded at static points in signing space as spatial references for motion events. We present a new approach starting from a large vocabulary speech recognition system which is able to recognize sentences of continuous sign language speaker independently. The manual features obtained from the tracking are passed to the statistical machine translation system to improve its accuracy. On a publicly available benchmark database, we achieve a competitive recognition performance and can similarly improve the translation performance by integrating the tracking features.


Machine Translation | 2012

Analysis, preparation, and optimization of statistical sign language machine translation

Daniel Stein; Christoph Schmidt; Hermann Ney

Sign languages represent an interesting niche for statistical machine translation that is typically hampered by the scarceness of suitable data, and most papers in this area apply only a few, well-known techniques that are not adapted to small-sized corpora. In this article, we analyze existing data collections and emphasize their quality and usability for statistical machine translation. We also offer findings in the proper preprocessing of a sign language corpus, by introducing sentence end markers, splitting compound words and handling parallel communication channels. Then, we focus on optimization procedures that are tailored to scarce resources, such as scaling factor optimization, alignment optimization and system combination. All methods are evaluated on two of the largest sign language corpora available.


Machine Translation | 2012

Jane: an advanced freely available hierarchical machine translation toolkit

David Vilar; Daniel Stein; Matthias Huck; Hermann Ney

In this article we will describe the design and implementation of Jane, an efficient hierarchical phrase-based (HPB) toolkit developed at RWTH Aachen University. The system has been used by RWTH at several international evaluation campaigns, including the WMT and NIST evaluations, and is now freely available for non-commercial application. We will go through the main features of Jane, which include, among others, support for different search strategies, different language model formats, support for syntax-based enhancements to the HPB machine translation paradigm, string-to-dependency translation, extended lexicon models, different methods for minimum-error-rate training and distributed operation on a computer cluster. Special attention has been paid to the efficiency of the decoder, clean code and quality assurance through unit and regression testing. Results on current machine translation tasks are reported, which show that the system is able to obtain state-of-the-art performance.


The Prague Bulletin of Mathematical Linguistics | 2011

A Guide to Jane, an Open Source Hierarchical Translation Toolkit

Daniel Stein; David Vilar; Stephan Peitz; Markus Freitag; Matthias Huck; Hermann Ney

A Guide to Jane, an Open Source Hierarchical Translation Toolkit Jane is RWTHs hierarchical phrase-based translation toolkit. It includes tools for phrase extraction, translation and scaling factor optimization, with efficient and documented programs of which large parts can be parallelized. The decoder features syntactic enhancements, reorderings, triplet models, discriminative word lexica, and support for a variety of language model formats. In this article, we will review the main features of Jane and explain the overall architecture. We will also indicate where and how new models can be included.


workshop on statistical machine translation | 2009

The RWTH Machine Translation System for WMT 2009

Maja Popović; David Vilar; Daniel Stein; Hermann Ney

RWTH participated in the shared translation task of the Fourth Workshop of Statistical Machine Translation (WMT 2009) with the German-English, French-English and Spanish-English pair in each translation direction. The submissions were generated using a phrase-based and a hierarchical statistical machine translation systems with appropriate morpho-syntactic enhancements. pos-based reorderings of the source language for the phrase-based systems and splitting of German compounds for both systems were applied. For some tasks, a system combination was used to generate a final hypothesis. An additional English hypothesis was produced by combining all three final systems for translation into English.


Proceedings of the Sixth Workshop on Vision and Language | 2017

Human Evaluation of Multi-modal Neural Machine Translation: A Case-Study on E-Commerce Listing Titles.

Iacer Calixto; Daniel Stein; Sheila Castilho; Andy Way

In this paper, we study how humans perceive the use of images as an additional knowledge source to machine-translate usergenerated product listings in an e-commerce company. We conduct a human evaluation where we assess how a multi-modal neural machine translation (NMT) model compares to two text-only approaches: a conventional state-of-the-art attention-based NMT and a phrase-based statistical machine translation (PBSMT) model. We evaluate translations obtained with different systems and also discuss the data set of user-generated product listings, which in our case comprises both product listings and associated images. We found that humans preferred translations obtained with a PBSMT system to both text-only and multi-modal NMT over 56% of the time. Nonetheless, human evaluators ranked translations from a multi-modal NMT model as better than those of a text-only NMT over 88% of the time, which suggests that images do help NMT in this use-case.


international conference on acoustics, speech, and signal processing | 2014

Gradient-free decoding parameter optimization on automatic speech recognition

Thach Le Nguyen; Daniel Stein; Michael Stadtschnitzer

Finding the optimal decoding parameters in speech recognition is often done manually in a rather tedious manner, although automatic gradient-free optimization techniques have been shown to perform quite well for this task. While there have been recent scientific contributions in this field, no thorough comparison of possible methods, in terms of convergence speed and performance, has been undertaken. In this paper, we conduct a series of experiments with three decoding paradigms and four different optimization techniques found in recent literature, both on unconstrained and time-constrained decoder optimization. We offer our findings on the German Difficult Speech Corpus and on the LinkedTV test sets.


workshop on statistical machine translation | 2010

Jane: Open Source Hierarchical Translation, Extended with Reordering and Lexicon Models

David Vilar; Daniel Stein; Matthias Huck; Hermann Ney


Technology and Disability | 2008

Spoken language processing techniques for sign language recognition and translation

Philippe Dreuw; Daniel Stein; Thomas Deselaers; David Rybach; Morteza Zahedi; Jan Bungeroth; Hermann Ney

Collaboration


Dive into the Daniel Stein's collaboration.

Top Co-Authors

Avatar

Hermann Ney

RWTH Aachen University

View shared research outputs
Top Co-Authors

Avatar

David Vilar

RWTH Aachen University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Andy Way

Dublin City University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

E.A. Ormel

Radboud University Nijmegen

View shared research outputs
Top Co-Authors

Avatar

Onno Crasborn

Radboud University Nijmegen

View shared research outputs
Researchain Logo
Decentralizing Knowledge