Sebastian Stüker
Karlsruhe Institute of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Sebastian Stüker.
international conference on acoustics, speech, and signal processing | 2003
Sebastian Stüker; Tanja Schultz; Florian Metze; Alex Waibel
Speech recognition systems based on or aided by articulatory features, such as place and manner of articulation, have been shown to be useful under varying circumstances. Recognizers based on features better compensate channel and noise variability. We show that it is also possible to compensate for inter language variability using articulatory feature detectors. We come to the conclusion that articulatory features can be recognized across languages and that using detectors from many languages can improve the classification accuracy of the feature detectors on a single language. We further demonstrate how those multilingual and cross-lingual detectors can support an HMM based recognizer and thereby significantly reduce the word error rate by up to 12.3% relative. We expect that with the use of multilingual articulatory features it is possible to support the rapid deployment of recognition systems for new target languages.
ieee automatic speech recognition and understanding workshop | 2005
M. Paulik; Sebastian Stüker; Christian Fügen; Tanja Schultz; Thomas Schaaf; Alex Waibel
Nowadays official documents have to be made available in many languages, like for example in the EU with its 20 official languages. Therefore, the need for effective tools to aid the multitude of human translators in their work becomes easily apparent. An ASR system, enabling the human translator to speak his translation in an unrestricted manner, instead of typing it, constitutes such a tool. In this work we improve the recognition performance of such an ASR system on the target language of the human translator by taking advantage of an either written or spoken source language representation. To do so, machine translation techniques are used to translate between the different languages and then the involved ASR systems are biased towards the gained knowledge. We present an iterative approach for ASR improvement and outperform our baseline system by a relative word error rate reduction of 35.8%/29.9% in the case of a written/spoken source language representation. Further, we show how multiple target languages, as for example provided by different simultaneous translators during European Parliament debates, can be incorporated into our system design for an improvement of all involved ASR systems
international conference on acoustics, speech, and signal processing | 2006
Christian Fügen; Muntsin Kolss; Dietmar Bernreuther; Matthias Paulik; Sebastian Stüker; Stephan Vogel; Alex Waibel
For years speech translation has focused on the recognition and translation of discourses in limited domains, such as hotel reservations or scheduling tasks. Only recently research projects have been started to tackle the problem of open domain speech recognition and translation of complex tasks such as lectures and speeches. In this paper we present the on-going work at our laboratory in open domain speech translation of lectures and parliamentary speeches. Starting from a translation system for European parliamentary plenary sessions and a lecture speech recognition system we show how both components perform in unison on speech translation of lectures
international conference on machine learning | 2006
Christian Fügen; Shajith Ikbal; Florian Kraft; Kenichi Kumatani; Kornel Laskowski; John W. McDonough; Mari Ostendorf; Sebastian Stüker; Matthias Wölfel
This paper describes the 2006 lecture and conference meeting speech-to-text system developed at the Interactive Systems Laboratories (ISL), for the individual head-mounted microphone (IHM), single distant microphone (SDM), and multiple distant microphone (MDM) conditions, which was evaluated in the RT-06S Rich Transcription Meeting Evaluation sponsored by the US National Institute of Standards and Technologies (NIST). We describe the principal differences between our current system and those submitted in previous years, namely improved acoustic and language models, cross adaptation between systems with different front-ends and phoneme sets, and the use of various automatic speech segmentation algorithms.
ieee automatic speech recognition and understanding workshop | 2003
Christian Fügen; Sebastian Stüker; Hagen Soltau; Florian Metze; Tanja Schultz
We introduce techniques for building a multilingual speech recognizer. More specifically, we present a new language model method that allows for the combination of several monolingual into one multilingual language model. Furthermore, we extend our techniques to the concept of grammars. All linguistic knowledge sources share one common interface to the search engine. As a consequence, new language model types can be easily integrated into our Ibis decoder. Based on a multilingual acoustic model, we compare multilingual statistical n-gram language models with multilingual grammars. Results are given in terms of recognition performance as well as resource requirements. They show that: (a) n-gram LMs can be easily combined at the meta level without major loss in performance; (b) grammars are very suitable to model multilinguality; (c) language switches can be significantly reduced by using the introduced techniques; (d) the resource overhead for handling multiple languages in one language model is acceptable; (e) language identification can be done implicitly during decoding.
international conference on acoustics, speech, and signal processing | 2008
Sebastian Stüker
Automatic speech recognition (ASR) systems have been developed only for a very limited number of the estimated 7,000 languages in the world. In order to avoid the evolvement of a digital divide between languages for which ASR systems exist and those without one, it is necessary to be able to rapidly create ASR systems for new languages in a cost efficient way. Grapheme based systems, which eliminate the costly need for a pronunciation dictionary, have been shown to work for a variety of languages. They are thus destined for porting ASR systems to new languages. This paper studies the use of multilingual grapheme based models for rapidly bootstrapping acoustic models in new languages. The cross language performance of a standard, multilingual (ML) acoustic model on a new language is improved by introducing a new, modified version of polyphone decision tree specialization that improves the performance of the ML models by up to 15.5% relative.
ieee international conference on high performance computing data and analytics | 2011
Sebastian Stüker; Kevin Kilgour; Jan Niehues
Our laboratory has used the HP XC4000, the high performance computer of the federal state Baden-Wnrttemberg, in order to participate in the second Quaero evaluation for automatic speech recognition (ASR) and Machine Translation (MT). State-of-the-art automatic speech recognition and machine translation systems train use stochastic models which are trained on large amounts of training data using techniques from the field of machine learning. Using these techniques the systems search for the most likely speech recognition hypothesis, translation hypothesis respectively.
international conference on acoustics, speech, and signal processing | 2014
Quoc Bao Nguyen; Jonas Gehring; Markus Müller; Sebastian Stüker; Alex Waibel
In this work, we propose a deep bottleneck feature architecture that is able to leverage data from multiple languages. We also show that tonal features are helpful for non-tonal languages. Evaluations are performed on a low-resource conversational telephone speech transcription task in Bengali, while additional data for DBNF training is provided in Assamese, Pashto, Tagalog, Turkish, and Vietnamese. We obtain relative reductions of up to 17.3% and 9.4% WER over mono-lingual GMMs and DBNFs, respectively.
international conference on acoustics, speech, and signal processing | 2007
Sebastian Stüker; Matthias Paulik; Muntsin Kolss; Christian Fügen; Alex Waibel
In this paper we describe our work in coupling automatic speech recognition (ASR) and machine translation (MT) in a speech translation enhanced automatic speech recognition (STE-ASR) framework for transcribing and translating European parliament speeches. We demonstrate the influence of the quality of the ASR component on the MT performance, by comparing a series of WERs with the corresponding automatic translation scores. By porting an STE-ASR framework to the task at hand, we show how the word errors for transcribing English and Spanish speeches can be lowered by 3.0% and 4.8% relative, respectively.
international conference on acoustics, speech, and signal processing | 2004
Alex Waibel; Tanja Schultz; Stephan Vogel; Christian Fügen; Matthias Honal; Muntsin Kolss; Jürgen Reichert; Sebastian Stüker
Speech translation has made significant advances over the last years. We believe that we can overcome todays limits of language and domain portable conversational speech translation systems by relying more radically on learning approaches and by the use of multiple layers of reduction and transformation to extract the desired content in another language. Therefore, we cascade stochastic source-channel models that extract an underlying message from a corrupt observed output. The three models effectively translate: (1) speech to word lattices (automatic speech recognition, ASR); (2) ill-formed fragments of word strings into a compact well-formed sentence (Clean); (3) sentences in one language to sentences in another (machine translation, MT). We present results of our research efforts towards rapid language portability of all these components. The results on translation suggest that MT systems can be successfully constructed for any language pair by cascading multiple MT systems via English. Moreover, end-to-end performance can be improved, if the interlingua language is enriched with additional linguistic information that can be derived automatically and monolingually in a data-driven fashion.