Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Sebastian Stüker is active.

Publication


Featured researches published by Sebastian Stüker.


international conference on acoustics, speech, and signal processing | 2003

Multilingual articulatory features

Sebastian Stüker; Tanja Schultz; Florian Metze; Alex Waibel

Speech recognition systems based on or aided by articulatory features, such as place and manner of articulation, have been shown to be useful under varying circumstances. Recognizers based on features better compensate channel and noise variability. We show that it is also possible to compensate for inter language variability using articulatory feature detectors. We come to the conclusion that articulatory features can be recognized across languages and that using detectors from many languages can improve the classification accuracy of the feature detectors on a single language. We further demonstrate how those multilingual and cross-lingual detectors can support an HMM based recognizer and thereby significantly reduce the word error rate by up to 12.3% relative. We expect that with the use of multilingual articulatory features it is possible to support the rapid deployment of recognition systems for new target languages.


ieee automatic speech recognition and understanding workshop | 2005

Speech translation enhanced automatic speech recognition

M. Paulik; Sebastian Stüker; Christian Fügen; Tanja Schultz; Thomas Schaaf; Alex Waibel

Nowadays official documents have to be made available in many languages, like for example in the EU with its 20 official languages. Therefore, the need for effective tools to aid the multitude of human translators in their work becomes easily apparent. An ASR system, enabling the human translator to speak his translation in an unrestricted manner, instead of typing it, constitutes such a tool. In this work we improve the recognition performance of such an ASR system on the target language of the human translator by taking advantage of an either written or spoken source language representation. To do so, machine translation techniques are used to translate between the different languages and then the involved ASR systems are biased towards the gained knowledge. We present an iterative approach for ASR improvement and outperform our baseline system by a relative word error rate reduction of 35.8%/29.9% in the case of a written/spoken source language representation. Further, we show how multiple target languages, as for example provided by different simultaneous translators during European Parliament debates, can be incorporated into our system design for an improvement of all involved ASR systems


international conference on acoustics, speech, and signal processing | 2006

Open Domain Speech Recognition a Translation:Lectures and Speeches

Christian Fügen; Muntsin Kolss; Dietmar Bernreuther; Matthias Paulik; Sebastian Stüker; Stephan Vogel; Alex Waibel

For years speech translation has focused on the recognition and translation of discourses in limited domains, such as hotel reservations or scheduling tasks. Only recently research projects have been started to tackle the problem of open domain speech recognition and translation of complex tasks such as lectures and speeches. In this paper we present the on-going work at our laboratory in open domain speech translation of lectures and parliamentary speeches. Starting from a translation system for European parliamentary plenary sessions and a lecture speech recognition system we show how both components perform in unison on speech translation of lectures


international conference on machine learning | 2006

The ISL RT-06S speech-to-text system

Christian Fügen; Shajith Ikbal; Florian Kraft; Kenichi Kumatani; Kornel Laskowski; John W. McDonough; Mari Ostendorf; Sebastian Stüker; Matthias Wölfel

This paper describes the 2006 lecture and conference meeting speech-to-text system developed at the Interactive Systems Laboratories (ISL), for the individual head-mounted microphone (IHM), single distant microphone (SDM), and multiple distant microphone (MDM) conditions, which was evaluated in the RT-06S Rich Transcription Meeting Evaluation sponsored by the US National Institute of Standards and Technologies (NIST). We describe the principal differences between our current system and those submitted in previous years, namely improved acoustic and language models, cross adaptation between systems with different front-ends and phoneme sets, and the use of various automatic speech segmentation algorithms.


ieee automatic speech recognition and understanding workshop | 2003

Efficient handling of multilingual language models

Christian Fügen; Sebastian Stüker; Hagen Soltau; Florian Metze; Tanja Schultz

We introduce techniques for building a multilingual speech recognizer. More specifically, we present a new language model method that allows for the combination of several monolingual into one multilingual language model. Furthermore, we extend our techniques to the concept of grammars. All linguistic knowledge sources share one common interface to the search engine. As a consequence, new language model types can be easily integrated into our Ibis decoder. Based on a multilingual acoustic model, we compare multilingual statistical n-gram language models with multilingual grammars. Results are given in terms of recognition performance as well as resource requirements. They show that: (a) n-gram LMs can be easily combined at the meta level without major loss in performance; (b) grammars are very suitable to model multilinguality; (c) language switches can be significantly reduced by using the introduced techniques; (d) the resource overhead for handling multiple languages in one language model is acceptable; (e) language identification can be done implicitly during decoding.


international conference on acoustics, speech, and signal processing | 2008

Modified polyphone decision tree specialization for porting multilingual Grapheme based ASR systems to new languages

Sebastian Stüker

Automatic speech recognition (ASR) systems have been developed only for a very limited number of the estimated 7,000 languages in the world. In order to avoid the evolvement of a digital divide between languages for which ASR systems exist and those without one, it is necessary to be able to rapidly create ASR systems for new languages in a cost efficient way. Grapheme based systems, which eliminate the costly need for a pronunciation dictionary, have been shown to work for a variety of languages. They are thus destined for porting ASR systems to new languages. This paper studies the use of multilingual grapheme based models for rapidly bootstrapping acoustic models in new languages. The cross language performance of a standard, multilingual (ML) acoustic model on a new language is improved by introducing a new, modified version of polyphone decision tree specialization that improves the performance of the ML models by up to 15.5% relative.


ieee international conference on high performance computing data and analytics | 2011

Quaero Speech-to-Text and Text Translation Evaluation Systems

Sebastian Stüker; Kevin Kilgour; Jan Niehues

Our laboratory has used the HP XC4000, the high performance computer of the federal state Baden-Wnrttemberg, in order to participate in the second Quaero evaluation for automatic speech recognition (ASR) and Machine Translation (MT). State-of-the-art automatic speech recognition and machine translation systems train use stochastic models which are trained on large amounts of training data using techniques from the field of machine learning. Using these techniques the systems search for the most likely speech recognition hypothesis, translation hypothesis respectively.


international conference on acoustics, speech, and signal processing | 2014

Multilingual shifting deep bottleneck features for low-resource ASR

Quoc Bao Nguyen; Jonas Gehring; Markus Müller; Sebastian Stüker; Alex Waibel

In this work, we propose a deep bottleneck feature architecture that is able to leverage data from multiple languages. We also show that tonal features are helpful for non-tonal languages. Evaluations are performed on a low-resource conversational telephone speech transcription task in Bengali, while additional data for DBNF training is provided in Assamese, Pashto, Tagalog, Turkish, and Vietnamese. We obtain relative reductions of up to 17.3% and 9.4% WER over mono-lingual GMMs and DBNFs, respectively.


international conference on acoustics, speech, and signal processing | 2007

Speech Translation Enhanced ASR for European Parliament Speeches - On the Influence of ASR Performance on Speech Translation

Sebastian Stüker; Matthias Paulik; Muntsin Kolss; Christian Fügen; Alex Waibel

In this paper we describe our work in coupling automatic speech recognition (ASR) and machine translation (MT) in a speech translation enhanced automatic speech recognition (STE-ASR) framework for transcribing and translating European parliament speeches. We demonstrate the influence of the quality of the ASR component on the MT performance, by comparing a series of WERs with the corresponding automatic translation scores. By porting an STE-ASR framework to the task at hand, we show how the word errors for transcribing English and Spanish speeches can be lowered by 3.0% and 4.8% relative, respectively.


international conference on acoustics, speech, and signal processing | 2004

Towards language portability in statistical speech translation

Alex Waibel; Tanja Schultz; Stephan Vogel; Christian Fügen; Matthias Honal; Muntsin Kolss; Jürgen Reichert; Sebastian Stüker

Speech translation has made significant advances over the last years. We believe that we can overcome todays limits of language and domain portable conversational speech translation systems by relying more radically on learning approaches and by the use of multiple layers of reduction and transformation to extract the desired content in another language. Therefore, we cascade stochastic source-channel models that extract an underlying message from a corrupt observed output. The three models effectively translate: (1) speech to word lattices (automatic speech recognition, ASR); (2) ill-formed fragments of word strings into a compact well-formed sentence (Clean); (3) sentences in one language to sentences in another (machine translation, MT). We present results of our research efforts towards rapid language portability of all these components. The results on translation suggest that MT systems can be successfully constructed for any language pair by cascading multiple MT systems via English. Moreover, end-to-end performance can be improved, if the interlingua language is enriched with additional linguistic information that can be derived automatically and monolingually in a data-driven fashion.

Collaboration


Dive into the Sebastian Stüker's collaboration.

Top Co-Authors

Avatar

Alex Waibel

Karlsruhe Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Markus Müller

Karlsruhe Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Kevin Kilgour

Karlsruhe Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Christian Fügen

Karlsruhe Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Jan Niehues

Karlsruhe Institute of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Florian Kraft

Karlsruhe Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Matthias Sperber

Karlsruhe Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Matthias Wölfel

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar

Muntsin Kolss

Karlsruhe Institute of Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge