Stephan Kanthak | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Stephan Kanthak is active.

Explore More

Publication

Featured researches published by Stephan Kanthak.

meeting of the association for computational linguistics | 2005

Novel Reordering Approaches in Phrase-Based Statistical Machine Translation

Stephan Kanthak; David Vilar; Richard Zens; Hermann Ney

This paper presents novel approaches to reordering in phrase-based statistical machine translation. We perform consistent reordering of source sentences in training and estimate a statistical translation model. Using this model, we follow a phrase-based monotonic machine translation approach, for which we develop an efficient and flexible reordering framework that allows to easily introduce different reordering constraints. In translation, we apply source sentence reordering on word level and use a reordering automaton as input. We show how to compute reordering automata on-demand using IBM or ITG constraints, and also introduce two new types of reordering constraints. We further add weights to the reordering automata. We present detailed experimental results and show that reordering significantly improves translation quality.

international conference on acoustics, speech, and signal processing | 2002

Context-dependent acoustic modeling using graphemes for large vocabulary speech recognition

Stephan Kanthak; Hermann Ney

In this paper we propose to use a decision tree based on graphemic acoustic sub-word units together with phonetic questions. We also show that automatic question generation can be used to completely eliminate any manual effort.

IEEE Transactions on Speech and Audio Processing | 2002

Speaker adaptive modeling by vocal tract normalization

Lutz Welling; Hermann Ney; Stephan Kanthak

This paper presents methods for speaker adaptive modeling using vocal tract normalization (VTN) along with experimental tests on three databases. We propose a new training method for VTN: By using single-density acoustic models per HMM state for selecting the scale factor of the frequency axis, we avoid the problem that a mixture-density tends to learn the scale factors of the training speakers and thus cannot be used for selecting the scale factor. We show that using single Gaussian densities for selecting the scale factor in training results in lower error rates than using mixture densities. For the recognition phase, we propose an improvement of the well-known two-pass strategy: by using a non-normalized acoustic model for the first recognition pass instead of a normalized model, lower error rates are obtained. In recognition tests, this method is compared with a fast variant of VTN. The two-pass strategy is an efficient method, but it is suboptimal because the scale factor and the word sequence are determined sequentially. We found that for telephone digit string recognition this suboptimality reduces the VTN gain in recognition performance by 30% relative. In summary, on the German spontaneous speech task Verbmobil, the WSJ task and the German telephone digit string corpus SieTill, the proposed methods for VTN reduce the error rates significantly.

international conference on acoustics speech and signal processing | 1999

Improved methods for vocal tract normalization

Lutz Welling; Stephan Kanthak; Hermann Ney

This paper presents improved methods for vocal tract normalization (VTN) along with experimental tests on three databases. We propose a new method for VTN in training: by using acoustic models with single Gaussian densities per state for selecting the normalization scales the need for the models to learn the normalization scales of the training speakers is avoided. We show that using single Gaussian densities for selecting the normalization scales in training results in lower error rates than using mixture densities. For VTN in recognition, we propose an improvement of the well-known multiple-pass strategy: by using an unnormalized acoustic model for the first recognition pass instead of a normalized model lower error rates are obtained. In recognition tests, this method is compared with a fast variant of VTN. The multiple-pass strategy is an efficient method but it is suboptimal because the normalization scale and the word sequence are determined sequentially. We found that for telephone digit string recognition this suboptimality reduces the VTN gain in recognition performance by 30% relative. On the German spontaneous scheduling task Verbmobil, the WSJ task and the German telephone digit string corpus SieTill the proposed methods for VTN reduce the error rates significantly.

international conference on acoustics, speech, and signal processing | 2005

Cross domain automatic transcription on the TC-STAR EPPS corpus

Christian Gollan; Maximilian Bisani; Stephan Kanthak; Ralf Schlüter; Hermann Ney

This paper describes the ongoing development of the British English European Parliament Plenary Session corpus. This corpus will be part of the speech-to-speech translation evaluation infrastructure of the European TC-STAR project. Furthermore, we present first recognition results on the English speech recordings. The transcription system has been derived from an older speech recognition system built for the North-American broadcast news task. We report on the measures taken for rapid cross-domain porting and present encouraging results.

international conference on acoustics, speech, and signal processing | 2000

Using SIMD instructions for fast likelihood calculation in LVCSR

Stephan Kanthak; Kai Schütz; Hermann Ney

Most modern processor architectures provide SIMD (single instruction multiple data) instructions to speed up algorithms based on vector or matrix operations. This paper describes the use of SIMD instructions to calculate Gaussian or Laplacian densities in a large vocabulary speech recognition system. We present a simple, robust method based on scalar quantization of the mean and observation vector components without any loss in recognition performance while speeding up the whole systems runtime by a factor of 3. Combining the approach with vector space partitioning techniques accelerated the overall system by a factor of over 7. The experiments show that the approach can be also applied to Viterbi training without any loss of accuracy. All experiments were conducted on a German, 10,000-word, spontaneous speech task using two architectures, namely Intel Pentium III and SUN UltraSPARC.

meeting of the association for computational linguistics | 2004

FSA: An Efficient and Flexible C++ Toolkit for Finite State Automata Using On-Demand Computation

Stephan Kanthak; Hermann Ney

In this paper we present the RWTH FSA toolkit --- an efficient implementation of algorithms for creating and manipulating weighted finite-state automata. The toolkit has been designed using the principle of on-demand computation and offers a large range of widely used algorithms. To prove the superior efficiency of the toolkit, we compare the implementation to that of other publically available toolkits. We also show that on-demand computations help to reduce memory requirements significantly without any loss in speed. To increase its flexibility, the RWTH FSA toolkit supports high-level interfaces to the programming language Python as well as a command-line tool for interactive manipulation of FSAs. Furthermore, we show how to utilize the toolkit to rapidly build a fast and accurate statistical machine translation system. Future extensibility of the toolkit is ensured as it will be publically available as open source software.

international conference on acoustics, speech, and signal processing | 2000

Recent improvements of the RWTH large vocabulary speech recognition system on spontaneous speech

Achim Sixtus; Sirko Molau; Stephan Kanthak; Ralf Schlüter; Hermann Ney

The paper presents recent improvements of the RWTH large vocabulary continuous speech recognition system (LVCSR). In particular, we report on the integration of across-word models into the first recognition pass, and describe better algorithms for fast vocal tract normalization (VTN). We focus both on improvements in word error rate and how to speed up the recognizer with only minimal loss of recognition accuracy. Implementation details and experimental results are given for the VerbMobil task, a German spontaneous speech corpus. The 25.0% word error rate (WER) of our within-word baseline system was reduced to 21.4% with VTN and across-word models. Decreasing the real-time factor (RTF) by up to 85% resulted in only a small degradation in recognition performance of 2% relative on average.

international conference on acoustics, speech, and signal processing | 2006

Integrating Speech Recognition and Machine Translation: Where do We Stand?

Stephan Kanthak; Hermann Ney

This paper describes state-of-the-art interfaces between speech recognition and machine translation. We modify two different machine translation systems to effectively process dense speech recognition lattices. In addition, we describe how to fully integrate speech translation with machine translation based on weighted finite-state transducers. With a thorough set of experiments, we show that both the acoustic model scores and the source language model positively and significantly affect the translation quality. We have found consistent improvements on three different corpora compared with translations of single best recognition results

Archive | 2000

Fast Search for Large Vocabulary Speech Recognition

Stephan Kanthak; Achim Sixtus; Sirko Molau; Ralf Schlüter; Hermann Ney

In this article we describe methods for improving the RWTH German speech recognizer used within the Verbmobil project. In particular, we present acceleration methods for the search based on both within-word and across-word phoneme models. We also study incremental methods to reduce the response time of the online speech recognizer. Finally, we present experimental off-line results for the three Verbmobil scenarios. We report on word error rates and real-time factors for both speaker independent and speaker dependent recognition.

Explore More