Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Günther Ruske is active.

Publication


Featured researches published by Günther Ruske.


international conference on acoustics speech and signal processing | 1998

Estimating the speaking rate by vowel detection

Thilo Pfau; Günther Ruske

We present a new feature-based method for estimating the speaking rate by detecting vowels in continuous speech. The features used are the modified loudness and the zerocrossing rate which are both calculated in the standard preprocessing unit of our speech recognition system. As vowels in general correspond to syllable nuclei, the feature-based vowel rate is comparable to an estimate of the lexically-based syllable rate. The vowel detector presented is tested on the spontaneously spoken German Verbmobil task and is evaluated using manually transcribed data. The lowest vowel error rate (including insertions) on the defined test set is 22.72% on average over all vowels. Additionally correlation coefficients between our estimates and reference rates are calculated. These coefficients reach up to 0.796 and therefore are comparable to those for lexically-based measures (like the phone rate) on other tasks. The accuracy is sufficient to use our measurement for speaking rate adaptation.


international conference on acoustics, speech, and signal processing | 1978

An approach to speech recognition using syllabic decision units

Günther Ruske; Thomas Schotola

Described is an automatic speech recognition system which uses parts of syllables as decision units, i.e. the syllable nuclei, the initial consonant clusters preceding the nuclei and the final consonant clusters following the nuclei. This segmentation includes monosyllabic words as well as polysyllabic words. The segmentation into these units is achieved by evaluating a modified loudness function which is generated by a special loudness analyzer equipment. Application of a time normalization procedure to the time-varying patterns of the consonant clusters yields feature vectors of a constant length. Classification experiments using a test set of 3000 utterances of initial and final consonant clusters have been performed. Although the segmentation of the proposed units is much easier than that of single phonems, recognition scores are equivalent. An automatic speech recognition system has been constructed using a vocabulary consisting of the names of 230 German cities.


international conference on acoustics, speech, and signal processing | 2000

On-line speaking rate estimation using Gaussian mixture models

Robert Faltlhauser; Thilo Pfau; Günther Ruske

Gaussian mixture models (GMM) are a widespread tool in applications like speaker identification or verification. In contrast to hidden Markov models (HMM) Gaussian mixture models are designed to model the general properties of an underlying acoustic source. In our paper we extend the application of GMMs to the assessment of speaking rate. Directly trained on the acoustic data, they can be either applied directly to estimate the speech rate category or-with the help of a mapping function-they can provide a continuous measure for the speaking rate. The mapping function can be realized by means of a neural net. First experiments showed a correlation coefficient of 0.66 between the lexical phoneme rate and our estimation based on speech rate dependent spectral variation. Moreover, our approach can be used simultaneously for high accuracy on-line gender detection.


conference of the international speech communication association | 1995

Discriminative training for continuous speech recognition

Wolfgang Reichl; Günther Ruske

Discriminative training techniques for Hidden-Markov Models were recently proposed and successfully applied for automatic speech recognition. In this paper a discussion of the Minimum Classification Error and the Maximum Mutual Information objective is presented. An extended reestimation formula is used for the HMM parameter update for both objective functions. The discriminative training methods were utilized in speaker independent phoneme recognition experiments and improved the phoneme recognition rates for both discriminative training techniques.


joint pattern recognition symposium | 2008

Novel VQ Designs for Discrete HMM On-Line Handwritten Whiteboard Note Recognition

Joachim Schenk; Stefan Schwärzler; Günther Ruske; Gerhard Rigoll

In this work we propose two novel vector quantization (VQ) designs for discrete HMM-based on-line handwriting recognition of whiteboard notes. Both VQ designs represent the binary pressure information without any loss. The new designs are necessary because standard k-means VQ systems cannot quantize this binary feature adequately, as is shown in this paper. Our experiments show that the new systems provide a relative improvement of r= 1.8 % in recognition accuracy on a character- and r= 3.3 % on a word-level benchmark compared to a standard k-means VQ system. Additionally, our system is compared and proven to be competitive to a state-of-the-art continuous HMM-based system yielding a slight relative improvement of r= 0.6 %.


international conference natural language processing | 2003

A one-stage decoder for interpretation of natural speech

Matthias Thomae; Tibor Fábián; Robert Lieb; Günther Ruske

Current speech understanding systems are typically designed as multistage systems, although this theoretically gives rise to errors due to early decisions. We present a framework that offers the chance of reducing these errors by an integrated system which directly computes a semantic tree representation from the input speech signal through a token passing based one-stage decoder, called ODINS. In order to limit the complexity of ODINS, we represent all a-priori knowledge consistently by a generalized uniform knowledge model based on a hierarchy of probabilistic transition networks, which also can be n-grams. Our framework includes a method to evaluate the system output using an edit distance based tree matching algorithm. First experiments quantify and confirm the theoretical advantage of the one-stage strategy over a corresponding two-stage approach.


Archive | 2000

Robust Recognition of Spontaneous Speech

Udo Haiber; Helmut Mangold; Thilo Pfau; Peter Regel-Brietzmann; Günther Ruske; Volker Schleß

This contribution describes the challenges and the progress which have been made in Verbmobil concerning robustness of speech recognition for various types of adverse conditions, like channel distortion, environmental noise and various speaker and speaking conditions. For the channel and noise problem classical approaches like cepstral bias normalization and spectral subtraction methods have been improved as well as new methods like parallel model combination. One major result is the fact, that an intelligent combination of various methods achieves the best results. Considerable progresses have also been made in research on unsupervised speaker adaptation. Several different main approaches are presented to improve robustness against variations of speaking rate, speaking style and speaker characteristics. The methods described include new estimation of the parameters for vocal tract length normalization, features and codebook transformation methods using ML algorithms, and pronunciation adaptation of the words in the lexicon.


joint pattern recognition symposium | 2008

Switching Linear Dynamic Models for Noise Robust In-Car Speech Recognition

Björn W. Schuller; Martin Wöllmer; Tobias Moosmayr; Günther Ruske; Gerhard Rigoll

Performance of speech recognition systems strongly degrades in the presence of background noise, like the driving noise in the interior of a car. We compare two different Kalman filtering approaches which attempt to improve noise robustness: Switching Linear Dynamic Models (SLDM) and Autoregressive Switching Linear Dynamical Systems (AR-SLDS). Unlike previous works which are restricted on considering white noise, we evaluate the modeling concepts in a noisy speech recognition task where also colored noise produced through different driving conditions and car types is taken into account. Thereby we demonstrate that speech enhancement based on Kalman filtering prevails over all standard de-noising techniques considered herein, such as Wiener filtering, Histogram Equalization, and Unsupervised Spectral Subtraction.


international conference on multimedia and expo | 2009

A multi-agent framework for a hybrid dialog management system

Stefan Schwärzler; Joachim Schenk; Günther Ruske; Frank Wallhoff

The importance of dialog management systems has increased in recent years. Dialog systems are created for domain specific applications, so that a high demand for a flexible dialog system framework arises. There are two basic approaches for dialog management systems: a rule-based approach and a statistic approach. In this paper, we combine both methods and form a hybrid dialog management system in a scalable agent based framework. For deciding of the next dialog step, two independent systems are used: the Java Rule Engine (JESS) as expert system for rule-based solutions, and the Partially Observable Markov Decision Process (POMDP) as model-based solution for more complex dialog sequences. Using a speech recognizer and text-to-speech systems, the human can be guided through a dialog with approximately ten steps.


conference of the international speech communication association | 1995

Neural networks for nonlinear discriminant analysis in continuous speech recognition

Wolfgang Reichl; S. Harengel; Franz Wolfertstetter; Günther Ruske

In this paper neural networks for Nonlinear Discriminant Analysis in continuous speech recognition are presented. Multilayer Perceptrons are used to estimate a-posteriori probabilities for Hidden-Markov Model states, which are the optimal discriminant features for the separation of the HMM states. The a-posteriori probabilities are transformed by a principal component analysis to calculate the new features for semicontinuous HMMs, which are trained by the known Maximum-Likelihood training. The nonlinear discriminant transformation is used in speaker-independent phoneme recognition experiments and compared to the standard Linear Discriminant Analysis technique.

Collaboration


Dive into the Günther Ruske's collaboration.

Top Co-Authors

Avatar

Thilo Pfau

University of California

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge