Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Volker Leutnant is active.

Publication


Featured researches published by Volker Leutnant.


IEEE Transactions on Audio, Speech, and Language Processing | 2013

Bayesian Feature Enhancement for Reverberation and Noise Robust Speech Recognition

Volker Leutnant; Alexander Krueger

In this contribution we extend a previously proposed Bayesian approach for the enhancement of reverberant logarithmic mel power spectral coefficients for robust automatic speech recognition to the additional compensation of background noise. A recently proposed observation model is employed whose time-variant observation error statistics are obtained as a side product of the inference of the a posteriori probability density function of the clean speech feature vectors. Further a reduction of the computational effort and the memory requirements are achieved by using a recursive formulation of the observation model. The performance of the proposed algorithms is first experimentally studied on a connected digits recognition task with artificially created noisy reverberant data. It is shown that the use of the time-variant observation error model leads to a significant error rate reduction at low signal-to-noise ratios compared to a time-invariant model. Further experiments were conducted on a 5000 word task recorded in a reverberant and noisy environment. A significant word error rate reduction was obtained demonstrating the effectiveness of the approach on real-world data.


IEEE Transactions on Audio, Speech, and Language Processing | 2014

A New Observation Model in the Logarithmic Mel Power Spectral Domain for the Automatic Recognition of Noisy Reverberant Speech

Volker Leutnant; Alexander Krueger

In this contribution we present a theoretical and experimental investigation into the effects of reverberation and noise on features in the logarithmic mel power spectral domain, an intermediate stage in the computation of the mel frequency cepstral coefficients, prevalent in automatic speech recognition (ASR). Gaining insight into the complex interaction between clean speech, noise, and noisy reverberant speech features is essential for any ASR system to be robust against noise and reverberation present in distant microphone input signals. The findings are gathered in a probabilistic formulation of an observation model which may be used in model-based feature compensation schemes. The proposed observation model extends previous models in three major directions: First, the contribution of additive background noise to the observation error is explicitly taken into account. Second, an energy compensation constant is introduced which ensures an unbiased estimate of the reverberant speech features, and, third, a recursive variant of the observation model is developed resulting in reduced computational complexity when used in model-based feature compensation. The experimental section is used to evaluate the accuracy of the model and to describe how its parameters can be determined from test data.


ambient intelligence | 2007

Amigo Context Management Service with Applications in Ambient Communication Scenarios

Joerg Schmalenstroeer; Volker Leutnant

The Amigo Context Management Service (CMS) provides an open infrastructure for the exchange of contextual information between context sources and context clients. Whereas context sources supply context information, retrieved from sensors or services within the networked home environment, context clients utilize those information to become context-aware.


international conference on acoustics, speech, and signal processing | 2013

GMM-based significance decoding

Ahmed Hussen Abdelaziz; Steffen Zeiler; Dorothea Kolossa; Volker Leutnant

The accuracy of automatic speech recognition systems in noisy and reverberant environments can be improved notably by exploiting the uncertainty of the estimated speech features using so-called uncertainty-of-observation techniques. In this paper, we introduce a new Bayesian decision rule that can serve as a mathematical framework from which both known and new uncertainty-of-observation techniques can be either derived or approximated. The new decision rule in its direct form leads to the new significance decoding approach for Gaussian mixture models, which results in better performance compared to standard uncertainty-of-observation techniques in different additive and convolutive noise scenarios.


international conference on signal processing | 2012

A statistical observation model for noisy reverberant speech features and its application to robust ASR

Volker Leutnant; Alexander Krueger

In this work, an observation model for the joint compensation of noise and reverberation in the logarithmic mel power spectral density domain is considered. It relates the features of the noisy reverberant speech to those of the non-reverberant speech and the noise. In contrast to enhancement of features only corrupted by reverberation (reverberant features), enhancement of noisy reverberant features requires a more sophisticated model for the error introduced by the proposed observation model. In a first consideration, it will be shown that this error is highly dependent on the instantaneous ratio of the power of reverberant speech to the power of the noise and, moreover, sensitive to the phase between reverberant speech and noise in the short-time discrete Fourier domain. Afterwards, a statistically motivated approach will be presented allowing for the model of the observation error to be inferred from the error model previously used for the reverberation only case. Finally, the developed observation error model will be utilized in a Bayesian feature enhancement scheme, leading to improvements in word accuracy on the AURORA5 database.


Robust Speech Recognition of Uncertain or Missing Data | 2011

Conditional Bayesian Estimation Employing a Phase-Sensitive Observation Model for Noise Robust Speech Recognition

Volker Leutnant

In this contribution, conditional Bayesian estimation employing a phasesensitive observation model for noise robust speech recognition will be studied. After a review of speech recognition under the presence of corrupted features, termed uncertainty decoding, the estimation of the posterior distribution of the uncorrupted (clean) feature vector will be shown to be a key element of noise robust speech recognition. The estimation process will be based on three major components: an a priori model of the unobservable data, an observationmodel relating the unobservable data to the corrupted observation and an inference algorithm, finally allowing for a computationally tractable solution. Special stress will be laid on a detailed derivation of the phase-sensitive observation model and the required moments of the phase factor distribution. Thereby, it will not only be proven analytically that the phase factor distribution is non-Gaussian but also that all central moments can (approximately) be computed solely based on the used mel filter bank, finally rendering the moments independent of noise type and signal-to-noise ratio. The phase-sensitive observation model will then be incorporated into a modelbased feature enhancement scheme and recognition experiments will be carried out on the Aurora 2 and Aurora 4 databases. The importance of incorporating phase factor information into the enhancement scheme is pointed out by all recognition results. Application of the proposed scheme under the derived uncertainty decoding framework further leads to significant improvements in both recognition tasks, eventually reaching the performance achieved with the ETSI advanced front-end.


EURASIP Journal on Advances in Signal Processing | 2016

A summary of the REVERB challenge: state-of-the-art and remaining challenges in reverberant speech processing research

Keisuke Kinoshita; Marc Delcroix; Sharon Gannot; Emanuel A. P. Habets; Walter Kellermann; Volker Leutnant; Roland Maas; Tomohiro Nakatani; Bhiksha Raj; Armin Sehr; Takuya Yoshioka


conference of the international speech communication association | 2009

An analytic derivation of a phase-sensitive observation model for noise robust speech recognition.

Volker Leutnant


conference of the international speech communication association | 2012

Bayesian Feature Enhancement for ASR of Noisy Reverberant Real-World Data.

Alexander Krueger; Oliver Walter; Volker Leutnant


conference of the international speech communication association | 2009

Fusing audio and video information for online speaker diarization.

Joerg Schmalenstroeer; Martin Kelling; Volker Leutnant

Collaboration


Dive into the Volker Leutnant's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Armin Sehr

University of Erlangen-Nuremberg

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Emanuel A. P. Habets

University of Erlangen-Nuremberg

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Roland Maas

University of Erlangen-Nuremberg

View shared research outputs
Researchain Logo
Decentralizing Knowledge