Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Roland Maas is active.

Publication


Featured researches published by Roland Maas.


workshop on applications of signal processing to audio and acoustics | 2013

The reverb challenge: Acommon evaluation framework for dereverberation and recognition of reverberant speech

Keisuke Kinoshita; Marc Delcroix; Takuya Yoshioka; Tomohiro Nakatani; Armin Sehr; Walter Kellermann; Roland Maas

Recently, substantial progress has been made in the field of reverberant speech signal processing, including both single- and multichannel dereverberation techniques, and automatic speech recognition (ASR) techniques robust to reverberation. To evaluate state-of-the-art algorithms and obtain new insights regarding potential future research directions, we propose a common evaluation framework including datasets, tasks, and evaluation metrics for both speech enhancement and ASR techniques. The proposed framework will be used as a common basis for the REVERB (REverberant Voice Enhancement and Recognition Benchmark) challenge. This paper describes the rationale behind the challenge, and provides a detailed description of the evaluation framework and benchmark results.


IEEE Signal Processing Magazine | 2012

Making Machines Understand Us in Reverberant Rooms: Robustness Against Reverberation for Automatic Speech Recognition

Takuya Yoshioka; Armin Sehr; Marc Delcroix; Keisuke Kinoshita; Roland Maas; Tomohiro Nakatani; Walter Kellermann

Speech recognition technology has left the research laboratory and is increasingly coming into practical use, enabling a wide spectrum of innovative and exciting voice-driven applications that are radically changing our way of accessing digital services and information. Most of todays applications still require a microphone located near the talker. However, almost all of these applications would benefit from distant-talking speech capturing, where talkers are able to speak at some distance from the microphones without the encumbrance of handheld or body-worn equipment [1]. For example, applications such as meeting speech recognition, automatic annotation of consumer-generated videos, speech-to-speech translation in teleconferencing, and hands-free interfaces for controlling consumer-products, like interactive TV, will greatly benefit from distant-talking operation. Furthermore, for a number of unexplored but important applications, distant microphones are a prerequisite. This means that distant talking speech recognition technology is essential for extending the availability of speech recognizers as well as enhancing the convenience of existing speech recognition applications.


IEEE Transactions on Audio, Speech, and Language Processing | 2010

Reverberation Model-Based Decoding in the Logmelspec Domain for Robust Distant-Talking Speech Recognition

Armin Sehr; Roland Maas; Walter Kellermann

The REMOS (REverberation MOdeling for Speech recognition) concept for reverberation-robust distant-talking speech recognition, introduced in “Distant-talking continuous speech recognition based on a novel reverberation model in the feature domain” (A. Sehr , in Proc. Interspeech, 2006, pp. 769-772) for melspectral features, is extended to logarithmic melspectral (logmelspec) features in this contribution. Thus, the favorable properties of REMOS, including its high flexibility with respect to changing reverberation conditions, become available in the more competitive logmelspec domain. Based on a combined acoustic model consisting of a hidden Markov model (HMM) network and a reverberation model (RM), REMOS determines clean-speech and reverberation estimates during recognition. Therefore, in each iteration of a modified Viterbi algorithm, an inner optimization operation maximizes the joint density of the current HMM output and the RM output subject to the constraint that their combination is equal to the current reverberant observation. Since the combination operation in the logmelspec domain is nonlinear, numerical methods appear necessary for solving the constrained inner optimization problem. A novel reformulation of the constraint, which allows for an efficient solution by nonlinear optimization algorithms, is derived in this paper so that a practicable implementation of REMOS for logmelspec features becomes possible. An in-depth analysis of this REMOS implementation investigates the statistical properties of its reverberation estimates and thus derives possibilities for further improving the performance of REMOS. Connected digit recognition experiments show that the proposed REMOS version in the logmelspec domain significantly outperforms the melspec version. While the proposed RMs with parameters estimated by straightforward training for a given room are robust to a mismatch of the speaker-microphone distance, their performance significantly decreases if they are used in a room with substantially different conditions. However, by training multi-style RMs with data from several rooms, good performance can be achieved across different rooms.


international conference on acoustics, speech, and signal processing | 2012

On the application of reverberation suppression to robust speech recognition

Roland Maas; Emanuel A. P. Habets; Armin Sehr; Walter Kellermann

In this paper, we study the effect of the design parameters of a single-channel reverberation suppression algorithm on reverberation-robust speech recognition. At the same time, reverberation compensation at the speech recognizer is investigated. The analysis reveals that it is highly beneficial to attenuate only the reverberation tail after approximately 50 ms while coping with the early reflections and residual late-reverberation by training the recognizer on moderately reverberant data. It will be shown that the overall system at its optimum configuration yields a very promising recognition performance even in strongly reverberant environments. Since the reverberation suppression algorithm is evidenced to significantly reduce the dependency on the training data, it allows for a very efficient training of acoustic models that are suitable for a wide range of reverberation conditions. Finally, experiments with an “ideal” reverberation suppression algorithm are carried out to cross-check the inferred guidelines.


international conference on acoustics, speech, and signal processing | 2015

Spatial diffuseness features for DNN-based speech recognition in noisy and reverberant environments

Andreas Schwarz; Christian Huemmer; Roland Maas; Walter Kellermann

We propose a spatial diffuseness feature for deep neural network (DNN)-based automatic speech recognition to improve recognition accuracy in reverberant and noisy environments. The feature is computed in real-time from multiple microphone signals without requiring knowledge or estimation of the direction of arrival, and represents the relative amount of diffuse noise in each time and frequency bin. It is shown that using the diffuseness feature as an additional input to a DNN-based acoustic model leads to a reduced word error rate for the REVERB challenge corpus, both compared to logmelspec features extracted from noisy signals, and features enhanced by spectral subtraction.


international conference on acoustics, speech, and signal processing | 2014

The elitist particle filter based on evolutionary strategies as novel approach for nonlinear acoustic echo cancellation

Christian Huemmer; Christian Hofmann; Roland Maas; Andreas Schwarz; Walter Kellermann

In this article, we introduce a novel approach for nonlinear acoustic echo cancellation based on a combination of particle filtering and evolutionary strategies. The nonlinear echo path is modeled as a state vector with non-Gaussian probability distribution and the relation to the observed signals and near-end interferences are captured by nonlinear functions. To estimate the probability distribution of the state vector and the model parameters, we apply the numerical sampling method of particle filtering, where each set of particles represents different realizations of the nonlinear echo path. While the classical particle-filter approach is unsuitable for system identification with large search spaces, we introduce a modified particle filter to select elitist particles based on long-term fitness measures and to create new particles based on the approximated probability distribution of the state vector. The validity of the novel approach is experimentally verified with real recordings for a nonlinear echo path stemming from a commercial smartphone.


international conference on acoustics, speech, and signal processing | 2011

Frame-wise HMM adaptation using state-dependent reverberation estimates

Armin Sehr; Roland Maas; Walter Kellermann

A novel frame-wise model adaptation approach for reverberation-robust distant-talking speech recognition is proposed. It adjusts the means of static cepstral features to capture the statistics of reverberant feature vector sequences obtained from distant-talking speech recordings. The means of the HMMs are adapted during decoding using a state-dependent estimate of the late reverberation determined by joint use of a feature-domain reverberation model and optimum partial state sequences. Since the parameters of the HMMs and the reverberation model can be estimated completely independently, the approach is very flexible with respect to changing acoustic environments. Due to the frame-wise model adaptation, some of the HMM limitations are relieved, and recognition results surpassing that of matched reverberant training are obtained at the cost of a moderately increased decoding complexity.


international conference on acoustics, speech, and signal processing | 2016

A new uncertainty decoding scheme for DNN-HMM hybrid systems with multichannel speech enhancement

Christian Huemmer; Andreas Schwarz; Roland Maas; Hendrik Barfuss; Ramón Fernández Astudillo; Walter Kellermann

Uncertainty decoding combines a probabilistic feature description with the acoustic model of a speech recognition system. For DNN-HMM hybrid systems, this can be realized by averaging the DNN outputs produced by a finite set of feature samples (drawn from an estimated probability distribution). In this article, we employ this sampling approach in combination with a multi-microphone speech enhancement system. We propose a new strategy for generating feature samples from multichannel signals, based on modeling the spatial coherence estimates between different microphone pairs as realizations of a latent random variable. From each coherence estimate, a spectral enhancement gain is computed and an enhanced feature vector is obtained, thus producing a finite set of feature samples, of which we average the respective DNN outputs. In the experimental part, this new uncertainty decoding strategy is shown to consistently improve the recognition accuracy of a DNN-HMM hybrid system for the 8-channel REVERB Challenge task.


IEEE Signal Processing Letters | 2015

The NLMS Algorithm with Time-Variant Optimum Stepsize Derived from a Bayesian Network Perspective

Christian Huemmer; Roland Maas; Walter Kellermann

In this letter, we derive a new stepsize adaptation for the normalized least mean square algorithm (NLMS) by describing the task of linear acoustic echo cancellation from a Bayesian network perspective. Similar to the well-known Kalman filter equations, we model the acoustic wave propagation from the loudspeaker to the microphone by a latent state vector and define a linear observation equation (to model the relation between the state vector and the observation) as well as a linear process equation (to model the temporal progress of the state vector). Based on additional assumptions on the statistics of the random variables in observation and process equation, we apply the expectation-maximization (EM) algorithm to derive an NLMS-like filter adaptation. By exploiting the conditional independence rules for Bayesian networks, we reveal that the resulting EM-NLMS algorithm has a stepsize update equivalent to the optimal-stepsize calculation proposed by Yamamoto and Kitayama in 1982, which has been adopted in many textbooks. As main difference, the instantaneous stepsize value is estimated in the M step of the EM algorithm (instead of being approximated by artificially extending the acoustic echo path). The EM-NLMS algorithm is experimentally verified for synthesized scenarios with both, white noise and male speech as input signal.


2011 Joint Workshop on Hands-free Speech Communication and Microphone Arrays | 2011

Multi-style training of HMMS with stereo data for reverberation-robust speech recognition

Armin Sehr; Christian Hofmann; Roland Maas; Walter Kellermann

A novel training algorithm using data pairs of clean and reverberant feature vectors for estimating robust Hidden Markov Models (HMMs), introduced in [1] for matched training, is employed in this paper for multi-style training. The multi-style HMMs are derived from well-trained clean-speech HMMs by aligning the clean data to the clean-speech HMM and using the resulting state-frame alignment to estimate the Gaussian mixture densities from the reverberant data of several different rooms. Thus, the temporal alignment is fixed for all reverberation conditions contained in the multi-style training set so that the model mismatch between the different rooms is reduced. Therefore, this training approach is particularly suitable for multi-style training. Multi-style HMMs trained by the proposed approach and adapted to the current room condition using maximum likelihood linear regression significantly outperform the corresponding adapted multi-style HMMs trained by the conventional Baum-Welch algorithm. In strongly reverberant rooms, the proposed adapted multi-style HMMs even outper-form Baum-Welch HMMs trained on matched data.

Collaboration


Dive into the Roland Maas's collaboration.

Top Co-Authors

Avatar

Walter Kellermann

University of Erlangen-Nuremberg

View shared research outputs
Top Co-Authors

Avatar

Armin Sehr

University of Erlangen-Nuremberg

View shared research outputs
Top Co-Authors

Avatar

Christian Huemmer

University of Erlangen-Nuremberg

View shared research outputs
Top Co-Authors

Avatar

Christian Hofmann

University of Erlangen-Nuremberg

View shared research outputs
Top Co-Authors

Avatar

Andreas Schwarz

University of Erlangen-Nuremberg

View shared research outputs
Top Co-Authors

Avatar

Keisuke Kinoshita

Nippon Telegraph and Telephone

View shared research outputs
Top Co-Authors

Avatar

Marc Delcroix

Nippon Telegraph and Telephone

View shared research outputs
Top Co-Authors

Avatar

Tomohiro Nakatani

Nippon Telegraph and Telephone

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Ariya Rastrow

Johns Hopkins University

View shared research outputs
Researchain Logo
Decentralizing Knowledge