Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Marc René Schädler is active.

Publication


Featured researches published by Marc René Schädler.


Journal of the Acoustical Society of America | 2012

Spectro-temporal modulation subspace-spanning filter bank features for robust automatic speech recognition

Marc René Schädler; Bernd T. Meyer; Birger Kollmeier

In an attempt to increase the robustness of automatic speech recognition (ASR) systems, a feature extraction scheme is proposed that takes spectro-temporal modulation frequencies (MF) into account. This physiologically inspired approach uses a two-dimensional filter bank based on Gabor filters, which limits the redundant information between feature components, and also results in physically interpretable features. Robustness against extrinsic variation (different types of additive noise) and intrinsic variability (arising from changes in speaking rate, effort, and style) is quantified in a series of recognition experiments. The results are compared to reference ASR systems using Mel-frequency cepstral coefficients (MFCCs), MFCCs with cepstral mean subtraction (CMS) and RASTA-PLP features, respectively. Gabor features are shown to be more robust against extrinsic variation than the baseline systems without CMS, with relative improvements of 28% and 16% for two training conditions (using only clean training samples or a mixture of noisy and clean utterances, respectively). When used in a state-of-the-art system, improvements of 14% are observed when spectro-temporal features are concatenated with MFCCs, indicating the complementarity of those feature types. An analysis of the importance of specific MF shows that temporal MF up to 25 Hz and spectral MF up to 0.25 cycles/channel are beneficial for ASR.


International Journal of Audiology | 2015

Matrix sentence intelligibility prediction using an automatic speech recognition system.

Marc René Schädler; Anna Warzybok; Sabine Hochmuth; Birger Kollmeier

Objective: The feasibility of predicting the outcome of the German matrix sentence test for different types of stationary background noise using an automatic speech recognition (ASR) system was studied. Design: Speech reception thresholds (SRT) of 50% intelligibility were predicted in seven noise conditions. The ASR system used Mel-frequency cepstral coefficients as a front-end and employed whole-word Hidden Markov models on the back-end side. The ASR system was trained and tested with noisy matrix sentences on a broad range of signal-to-noise ratios. Study sample: The ASR-based predictions were compared to data from the literature (Hochmuth et al, 2015) obtained with 10 native German listeners with normal hearing and predictions of the speech intelligibility index (SII). Results: The ASR-based predictions showed a high and significant correlation (R² = 0.95, p < 0.001) with the empirical data across different noise conditions, outperforming the SII-based predictions which showed no correlation with the empirical data (R² = 0.00, p = 0.987). Conclusions: The SRTs for the German matrix test for listeners with normal hearing in different stationary noise conditions could well be predicted based on the acoustical properties of the speech and noise signals. Minimum assumptions were made about human speech processing already incorporated in a reference-free ordinary ASR system.


Journal of the Acoustical Society of America | 2015

Separable spectro-temporal Gabor filter bank features: Reducing the complexity of robust features for automatic speech recognition

Marc René Schädler; Birger Kollmeier

To test if simultaneous spectral and temporal processing is required to extract robust features for automatic speech recognition (ASR), the robust spectro-temporal two-dimensional-Gabor filter bank (GBFB) front-end from Schädler, Meyer, and Kollmeier [J. Acoust. Soc. Am. 131, 4134-4151 (2012)] was de-composed into a spectral one-dimensional-Gabor filter bank and a temporal one-dimensional-Gabor filter bank. A feature set that is extracted with these separate spectral and temporal modulation filter banks was introduced, the separate Gabor filter bank (SGBFB) features, and evaluated on the CHiME (Computational Hearing in Multisource Environments) keywords-in-noise recognition task. From the perspective of robust ASR, the results showed that spectral and temporal processing can be performed independently and are not required to interact with each other. Using SGBFB features permitted the signal-to-noise ratio (SNR) to be lowered by 1.2 dB while still performing as well as the GBFB-based reference system, which corresponds to a relative improvement of the word error rate by 12.8%. Additionally, the real time factor of the spectro-temporal processing could be reduced by more than an order of magnitude. Compared to human listeners, the SNR needed to be 13 dB higher when using Mel-frequency cepstral coefficient features, 11 dB higher when using GBFB features, and 9 dB higher when using SGBFB features to achieve the same recognition performance.


Journal of the Acoustical Society of America | 2016

A simulation framework for auditory discrimination experiments: Revealing the importance of across-frequency processing in speech perception

Marc René Schädler; Anna Warzybok; Stephan D. Ewert; Birger Kollmeier

A framework for simulating auditory discrimination experiments, based on an approach from Schädler, Warzybok, Hochmuth, and Kollmeier [(2015). Int. J. Audiol. 54, 100-107] which was originally designed to predict speech recognition thresholds, is extended to also predict psychoacoustic thresholds. The proposed framework is used to assess the suitability of different auditory-inspired feature sets for a range of auditory discrimination experiments that included psychoacoustic as well as speech recognition experiments in noise. The considered experiments were 2 kHz tone-in-broadband-noise simultaneous masking depending on the tone length, spectral masking with simultaneously presented tone signals and narrow-band noise maskers, and German Matrix sentence test reception threshold in stationary and modulated noise. The employed feature sets included spectro-temporal Gabor filter bank features, Mel-frequency cepstral coefficients, logarithmically scaled Mel-spectrograms, and the internal representation of the Perception Model from Dau, Kollmeier, and Kohlrausch [(1997). J. Acoust. Soc. Am. 102(5), 2892-2905]. The proposed framework was successfully employed to simulate all experiments with a common parameter set and obtain objective thresholds with less assumptions compared to traditional modeling approaches. Depending on the feature set, the simulated reference-free thresholds were found to agree with-and hence to predict-empirical data from the literature. Across-frequency processing was found to be crucial to accurately model the lower speech reception threshold in modulated noise conditions than in stationary noise conditions.


Trends in hearing | 2016

Sentence Recognition Prediction for Hearing-impaired Listeners in Stationary and Fluctuation Noise With FADE Empowering the Attenuation and Distortion Concept by Plomp With a Quantitative Processing Model

Birger Kollmeier; Marc René Schädler; Anna Warzybok; Bernd T. Meyer; Thomas Brand

To characterize the individual patient’s hearing impairment as obtained with the matrix sentence recognition test, a simulation Framework for Auditory Discrimination Experiments (FADE) is extended here using the Attenuation and Distortion (A+D) approach by Plomp as a blueprint for setting the individual processing parameters. FADE has been shown to predict the outcome of both speech recognition tests and psychoacoustic experiments based on simulations using an automatic speech recognition system requiring only few assumptions. It builds on the closed-set matrix sentence recognition test which is advantageous for testing individual speech recognition in a way comparable across languages. Individual predictions of speech recognition thresholds in stationary and in fluctuating noise were derived using the audiogram and an estimate of the internal level uncertainty for modeling the individual Plomp curves fitted to the data with the Attenuation (A-) and Distortion (D-) parameters of the Plomp approach. The “typical” audiogram shapes from Bisgaard et al with or without a “typical” level uncertainty and the individual data were used for individual predictions. As a result, the individualization of the level uncertainty was found to be more important than the exact shape of the individual audiogram to accurately model the outcome of the German Matrix test in stationary or fluctuating noise for listeners with hearing impairment. The prediction accuracy of the individualized approach also outperforms the (modified) Speech Intelligibility Index approach which is based on the individual threshold data only.


conference of the international speech communication association | 2016

Why do ASR Systems Despite Neural Nets Still Depend on Robust Features.

Angel Mario Castro Martinez; Marc René Schädler

To which extent can neural nets learn traditional signal processing stages of current robust ASR front-ends? Will neural nets replace the classical, often auditory-inspired feature extraction in the near future? To answer these questions, a DNN-based ASR system was trained and tested on the Aurora4 robust ASR task using various (intermediate) processing stages. Additionally, the training set was divided into several fractions to reveal the amount of data needed to account for a missing processing step on the input signal or prior knowledge about the auditory system. The DNN system was able to learn from ordinary spectrograms representations outperforming MFCC using 75% of the training set and almost as good as log-Mel-spectrograms with the full set; on the other hand, it was unable to compensate the robustness of auditory-based Gabor features, which even using 40% of the training data outperformed every other representation. The study concludes that even with deep learning approaches, current ASR systems still benefit from a suitable feature extraction.


conference of the international speech communication association | 2016

Microscopic Multilingual Matrix Test Predictions Using an ASR-Based Speech Recognition Model.

Marc René Schädler; David Hülsmeier; Anna Warzybok; Sabine Hochmuth; Birger Kollmeier

In an attempt to predict the outcomes of matrix sentence tests in different languages and various noise conditions for native listeners, the simulation framework for auditory discrimination experiments (FADE) and the extended Speech Intelligibility Index (eSII) is employed. FADE uses an automatic speech recognition system to simulate recognition experiments and reports the highest achievable performance as the outcome, which showed good predictions for the German matrix test in noise. The eSII is based on the short-time analysis of weighted signalto-noise ratios in different frequency bands. In contrast to many other approaches, including the eSII, FADE uses no empirical reference. In this work, the FADE approach is evaluated for predictions of the German, Polish, Russian, and Spanish matrix test in stationary and fluctuating noise conditions. The FADEbased predictions yield a high correlation (Pearsons R = 0.94) with the empirical data and a root-mean-square (RMS) prediction error of 1.9 dB outperforming the eSII-based predictions (R = 0.78, RMS = 4.2 dB). FADE can also predict the data of subgroups with only stationary or only fluctuating noises, while the eSII cannot. The FADE-based predictions seem to generalize over different languages and noise conditions.


Trends in hearing | 2018

Objective Prediction of Hearing Aid Benefit Across Listener Groups Using Machine Learning: Speech Recognition Performance With Binaural Noise-Reduction Algorithms:

Marc René Schädler; Anna Warzybok; Birger Kollmeier

The simulation framework for auditory discrimination experiments (FADE) was adopted and validated to predict the individual speech-in-noise recognition performance of listeners with normal and impaired hearing with and without a given hearing-aid algorithm. FADE uses a simple automatic speech recognizer (ASR) to estimate the lowest achievable speech reception thresholds (SRTs) from simulated speech recognition experiments in an objective way, independent from any empirical reference data. Empirical data from the literature were used to evaluate the model in terms of predicted SRTs and benefits in SRT with the German matrix sentence recognition test when using eight single- and multichannel binaural noise-reduction algorithms. To allow individual predictions of SRTs in binaural conditions, the model was extended with a simple better ear approach and individualized by taking audiograms into account. In a realistic binaural cafeteria condition, FADE explained about 90% of the variance of the empirical SRTs for a group of normal-hearing listeners and predicted the corresponding benefits with a root-mean-square prediction error of 0.6 dB. This highlights the potential of the approach for the objective assessment of benefits in SRT without prior knowledge about the empirical data. The predictions for the group of listeners with impaired hearing explained 75% of the empirical variance, while the individual predictions explained less than 25%. Possibly, additional individual factors should be considered for more accurate predictions with impaired hearing. A competing talker condition clearly showed one limitation of current ASR technology, as the empirical performance with SRTs lower than −20 dB could not be predicted.


Journal of the Acoustical Society of America | 2017

Objective evaluation of binaural noise-reduction algorithms for the hearing-impaired in complex acoustic scenes

Marc René Schädler; Anna Warzybok; Birger Kollmeier

The simulation framework for auditory discrimination experiments (FADE) was used to predict the benefit in speech reception thresholds (SRT) with the German matrix sentence test when using a range of single- and multi-channel noise-reduction algorithms in complex acoustic conditions. FADE uses a simple robust automatic speech recognizer to predict SRTs from simulated speech recognition experiments in an objective way, independent from any empirical reference data. Here, it was extended with a simple binaural stage and individualized by taking into account the audiogram. Empirical data from the literature was used to evaluate the model in terms of predicted SRTs and benefits in SRT when using eight different noise-reduction algorithms. In a realistic binaural cafeteria condition, FADE explained about 90% of the variance of the empirical SRTs for normal-hearing listeners and predicted the corresponding benefits in SRT with a root-mean-square prediction error of 0.6 dB. In contrast to the surprisingly high p...


conference of the international speech communication association | 2011

Comparing Different Flavors of Spectro-Temporal Features for ASR

Bernd T. Meyer; Suman V. Ravuri; Marc René Schädler; Nelson Morgan

Collaboration


Dive into the Marc René Schädler's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Simon Doclo

University of Oldenburg

View shared research outputs
Top Co-Authors

Avatar

Thomas Brand

University of Oldenburg

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge