Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Christian Huemmer is active.

Publication


Featured researches published by Christian Huemmer.


international conference on acoustics, speech, and signal processing | 2015

Spatial diffuseness features for DNN-based speech recognition in noisy and reverberant environments

Andreas Schwarz; Christian Huemmer; Roland Maas; Walter Kellermann

We propose a spatial diffuseness feature for deep neural network (DNN)-based automatic speech recognition to improve recognition accuracy in reverberant and noisy environments. The feature is computed in real-time from multiple microphone signals without requiring knowledge or estimation of the direction of arrival, and represents the relative amount of diffuse noise in each time and frequency bin. It is shown that using the diffuseness feature as an additional input to a DNN-based acoustic model leads to a reduced word error rate for the REVERB challenge corpus, both compared to logmelspec features extracted from noisy signals, and features enhanced by spectral subtraction.


international conference on acoustics, speech, and signal processing | 2014

The elitist particle filter based on evolutionary strategies as novel approach for nonlinear acoustic echo cancellation

Christian Huemmer; Christian Hofmann; Roland Maas; Andreas Schwarz; Walter Kellermann

In this article, we introduce a novel approach for nonlinear acoustic echo cancellation based on a combination of particle filtering and evolutionary strategies. The nonlinear echo path is modeled as a state vector with non-Gaussian probability distribution and the relation to the observed signals and near-end interferences are captured by nonlinear functions. To estimate the probability distribution of the state vector and the model parameters, we apply the numerical sampling method of particle filtering, where each set of particles represents different realizations of the nonlinear echo path. While the classical particle-filter approach is unsuitable for system identification with large search spaces, we introduce a modified particle filter to select elitist particles based on long-term fitness measures and to create new particles based on the approximated probability distribution of the state vector. The validity of the novel approach is experimentally verified with real recordings for a nonlinear echo path stemming from a commercial smartphone.


international conference on acoustics, speech, and signal processing | 2014

Significance-aware Hammerstein group models for nonlinear acoustic echo cancellation

Christian Hofmann; Christian Huemmer; Walter Kellermann

In this work, a novel approach for nonlinear acoustic echo cancellation is proposed. The main innovative idea of the proposed method is to model only the small region of the echo path around the direct path by a group of parallel Hammerstein models, to estimate a nonlinear preprocessor by correlations between the linear kernels of the Hammerstein submodels, and to describe the remaining echo path by a simple Hammerstein model with the preprocessor determined in the aforementioned way. While the computational complexity of such a system increases only slightly in comparison to a linear echo canceller, experiments with speech recordings from a smartphone in different environments confirm a significantly increased echo cancellation performance.


workshop on applications of signal processing to audio and acoustics | 2015

HRTF-based robust least-squares frequency-invariant beamforming

Hendrik Barfuss; Christian Huemmer; Gleni Lamani; Andreas Schwarz; Walter Kellermann

In this work, a Head-Related Transfer Function (HRTF)-based Robust Least-Squares Frequency-Invariant (RLSFI) beamformer design is proposed. The HRTF-based RLSFI beamformer accounts for the influence of a robots head on the sound field. The performance of the new HRTF-based RLSFI beamformer is evaluated using signal-based measures and word error rates for an off-the-shelf speech recognizer, and is compared to the performance of the original free-field RLSFI beamformer design. The experimental results confirm the efficacy of the proposed HRTF-based beamformer design for robot audition.


international conference on acoustics, speech, and signal processing | 2013

Wave-domain loudspeaker signal decorrelation for system identification in multichannel audio reproduction scenarios

Martin Schneider; Christian Huemmer; Walter Kellermann

For applications like acoustic echo cancellation (AEC) or listening room equalization (LRE), a loudspeaker-enclosure-microphone system (LEMS) must be identified. When using a large number of reproduction channels, as, e. g., for wave field synthesis (WFS) or Higher-Order Ambisonics (HOA), the strong correlation of the loud-speaker signals will hamper a unique identification. A state-of-the-art remedy against this so-called nonuniqueness problem is a decorrelation of the loudspeaker signals, which facilitates a unique identification. However, most of the known approaches are not suitable for acoustic wave field reproduction schemes, as they would distort the reproduced wave field in an uncontrolled manner or degrade the audio quality. In this contribution, we propose a wave-domain time-varying filtering of the loudspeaker signals, so that the reproduced wave field is rotated within a perceptually acceptable range, while preserving its shape.


international conference on acoustics, speech, and signal processing | 2016

A new uncertainty decoding scheme for DNN-HMM hybrid systems with multichannel speech enhancement

Christian Huemmer; Andreas Schwarz; Roland Maas; Hendrik Barfuss; Ramón Fernández Astudillo; Walter Kellermann

Uncertainty decoding combines a probabilistic feature description with the acoustic model of a speech recognition system. For DNN-HMM hybrid systems, this can be realized by averaging the DNN outputs produced by a finite set of feature samples (drawn from an estimated probability distribution). In this article, we employ this sampling approach in combination with a multi-microphone speech enhancement system. We propose a new strategy for generating feature samples from multichannel signals, based on modeling the spatial coherence estimates between different microphone pairs as realizations of a latent random variable. From each coherence estimate, a spectral enhancement gain is computed and an enhanced feature vector is obtained, thus producing a finite set of feature samples, of which we average the respective DNN outputs. In the experimental part, this new uncertainty decoding strategy is shown to consistently improve the recognition accuracy of a DNN-HMM hybrid system for the 8-channel REVERB Challenge task.


IEEE Signal Processing Letters | 2015

The NLMS Algorithm with Time-Variant Optimum Stepsize Derived from a Bayesian Network Perspective

Christian Huemmer; Roland Maas; Walter Kellermann

In this letter, we derive a new stepsize adaptation for the normalized least mean square algorithm (NLMS) by describing the task of linear acoustic echo cancellation from a Bayesian network perspective. Similar to the well-known Kalman filter equations, we model the acoustic wave propagation from the loudspeaker to the microphone by a latent state vector and define a linear observation equation (to model the relation between the state vector and the observation) as well as a linear process equation (to model the temporal progress of the state vector). Based on additional assumptions on the statistics of the random variables in observation and process equation, we apply the expectation-maximization (EM) algorithm to derive an NLMS-like filter adaptation. By exploiting the conditional independence rules for Bayesian networks, we reveal that the resulting EM-NLMS algorithm has a stepsize update equivalent to the optimal-stepsize calculation proposed by Yamamoto and Kitayama in 1982, which has been adopted in many textbooks. As main difference, the instantaneous stepsize value is estimated in the M step of the EM algorithm (instead of being approximated by artificially extending the acoustic echo path). The EM-NLMS algorithm is experimentally verified for synthesized scenarios with both, white noise and male speech as input signal.


european signal processing conference | 2016

Efficient nonlinear acoustic echo cancellation by partitioned-block Significance-Aware Hammerstein Group Models

Christian Hofmann; Michael Guenther; Christian Huemmer; Walter Kellermann

A powerful and efficient model for nonlinear echo paths of hands-free communication systems is given by the recently proposed Significance-Aware Hammerstein Group Model (SA-HGM). Such a model learns memoryless loudspeaker nonlinearities on a small temporal support of the echo path (preferably the direct-sound region) and extrapolates the nonlinearities for the entire echo path afterwards. In this contribution, an efficient frequency-domain realization of the significance-aware concept for nonlinear acoustic echo cancellation is proposed. The proposed method exploits the benefits of partitioned-block frequency-domain adaptive filtering and will therefore be referred to as Partitioned-Block Significance-Aware Hammerstein Group Model (PBSA-HGM). This allows to efficiently model a long nonlinear echo path by a linear partitioned-block frequency-domain adaptive filter after a parametric memoryless nonlinear preprocessor, the parameters of which are estimated via a nonlinear Hammerstein Group Model (HGM) with the short temporal support of a single block only.


ieee global conference on signal and information processing | 2014

The significance-aware EPFES to estimate a memoryless preprocessor for nonlinear acoustic echo cancellation

Christian Huemmer; Christian Hofmann; Roland Maas; Walter Kellermann

In this article, we introduce a novel approach for estimating the coefficients of a memoryless preprocessor for nonlinear acoustic echo cancellation (NL-AEC) using particle filtering. The acoustic echo path is modeled by a nonlinear-linear cascade of a memoryless preprocessor (to model the loudspeaker nonlinearities) preceding a linear finite impulse response filter (estimated by the normalized least mean square algorithm). For identifying the loudspeaker signal distortions, we follow the concept of significance-aware filtering by modeling the time-variant coefficients of the memoryless preprocessor and the direct-path part of the room impulse response vector as one state vector with non-Gaussian probability distribution. Due to the nonlinear relation between the state vector and the observation, we propose a computationally-efficient realization of the recently published elitist particle filter based on evolutionary strategies (EPFES), which evaluates realizations of the state vector based on long-term fitness measures. The experimental validation comprises predefined loudspeaker signal distortions as well as real recordings stemming from a commercial smartphone. In comparison to the well-known Hammerstein group model for NL-AEC, the computational complexity is reduced and the achievable system identification is improved for both scenarios.


Computer Speech & Language | 2017

Robust coherence-based spectral enhancement for speech recognition in adverse real-world environments ☆

Hendrik Barfuss; Christian Huemmer; Andreas Schwarz; Walter Kellermann

Speech recognition in adverse real-world environments is highly affected by reverberation and nonstationary background noise. A well-known strategy to reduce such undesired signal components in multi-microphone scenarios is spatial filtering of the microphone signals. In this article, we demonstrate that an additional coherence-based postfilter, which is applied to the beamformer output signal to remove diffuse interference components from the latter, is an effective means to further improve the recognition accuracy of modern deep learning speech recognition systems. To this end, the recently updated 3rd CHiME Speech Separation and Recognition Challenge (CHiME-3) baseline speech recognition system is extended by a coherence-based postfilter and the postfilters impact on the word error rates is investigated for the noisy environments provided by CHiME-3. To determine the time- and frequency-dependent postfilter gains, we use a Direction-of-Arrival (DOA)-dependent and a DOA-independent estimator of the coherent-to-diffuse power ratio as an approximation of the short-time signal-to-noise ratio. Our experiments show that incorporating coherence-based postfiltering into the CHiME-3 baseline speech recognition system leads to a significant reduction of the word error rate scores for the noisy and reverberant environments provided as part of CHiME-3.

Collaboration


Dive into the Christian Huemmer's collaboration.

Top Co-Authors

Avatar

Walter Kellermann

University of Erlangen-Nuremberg

View shared research outputs
Top Co-Authors

Avatar

Roland Maas

University of Erlangen-Nuremberg

View shared research outputs
Top Co-Authors

Avatar

Christian Hofmann

University of Erlangen-Nuremberg

View shared research outputs
Top Co-Authors

Avatar

Andreas Schwarz

University of Erlangen-Nuremberg

View shared research outputs
Top Co-Authors

Avatar

Hendrik Barfuss

University of Erlangen-Nuremberg

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Marc Delcroix

Nippon Telegraph and Telephone

View shared research outputs
Top Co-Authors

Avatar

Tomohiro Nakatani

Nippon Telegraph and Telephone

View shared research outputs
Top Co-Authors

Avatar

Keisuke Kinoshita

Nippon Telegraph and Telephone

View shared research outputs
Researchain Logo
Decentralizing Knowledge