Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Hendrik Barfuss is active.

Publication


Featured researches published by Hendrik Barfuss.


workshop on applications of signal processing to audio and acoustics | 2013

Geometrically Constrained TRINICON-based relative transfer function estimation in underdetermined scenarios

Klaus Reindl; Shmulik Markovich-Golan; Hendrik Barfuss; Sharon Gannot; Walter Kellermann

Speech extraction in a reverberant enclosure using a linearly-constrained minimum variance (LCMV) beamformer usually requires reliable estimates of the relative transfer functions (RTFs) of the desired source to all microphones. In this contribution, a geometrically constrained (GC)-TRINICON concept for RTF estimation is proposed. This approach is applicable in challenging multiple-speaker scenarios and in underdetermined situations, where the number of simultaneously active sources outnumbers the number of available microphone signals. As a most practically relevant and distinctive feature, this concept does not require any voice-activity-based control mechanism. It only requires coarse reference information on the target direction of arrival (DoA). The proposed GC-TRINICON method is compared to a recently proposed subspace method for RTF estimation relying on voice-activity control. Experimental results confirm the effectiveness of GC-TRINICON in realistic conditions.


workshop on applications of signal processing to audio and acoustics | 2015

HRTF-based robust least-squares frequency-invariant beamforming

Hendrik Barfuss; Christian Huemmer; Gleni Lamani; Andreas Schwarz; Walter Kellermann

In this work, a Head-Related Transfer Function (HRTF)-based Robust Least-Squares Frequency-Invariant (RLSFI) beamformer design is proposed. The HRTF-based RLSFI beamformer accounts for the influence of a robots head on the sound field. The performance of the new HRTF-based RLSFI beamformer is evaluated using signal-based measures and word error rates for an off-the-shelf speech recognizer, and is compared to the performance of the original free-field RLSFI beamformer design. The experimental results confirm the efficacy of the proposed HRTF-based beamformer design for robot audition.


IEEE Transactions on Audio, Speech, and Language Processing | 2014

Minimum mutual information-based linearly constrained broadband signal extraction

Klaus Reindl; Stefan Meier; Hendrik Barfuss; Walter Kellermann

In this contribution, the problem of broadband acoustic signal extraction is treated as a specific source separation problem, where the desired signal components are to be separated from all remaining undesired components. For this, we exploit the generic TRIple-N Independent component analysis for CONvolutive mixtures (TRINICON) framework. The TRINICON optimization criterion is complemented with linear constraints leading to the Linearly Constrained Minimum Mutual Information (LCMMI) criterion for desired signal extraction. A general linearly constrained update rule for iterative filter optimization is derived, which can efficiently be realized in a novel Minimum Mutual Information (MMI)-Generalized Sidelobe Canceler (GSC). The general treatment of the signal extraction problem using an MMI criterion provides several advantages: Firstly, new insights into the signal extraction problem can be derived by establishing links to both the original GSC and the Multichannel Wiener Filter (MWF). Secondly, by exploiting fundamental properties characteristic for speech and audio signals, complicated and often unreliable Voice Activity Detection (VAD)-based control mechanisms become unnecessary. Thirdly, the overall realization requires only prior information of the desired source position. An evaluation of the MMI-GSC for the double-talk situation with two concurrently active speech sources under reverberant and noisy conditions demonstrates the effectiveness of this novel approach.


international conference on acoustics, speech, and signal processing | 2016

A new uncertainty decoding scheme for DNN-HMM hybrid systems with multichannel speech enhancement

Christian Huemmer; Andreas Schwarz; Roland Maas; Hendrik Barfuss; Ramón Fernández Astudillo; Walter Kellermann

Uncertainty decoding combines a probabilistic feature description with the acoustic model of a speech recognition system. For DNN-HMM hybrid systems, this can be realized by averaging the DNN outputs produced by a finite set of feature samples (drawn from an estimated probability distribution). In this article, we employ this sampling approach in combination with a multi-microphone speech enhancement system. We propose a new strategy for generating feature samples from multichannel signals, based on modeling the spatial coherence estimates between different microphone pairs as realizations of a latent random variable. From each coherence estimate, a spectral enhancement gain is computed and an enhanced feature vector is obtained, thus producing a finite set of feature samples, of which we average the respective DNN outputs. In the experimental part, this new uncertainty decoding strategy is shown to consistently improve the recognition accuracy of a DNN-HMM hybrid system for the 8-channel REVERB Challenge task.


international conference on acoustics, speech, and signal processing | 2015

Enhanced robot audition by dynamic acoustic sensing in moving humanoids

Vladimir Tourbabin; Hendrik Barfuss; Boaz Rafaely; Walter Kellermann

Auditory systems of humanoid robots usually acquire the surrounding sound field by means of microphone arrays. These arrays can undergo motion related to the robots activity. The conventional approach to dealing with this motion is to stop the robot during sound acquisition. This approach avoids changing the positions of the microphones during the acquisition and reduces the robots ego-noise. However, stopping the robot can interfere with the naturalness of its behaviour. Moreover, the potential performance improvement due to motion of the sound acquiring system can not be attained. This potential is analysed in the current paper. The analysis considers two different types of motion: (i) rotation of the robots head and (ii) limb gestures. The study presented here combines both theoretical and numerical simulation approaches. The results show that rotation of the head improves the high-frequency performance of the microphone array positioned on the head of the robot. This is complemented by the limb gestures, which improve the low-frequency performance of the array positioned on the torso and limbs of the robot.


Computer Speech & Language | 2017

Robust coherence-based spectral enhancement for speech recognition in adverse real-world environments ☆

Hendrik Barfuss; Christian Huemmer; Andreas Schwarz; Walter Kellermann

Speech recognition in adverse real-world environments is highly affected by reverberation and nonstationary background noise. A well-known strategy to reduce such undesired signal components in multi-microphone scenarios is spatial filtering of the microphone signals. In this article, we demonstrate that an additional coherence-based postfilter, which is applied to the beamformer output signal to remove diffuse interference components from the latter, is an effective means to further improve the recognition accuracy of modern deep learning speech recognition systems. To this end, the recently updated 3rd CHiME Speech Separation and Recognition Challenge (CHiME-3) baseline speech recognition system is extended by a coherence-based postfilter and the postfilters impact on the word error rates is investigated for the noisy environments provided by CHiME-3. To determine the time- and frequency-dependent postfilter gains, we use a Direction-of-Arrival (DOA)-dependent and a DOA-independent estimator of the coherent-to-diffuse power ratio as an approximation of the short-time signal-to-noise ratio. Our experiments show that incorporating coherence-based postfiltering into the CHiME-3 baseline speech recognition system leads to a significant reduction of the word error rate scores for the noisy and reverberant environments provided as part of CHiME-3.


2017 Hands-free Speech Communications and Microphone Arrays (HSCMA) | 2017

HRTF-based two-dimensional robust least-squares frequency-invariant beamformer design for robot audition

Hendrik Barfuss; Michael Buerger; Jasper Podschus; Walter Kellermann

In this work, we propose a two-dimensional Head-Related Transfer Function (HRTF)-based robust beamformer design for robot audition, which allows for explicit control of the beamformer response for the entire three-dimensional sound field surrounding a humanoid robot. We evaluate the proposed method by means of both signal-independent and signal-dependent measures in a robot audition scenario. Our results confirm the effectiveness of the proposed two-dimensional HRTF-based beamformer design, compared to our previously published one-dimensional HRTF-based beamformer design, which was carried out for a fixed elevation angle only.


international workshop on acoustic signal enhancement | 2016

HRTF-based robust least-squares frequency-invariant polynomial beamforming

Hendrik Barfuss; Marcel Mueglich; Walter Kellermann

In this work, we propose a robust Head-Related Transfer Function (HRTF)-based polynomial beamformer design which accounts for the influence of a humanoid robots head on the sound field. In addition, it allows for a flexible steering of our previously proposed robust HRTF-based beamformer design. We evaluate the HRTF-based polynomial beamformer design and compare it to the original HRTF-based beamformer design by means of signal-independent measures as well as word error rates of an off-the-shelf speech recognition system. Our results confirm the effectiveness of the polynomial beam-former design, which makes it a promising approach to robust beam-forming for robot audition.


Hands-free Speech Communication and Microphone Arrays (HSCMA), 2014 4th Joint Workshop on | 2014

Efficient training of acoustic models for reverberation-robust medium-vocabulary automatic speech recognition

Armin Sehr; Hendrik Barfuss; Christian Hofmann; Roland Maas; Walter Kellermann

A recently proposed concept for training reverberation-robust acoustic models for automatic speech recognition using pairs of clean and reverberant data is extended from word models to tied-state triphone models in this paper. The key idea of the concept, termed ICEWIND, is to use the clean data for the temporal alignment and the reverberant data for the estimation of the emission densities. Experiments with the 5000-word Wall Street Journal corpus confirm the benefits of ICEWIND with tied-state triphones: While the training time is reduced by more than 90%, the word accuracy is improved at the same time, both for room-specific and multi-style hidden Markov models. Since the acoustic models trained with ICEWIND need less Gaussian components for the emission densities to achieve comparable recognition rates as Baum-Welch acoustic models, ICEWIND also allows for a reduced decoding complexity.


Archive | 2018

Informed Spatial Filtering Based on Constrained Independent Component Analysis

Hendrik Barfuss; Klaus Reindl; Walter Kellermann

In this work, we present a linearly constrained signal extraction algorithm which is based on a Minimum Mutual Information (MMI) criterion that allows to exploit the three fundamental properties of speech and audio signals: Nonstationarity, Nonwhiteness, and Nongaussianity. Hence, the proposed method is very well suited for signal processing of nonstationary nongaussian broadband signals like speech. Furthermore, from the linearly constrained MMI approach, we derive an efficient realization in a (GSC) structure. To estimate the relative transfer functions between the microphones, which are needed for the set of linear constraints, we use an informed time-domain independent component analysis algorithm, which exploits some coarse direction-of-arrival information of the target source. As a decisive advantage, this simplifies the otherwise challenging control mechanism for simultaneous adaptation of the GSC’s blocking matrix und interference and noise canceler coefficients. Finally, we establish relations between the proposed method and other well-known multichannel linear filter approaches for signal extraction based on second-order-statistics, and demonstrate the effectiveness of the proposed signal extraction method in a multispeaker scenario.

Collaboration


Dive into the Hendrik Barfuss's collaboration.

Top Co-Authors

Avatar

Walter Kellermann

University of Erlangen-Nuremberg

View shared research outputs
Top Co-Authors

Avatar

Andreas Schwarz

University of Erlangen-Nuremberg

View shared research outputs
Top Co-Authors

Avatar

Christian Huemmer

University of Erlangen-Nuremberg

View shared research outputs
Top Co-Authors

Avatar

Klaus Reindl

University of Erlangen-Nuremberg

View shared research outputs
Top Co-Authors

Avatar

Michael Buerger

University of Erlangen-Nuremberg

View shared research outputs
Top Co-Authors

Avatar

Roland Maas

University of Erlangen-Nuremberg

View shared research outputs
Top Co-Authors

Avatar

Stefan Meier

University of Erlangen-Nuremberg

View shared research outputs
Top Co-Authors

Avatar

Boaz Rafaely

Ben-Gurion University of the Negev

View shared research outputs
Top Co-Authors

Avatar

Alexander Schmidt

University of Erlangen-Nuremberg

View shared research outputs
Top Co-Authors

Avatar

Armin Sehr

University of Erlangen-Nuremberg

View shared research outputs
Researchain Logo
Decentralizing Knowledge