Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Hendrik Kayser is active.

Publication


Featured researches published by Hendrik Kayser.


international workshop on acoustic signal enhancement | 2014

A discriminative learning approach to probabilistic acoustic source localization

Hendrik Kayser; Jörn Anemüller

Sound source localization algorithms commonly include assessment of inter-sensor (generalized) correlation functions to obtain direction-of-arrival estimates. Here, we present a classification-based method for source localization that uses discriminative support vector machine-learning of correlation patterns that are indicative of source presence or absence. Subsequent probabilistic modeling generates a map of sound source presence probability in given directions. Being data-driven, the method during training adapts to characteristics of the sensor setup, such as convolution effects in non-free-field situations, and to target signal specific acoustic properties. Experimental evaluation was conducted with algorithm training in anechoic single-talker scenarios and test data from several reverberant multi-talker situations, together with diffuse and real-recorded background noise, respectively. Results demonstrate that the method successfully generalizes from training to test conditions. Improvement over the best of five investigated state-of-the-art angular spectrum-based reference methods was on average about 45% in terms of relative F-measure-related error reduction.


Trends in hearing | 2015

A Binaural Steering Beamformer System for Enhancing a Moving Speech Source

Kamil Adiloglu; Hendrik Kayser; Regina M. Baumgärtel; Sanja Rennebeck; Mathias Dietz; Volker Hohmann

In many daily life communication situations, several sound sources are simultaneously active. While normal-hearing listeners can easily distinguish the target sound source from interfering sound sources—as long as target and interferers are spatially or spectrally separated—and concentrate on the target, hearing-impaired listeners and cochlear implant users have difficulties in making such a distinction. In this article, we propose a binaural approach composed of a spatial filter controlled by a direction-of-arrival estimator to track and enhance a moving target sound. This approach was implemented on a real-time signal processing platform enabling experiments with test subjects in situ. To evaluate the proposed method, a data set of sound signals with a single moving sound source in an anechoic diffuse noise environment was generated using virtual acoustics. The proposed steering method was compared with a fixed (nonsteering) method that enhances sound from the frontal direction in an objective evaluation and subjective experiments using this database. In both cases, the obtained results indicated a significant improvement in speech intelligibility and quality compared with the unprocessed signal. Furthermore, the proposed method outperformed the nonsteering method.


system analysis and modeling | 2014

Estimation of inter-channel phase differences using non-negative matrix factorization

Hendrik Kayser; Jörn Anemüller; Kamil Adiloglu

Estimation of non-linearities in phase differences between two or more channels of an audio recording leads to a more precise spatial information in audio signal enhancement applications. In this work, we propose the estimation of these non-linearities in multi-channel, multi-source audio mixtures in reverberant environments. For this task, we compute short term cross-correlation functions between the channels and extract the non-linear inter-channel phase differences as well as a measure of activation for each source. This is conducted by decomposition of the cross-correlation matrix using a non-negative matrix factorization method. Our evaluation shows that the estimated inter-channel phase differences depict the non-linearities. Furthermore, the estimated activations reflect the time instances where the sources are active. In audio source separation experiments the proposed method outperforms a state-of-the-art approach based on linear phase differences by 30% in terms of relative improvement.


Detection and Identification of Rare Audiovisual Cues | 2012

Audio Classification and Localization for Incongruent Event Detection

Jörg-Hendrik Bach; Hendrik Kayser; Jörn Anemüller

A method is presented that detects unexpected acoustic events, i.e., occurrence of acoustic objects that do not belong to any of the learned classes but nevertheless appear to constitutemeaningful acoustic events. Building on the framework [Weinshall et al.], general and specific acoustic classifiers are implemented and combined for detection of events in which they respond in an incongruous way, indicating an unexpected event. Subsequent identification of events is performed by estimation of source direction, for which a novel classification-based approach is outlined. Performance, evaluated in dependence of signal-to-noise ratio (SNR) and type of unexpected event, indicates decent performance at SNRs better than 5 dB.


conference of the international speech communication association | 2016

Assessing Speech Quality in Speech-Aware Hearing Aids Based on Phoneme Posteriorgrams.

Constantin Spille; Hendrik Kayser; Hynek Hermansky; Bernd T. Meyer

Current behind-the-ear hearing aids (HA) allow to perform spatial filtering to enhance localized sound sources; however, they often lack processing strategies that are tailored to spoken language. Hence, without a feedback about speech quality achieved by the system, spatial filtering potentially remains unused, in case of a conservative enhancement strategy, or can even be detrimental to the speech intelligibility of the output signal. In this paper we apply phoneme posteriorgrams obtained from HA signals processed with deep neural networks to measure the quality of speech representations in spatial scenes. Inverse entropy of phoneme probabilities is proposed as a measure that allows to evaluate if current hearing aid parameters are optimal for the given acoustic condition. We investigate how varying noise levels and wrong estimates of the to-beenhanced direction affect this measure in anechoic and reverberant conditions and show our measure to provide a high reliability when varying each parameter.Experiments show that entropy as a function of the beam angle has a distinct minimum at the speaker’s true position and its immediate vicinity. Thus, it can be used to determine the beam angle which optimizes the speech representation. Further, variations of the SNR cause a consistent offset of the entropy.


Journal of the Acoustical Society of America | 2015

Robust auditory localization using probabilistic inference and coherence-based weighting of interaural cues

Hendrik Kayser; Volker Hohmann; Stephan D. Ewert; Birger Kollmeier; Jörn Anemüller

Robust sound source localization is performed by the human auditory system even in challenging acoustic conditions and in previously unencountered, complex scenarios. Here a computational binaural localization model is proposed that possesses mechanisms for handling of corrupted or unreliable localization cues and generalization across different acoustic situations. Central to the model is the use of interaural coherence, measured as interaural vector strength (IVS), to dynamically weight the importance of observed interaural phase (IPD) and level (ILD) differences in frequency bands up to 1.4 kHz. This is accomplished through formulation of a probabilistic model in which the ILD and IPD distributions pertaining to a specific source location are dependent on observed interaural coherence. Bayesian computation of the direction-of-arrival probability map naturally leads to coherence-weighted integration of location cues across frequency and time. Results confirm the models validity through statistical analyses of interaural parameter values. Simulated localization experiments show that even data points with low reliability (i.e., low IVS) can be exploited to enhance localization performance. A temporal integration length of at least 200 ms is required to gain a benefit; this is in accordance with previous psychoacoustic findings on temporal integration of spatial cues in the human auditory system.


international conference on acoustics, speech, and signal processing | 2017

Predicting error rates for unknown data in automatic speech recognition

Bernd T. Meyer; Sri Harish Reddy Mallidi; Hendrik Kayser; Hynek Hermansky

In this paper we investigate methods to predict word error rates in automatic speech recognition in the presence of unknown noise types, which have not been seen during training. The performance measures operate on phoneme posteriorgrams that are obtained from neural nets. We compare average frame-wise entropy as a baseline measure to the mean temporal distance (M-Measure) and to the number of phonetic events. The latter is obtained by learning typical phoneme activations from clean training data, which are later applied as phoneme-specific matched filters to posteriorgrams (MaP). When exceeding a threshold after filtering, we register this as phonetic event. For test sets using 10 unknown noise types and a wide range of signal-to-noise ratios, we find M-Measure and MaP to produce predictions twice as accurate as the baseline measure. When excluding noise types that contain speech segments, a prediction error of 3.1% is achieved, compared to 15.0% for the baseline measure.


spoken language technology workshop | 2016

Performance monitoring for automatic speech recognition in noisy multi-channel environments

Bernd T. Meyer; Sri Harish Reddy Mallidi; Angel Mario Castro Martinez; Guillermo Payá-Vayá; Hendrik Kayser; Hynek Hermansky

In many applications of machine listening it is useful to know how well an automatic speech recognition system will do before the actual recognition is performed. In this study we investigate different performance measures with the aim of predicting word error rates (WERs) in spatial acoustic scenes in which the type of noise, the signal-to-noise ratio, parameters for spatial filtering, and the amount of reverberation are varied. All measures under consideration are based on phoneme posteriorgrams obtained from a deep neural net. While frame-wise entropy exhibits only medium predictive power for factors other than additive noise, we found the medium temporal distance between posterior vectors (M-Measure) as well as matched phoneme filters (MaP) to exhibit excellent correlations with WER across all conditions. Since our results were obtained with simulated behind-the-ear hearing aid signals, we discuss possible applications for speech-aware hearing devices.


conference of the international speech communication association | 2016

Probabilistic Spatial Filter Estimation for Signal Enhancement in Multi-Channel Automatic Speech Recognition.

Hendrik Kayser; Niko Moritz; Jörn Anemüller

Speech recognition in multi-channel environments requires target speaker localization, multi-channel signal enhancement and robust speech recognition. We here propose a system that addresses these problems: Localization is performed with a recently introduced probabilistic localization method that is based on support-vector machine learning of GCC-PHAT weights and that estimates a spatial source probability map. The main contribution of the present work is the introduction of a probabilistic approach to (re-)estimation of location-specific steering vectors based on weighting of observed inter-channel phase differences with the spatial source probability map derived in the localization step. Subsequent speech recognition is carried out with a DNN-HMM system using amplitude modulation filter bank (AMFB) acoustic features which are robust to spectral distortions introduced during spatial filtering. The system has been evaluated on the CHIME-3 multichannel ASR dataset. Recognition was carried out with and without probabilistic steering vector re-estimation and with MVDR and delay-and-sum beamforming, respectively. Results indicate that the system attains on real-world evaluation data a relative improvement of 31.98% over the baseline and of 21.44% over a modified baseline. We note that this improvement is achieved without exploiting oracle knowledge about speech/non-speech intervals for noise covariance estimation (which is, however, assumed for baseline processing).


Detection and Identification of Rare Audiovisual Cues | 2012

Incongruence Detection in Audio-Visual Processing

Michal Havlena; Jan Heller; Hendrik Kayser; Jörg-Hendrik Bach; Jörn Anemüller; Tomas Pajdla

The recently introduced theory of incongruence allows for detection of unexpected events in observations via disagreement of classifiers on specific and general levels of a classifier hierarchy which encodes the understanding a machine currently has of the world. We present an application of this theory, a hierarchy of classifiers describing an audio-visual speaker detector, and show successful incongruence detection on sequences acquired by a static as well as by a moving AWEAR 2.0 device using the presented classifier hierarchy.

Collaboration


Dive into the Hendrik Kayser's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Michal Havlena

Czech Technical University in Prague

View shared research outputs
Top Co-Authors

Avatar

Tomas Pajdla

Czech Technical University in Prague

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge