Jürgen T. Geiger
Huawei
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jürgen T. Geiger.
european signal processing conference | 2015
Jürgen T. Geiger; Karim Helwani
Acoustic event detection in surveillance scenarios is an important but difficult problem. Realistic systems are struggling with noisy recording conditions. In this work, we propose to use Gabor filterbank features to detect target events in different noisy background scenes. These features capture spectro-temporal modulation frequencies in the signal, which makes them suited for the detection of non-stationary sound events. A single-class detector is constructed for each of the different target events. In a hierarchical framework, the separate detectors are combined to a multi-class detector. Experiments are performed using a database of four different target sounds and four background scenarios. On average, the proposed features outperform conventional features in all tested noise levels, in terms of detection and classification performance.
ACM Transactions on Intelligent Systems and Technology | 2018
Zixing Zhang; Jürgen T. Geiger; Jouni Pohjalainen; Amr El-Desoky Mousa; Wenyu Jin; Björn W. Schuller
Eliminating the negative effect of non-stationary environmental noise is a long-standing research topic for automatic speech recognition but still remains an important challenge. Data-driven supervised approaches, especially the ones based on deep neural networks, have recently emerged as potential alternatives to traditional unsupervised approaches and with sufficient training, can alleviate the shortcomings of the unsupervised methods in various real-life acoustic environments. In this light, we review recently developed, representative deep learning approaches for tackling non-stationary additive and convolutional degradation of speech with the aim of providing guidelines for those involved in the development of environmentally robust speech recognition systems. We separately discuss single- and multi-channel techniques developed for the front-end and back-end of speech recognition systems, as well as joint front-end and back-end training frameworks. In the meanwhile, we discuss the pros and cons of these approaches and provide their experimental results on benchmark databases. We expect that this overview can facilitate the development of the robustness of speech recognition systems in acoustic noisy environments.
international conference on acoustics, speech, and signal processing | 2017
Kainan Chen; Jürgen T. Geiger; Walter Kellermann
Most of multichannel sound source Direction Of Arrival (DOA) estimation algorithms suffer from spatial aliasing problems. The phase differences between a pair of microphones are wrapped beyond the spatial aliasing frequency. A common solution is to adjust the distance between the microphones to obtain a suitable aliasing frequency, and take only the frequency band below the aliasing frequency for localization. With correct phase unwrapping, a broader frequency band can be utilized for localization. In this paper, we investigate a method for phase unwrapping solving the spatial aliasing problem for scenarios with a single source and high-level diffuse background noise (around 0dB SNR). The aliasing frequency is estimated from the signal, and is used to unwrap a phase difference vector. Pre- and post-processing steps are applied to increase the robustness. Our experiments with a large number of simulated and real signals demonstrate the robustness of our method in noise.
international conference on acoustics, speech, and signal processing | 2016
Kainan Chen; Jürgen T. Geiger; Karim Helwani; Mohammad Javad Taghizadeh
Methods are available for simultaneous localization of multiple (unknown) audio sources using microphone arrays. Typical algorithms aim at localizing all active sources. They moreover require that the number of sources is known and is less than or equal the number of microphones. This constraint cannot be satisfied in many reallife situations and noisy environments. We present an algorithm for localizing an audio source with known statistics in a multi-source environment. The proposed method circumvents the mentioned problems by using a phase-preserving signal extraction method on the input signal. A binary mask is estimated and used to retain only the information of the target source in the original microphone signals. The masked signals are fed to a modified version of a conventional localization algorithm, which now localizes only the target source. Experimental results obtained from real recordings show that the proposed method can successfully detect and localize the target source.
international conference on acoustics, speech, and signal processing | 2017
Milos Markovic; Jürgen T. Geiger
We present a system for acoustic scene classification, which is the task to classify an environment based on audio recordings. First, we describe a strong low-complexity baseline system using a compact feature set. Second, this system is improved with a novel class of audio features, which exploit the knowledge of sound behaviour within the scene - reverberation. This information is complementary to commonly used features for acoustic scene classification, such as spectral or cepstral components. For extracting the new features, temporal peaks in the audio signal are detected, and the decay after the peak reveals information about the reverberation properties. For the detected decays, statistics are extracted and summarized over time and over frequency bands. The combination of the novel features with features used in state-of-the-art algorithms for acoustic scene classification increases the classification accuracy, as our results obtained with a large in-house database and the DCASE 2016 database demonstrate.
european signal processing conference | 2015
Jürgen T. Geiger; Peter Grosche; Yesenia Lacouture Parodi
Studies show that many people have difficulties in understanding dialogue in movies when watching TV, especially hard-of-hearing listeners or in adverse listening environments. In order to overcome this problem, we propose an efficient methodology to enhance the speech component of a stereo signal. The method is designed with low computational complexity in mind, and consists of first extracting a center channel from the stereo signal. Novel methods for speech enhancement and voice activity detection are proposed which exploit the stereo information. A speech enhancement filter is estimated based on the relationship between the extracted center channel and all other channels. Subjective and objective evaluations show that this method can successfully enhance intelligibility of the dialogue without affecting the overall sound quality negatively.
international conference on acoustics speech and signal processing | 2013
Felix Weninger; Jürgen T. Geiger; Martin Wöllmer; Björn W. Schuller; Gerhard Rigoll
Proceedings CHiME 2013 | 2013
Jürgen T. Geiger; Felix Weninger; Antti Hurmalainen; Jort F. Gemmeke; Martin Wöllmer; Björn W. Schuller; Gerhard Rigoll; Tuomas Virtanen
european signal processing conference | 2012
Jürgen T. Geiger; Ravichander Vipperla; Nicholas W. D. Evans; Björn W. Schuller; Gerhard Rigoll
workshop on applications of signal processing to audio and acoustics | 2017
Kainan Chen; Jürgen T. Geiger; Wenyu Jin; Mohammad Javad Taghizadeh; Walter Kellermann