Timo Gerkmann
University of Oldenburg
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Timo Gerkmann.
IEEE Transactions on Audio, Speech, and Language Processing | 2012
Timo Gerkmann; Richard C. Hendriks
Recently, it has been proposed to estimate the noise power spectral density by means of minimum mean-square error (MMSE) optimal estimation. We show that the resulting estimator can be interpreted as a voice activity detector (VAD)-based noise power estimator, where the noise power is updated only when speech absence is signaled, compensated with a required bias compensation. We show that the bias compensation is unnecessary when we replace the VAD by a soft speech presence probability (SPP) with fixed priors. Choosing fixed priors also has the benefit of decoupling the noise power estimator from subsequent steps in a speech enhancement framework, such as the estimation of the speech power and the estimation of the clean speech. We show that the proposed speech presence probability (SPP) approach maintains the quick noise tracking performance of the bias compensated minimum mean-square error (MMSE)-based approach while exhibiting less overestimation of the spectral noise power and an even lower computational complexity.
international conference on acoustics, speech, and signal processing | 2008
Colin Breithaupt; Timo Gerkmann; Rainer Martin
While state-of-the-art approaches obtain an estimate of the a priori SNR by adaptively smoothing its maximum likelihood estimate in the frequency domain, we selectively smooth the maximum likelihood estimate in the cepstral domain. In the cepstral domain the noisy speech signal is decomposed into coefficients related mainly to the speech envelope, the excitation, and noise. As in the cepstral domain coefficients that represent speech can be robustly determined, we can apply little smoothing to speech coefficients and strong smoothing to noise coefficients. Thus, speech components are preserved and musical noise is suppressed. In speech enhancement experiments we obtain consistent improvements over the well known decision-directed approach.
IEEE Transactions on Audio, Speech, and Language Processing | 2008
Timo Gerkmann; Colin Breithaupt; Rainer Martin
In this paper, we present an improved estimator for the speech presence probability at each time-frequency point in the short-time Fourier transform domain. In contrast to existing approaches, this estimator does not rely on an adaptively estimated and thus signal-dependent a priori signal-to-noise ratio estimate. It therefore decouples the estimation of the speech presence probability from the estimation of the clean speech spectral coefficients in a speech enhancement task. Using both a fixed a priori signal-to-noise ratio and a fixed prior probability of speech presence, the proposed a posteriori speech presence probability estimator achieves probabilities close to zero for speech absence and probabilities close to one for speech presence. While state-of-the-art speech presence probability estimators use adaptive prior probabilities and signal-to-noise ratio estimates, we argue that these quantities should reflect true a priori information that shall not depend on the observed signal. We present a detection theoretic framework for determining the fixed a priori signal-to-noise ratio. The proposed estimator is conceptually simple and yields a better tradeoff between speech distortion and noise leakage than state-of-the-art estimators.
IEEE Transactions on Audio, Speech, and Language Processing | 2014
Martin Krawczyk; Timo Gerkmann
The enhancement of speech which is corrupted by noise is commonly performed in the short-time discrete Fourier transform domain. In case only a single microphone signal is available, typically only the spectral amplitude is modified. However, it has recently been shown that an improved spectral phase can as well be utilized for speech enhancement, e.g., for phase-sensitive amplitude estimation. In this paper, we therefore present a method to reconstruct the spectral phase of voiced speech from only the fundamental frequency and the noisy observation. The importance of the spectral phase is highlighted and we elaborate on the reason why noise reduction can be achieved by modifications of the spectral phase. We show that, when the noisy phase is enhanced using the proposed phase reconstruction, instrumental measures predict an increase of speech quality over a range of signal to noise ratios, even without explicit amplitude enhancement.
workshop on applications of signal processing to audio and acoustics | 2011
Timo Gerkmann; Richard C. Hendriks
In this paper, we analyze the minimum mean square error (MMSE) based spectral noise power estimator [1] and present an improvement. We will show that the MMSE based spectral noise power estimate is only updated when the a posteriori signal-to-noise ratio (SNR) is lower than one. This threshold on the a posteriori SNR can be interpreted as a voice activity detector (VAD). We propose in this work to replace the hard decision of the VAD by a soft speech presence probability (SPP). We show that by doing so, the proposed estimator does not require a bias correction and safety-net as is required by the MMSE estimator presented in [1]. At the same time, the proposed estimator maintains the quick noise tracking capability which is characteristic for the MMSE noise tracker, results in less noise power overestimation and is computationally less expensive.
IEEE Signal Processing Letters | 2007
Colin Breithaupt; Timo Gerkmann; Rainer Martin
Many speech enhancement algorithms that modify short-term spectral magnitudes of the noisy signal by means of adaptive spectral gain functions are plagued by annoying spectral outliers. In this letter, we propose cepstral smoothing as a solution to this problem. We show that cepstral smoothing can effectively prevent spectral peaks of short duration that may be perceived as musical noise. At the same time, cepstral smoothing preserves speech onsets, plosives, and quasi-stationary narrowband structures like voiced speech. The proposed recursive temporal smoothing is applied to higher cepstral coefficients only, excluding those representing the pitch information. As the higher cepstral coefficients describe the finer spectral structure of the Fourier spectrum, smoothing them along time prevents single coefficients of the filter function from changing excessively and independently of their neighboring bins, thus suppressing musical noise. The proposed cepstral smoothing technique is very effective in nonstationary noise.
IEEE Signal Processing Magazine | 2015
Timo Gerkmann; Martin Krawczyk-Becker; Jonathan Le Roux
With the advancement of technology, both assisted listening devices and speech communication devices are becoming more portable and also more frequently used. As a consequence, users of devices such as hearing aids, cochlear implants, and mobile telephones, expect their devices to work robustly anywhere and at any time. This holds in particular for challenging noisy environments like a cafeteria, a restaurant, a subway, a factory, or in traffic. One way to making assisted listening devices robust to noise is to apply speech enhancement algorithms. To improve the corrupted speech, spatial diversity can be exploited by a constructive combination of microphone signals (so-called beamforming), and by exploiting the different spectro?temporal properties of speech and noise. Here, we focus on single-channel speech enhancement algorithms which rely on spectrotemporal properties. On the one hand, these algorithms can be employed when the miniaturization of devices only allows for using a single microphone. On the other hand, when multiple microphones are available, single-channel algorithms can be employed as a postprocessor at the output of a beamformer. To exploit the short-term stationary properties of natural sounds, many of these approaches process the signal in a time-frequency representation, most frequently the short-time discrete Fourier transform (STFT) domain. In this domain, the coefficients of the signal are complex-valued, and can therefore be represented by their absolute value (referred to in the literature both as STFT magnitude and STFT amplitude) and their phase. While the modeling and processing of the STFT magnitude has been the center of interest in the past three decades, phase has been largely ignored.
Synthesis Lectures on Speech and Audio Processing | 2013
Richard C. Hendriks; Timo Gerkmann; Jesper Jensen
As speech processing devices like mobile phones, voice controlled devices, and hearing aids have increased in popularity, people expect them to work anywhere and at any time without user intervention. However, the presence of acoustical disturbances limits the use of these applications, degrades their performance, or causes the user difficulties in understanding the conversation or appreciating the device. A common way to reduce the effects of such disturbances is through the use of single-microphone noise reduction algorithms for speech enhancement. The field of single-microphone noise reduction for speech enhancement comprises a history of more than 30 years of research. In this survey, we wish to demonstrate the significant advances that have been made during the last decade in the field of discrete Fourier transform domain-based single-channel noise reduction for speech enhancement.Furthermore, our goal is to provide a concise description of a state-of-the-art speech enhancement system, and demonstrate the relative importance of the various building blocks of such a system. This allows the non-expert DSP practitioner to judge the relevance of each building block and to implement a close-to-optimal enhancement system for the particular application at hand. Table of Contents: Introduction / Single Channel Speech Enhancement: General Principles / DFT-Based Speech Enhancement Methods: Signal Model and Notation / Speech DFT Estimators / Speech Presence Probability Estimation / Noise PSD Estimation / Speech PSD Estimation / Performance Evaluation Methods / Simulation Experiments with Single-Channel Enhancement Systems / Future Directions
IEEE Signal Processing Letters | 2013
Timo Gerkmann; Martin Krawczyk
In this letter, we derive a minimum mean squared error (MMSE) optimal estimator for clean speech spectral amplitudes, which we apply in single channel speech enhancement. As opposed to state-of-the-art estimators, the optimal estimator is derived for a given clean speech spectral phase. We show that the phase contains additional information that can be exploited to distinguish outliers in the noise from the target signal. With the proposed technique, incorporating the phase can potentially improve the PESQ-MOS by 0.5 in babble noise as compared to state-of-the-art amplitude estimators. In a blind setup we achieve a PESQ improvement of around 0.25 in voiced speech.
IEEE Transactions on Signal Processing | 2009
Timo Gerkmann; Rainer Martin
In this paper, we derive the signal power bias that arises when spectral amplitudes are smoothed by reducing their variance in the cepstral domain (often referred to as cepstral smoothing) and develop a power bias compensation method. We show that if chi-distributed spectral amplitudes are smoothed in the cepstral domain, the resulting smoothed spectral amplitudes are also approximately chi-distributed but with more degrees of freedom and less signal power. The key finding for the proposed power bias compensation method is that the degrees of freedom of chi-distributed spectral amplitudes are directly related to their average cepstral variance. Furthermore, this work gives new insights into the statistics of the cepstral coefficients derived from chi-distributed spectral amplitudes using tapered spectral analysis windows. We derive explicit expressions for the variance and covariance of correlated chi-distributed spectral amplitudes and the resulting cepstral coefficients, parameterized by the degrees of freedom. The results in this work allow for a cepstral smoothing of spectral quantities without affecting their signal power. As we assume the parameterized chi-distribution for the spectral amplitudes, the results hold for Gaussian, super-Gaussian, and sub-Gaussian distributed complex spectral coefficients. The proposed bias compensation method is computationally inexpensive and shown to work very well for white and colored signals, as well as for rectangular and tapered spectral analysis windows.