Colin Breithaupt
Ruhr University Bochum
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Colin Breithaupt.
international conference on acoustics, speech, and signal processing | 2008
Colin Breithaupt; Timo Gerkmann; Rainer Martin
While state-of-the-art approaches obtain an estimate of the a priori SNR by adaptively smoothing its maximum likelihood estimate in the frequency domain, we selectively smooth the maximum likelihood estimate in the cepstral domain. In the cepstral domain the noisy speech signal is decomposed into coefficients related mainly to the speech envelope, the excitation, and noise. As in the cepstral domain coefficients that represent speech can be robustly determined, we can apply little smoothing to speech coefficients and strong smoothing to noise coefficients. Thus, speech components are preserved and musical noise is suppressed. In speech enhancement experiments we obtain consistent improvements over the well known decision-directed approach.
IEEE Transactions on Audio, Speech, and Language Processing | 2008
Timo Gerkmann; Colin Breithaupt; Rainer Martin
In this paper, we present an improved estimator for the speech presence probability at each time-frequency point in the short-time Fourier transform domain. In contrast to existing approaches, this estimator does not rely on an adaptively estimated and thus signal-dependent a priori signal-to-noise ratio estimate. It therefore decouples the estimation of the speech presence probability from the estimation of the clean speech spectral coefficients in a speech enhancement task. Using both a fixed a priori signal-to-noise ratio and a fixed prior probability of speech presence, the proposed a posteriori speech presence probability estimator achieves probabilities close to zero for speech absence and probabilities close to one for speech presence. While state-of-the-art speech presence probability estimators use adaptive prior probabilities and signal-to-noise ratio estimates, we argue that these quantities should reflect true a priori information that shall not depend on the observed signal. We present a detection theoretic framework for determining the fixed a priori signal-to-noise ratio. The proposed estimator is conceptually simple and yields a better tradeoff between speech distortion and noise leakage than state-of-the-art estimators.
IEEE Signal Processing Letters | 2007
Colin Breithaupt; Timo Gerkmann; Rainer Martin
Many speech enhancement algorithms that modify short-term spectral magnitudes of the noisy signal by means of adaptive spectral gain functions are plagued by annoying spectral outliers. In this letter, we propose cepstral smoothing as a solution to this problem. We show that cepstral smoothing can effectively prevent spectral peaks of short duration that may be perceived as musical noise. At the same time, cepstral smoothing preserves speech onsets, plosives, and quasi-stationary narrowband structures like voiced speech. The proposed recursive temporal smoothing is applied to higher cepstral coefficients only, excluding those representing the pitch information. As the higher cepstral coefficients describe the finer spectral structure of the Fourier spectrum, smoothing them along time prevents single coefficients of the filter function from changing excessively and independently of their neighboring bins, thus suppressing musical noise. The proposed cepstral smoothing technique is very effective in nonstationary noise.
international conference on acoustics, speech, and signal processing | 2008
Colin Breithaupt; Martin Krawczyk; Rainer Martin
The enhancement of short-term spectra of noisy speech can be achieved by statistical estimation of the clean speech spectral components. We present a minimum mean-square error estimator of the clean speech spectral magnitude that uses both a parametric compression function in the estimation error criterion and a parametric prior distribution for the statistical model of the clean speech magnitude. The novel parametric estimator has many known magnitude estimators as a special solution and, additionally, affords estimators that combine the beneficial properties of different known solutions. The new estimator is evaluated in terms of segmental SNR, speech distortion, and noise suppression.
IEEE Transactions on Audio, Speech, and Language Processing | 2011
Colin Breithaupt; Rainer Martin
Because of their many applications and their relative ease of implementation, single-channel speech enhancement algorithms have received much attention. As a consequence, a vast amount of publications on estimation procedures and their implementation in noise reduction systems exists. However, there has been little systematic research on the theoretic performance of such estimators. In this paper, we provide a systematic analysis of the performance of noise reduction algorithms in low signal-to-noise ratio (SNR) and transient conditions, where we consider approaches using the well-known decision-directed SNR estimator. We show that the smoothing properties of the decision-directed SNR estimator in low SNR conditions can be analytically described and that the limits of noise reduction for widely used spectral speech estimators based on the decision-directed approach can be predicted. We also illustrate that achieving both a good preservation of speech onsets in transient conditions on one side and the suppression of musical noise on the other can be especially problematic when the decision-directed SNR estimation is used.
international conference on acoustics, speech, and signal processing | 2008
Nilesh Madhu; Colin Breithaupt; Rainer Martin
This contribution details the development of a mask-based post- processor to improve the interference suppression in speech signals separated using linear deconvolution algorithms like independent component analysis (ICA). The design of the proposed post-filter is in two stages: in the first stage, use is made of the disjointness of the separated signals in the time-frequency domain to obtain binary masks to suppress cross-talk that generally remains after separation. In the next stage, a novel smoothing of the masks is proposed that preserves the speech structure of the target source while eliminating the random peaks in the time-frequency plane that lead to fluctuating background noise. The result is an enhanced signal with reduced cross-talk and no musical noise.
international conference on acoustics, speech, and signal processing | 2003
Colin Breithaupt; Rainer Martin
We present two minimum mean square error (MMSE) frequency domain estimators of the squared magnitude of a clean speech signal that is degraded by additive noise. These estimators are derived under the assumption that the DFT (discrete Fourier transform) coefficients of the clean speech are best modelled by the Gamma probability distribution function (PDF) instead of the common Gaussian PDF. The statistics of the perturbing noise is the Gaussian PDF in one case and the Laplacian PDF in the other. The estimators are used as noise reduction filters in the experimental evaluation. We give a comparison with a previously derived estimator which uses the Gaussian PDF as the PDF for speech and noise coefficients.
Archive | 2003
Rainer Martin; Colin Breithaupt
Archive | 2008
Timo Gerkmann; Colin Breithaupt; Rainer Martin
Voice Communication (SprachKommunikation), 2008 ITG Conference on | 2011
Colin Breithaupt; Rainer Martin