Martin Krawczyk-Becker
University of Hamburg
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Martin Krawczyk-Becker.
IEEE Signal Processing Magazine | 2015
Timo Gerkmann; Martin Krawczyk-Becker; Jonathan Le Roux
With the advancement of technology, both assisted listening devices and speech communication devices are becoming more portable and also more frequently used. As a consequence, users of devices such as hearing aids, cochlear implants, and mobile telephones, expect their devices to work robustly anywhere and at any time. This holds in particular for challenging noisy environments like a cafeteria, a restaurant, a subway, a factory, or in traffic. One way to making assisted listening devices robust to noise is to apply speech enhancement algorithms. To improve the corrupted speech, spatial diversity can be exploited by a constructive combination of microphone signals (so-called beamforming), and by exploiting the different spectro?temporal properties of speech and noise. Here, we focus on single-channel speech enhancement algorithms which rely on spectrotemporal properties. On the one hand, these algorithms can be employed when the miniaturization of devices only allows for using a single microphone. On the other hand, when multiple microphones are available, single-channel algorithms can be employed as a postprocessor at the output of a beamformer. To exploit the short-term stationary properties of natural sounds, many of these approaches process the signal in a time-frequency representation, most frequently the short-time discrete Fourier transform (STFT) domain. In this domain, the coefficients of the signal are complex-valued, and can therefore be represented by their absolute value (referred to in the literature both as STFT magnitude and STFT amplitude) and their phase. While the modeling and processing of the STFT magnitude has been the center of interest in the past three decades, phase has been largely ignored.
Trends in hearing | 2015
Regina M. Baumgärtel; Martin Krawczyk-Becker; Daniel Marquardt; Christoph Völker; Hongmei Hu; Tobias Herzke; Graham Coleman; Kamil Adiloglu; Stephan M. A. Ernst; Timo Gerkmann; Simon Doclo; Birger Kollmeier; Volker Hohmann; Mathias Dietz
In a collaborative research project, several monaural and binaural noise reduction algorithms have been comprehensively evaluated. In this article, eight selected noise reduction algorithms were assessed using instrumental measures, with a focus on the instrumental evaluation of speech intelligibility. Four distinct, reverberant scenarios were created to reflect everyday listening situations: a stationary speech-shaped noise, a multitalker babble noise, a single interfering talker, and a realistic cafeteria noise. Three instrumental measures were employed to assess predicted speech intelligibility and predicted sound quality: the intelligibility-weighted signal-to-noise ratio, the short-time objective intelligibility measure, and the perceptual evaluation of speech quality. The results show substantial improvements in predicted speech intelligibility as well as sound quality for the proposed algorithms. The evaluated coherence-based noise reduction algorithm was able to provide improvements in predicted audio signal quality. For the tested single-channel noise reduction algorithm, improvements in intelligibility-weighted signal-to-noise ratio were observed in all but the nonstationary cafeteria ambient noise scenario. Binaural minimum variance distortionless response beamforming algorithms performed particularly well in all noise scenarios.
IEEE Transactions on Audio, Speech, and Language Processing | 2016
Martin Krawczyk-Becker; Timo Gerkmann
Among the most commonly used single-channel approaches for the enhancement of noise corrupted speech are Bayesian estimators of clean speech coefficients in the short-time Fourier transform domain. However, the vast majority of these approaches effectively only modifies the spectral amplitude and does not consider any information about the clean speech spectral phase. More recently, clean speech estimators that can utilize prior phase information have been proposed and shown to lead to improvements over the traditional, phase-blind approaches. In this work, we revisit phase-aware estimators of clean speech amplitudes and complex coefficients. To complete the existing set of estimators, we first derive a novel amplitude estimator given uncertain prior phase information. Second, we derive a closed-form solution for complex coefficients when the prior phase information is completely uncertain or not available. We put the novel estimators into the context of existing estimators and discuss their advantages and disadvantages.
Trends in hearing | 2015
Regina M. Baumgärtel; Hongmei Hu; Martin Krawczyk-Becker; Daniel Marquardt; Tobias Herzke; Graham Coleman; Kamil Adiloglu; Katrin Bomke; Karsten Plotz; Timo Gerkmann; Simon Doclo; Birger Kollmeier; Volker Hohmann; Mathias Dietz
Several binaural audio signal enhancement algorithms were evaluated with respect to their potential to improve speech intelligibility in noise for users of bilateral cochlear implants (CIs). 50% speech reception thresholds (SRT50) were assessed using an adaptive procedure in three distinct, realistic noise scenarios. All scenarios were highly nonstationary, complex, and included a significant amount of reverberation. Other aspects, such as the perfectly frontal target position, were idealized laboratory settings, allowing the algorithms to perform better than in corresponding real-world conditions. Eight bilaterally implanted CI users, wearing devices from three manufacturers, participated in the study. In all noise conditions, a substantial improvement in SRT50 compared to the unprocessed signal was observed for most of the algorithms tested, with the largest improvements generally provided by binaural minimum variance distortionless response (MVDR) beamforming algorithms. The largest overall improvement in speech intelligibility was achieved by an adaptive binaural MVDR in a spatially separated, single competing talker noise scenario. A no-pre-processing condition and adaptive differential microphones without a binaural link served as the two baseline conditions. SRT50 improvements provided by the binaural MVDR beamformers surpassed the performance of the adaptive differential microphones in most cases. Speech intelligibility improvements predicted by instrumental measures were shown to account for some but not all aspects of the perceptually obtained SRT50 improvements measured in bilaterally implanted CI users.
IEEE Transactions on Audio, Speech, and Language Processing | 2016
Martin Krawczyk-Becker; Timo Gerkmann
Conventional statistical clean speech estimators, like the Wiener filter, are frequently used for the spectro-temporal enhancement of noise corrupted speech. Most of these approaches estimate the clean speech independently for each time-frequency point, neglecting the structure of the underlying speech sound. In this work, we derive a statistical estimator that explicitly takes into account information about the characteristic structure of voiced speech by means of a harmonic signal model. To this end, we also present a way to estimate a harmonic model-based clean speech representation and the corresponding error variance directly in the short-time Fourier transform domain. The resulting estimator is optimal in the minimum-mean-squared error sense and can conveniently be formulated in terms of a multichannel Wiener filter. The proposed estimator outperforms several reference algorithms in terms of speech quality and intelligibility as predicted by instrumental measures.
international conference on acoustics, speech, and signal processing | 2015
Martin Krawczyk-Becker; Dörte Fischer; Timo Gerkmann
For the enhancement of speech degraded by noise, accurate estimation of the noise power spectral density (PSD) is indispensable, especially if only a single microphone signal is available. Fast and accurate tracking of the noise PSD is particularly challenging in highly non-stationary noise types, since the distinction between speech and noise components becomes more difficult. Short-time discrete Fourier transform (STFT) based noise PSD estimation algorithms which employ estimates of the speech presence probability (SPP) with fixed priors have been shown to yield good tracking performance even in adverse noise conditions. In this paper, we compare two methods to incorporate spectro-temporal correlations to improve the tracking performance. The first method smoothes the noisy observation over time and frequency before computing the SPP, while the second is based on a Hidden Markov Model (HMM) of the speech presence and absence states. We show that the proposed modifications lead to improved noise PSD estimators which are less sensitive to spectral outliers of the noise and track changes in the noise PSD more quickly than the reference method. Further, when employed in a common speech enhancement setup, the proposed estimators achieve an increased noise reduction while keeping speech distortions at a comparable level.
Journal of the Acoustical Society of America | 2016
Martin Krawczyk-Becker; Timo Gerkmann
For the enhancement of single-channel speech corrupted by acoustic noise, recently short-time Fourier transform domain clean speech estimators were proposed that incorporate prior information about the clean speech spectral phase. Instrumental measures predict quality improvements for the phase-aware estimators over their conventional phase-blind counterparts. In this letter, these predictions are verified by means of listening experiments. The phase-aware amplitude estimator on average achieves a stronger noise reduction and is significantly preferred over its phase-blind counterpart in a pairwise comparison even if the clean spectral phase is estimated blindly on the noisy signal.
workshop on applications of signal processing to audio and acoustics | 2015
Martin Krawczyk-Becker; Timo Gerkmann
For the reduction of additive acoustic noise, various methods and clean speech estimators are available, with specific strengths and weaknesses. In order to combine the strengths of two such approaches, we derive a minimum mean squared error (MMSE)-optimal estimator of the clean speech given two independent initial clean speech estimates. As an example we present a specific combination that results in a weighted mixture of the Wiener filter and a simple, low-cost harmonic speech model. The proposed estimator benefits from the additional information provided by the harmonic model, leading to a better protection of harmonic components of voiced speech as compared to the traditional Wiener filter. Instrumental measures predict improvements in speech quality and speech intelligibility for the proposed combination over each individual estimator.
international conference on latent variable analysis and signal separation | 2018
Martin Krawczyk-Becker; Timo Gerkmann
In recent years, there has been a renaissance of research on the role of the spectral phase in single-channel speech enhancement. One of the recent proposals is to not only estimate the clean speech phase but also use this phase estimate as an additional source of information to facilitate the estimation of the clean speech magnitude. To assess the potential benefit of such approaches, in this paper we systematically explore in which situations additional information about the clean speech phase is most valuable. For this, we compare the performance of phase-aware and phase-blind clean speech estimators in different noise scenarios, i.e. at different signal to noise ratios (SNRs) and for noise sources with different degrees of stationarity. Interestingly, the results indicate that the greatest benefits can be achieved in situations where conventional magnitude-only speech enhancement is most challenging, namely in highly non-stationary noises at low SNRs.
arXiv: Sound | 2017
Huy Phan; Martin Krawczyk-Becker; Timo Gerkmann; Alfred Mertins