Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Richard C. Hendriks is active.

Publication


Featured researches published by Richard C. Hendriks.


IEEE Transactions on Audio, Speech, and Language Processing | 2011

An Algorithm for Intelligibility Prediction of Time–Frequency Weighted Noisy Speech

Cees H. Taal; Richard C. Hendriks; Richard Heusdens; Jesper Jensen

In the development process of noise-reduction algorithms, an objective machine-driven intelligibility measure which shows high correlation with speech intelligibility is of great interest. Besides reducing time and costs compared to real listening experiments, an objective intelligibility measure could also help provide answers on how to improve the intelligibility of noisy unprocessed speech. In this paper, a short-time objective intelligibility measure (STOI) is presented, which shows high correlation with the intelligibility of noisy and time-frequency weighted noisy speech (e.g., resulting from noise reduction) of three different listening experiments. In general, STOI showed better correlation with speech intelligibility compared to five other reference objective intelligibility models. In contrast to other conventional intelligibility models which tend to rely on global statistics across entire sentences, STOI is based on shorter time segments (386 ms). Experiments indeed show that it is beneficial to take segment lengths of this order into account. In addition, a free Matlab implementation is provided.


IEEE Transactions on Audio, Speech, and Language Processing | 2012

Unbiased MMSE-Based Noise Power Estimation With Low Complexity and Low Tracking Delay

Timo Gerkmann; Richard C. Hendriks

Recently, it has been proposed to estimate the noise power spectral density by means of minimum mean-square error (MMSE) optimal estimation. We show that the resulting estimator can be interpreted as a voice activity detector (VAD)-based noise power estimator, where the noise power is updated only when speech absence is signaled, compensated with a required bias compensation. We show that the bias compensation is unnecessary when we replace the VAD by a soft speech presence probability (SPP) with fixed priors. Choosing fixed priors also has the benefit of decoupling the noise power estimator from subsequent steps in a speech enhancement framework, such as the estimation of the speech power and the estimation of the clean speech. We show that the proposed speech presence probability (SPP) approach maintains the quick noise tracking performance of the bias compensated minimum mean-square error (MMSE)-based approach while exhibiting less overestimation of the spectral noise power and an even lower computational complexity.


IEEE Transactions on Audio, Speech, and Language Processing | 2007

Minimum Mean-Square Error Estimation of Discrete Fourier Coefficients With Generalized Gamma Priors

Jan S. Erkelens; Richard C. Hendriks; Richard Heusdens; Jesper Jensen

This paper considers techniques for single-channel speech enhancement based on the discrete Fourier transform (DFT). Specifically, we derive minimum mean-square error (MMSE) estimators of speech DFT coefficient magnitudes as well as of complex-valued DFT coefficients based on two classes of generalized gamma distributions, under an additive Gaussian noise assumption. The resulting generalized DFT magnitude estimator has as a special case the existing scheme based on a Rayleigh speech prior, while the complex DFT estimators generalize existing schemes based on Gaussian, Laplacian, and Gamma speech priors. Extensive simulation experiments with speech signals degraded by various additive noise sources verify that significant improvements are possible with the more recent estimators based on super-Gaussian priors. The increase in perceptual evaluation of speech quality (PESQ) over the noisy signals is about 0.5 points for street noise and about 1 point for white noise, nearly independent of input signal-to-noise ratio (SNR). The assumptions made for deriving the complex DFT estimators are less accurate than those for the magnitude estimators, leading to a higher maximum achievable speech quality with the magnitude estimators.


international conference on acoustics, speech, and signal processing | 2010

MMSE based noise PSD tracking with low complexity

Richard C. Hendriks; Richard Heusdens; Jesper Jensen

Most speech enhancement algorithms heavily depend on the noise power spectral density (PSD). Because this quantity is unknown in practice, estimation from the noisy data is necessary. We present a low complexity method for noise PSD estimation. The algorithm is based on a minimum mean-squared error estimator of the noise magnitude-squared DFT coefficients. Compared to minimum statistics based noise tracking, segmental SNR and PESQ are improved for non-stationary noise sources with 1 dB and 0.25 MOS points, respectively. Compared to recently published algorithms, similar good noise tracking performance is obtained, but at a computational complexity that is in the order of a factor 40 lower.


international conference on acoustics, speech, and signal processing | 2010

A short-time objective intelligibility measure for time-frequency weighted noisy speech

Cees H. Taal; Richard C. Hendriks; Richard Heusdens; Jesper Jensen

Existing objective speech-intelligibility measures are suitable for several types of degradation, however, it turns out that they are less appropriate for methods where noisy speech is processed by a time-frequency (TF) weighting, e.g., noise reduction and speech separation. In this paper, we present an objective intelligibility measure, which shows high correlation (rho=0.95) with the intelligibility of both noisy, and TF-weighted noisy speech. The proposed method shows significantly better performance than three other, more sophisticated, objective measures. Furthermore, it is based on an intermediate intelligibility measure for short-time (approximately 400 ms) TF-regions, and uses a simple DFT-based TF-decomposition. In addition, a free Matlab implementation is provided.


workshop on applications of signal processing to audio and acoustics | 2011

Noise power estimation based on the probability of speech presence

Timo Gerkmann; Richard C. Hendriks

In this paper, we analyze the minimum mean square error (MMSE) based spectral noise power estimator [1] and present an improvement. We will show that the MMSE based spectral noise power estimate is only updated when the a posteriori signal-to-noise ratio (SNR) is lower than one. This threshold on the a posteriori SNR can be interpreted as a voice activity detector (VAD). We propose in this work to replace the hard decision of the VAD by a soft speech presence probability (SPP). We show that by doing so, the proposed estimator does not require a bias correction and safety-net as is required by the MMSE estimator presented in [1]. At the same time, the proposed estimator maintains the quick noise tracking capability which is characteristic for the MMSE noise tracker, results in less noise power overestimation and is computationally less expensive.


Synthesis Lectures on Speech and Audio Processing | 2013

DFT-Domain Based Single-Microphone Noise Reduction for Speech Enhancement: A Survey of the State of the Art

Richard C. Hendriks; Timo Gerkmann; Jesper Jensen

As speech processing devices like mobile phones, voice controlled devices, and hearing aids have increased in popularity, people expect them to work anywhere and at any time without user intervention. However, the presence of acoustical disturbances limits the use of these applications, degrades their performance, or causes the user difficulties in understanding the conversation or appreciating the device. A common way to reduce the effects of such disturbances is through the use of single-microphone noise reduction algorithms for speech enhancement. The field of single-microphone noise reduction for speech enhancement comprises a history of more than 30 years of research. In this survey, we wish to demonstrate the significant advances that have been made during the last decade in the field of discrete Fourier transform domain-based single-channel noise reduction for speech enhancement.Furthermore, our goal is to provide a concise description of a state-of-the-art speech enhancement system, and demonstrate the relative importance of the various building blocks of such a system. This allows the non-expert DSP practitioner to judge the relevance of each building block and to implement a close-to-optimal enhancement system for the particular application at hand. Table of Contents: Introduction / Single Channel Speech Enhancement: General Principles / DFT-Based Speech Enhancement Methods: Signal Model and Notation / Speech DFT Estimators / Speech Presence Probability Estimation / Noise PSD Estimation / Speech PSD Estimation / Performance Evaluation Methods / Simulation Experiments with Single-Channel Enhancement Systems / Future Directions


IEEE Transactions on Audio, Speech, and Language Processing | 2008

Noise Tracking Using DFT Domain Subspace Decompositions

Richard C. Hendriks; Jesper Jensen; Richard Heusdens

All discrete Fourier transform (DFT) domain-based speech enhancement gain functions rely on knowledge of the noise power spectral density (PSD). Since the noise PSD is unknown in advance, estimation from the noisy speech signal is necessary. An overestimation of the noise PSD will lead to a loss in speech quality, while an underestimation will lead to an unnecessary high level of residual noise. We present a novel approach for noise tracking, which updates the noise PSD for each DFT coefficient in the presence of both speech and noise. This method is based on the eigenvalue decomposition of correlation matrices that are constructed from time series of noisy DFT coefficients. The presented method is very well capable of tracking gradually changing noise types. In comparison to state-of-the-art noise tracking algorithms the proposed method reduces the estimation error between the estimated and the true noise PSD. In combination with an enhancement system the proposed method improves the segmental SNR with several decibels for gradually changing noise types. Listening experiments show that the proposed system is preferred over the state-of-the-art noise tracking algorithm.


IEEE Transactions on Audio, Speech, and Language Processing | 2012

Noise Correlation Matrix Estimation for Multi-Microphone Speech Enhancement

Richard C. Hendriks; Timo Gerkmann

For multi-channel noise reduction algorithms like the minimum variance distortionless response (MVDR) beamformer, or the multi-channel Wiener filter, an estimate of the noise correlation matrix is needed. For its estimation, it is often proposed in the literature to use a voice activity detector (VAD). However, using a VAD the estimated matrix can only be updated in speech absence. As a result, during speech presence the noise correlation matrix estimate does not follow changing noise fields with an appropriate accuracy. This effect is further increased, as in nonstationary noise voice activity detection is a rather difficult task, and false-alarms are likely to occur. In this paper, we present and analyze an algorithm that estimates the noise correlation matrix without using a VAD. This algorithm is based on measuring the correlation of the noisy input and a noise reference which can be obtained, e.g., by steering a null towards the target source. When applied in combination with an MVDR beamformer, it is shown that the proposed noise correlation matrix estimate results in a more accurate beamformer response, a larger signal-to-noise ratio improvement and a larger instrumentally predicted speech intelligibility when compared to competing algorithms such as the generalized sidelobe canceler, a VAD-based MVDR beamformer, and an MVDR based on the noisy correlation matrix.


international conference on acoustics, speech, and signal processing | 2012

A speech preprocessing strategy for intelligibility improvement in noise based on a perceptual distortion measure

Cees H. Taal; Richard C. Hendriks; Richard Heusdens

A speech pre-processing algorithm is presented to improve the speech intelligibility in noise for the near-end listener. The algorithm improves the intelligibility by optimally redistributing the speech energy over time and frequency for a perceptual distortion measure, which is based on a spectro-temporal auditory model. In contrast to spectral-only models, short-time information is taken into account. As a consequence, the algorithm is more sensitive to transient regions, which will therefore receive more amplification compared to stationary vowels. It is known from literature that changing the vowel-transient energy ratio is beneficial for improving speech-intelligibility in noise. Objective intelligibility prediction results show that the proposed method has higher speech intelligibility in noise compared to two other reference methods, without modifying the global speech energy.

Collaboration


Dive into the Richard C. Hendriks's collaboration.

Top Co-Authors

Avatar

Richard Heusdens

Delft University of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Cees H. Taal

Delft University of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

W. Bastiaan Kleijn

Victoria University of Wellington

View shared research outputs
Top Co-Authors

Avatar

Yuan Zeng

Delft University of Technology

View shared research outputs
Top Co-Authors

Avatar

Ulrik Kjems

Technical University of Denmark

View shared research outputs
Top Co-Authors

Avatar

Andreas I. Koutrouvelis

Delft University of Technology

View shared research outputs
Top Co-Authors

Avatar

Jan S. Erkelens

Delft University of Technology

View shared research outputs
Top Co-Authors

Avatar

Joao B. Crespo

Delft University of Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge