Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where H. Van Hamme is active.

Publication


Featured researches published by H. Van Hamme.


international conference on acoustics, speech, and signal processing | 2004

Robust speech recognition using cepstral domain missing data techniques and noisy masks

H. Van Hamme

Missing data techniques (MDT) have been shown to be an effective method for curing the performance degradation of HMM-based speech recognition systems operating on noisy signals. However, a major drawback of the approach is that MDT requires that the acoustic model be expressed as a mixture of diagonal Gaussians in the log-spectral domain, whereas a higher accuracy can be obtained with Gaussian mixtures in the cepstral domain. The paper describes a recognizer based on the recently described cepstral-domain MDT approach using missing data masks computed from the noisy signal. It exploits a novel decision criterion that integrates harmonicity with signal-to-noise ratio and which makes minimal assumptions on the noise. The system is shown to exhibit a recognition accuracy that is comparable to the ETSI advanced front-end reference.


IEEE Signal Processing Letters | 2008

Discovering Phone Patterns in Spoken Utterances by Non-Negative Matrix Factorization

Veronique Stouten; Kris Demuynck; H. Van Hamme

We present a technique to automatically discover the (word-sized) phone patterns that are present in speech utterances. These patterns are learnt from a set of phone lattices generated from the utterances. Just like children acquiring language, our system does not have prior information on what the meaningful patterns are. By applying the non-negative matrix factorization algorithm to a fixed-length high-dimensional vector representation of the speech utterances, a decomposition in terms of additive units is obtained. We illustrate that these units correspond to words in case of a small vocabulary task. Our result also raises questions about whether explicit segmentation and clustering are needed in an unsupervised learning context.


international conference on acoustics, speech, and signal processing | 2008

Robust speech recognition using missing data techniques in the prospect domain and fuzzy masks

M. van Segbroeck; H. Van Hamme

Missing data theory (MDT) has been applied to handle the problem of noise-robust speech recognition. Conventional MDT-systems require acoustic models that are expressed in the log-spectral rather than in the cepstral domain, which leads to a loss in accuracy. Therefore, we have already introduced a MDT-technique that can be applied in any feature domain that is a linear transform of log-spectra. This MDT-system requires hard decisions about the reliability of each spectral component. When computed from noisy data, misclassification errors in the mask are hardly unavoidable and the recognition rate will significantly degrade. The risk of misclassifications can be reduced by estimating a probability that the component is reliable, e.g. a fuzzy mask. In this paper, we extend our MDT-system to be applied in the probabilistic decision framework. Experiments on the Aurora2 database demonstrate a further increase in recognition accuracy, especially at low SNRs.


international conference on acoustics, speech, and signal processing | 2004

Joint removal of additive and convolutional noise with model-based feature enhancement

Veronique Stouten; H. Van Hamme; Patrick Wambacq

In this paper we describe how we successfully extended the model-based feature enhancement (MBFE) algorithm to jointly remove additive and convolutional noise from corrupted speech. Although a model of the clean speech can incorporate prior knowledge into the feature enhancement process, this model no longer yields an accurate fit if a different microphone is used. To cure the resulting performance degradation, we merge a new iterative EM algorithm to estimate the channel, and the MBFE-algorithm to remove nonstationary additive noise. In the latter, the parameters of a shifted clean speech HMM and a noise HMM are first combined by a vector Taylor series approximation and then the state-conditional MMSE-estimates of the clean speech are calculated. Recognition experiments confirmed the superior performance on the Aurora4 recognition task. An average relative reduction in WER of 12% and 2.8% on the clean and multi condition training respectively, was obtained compared to the Advanced Front-End standard.


international conference on acoustics, speech, and signal processing | 2005

Effect of phase-sensitive environment model and higher order VTS on noisy speech feature enhancement [speech recognition applications]

Veronique Stouten; H. Van Hamme; Patrick Wambacq

Model-based techniques for robust speech recognition often require the statistics of noisy speech. In this paper, we propose two modifications to obtain more accurate versions of the statistics of the combined HMM (starting from a clean speech and a noise model). Usually, the phase difference between speech and noise is neglected in the acoustic environment model. However, we show how a phase-sensitive environment model can be efficiently integrated in the context of multi-stream model-based feature enhancement and gives rise to more accurate covariance matrices for the noisy speech. Also, by expanding the vector Taylor series up to the second order term, an improved noisy speech mean can be obtained. Finally, we explain how the front-end clean speech model itself can be improved by a preprocessing of the training data. Recognition results on the Aurora4 database illustrate the effect on the noise robustness for each of these modifications.


2012 5th European DSP Education and Research Conference (EDERC) | 2012

Time-domain generalized cross correlation phase transform sound source localization for small microphone arrays

B. Van den Broeck; Alexander Bertrand; Peter Karsmakers; Bart Vanrumste; H. Van Hamme; Marc Moonen

Due to hard- and software progress applications based on sound enhancement are gaining popularity. But such applications are often still limited by hardware costs, energy and real-time constraints, thereby bounding the available complexity. One task often accompanied with (multichannel) sound enhancement is the localization of the sound source. This paper focusses on implementing an accurate Sound Source Localizer (SSL) for estimating the position of a sound source on a digital signal processor, using as less CPU resources as possible. One of the least complex algorithms for SSL is a simple correlation, implemented in the frequency-domain for efficiency, combined with a frequency bin weighing for robustness. Together called Generalized Cross Correlation (GCC). One popular weighing called GCC PHAse Transform (GCC-PHAT) will be handled. In this paper it is explained that for small microphone arrays this frequency-domain implementation is inferior to its time-domain alternative in terms of algorithmic complexity. Therefore a time-domain PHAT equivalent will be described. Both implementations are compared in terms of complexity (clock cycles needed on a Texas Instruments C5515 DSP) and obtained results, showing a complexity gain with a factor of 146, with hardly any loss in localization accuracy.


international conference on acoustics, speech, and signal processing | 2006

Handling Time-Derivative Features in a Missing Data Framework for Robust Automatic Speech Recognition

H. Van Hamme

We present a novel approach to handling dynamic (time derivative or delta) features for automatic speech recognition using a HMM/GMM-architecture and based on missing data techniques for noise robustness. The static and the dynamic features are imputed in the observations based on an acoustic model expressed in a domain that is a linear transform of the log-spectra and taking bounds into account. The reliability masks of the dynamic features are ternary. We describe a method for computing oracle masks for dynamic features. We also propose a simple method to derive dynamic masks from the reliability mask of the static features. We find that using bounds in the imputation is advantageous, both for oracle masks and for masks derived from the noisy observations


international conference on acoustics, speech, and signal processing | 2006

Application of Minimum Statistics and Minima Controlled Recursive Averaging Methods to Estimate a Cepstral Noise Model for Robust ASR

Veronique Stouten; H. Van Hamme; Patrick Wambacq

Many compensation techniques, both in the model and feature domain, require an estimate of the noise statistics to compensate for the clean speech degradation in adverse environments. We explore how two spectral noise estimation approaches can be applied in the context of model-based feature enhancement. The minimum statistics method and the improved minima controlled recursive averaging method are used to estimate the noise power spectrum based only on the noisy speech. The noise mean and variance estimates are nonlinearly transformed to the cepstral domain and used in the Gaussian noise model of MBFE. We show that the resulting system achieves an accuracy on the Aurora2 task that is comparable to MBFE with prior knowledge on noise. Finally, this performance can be significantly improved when the MS or EMCRA noise mean is reestimated based on a clean speech model


Neurocomputing | 2011

Modelling vocabulary acquisition, adaptation and generalization in infants using adaptive Bayesian PLSA

Joris Driesen; H. Van Hamme

During the early stages of language acquisition, young infants face the task of learning a basic vocabulary without the aid of prior linguistic knowledge. Attempts have been made to model this complex behaviour computationally, using a variety of machine learning algorithms, a.o. non-negative matrix factorization (NMF). In this paper, we replace NMF in a vocabulary learning setting with a conceptually similar algorithm, probabilistic latent semantic analysis (PLSA), which can learn word representations incrementally by Bayesian updating. We further show that this learning framework is capable of modelling certain cognitive behaviours, e.g. forgetting, in a simple way.


international conference on acoustics, speech, and signal processing | 2008

Unsupervised learning of auditory filter banks using non-negative matrix factorisation

Alexander Bertrand; Kris Demuynck; Veronique Stouten; H. Van Hamme

Non-negative matrix factorisation (NMF) is an unsupervised learning technique that decomposes a non-negative data matrix into a product of two lower rank non-negative matrices. The non-negativity constraint results in a parts-based and often sparse representation of the data. We use NMF to factorise a matrix with spectral slices of continuous speech to automatically find a feature set for speech recognition. The resulting decomposition yields a filter bank design with remarkable similarities to perceptually motivated designs, supporting the hypothesis that human hearing and speech production are well matched to each other. We point out that the divergence cost criterion used by NMF is linearly dependent on energy, which may influence the design. We will however argue that this does not significantly affect the interpretation of our results. Furthermore, we compare our filter bank with several hearing models found in literature. Evaluating the filter bank for speech recognition shows that the same recognition performance is achieved as with classical MEL- based features.

Collaboration


Dive into the H. Van Hamme's collaboration.

Top Co-Authors

Avatar

Veronique Stouten

Katholieke Universiteit Leuven

View shared research outputs
Top Co-Authors

Avatar

Joris Driesen

Katholieke Universiteit Leuven

View shared research outputs
Top Co-Authors

Avatar

Kris Demuynck

Katholieke Universiteit Leuven

View shared research outputs
Top Co-Authors

Avatar

Kris Hermus

Katholieke Universiteit Leuven

View shared research outputs
Top Co-Authors

Avatar

Patrick Wambacq

Katholieke Universiteit Leuven

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Alexander Bertrand

Katholieke Universiteit Leuven

View shared research outputs
Top Co-Authors

Avatar

Jacques Duchateau

Katholieke Universiteit Leuven

View shared research outputs
Top Co-Authors

Avatar

Jort F. Gemmeke

Katholieke Universiteit Leuven

View shared research outputs
Top Co-Authors

Avatar

M. van Segbroeck

Katholieke Universiteit Leuven

View shared research outputs
Researchain Logo
Decentralizing Knowledge