Hugo Van hamme | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Hugo Van hamme is active.

Explore More

Publication

Featured researches published by Hugo Van hamme.

EURASIP Journal on Advances in Signal Processing | 2007

A review of signal subspace speech enhancement and its application to noise robust speech recognition

Kris Hermus; Patrick Wambacq; Hugo Van hamme

The objective of this paper is threefold: (1) to provide an extensive review of signal subspace speech enhancement, (2) to derive an upper bound for the performance of these techniques, and (3) to present a comprehensive study of the potential of subspace filtering to increase the robustness of automatic speech recognisers against stationary additive noise distortions. Subspace filtering methods are based on the orthogonal decomposition of the noisy speech observation space into a signal subspace and a noise subspace. This decomposition is possible under the assumption of a low-rank model for speech, and on the availability of an estimate of the noise correlation matrix. We present an extensive overview of the available estimators, and derive a theoretical estimator to experimentally assess an upper bound to the performance that can be achieved by any subspace-based method. Automatic speech recognition experiments with noisy data demonstrate that subspace-based speech enhancement can significantly increase the robustness of these systems in additive coloured noise environments. Optimal performance is obtained only if no explicit rank reduction of the noisy Hankel matrix is performed. Although this strategy might increase the level of the residual noise, it reduces the risk of removing essential signal information for the recognisers back end. Finally, it is also shown that subspace filtering compares favourably to the well-known spectral subtraction technique.

international conference on acoustics, speech, and signal processing | 2013

Accent recognition using i-vector, Gaussian Mean Supervector and Gaussian posterior probability supervector for spontaneous telephone speech

Mohamad Hasan Bahari; Rahim Saeidi; Hugo Van hamme; David A. van Leeuwen

In this paper, three utterance modelling approaches, namely Gaussian Mean Supervector (GMS), i-vector and Gaussian Posterior Probability Supervector (GPPS), are applied to the accent recognition problem. For each utterance modeling method, three different classifiers, namely the Support Vector Machine (SVM), the Naive Bayesian Classifier (NBC) and the Sparse Representation Classifier (SRC), are employed to find out suitable matches between the utterance modelling schemes and the classifiers. The evaluation database is formed by using English utterances of speakers whose native languages are Russian, Hindi, American English, Thai, Vietnamese and Cantonese. These utterances are drawn from the National Institute of Standards and Technology (NIST) 2008 Speaker Recognition Evaluation (SRE) database. The study results show that GPPS and i-vector are more effective than GMS in this accent recognition task. It is also concluded that among the employed classifiers, the best matches for i-vector and GPPS are SVM and SRC, respectively.

Speech Communication | 2006

Model-based feature enhancement with uncertainty decoding for noise robust ASR

Veronique Stouten; Hugo Van hamme; Patrick Wambacq

In this paper, several techniques are proposed to incorporate the uncertainty of the clean speech estimate in the decoding process of the backend recogniser in the context of model-based feature enhancement (MBFE) for noise robust speech recognition. Usually, the Gaussians in the acoustic space are sampled in a single point estimate, which means that the backend recogniser considers its input as a noise-free utterance. However, in this way the variance of the estimator is neglected. To solve this problem, it has already been argued that the acoustic space should be evaluated in a probability density function, e.g. a Gaussian observation pdf. We illustrate that this Gaussian observation pdf can be replaced by a computationally more tractable discrete pdf, consisting of a weighted sum of delta functions. We also show how improved posterior state probabilities can be obtained by calculating their maximum likelihood estimates or by using the pdf of clean speech conditioned on both the noisy speech and the backend Gaussian. Another simple and efficient technique is to replace these posterior probabilities by M Kronecker deltas, which results in M front-end feature vector candidates, and to take the maximum over their backend scores. Experimental results are given for the Aurora2 and Aurora4 database to compare the proposed techniques. A significant decrease of the word error rate of the resulting speech recognition system is obtained.In this paper, we consider the parametric version of Wiener systems where both the linear and nonlinear parts are identified with clipped observations in the presence of internal and external noises. Also the static functions are allowed noninvertible. We propose a classification based support vector machine (SVM) and formulate the identification problem as a convex optimization. The solution to the optimization problem converges to the true parameters of the linear system if it is an finite-impulse-response (FIR) system, even though clipping reduces a great deal of information about the system characteristics. In identifying a Wiener system with a stable infinite-impulse-response (IIR) system, an FIR system is used to approximate it and the problem is converted to identifying the FIR system together with solving a set of nonlinear equations. This leads to biased estimates of parameters in the IIR system while the bias can be controlled by choosing the order of the approximated FIR system.

Speech Communication | 2009

Automatic voice onset time estimation from reassignment spectra

Veronique Stouten; Hugo Van hamme

We describe an algorithm to automatically estimate the voice onset time (VOT) of plosives. The VOT is the time delay between the burst onset and the start of periodicity when it is followed by a voiced sound. Since the VOT is affected by factors like place of articulation and voicing it can be used for inference of these factors. The algorithm uses the reassignment spectrum of the speech signal, a high resolution time-frequency representation which simplifies the detection of the acoustic events in a plosive. The performance of our algorithm is evaluated on a subset of the TIMIT database by comparison with manual VOT measurements. On average, the difference is smaller than 10ms for 76.1% and smaller than 20ms for 91.4% of the plosive segments. We also provide analysis statistics of the VOT of /b/, /d/, /g/, /p/, /t/ and /k/ and experimentally verify some sources of variability. Finally, to illustrate possible applications, we integrate the automatic VOT estimates as an additional feature in an HMM-based speech recognition system and show a small but statistically significant improvement in phone recognition rate.

workshop on applications of signal processing to audio and acoustics | 2013

An exemplar-based NMF approach to audio event detection

Jort F. Gemmeke; Lode Vuegen; Peter Karsmakers; Bart Vanrumste; Hugo Van hamme

We present a novel, exemplar-based method for audio event detection based on non-negative matrix factorisation. Building on recent work in noise robust automatic speech recognition, we model events as a linear combination of dictionary atoms, and mixtures as a linear combination of overlapping events. The weights of activated atoms in an observation serve directly as evidence for the underlying event classes. The atoms in the dictionary span multiple frames and are created by extracting all possible fixed-length exemplars from the training data. To combat data scarcity of small training datasets, we propose to artificially augment the amount of training data by linear time warping in the feature domain at multiple rates. The method is evaluated on the Office Live and Office Synthetic datasets released by the AASP Challenge on Detection and Classification of Acoustic Scenes and Events.

IEEE Transactions on Audio, Speech, and Language Processing | 2011

Advances in Missing Feature Techniques for Robust Large-Vocabulary Continuous Speech Recognition

Maarten Van Segbroeck; Hugo Van hamme

Missing feature theory (MFT) has demonstrated great potential for improving the noise robustness in speech recognition. MFT was mostly applied in the log-spectral domain since this is also the representation in which the masks have a simple formulation. However, with diagonally structured covariance matrices in the log-spectral domain, recognition performance can only be maintained at the cost of increasing the number of Gaussians drastically. In this paper, MFT can be applied for static and dynamic features in any feature domain that is a linear transform of log-spectra. A crucial part in MFT-systems is the computation of reliability masks from noisy data. The proposed system operates on either binary masks where hard decisions are made about the reliability of the data or on fuzzy masks which use a soft decision criterion. For real-life deployments, a compensation for convolutional noise is also required. Channel compensation in speech recognition typically involves estimating an additive shift in the log-spectral or cepstral domain. To deal with the fact that some features are considered as unreliable, a maximum-likelihood estimation technique is integrated in the back-end recognition process of the MFT system to estimate the channel. Hence, the resulting MFT-based recognizer can deal with both additive and convolutional noise and shows promising results on the Aurora4 large-vocabulary database.

2011 IEEE Workshop on Biometric Measurements and Systems for Security and Medical Applications (BIOMS) | 2011

Speaker age estimation and gender detection based on supervised Non-Negative Matrix Factorization

Mohamad Hasan Bahari; Hugo Van hamme

In many criminal cases, evidence might be in the form of telephone conversations or tape recordings. Therefore, law enforcement agencies have been concerned about accurate methods to profile different characteristics of a speaker from recorded voice patterns, which facilitate the identification of a criminal. This paper proposes a new approach for speaker gender detection and age estimation, based on a hybrid architecture of Weighted Supervised Non-Negative Matrix Factorization (WSNMF) and General Regression Neural Network (GRNN). Evaluation results on a corpus of read and spontaneous speech in Dutch confirms the effectiveness of the proposed scheme.

IEEE Transactions on Audio, Speech, and Language Processing | 2016

Unseen noise estimation using separable deep auto encoder for speech enhancement

Meng Sun; Xiongwei Zhang; Hugo Van hamme; Thomas Fang Zheng

Unseen noise estimation is a key yet challenging step to make a speech enhancement algorithm work in adverse environments. At worst, the only prior knowledge we know about the encountered noise is that it is different from the involved speech. Therefore, by subtracting the components which cannot be adequately represented by a well defined speech model, the noises can be estimated and removed. Given the good performance of deep learning in signal representation, a deep auto encoder (DAE) is employed in this work for accurately modeling the clean speech spectrum. In the subsequent stage of speech enhancement, an extra DAE is introduced to represent the residual part obtained by subtracting the estimated clean speech spectrum (by using the pre-trained DAE) from the noisy speech spectrum. By adjusting the estimated clean speech spectrum and the unknown parameters of the noise DAE, one can reach a stationary point to minimize the total reconstruction error of the noisy speech spectrum. The enhanced speech signal is thus obtained by transforming the estimated clean speech spectrum back into time domain. The above proposed technique is called separable deep auto encoder (SDAE). Given the under-determined nature of the above optimization problem, the clean speech reconstruction is confined in the convex hull spanned by a pre-trained speech dictionary. New learning algorithms are investigated to respect the non-negativity of the parameters in the SDAE. Experimental results on TIMIT with 20 noise types at various noise levels demonstrate the superiority of the proposed method over the conventional baselines.

Engineering Applications of Artificial Intelligence | 2014

Speaker age estimation using i-vectors

Mohamad Hasan Bahari; Mitchell McLaren; Hugo Van hamme; David A. van Leeuwen

In this paper, a new approach for age estimation from speech signals based on i-vectors is proposed. In this method, each utterance is modeled by its corresponding i-vector. Then, a Within-Class Covariance Normalization technique is used for session variability compensation. Finally, a least squares support vector regression (LSSVR) is applied to estimate the age of speakers. The proposed method is trained and tested on telephone conversations of the National Institute for Standard and Technology (NIST) 2010 and 2008 speaker recognition evaluation databases. Evaluation results show that the proposed method yields significantly lower mean absolute error and higher Pearson correlation coefficient between chronological speaker age and estimated speaker age compared to different conventional schemes. The obtained relative improvements of mean absolute error and correlation coefficient compared to our best baseline system are around 5% and 2% respectively. Finally, the effect of some major factors influencing the proposed age estimation system, namely utterance length and spoken language are analyzed. HighlightsA new approach for age estimation from speech signals based on i-vectors is proposed.Utterances are modeled using the i-vector framework.Within-class covariance normalization is used for session variability compensation.Least squares support vector regression is applied to estimate the age of speakers.The proposed method significantly improves conventional schemes.

Speech Communication | 2009

Developing a reading tutor: Design and evaluation of dedicated speech recognition and synthesis modules

Jacques Duchateau; Yuk On Kong; Leen Cleuren; Lukas Latacz; Jan Roelens; Abdurrahman Samir; Kris Demuynck; Pol Ghesquière; Werner Verhelst; Hugo Van hamme

When a child learns to read, the learning process can be enhanced by significant reading practice with individual support from a tutor. But in reality, the availability of teachers or clinicians is limited, so the additional use of a fully automated reading tutor would be beneficial for the child. This paper discusses our efforts to develop an automated reading tutor for Dutch. First, the dedicated speech recognition and synthesis modules in the reading tutor are described. Then, three diagnostic and remedial reading tutor tools are evaluated in practice and improved based on these evaluations: (1) automatic assessment of a childs reading level, (2) oral feedback to a child at the phoneme, syllable or word level, and (3) tracking where a child is reading, for automated screen advancement or for direct feedback to the child. In general, the presented tools work in a satisfactory way, including for children with known reading disabilities.

Explore More