Kris Hermus
Katholieke Universiteit Leuven
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Kris Hermus.
EURASIP Journal on Advances in Signal Processing | 2007
Kris Hermus; Patrick Wambacq; Hugo Van hamme
The objective of this paper is threefold: (1) to provide an extensive review of signal subspace speech enhancement, (2) to derive an upper bound for the performance of these techniques, and (3) to present a comprehensive study of the potential of subspace filtering to increase the robustness of automatic speech recognisers against stationary additive noise distortions. Subspace filtering methods are based on the orthogonal decomposition of the noisy speech observation space into a signal subspace and a noise subspace. This decomposition is possible under the assumption of a low-rank model for speech, and on the availability of an estimate of the noise correlation matrix. We present an extensive overview of the available estimators, and derive a theoretical estimator to experimentally assess an upper bound to the performance that can be achieved by any subspace-based method. Automatic speech recognition experiments with noisy data demonstrate that subspace-based speech enhancement can significantly increase the robustness of these systems in additive coloured noise environments. Optimal performance is obtained only if no explicit rank reduction of the noisy Hankel matrix is performed. Although this strategy might increase the level of the residual noise, it reduces the risk of removing essential signal information for the recognisers back end. Finally, it is also shown that subspace filtering compares favourably to the well-known spectral subtraction technique.
Signal Processing | 2005
Kris Hermus; Werner Verhelst; Philippe Lemmerling; Patrick Wambacq; Sabine Van Huffel
This paper presents the derivation of a new perceptual model that represents speech and audio signals by a sum of exponentially damped sinusoids. Compared to a traditional sinusoidal model, the exponential sinusoidal model (ESM) is better suited to model transient segments that are readily found in audio signals.Total least squares (TLS) algorithms are applied for the automatic extraction of the modeling parameters in the ESM, i.e. the amplitude, phase, frequency and damping factors of a user-defined number of damped sinusoids. In order to turn the SNR optimization criterion of these TLS algorithms into a perceptual modeling strategy, we use the psychoacoustic model of MPEG-1 Layer 1 in a subband TLS-ESM scheme. This allows us to model each subband signal in accordance with its perceptual relevance, thereby lowering the number of required modeling components for a given modeling quality. Simulations and listening tests confirm that perceptual ESM achieves the same perceived quality as plain ESM while using substantially less components, and provide support for applying the new model in the fields of parametric audio processing and coding.
international conference on acoustics, speech, and signal processing | 2004
Kris Hermus; Patrick Wambacq
Subspace filtering is an extensively studied technique that has been proven very effective in the area of speech enhancement to improve the speech intelligibility. In this paper, we review different subspace estimation techniques (minimum variance, least squares, singular value adaptation, time domain constrained and spectral domain constrained) in a modified singular value decomposition (SVD) framework, and investigate their capability to improve the noise robustness of speech recognisers. An extensive set of recognition experiments with the Resource Management (RM) database showed that significant reductions in WER can be obtained, both for the white noise and the coloured noise case. Unlike for speech enhancement approaches, we found that no truncation of the noisy signal subspace should be done to optimise the recognition accuracy.
IEEE Signal Processing Letters | 2007
Kris Hermus; H. Van Hamme; S. Irhimeh
We present a new algorithm for the estimation of the voicing cut-off frequency (VCO), i.e., the frequency that separates the harmonic low-frequency part from the aperiodic high-frequency part in voiced speech. The VCO is estimated as the frequency for which the sum of the harmonicity scores of all pitch harmonics below that frequency is maximized. The algorithm is combined with a powerful dynamic programming approach to track the VCO estimates over time. Remarkably accurate and smooth VCO contours are obtained, despite the simplicity of the algorithm. Applications include a.o. (sinusoidal) speech modeling, coding, and synthesis, as well as harmonic speech analysis for, e.g., automatic speech recognition.
international conference on acoustics, speech, and signal processing | 2008
Kris Hermus; Laurent Girin; H. Van Hamme; S. Irhimeh
We present a new algorithm for the automatic estimation of the voicing cut-off frequency (VCO), i.e., the frequency that separates the periodic low-frequency part from the aperiodic high-frequency part in voiced segments of natural speech. Starting from the power spectrum of a two pitch period speech frame, we define the VCO to be located at the frequency for which the sum of the periodic and aperiodic energy in the spectral band below and above that frequency respectively, is maximised. By formulating the problem in terms of a score function we are able to apply a dynamic programming based smoothing technique. Remarkably smooth and accurate VCO contours were obtained, despite the simplicity of the proposed algorithm. In a formal evaluation the algorithm compares favourably to two existing VCO estimation techniques.
international conference on acoustics, speech, and signal processing | 2002
Kris Hermus; Werner Verhelst; Patrick Wambacq
While a traditional sinusoidal model is capable of representing audio segments, a sum of exponentially damped sinusoids is more efficient to model the transient segments that are readily found in audio signals. In this paper, Total Least Squares (TLS) algorithms are applied to automatically extract the modeling parameters in the Exponential Sinusoidal Model (ESM). In order to turn the SNR . optimization criterion of these TLS algorithms into a perceptual modeling strategy we incorporate the psycho-acoustic model of MPEG 1 - Layer 1 into a subband TLS-ESM scheme. This allows us to model each subband in accordance with its perceptual relevance. Informal listening tests confirm that perceptual ESM achieves the same perceived quality as plain ESM while using substantially less components.
Abstractbook Third international workshop on "TLS and errors-in-variables modeling" | 2002
Werner Verhelst; Kris Hermus; Philippe Lemmerling; Patrick Wambacq; S. Van Huffel
We demonstrate that damped sinusoidal modeling can be used to improve the modeling accuracy of current perceptual audio coders. We show that the model parameter estimation can be performed with TLS algorithms, and that a subband modeling approach results in TLS problems that are computationally much more tractable than the fullband approach. Experimental results indicate that subband TLS modeling can be effectively controlled using perceptual criteria.
international conference on acoustics, speech, and signal processing | 2007
Kris Hermus; H. Van Hamme; Werner Verhelst; S. Irhimeh; J. De Moortel
We present a harmonic-plus-noise modelling (HNM) strategy in the context of corpus-based text-to-speech (TTS) synthesis, in which whole speech phonemes are modelled in their integrity, contrary to the traditional frame-based approach. The pitch and amplitude trajectories of each phoneme are modelled with a low-order DCT expansion. The parameter analysis algorithm is to a large extent aided and guided by the pitch contours, and by the phonetic annotation and segmentation information that is available in any TTS system. The major advantages of our model are: few parameter interpolation points during synthesis (one per phoneme), flexible time and pitch modifications, and a reduction in the number of model parameters which is favourable for low bit rate coding in TTS for embedded applications. Listening tests on TTS sentences have shown that very natural speech can be obtained, despite the compactness of the signal representation.
conference of the international speech communication association | 1999
Kris Hermus; Ioannis Dologlou; Patrick Wambacq; Dirk Van Compernolle
conference of the international speech communication association | 2000
Kris Hermus; Werner Verhelst; Patrick Wambacq