Werner Verhelst | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Werner Verhelst is active.

Explore More

Publication

Featured researches published by Werner Verhelst.

international conference on acoustics, speech, and signal processing | 1993

An overlap-add technique based on waveform similarity (WSOLA) for high quality time-scale modification of speech

Werner Verhelst; Marc Roelands

A concept of waveform similarity for tackling the problem of time-scale modification of speech is proposed. It is worked out in the context of short-time Fourier transform representations. The resulting WSOLA (waveform-similarity-based synchronized overlap-add) algorithm produces high-quality speech output, is algorithmically and computationally efficient and robust, and allows for online processing with arbitrary time-scaling factors that may be specified in a time-varying fashion and can be chosen over a wide continuous range of values.<<ETX>>

international conference on acoustics, speech, and signal processing | 2003

On psychoacoustic noise shaping for audio requantization

Dreten De Koning; Werner Verhelst

Signal requantization to reduce the word-length of an audio stream introduces distortions. Noise shaping can be applied in combination with a psychoacoustic model in order to make requantization distortions minimally audible. The psychoacoustically optimal noise shaping curve depends on the time-varying characteristics of the input signal. Therefore, the noise shaping filter coefficients are to be computed and updated on a regular basis. In this paper, we present a least squares theory for optimal noise shaping of audio signals. It provides shorter and more straightforward proof of known properties, and in contrast with the standard theory, it does show how noise shaping filters that attain the theoretical optimum can be designed in practice.

international conference on acoustics speech and signal processing | 1996

Voice conversion using partitions of spectral feature space

Werner Verhelst; Johan Mertens

Given a set of utterances that have been spoken by two speakers, the problem of voice conversion consists in constructing some sort of transformation that can be applied on any utterance of the first speaker and makes the result appear to have been spoken by the second speaker. Conversion rules for the vocal tract spectral characteristics, which are usually trained on individual subsets of a partition of the spectral feature space, still do not reach the desired level of accuracy. The paper presents an analysis of the problem.

database and expert systems applications | 2008

A Speech/Music/Silence/Garbage/ Classifier for Searching and Indexing Broadcast News Material

Yorgos Patsis; Werner Verhelst

An audio classifier that can distinguish between speech, music, silence and garbage has been developed. The classifier was trained and tested on broadcast news material provided by VRT (Flemish Radio and Television Network). Several feature sets and machine learning algorithms have been tested, providing choices of speed and performance for a target system. The audio classifier is part of a greater system that together with visual data can retrieve information from news broadcasts: speech can be converted to text and the speaker can be recognized. Music can be further used for genre classification, jingle recognition or copyright infringement detection. Silence is recognized and used to provide cues on topic changes or speaker turns. At this point everything that is not classified as speech, music or silence is labeled garbage. Garbage classes can be further used for background categorization giving information on the environment where someone speaks (an anchor in the studio or a reporter in the street).

Proceedings of the 7th International Conference on Methods and Techniques in Behavioral Research | 2010

Automatic recognition of lower facial action units

Isabel Gonzalez; Hichem Sahli; Werner Verhelst

The face is an important source of information in multimodal communication. Facial expressions are generated by contractions of facial muscles, which lead to subtle changes in the area of the eyelids, eye brows, nose, lips and skin texture, often revealed by wrinkles and bulges. To measure these subtle changes, Ekman et al.[5] developed the Facial Action Coding System (FACS). FACS is a human-observer-based system designed to detect subtle changes in facial features, and describes facial expressions by action units (AUs). We present a technique to automatically recognize lower facial Action Units, independently from one another. Even though we do not explicitly take into account AU combinations, thereby making the classification process harder, an average F1 score of 94.83% is achieved.

international conference on acoustics speech and signal processing | 1988

On short-time cepstra of voiced speech

Werner Verhelst; O. Steenhaut

A brief review of the basics of cepstral deconvolution is followed by a refined model for short-time cepstra of voiced speech. The model provides a better understanding of the nature of short-time cepstra and of the heuristics of deconvolution. It brings to light a number of properties which are important for the successful application of short-time cepstral deconvolution to voiced speech. It allows a set of rules to be derived which can be used for the optimization of cepstral deconvolution systems.<<ETX>>

international conference on image and graphics | 2009

A Visual Silence Detector Constraining Speech Source Separation

Isabel Gonzalez; Ilse Ravyse; Henk Brouckxon; Werner Verhelst; Dongmei Jiang; Hichem Sahli

We propose an audiovisual source separation algorithm for speech signals. In our proposed algorithm we first extract the time segments with low activity of the mouth region from synchronous video recordings. An automatically selected optimal classifier is used to detect silent intervals in these instants of low visual mouth activity. Then, the source separation problem is formulated and solved for the entire signal duration. Our approach was tested on two challenging speech corpora with two speakers and two microphones, namely in the first corpus separate source signals were mixed in a simulated room, and the second corpus contains recorded conversations. The results are promising on both corpora: with the visual silence detector the performance of the source separation algorithm, measured by the signal to noise inference ratio increases.

international conference on image and graphics | 2009

Video Realistic Mouth Animation Based on an Audio Visual DBN Model with Articulatory Features and Constrained Asynchrony

Dongmei Jiang; Peizhen Liu; Ilse Ravyse; Hichem Sahli; Werner Verhelst

This paper presents a mouth animation construction method based on the DBN models with articulatory features (AF_AVDBN), in which the articulatory features of lips, tongue, glottis/velum can be asynchronous within a maximum asynchrony constraint to describe the speech production process more reasonably. Given an audio input and the trained AF_AVDBN models, the optimal visual feature learning algorithm is deduced based on the Maximum Likelihood Estimation criterion. The learned visual features are then used to construct the mouth images for the input speech. Objective and subjective evaluations on the mouth animations of 110 speech sentences show that the learned visual features from the AF_AVDBN models track the real visual features very closely, and the constructed mouth images from the AF_AVDBN models are very much like the real ones.

international conference on machine learning and cybernetics | 2003

A multi-stream bimodal continuous speech recognition system using datasieve based features

Lei Xie; I. Ravyse; Dong-Mei Jiang; Rongchun Zhao; Hichem Sahli; Werner Verhelst; J. Cornelis

This paper presents an audio visual bimodal continuous speech recognition system. The visual feature extraction of the mouth movements uses the number of granules obtained by applying a datasieve. Multi-stream HMMs are introduced for combining audio and visual modalities using time synchronous audio visual features. Experimental results show that the recognition system provided by this paper is suitable for continuous speech recognition tasks in noisy environments, and the datasieve based visual features outperform the conventional DCT and DWT features.

international conference on digital signal processing | 2013

Acoustic localization enhanced with phase information from modified STFT magnitude

Georgios Athanasopoulos; Tomas Dekens; Werner Verhelst

In this paper, we study the effect of noise and reverberation on the phase spectrum and its impact on acoustic localization. In contrast with the existing Generalized Cross Correlation weighting functions that take only the signals magnitude into account, we introduce a novel fused approach which also makes use of the phase information. In this approach, the phase spectra of the noise-corrupted microphone array signals are restored by an estimate of the sound sources phase information obtained from the noise-suppressed STFT magnitude. The proposed approach can be readily combined with any existing weighting function or Time Delay Estimation algorithm. Experimental results demonstrate the potential of this approach and show that it can improve the acoustic localization in joint noisy and reverberant conditions.

Explore More