Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Christian Wellekens is active.

Publication


Featured researches published by Christian Wellekens.


Speech Communication | 2000

DISTBIC: a speaker-based segmentation for audio data indexing

Perrine Delacourt; Christian Wellekens

Abstract In this paper, we address the problem of speaker-based segmentation, which is the first necessary step for several indexing tasks. It aims to extract homogeneous segments containing the longest possible utterances produced by a single speaker. In our context, no assumption is made about prior knowledge of the speaker or speech signal characteristics (neither speaker model, nor speech model). However, we assume that people do not speak simultaneously and that we have no real-time constraints. We review existing techniques and propose a new segmentation method, which combines two different segmentation techniques. This method, called DISTBIC, is organized into two passes: first the most likely speaker turns are detected, and then they are validated or discarded. The advantage of our algorithm is its efficiency in detecting speaker turns even close to one another (i.e., separated by a few seconds).


international conference on acoustics, speech, and signal processing | 2000

A speaker tracking system based on speaker turn detection for NIST evaluation

Jean-François Bonastre; Perrine Delacourt; Corinne Fredouille; Teva Merlin; Christian Wellekens

A speaker tracking system (STS) is built by using successively a speaker change detector and a speaker verification system. The aim of the STS is to find in a conversation between several persons (some of them having already enrolled and other being totally unknown) target speakers chosen in a set of enrolled users. In a first step, speech is segmented into homogeneous segments containing only one speaker, without any use of a priori knowledge about speakers. Then, the resulting segments are checked to belong to one of the target speakers. The system has been used in a NIST evaluation test with satisfactory results.


international conference on acoustics, speech, and signal processing | 2005

On desensitizing the Mel-cepstrum to spurious spectral components for robust speech recognition

Vivek Tyagi; Christian Wellekens

It is well known that the peaks in the spectrum of a log Mel-filter bank are important cues in characterizing speech sounds. However, low energy perturbations in the power spectrum may become numerically significant after the log compression. We show that even if the spectral peaks are kept constant, the low energy perturbations in the power spectrum can create huge variations in the cepstral coefficients. We show, both analytically and experimentally, that exponentiating the log Mel-filter bank spectrum before the cepstrum computation can significantly reduce the sensitivity of the cepstra to spurious low energy perturbations. The Mel-cepstrum modulation spectrum (Tyagi, V. et al., Proc. IEEE ASRU, 2003) is computed from the processed cepstra which results in further noise robustness of the composite feature vector. In experiments with speech signals, it is shown that the proposed technique based features yield a significant increase in speech recognition performance in non-stationary noise conditions when compared directly to the MFCC and RASTA-PLP features.


international conference on acoustics, speech, and signal processing | 1985

Speaker dependent connected speech recognition via phonetic Markov models

Yves G. Kamp; Christian Wellekens

In this paper, a method for speaker dependent connected speech recognition based on phonemic units is described. In this recognition system, each phoneme is characterized by a very simple 3-state Hidden Markov Model (HMM) which is trained on connected speech by a Viterbi algorithm. Each state has associated with it a continuous (Gaussian) or discrete probability density function (pdf). With the phonemic models so obtained, the recognition is then performed either directly at word level (by the reconstruction of reference words from the models of the constituting phonemes) or via a phonemic labelling. Good results are obtained as well with a German ten digit vocabulary (20 phonemes) as with a French 80 word vocabulary (36 phonemes).


international conference on acoustics speech and signal processing | 1996

Keyword spotting for video soundtrack indexing

Philippe Gelin; Christian Wellekens

The amount of available video information is dramatically increasing due to the development of multimedia applications. As a consequence, content based retrieval tools are urgently needed for fast and easy access to multimedia database but also to movies and recorded video news. In particular, queries may rely on off-line indexing. Keyword spotting on video soundtracks could be of great help in this indexation process and in the future associated with pattern or event recognition out of the strictly visual information. Specific constraints for this application are identified and a solution based on phonemic lattices is proposed. The word spotter achieves indexing on open vocabularies uttered by any speaker. It is fast enough for practical applications and does not require much additional stored information.


international conference on acoustics, speech, and signal processing | 1984

Connected digit recognition using vector quantization

Christian Wellekens; Hermann Ney

The principles of classification applied to the representation of the words in a vocabulary lead to the clustering of the acoustic vectors into prototype vectors. For a small number of prototypes, recognition scores comparable to those observed with unclustered vocabularies are obtained with a highly reduced computation time. Two different forms (deterministic and stochastic) of the single-level recognition method for concatenated words are described and the improvements obtained by vector quantization are put into evidence. The use of prototypes in the training phase of the finite stochastic automata representing a vocabulary word is also described.


international conference on multimedia computing and systems | 1999

Audio data indexing: Use of second-order statistics for speaker-based segmentation

Perrine Delacourt; Christian Wellekens

The content-based indexing task considered in this paper consists in recognizing from their voice, speakers involved in a conversation. A new approach for speaker-based segmentation, which is the first necessary step for this indexing task, is described. Our study is done under the assumptions that no prior information on speakers is available, that the number of speakers is unknown and that people do not speak simultaneously. Audio data indexing is commonly divided in two parts : audio data is first segmented with respect to speakers utterances and then resulting segments associated with a given speaker are merged together. In this work, we focus on the first part and we propose a new segmentation method based on second order statistics. The practical significance of this study is illustrated by applying our new technique to real data to show its efficiency.


international conference on acoustics, speech, and signal processing | 2005

Variational Bayesian adaptation for speaker clustering

Fabio Valente; Christian Wellekens

In this paper we explore the use of variational Bayesian (VB) learning for adaptation in a speaker clustering framework. Variational learning offers the interesting property of making model learning and model selection at the same time. We compare VB learning with a classical MAP/BIC (MAP for training, BIC for model selection) approach. Results on the NIST BN-96 HUB4 database show that VB learning can outperform the classical MAP-BIC method.


international conference on acoustics, speech, and signal processing | 2004

Variational Bayesian feature selection for Gaussian mixture models

Fabio Valente; Christian Wellekens

In this paper we show that feature selection problem can be formulated as a model selection problem. A Bayesian framework for feature selection in unsupervised learning based on Gaussian mixture models is applied to speech recognition. In the original formulation (Figueiredo (2002)) a minimum message length criterion is used for model selection; we propose a new model selection technique based on variational Bayesian learning that shows a higher robustness to the amount of training data. Results on speech data from the TIMIT database show a high efficiency in determining feature saliency.


international conference on spoken language processing | 1996

Keyword spotting enhancement for video soundtrack indexing

Philippe Gelin; Christian Wellekens

Multimedia databases contain an increasing number of videos that are not easily semantically accessed. Among the useful indices that can be extracted from the soundtrack, the presence of a keyword at some place plays a prominent role. This paper deals with the specificities of such a keyword spotter and the enhancements brought to our previous technique (1996) based on frame labeling. To be useful, such a keyword spotter has to be speaker-independent. Moreover, it has to be able to detect any word from an open vocabulary. This directly implies the use of a phonemic representation of the word. These constraints usually lead to an excessively time-consuming tool. The division of the indexing process into two parts-the first one off-line, the second one at query time-allows a faster response.

Collaboration


Dive into the Christian Wellekens's collaboration.

Top Co-Authors

Avatar

Fabio Valente

Idiap Research Institute

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Fabio Valente

Idiap Research Institute

View shared research outputs
Top Co-Authors

Avatar

Vivek Tyagi

Idiap Research Institute

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge