Christophe Ris
Faculté polytechnique de Mons
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Christophe Ris.
Speech Communication | 2007
M. Benzeghiba; R. De Mori; Olivier Deroo; Stéphane Dupont; T. Erbes; D. Jouvet; L. Fissore; Pietro Laface; Alfred Mertins; Christophe Ris; R. Rose; V. Tyagi; C. Wellekens
Major progress is being recorded regularly on both the technology and exploitation of automatic speech recognition (ASR) and spoken language systems. However, there are still technological barriers to flexible solutions and user satisfaction under some circumstances. This is related to several factors, such as the sensitivity to the environment (background noise), or the weak representation of grammatical and semantic knowledge. Current research is also emphasizing deficiencies in dealing with variation naturally present in speech. For instance, the lack of robustness to foreign accents precludes the use by specific populations. Also, some applications, like directory assistance, particularly stress the core recognition technology due to the very high active vocabulary (application perplexity). There are actually many factors affecting the speech realization: regional, sociolinguistic, or related to the environment or the speaker herself. These create a wide range of variations that may not be modeled correctly (speaker, gender, speaking rate, vocal effort, regional accent, speaking style, non-stationarity, etc.), especially when resources for system training are scarce. This paper outlines current advances related to these topics.
Speech Communication | 2000
Christophe Ris; Stéphane Dupont
Abstract In this paper, we assess and compare four methods for the local estimation of noise spectra, namely the energy clustering, the Hirsch histograms, the weighted average method and the low-energy envelope tracking. Moreover we introduce, for these four approaches, the harmonic filtering strategy, a new pre-processing technique, expected to better track fast modulations of the noise energy. The speech periodicity property is used to update the noise level estimate during voiced parts of speech, without explicit detection of voiced portions. Our evaluation is performed with six different kinds of noises (both artificial and real noises) added to clean speech. The best noise level estimation method is then applied to noise robust speech recognition based on techniques requiring a dynamic estimation of the noise spectra, namely spectral subtraction and missing data compensation.
Speech Communication | 2003
Fabrice Malfrère; Olivier Deroo; Thierry Dutoit; Christophe Ris
In this paper we compare two different methods for automatically phonetically labeling a continuous speech data-base, as usually required for designing a speech recognition or speech synthesis system. The first method is based on temporal alignment of speech on a synthetic speech pattern; the second method uses either a continuous density hidden Markov models (HMM) or a hybrid HMM/ANN (artificial neural network) system in forced alignment mode. Both systems have been evaluated on read utterances not part of the training set of the HMM systems, and compared to manual segmentation. This study outlines the advantages and drawbacks of both methods. The speech synthetic system has the great advantage that no training stage (hence no large labeled database) is needed, while HMM Systems easily handle multiple phonetic transcriptions (phonetic lattice). We deduce a method for the automatic creation of large phonetically labeled speech databases, based on using the synthetic speech segmentation tool to bootstrap the training process of either a HMM or a hybrid HMM/ANN system. The importance of such segmentation tools is a key point for the development of improved multilingual speech synthesis and recognition systems.
ieee automatic speech recognition and understanding workshop | 2005
Stéphane Dupont; Christophe Ris; Olivier Deroo; Sébastien Poitoux
The paper proposes a solution that brings some advances to the genericity of the ASR technology towards tasks and languages. A non-linear discriminant model is built from multi-lingual, multi-task speech material in order to classify the acoustic signal into language independent phonetic units. Instead of considering this model for direct HMM state likelihood estimation, it rather operates as a first stage to produce discriminant features that can be further used in cascade with a traditional task/language specific ASR system. This first stage structure is expected to achieve a strong modeling of the cross-language variability of speech that can better handle pronunciation variations due for instance to regional and non-native accents. Moreover, the flexibility of this architecture still allow the development of small task/language dedicated ASR systems as a second stage structure, possibly with small amount of data. The benefit of this architecture is demonstrated through a fine analysis of modeling performance at the phoneme level and on two different isolated word recognition tasks featuring accent variabilities
international conference on acoustics, speech, and signal processing | 2006
M. Benzeguiba; R. De Mori; Olivier Deroo; Stéphane Dupont; T. Erbes; D. Jouvet; L. Fissore; Pietro Laface; Alfred Mertins; Christophe Ris; R. Rose; V. Tyagi; C. Wellekens
This paper briefly reviews state of the art related to the topic of speech variability sources in automatic speech recognition systems. It focuses on some variations within the speech signal that make the ASR task difficult. The variations detailed in the paper are intrinsic to the speech and affect the different levels of the ASR processing chain. For different sources of speech variation, the paper summarizes the current knowledge and highlights specific feature extraction or modeling weaknesses and current trends
international conference on acoustics speech and signal processing | 1996
Vincent Fontaine; Christophe Ris; Henri Leich
We present and compare two different hybrid HMM/MLP approaches. The first one uses MLPs as labelers coupled with a discrete HMM while the second one takes advantage of the ability of MLPs trained as classifiers to estimate a posteriori probabilities. Both approaches bring sensible improvement compared with classical methods since they rid the system of some restrictive hypotheses inherent of pure HMM design (no time correlation between successive acoustic vectors, hypothesis on the probability distributions...). Our experiments have been achieved in order to provide quite fair comparisons. This implied that we used standard environment namely, standard software, standard databases including common training and test sets.
international conference on acoustics, speech, and signal processing | 1995
Christophe Ris; Vincent Fontaine; Henri Leich
This paper presents a new pre-processing method developed with the objective to represent relevant information of a signal with a minimum number of parameters. The originality of this work is to propose a new efficient pre-processing algorithm producing acoustical vectors at a variable frame rate. The length of the speech frames is no longer fixed a priori to a constant value but results from a study of the signal stationarity. Both segmentation and signal analysis are based on Malvar wavelets since the orthogonal properties of this transform are the key to the problem of comparing measures done on frames of different lengths. Some results of speech recognition based on this preprocessing are presented.
Acoustics Research Letters Online-arlo | 2005
Erhan Mengusoglu; Christophe Ris
In this paper, a new acoustic confidence measure of automatic speech recognition hypothesis is proposed and it is compared to approaches proposed in the literature. This approach takes into account prior information on the acoustic model performance specific to each phoneme. The new method is tested on two types of recognition errors: the out-of-vocabulary words and the errors due to additive noise. An efficient way to interpret the raw confidence measure as a correctness prior probability is also proposed in the paper.
international conference on acoustics speech and signal processing | 1996
Vincent Fontaine; Christophe Ris; Henri Leich
This paper presents a new algorithm of supervised vector quantization based on codebook mapping and maximization of mutual information. Classical VQ algorithms are usually based on minimization of a distortion criterion and hence, the phonetic classification of the acoustic vectors is not taken into account during the design of the codebooks. Moreover, the regions defined by classical VQ algorithms are generally limited to Voronoi regions. We show how the MMI mapping can design more complex class regions while taking the phonetic information associated to the input vectors into account. Recognition experiments have been conducted on an isolated word recognition task. These experiments show that the MMI mapping outperforms and is more robust to test conditions than classical VQ algorithms.
Archive | 1996
Stéphane Dupont; Christophe Ris