Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Fabrice Lefèvre is active.

Publication


Featured researches published by Fabrice Lefèvre.


IEEE Transactions on Audio, Speech, and Language Processing | 2011

Comparing Stochastic Approaches to Spoken Language Understanding in Multiple Languages

Stefan Hahn; Marco Dinarelli; Christian Raymond; Fabrice Lefèvre; Patrick Lehnen; R. De Mori; Alessandro Moschitti; Hermann Ney; Giuseppe Riccardi

One of the first steps in building a spoken language understanding (SLU) module for dialogue systems is the extraction of flat concepts out of a given word sequence, usually provided by an automatic speech recognition (ASR) system. In this paper, six different modeling approaches are investigated to tackle the task of concept tagging. These methods include classical, well-known generative and discriminative methods like Finite State Transducers (FSTs), Statistical Machine Translation (SMT), Maximum Entropy Markov Models (MEMMs), or Support Vector Machines (SVMs) as well as techniques recently applied to natural language processing such as Conditional Random Fields (CRFs) or Dynamic Bayesian Networks (DBNs). Following a detailed description of the models, experimental and comparative results are presented on three corpora in different languages and with different complexity. The French MEDIA corpus has already been exploited during an evaluation campaign and so a direct comparison with existing benchmarks is possible. Recently collected Italian and Polish corpora are used to test the robustness and portability of the modeling approaches. For all tasks, manual transcriptions as well as ASR inputs are considered. Additionally to single systems, methods for system combination are investigated. The best performing model on all tasks is based on conditional random fields. On the MEDIA evaluation corpus, a concept error rate of 12.6% could be achieved. Here, additionally to attribute names, attribute values have been extracted using a combination of a rule-based and a statistical approach. Applying system combination using weighted ROVER with all six systems, the concept error rate (CER) drops to 12.0%.


international conference on acoustics, speech, and signal processing | 2003

Conversational telephone speech recognition

Jean-Luc Gauvain; Lori Lamel; Holger Schwenk; Gilles Adda; Langzhou Chen; Fabrice Lefèvre

This paper describes the development of a speech recognition system for the processing of telephone conversations, starting with a state-of-the-art broadcast news transcription system. We identify major changes and improvements in acoustic and language modeling, as well as decoding, which are required to achieve state-of-the-art performance on conversational speech. Some major changes on the acoustic side include the use of speaker normalization (VTLN), the need to cope with channel variability, and the need for efficient speaker adaptation and better pronunciation modeling. On the linguistic side the primary challenge is to cope with the limited amount of language model training data. To address this issue we make use of a data selection technique, and a smoothing technique based on a neural network language model. At the decoding level lattice rescoring and minimum word error decoding are applied. On the development data, the improvements yield an overall word error rate of 24.9% whereas the original BN transcription system had a word error rate of about 50% on the same data.


IEEE Transactions on Audio, Speech, and Language Processing | 2006

Advances in transcription of broadcast news and conversational telephone speech within the combined EARS BBN/LIMSI system

Spyridon Matsoukas; Jean-Luc Gauvain; Gilles Adda; Thomas Colthurst; Chia-Lin Kao; Owen Kimball; Lori Lamel; Fabrice Lefèvre; Jeff Z. Ma; John Makhoul; Long Nguyen; Rohit Prasad; Richard M. Schwartz; Holger Schwenk; Bing Xiang

This paper describes the progress made in the transcription of broadcast news (BN) and conversational telephone speech (CTS) within the combined BBN/LIMSI system from May 2002 to September 2004. During that period, BBN and LIMSI collaborated in an effort to produce significant reductions in the word error rate (WER), as directed by the aggressive goals of the Effective, Affordable, Reusable, Speech-to-text [Defense Advanced Research Projects Agency (DARPA) EARS] program. The paper focuses on general modeling techniques that led to recognition accuracy improvements, as well as engineering approaches that enabled efficient use of large amounts of training data and fast decoding architectures. Special attention is given on efforts to integrate components of the BBN and LIMSI systems, discussing the tradeoff between speed and accuracy for various system combination strategies. Results on the EARS progress test sets show that the combined BBN/LIMSI system achieved relative reductions of 47% and 51% on the BN and CTS domains, respectively


international conference on acoustics, speech, and signal processing | 2011

Combination of stochastic understanding and machine translation systems for language portability of dialogue systems

Bassam Jabaian; Laurent Besacier; Fabrice Lefèvre

In this paper, several approaches for language portability of dialogue systems are investigated with a focus on the spoken language understanding (SLU) component. We show that the use of statistical machine translation (SMT) can greatly reduce the time and cost of porting an existing system from a source to a target language. Using automatically translated training data we study phrase-based machine translation as an alternative to conditional random fields for conceptual decoding to compensate for the loss of a precise concept-word alignment. Also two ways to increase SLU robustness to translation errors (smeared training data and translation post-editing) are shown to improve performance when test data are translated then decoded in the source language. Overall the combination of all these approaches allows to reduce even further the concept error rate. Experiments were carried out on the French MEDIA dialogue corpus with a subset manually translated into Italian.


Computer Speech & Language | 2003

Non-parametric probability estimation for HMM-based automatic speech recognition

Fabrice Lefèvre

Abstract During the last decade, the most significant advances in the field of continuous speech recognition (CSR) have arisen from the use of hidden Markov models (HMM) for acoustic modeling. These models address one of the major issues for CSR: simultaneous modeling of temporal and frequency distortions in the speech signal. In the HMM, the temporal dimension is managed through an oriented states graph, each state accounting for the local frequency distortions through a probability density function. In this study, improvement of the HMM performance is expected from the introduction of a very effective non-parametric probability density function estimate: the k -nearest neighbors ( k -nn) estimate. First, experiments on a short-term speech spectrum identification task are performed to compare the k -nn estimate and the widespread estimate based on mixtures of Gaussian functions. Then adaptations implied by the integration of the k -nn estimate in an HMM-based recognition system are developed. An optimal training protocol is obtained based on the introduction of the membership coefficients in the HMM parameters. The membership coefficients measure the degree of association between a reference acoustic vector and a HMM state. The training procedure uses the expectation-maximization (EM) algorithm applied to the membership coefficient estimation. Its convergence is shown according to the maximum likelihood criterion. This study leads to the development of a baseline k -nn/HMM recognition system which is evaluated on the TIMIT speech database. Further improvements of the k -nn/HMM system are finally sought through the introduction of a temporal information into the representation space ( delta coefficients ) and the adaptation of the references (mainly, gender modeling and contextual modeling ).


Computer Speech & Language | 2015

Reinforcement-learning based dialogue system for human-robot interactions with socially-inspired rewards

Emmanuel Ferreira; Fabrice Lefèvre

HighlightsWe integrate user appraisals in a POMDP-based dialogue manager procedure.We employ additional socially-inspired rewards in a RL setup to guide the learning.A unified framework for speeding up the policy optimisation and user adaptation.We consider a potential-based reward shaping with a sample efficient RL algorithm.Evaluated using both user simulator (information retrieval) and user trials (HRI). This paper investigates some conditions under which polarized user appraisals gathered throughout the course of a vocal interaction between a machine and a human can be integrated in a reinforcement learning-based dialogue manager. More specifically, we discuss how this information can be cast into socially-inspired rewards for speeding up the policy optimisation for both efficient task completion and user adaptation in an online learning setting. For this purpose a potential-based reward shaping method is combined with a sample efficient reinforcement learning algorithm to offer a principled framework to cope with these potentially noisy interim rewards. The proposed scheme will greatly facilitate the systems development by allowing the designer to teach his system through explicit positive/negative feedbacks given as hints about task progress, in the early stage of training. At a later stage, the approach will be used as a way to ease the adaptation of the dialogue policy to specific user profiles. Experiments carried out using a state-of-the-art goal-oriented dialogue management framework, the Hidden Information State (HIS), support our claims in two configurations: firstly, with a user simulator in the tourist information domain (and thus simulated appraisals), and secondly, in the context of man-robot dialogue with real user trials.


ieee automatic speech recognition and understanding workshop | 2001

Investigating stochastic speech understanding

Hélène Bonneau-Maynard; Fabrice Lefèvre

The need for human expertise in the development of a speech understanding system can be greatly reduced by the use of stochastic techniques. However corpus-based techniques require the annotation of large amounts of training data. Manual semantic annotation of such corpora is tedious, expensive, and subject to inconsistencies. This work investigates the influence of the training corpus size on the performance of the understanding module. The use of automatically annotated data is also investigated as a means to increase the corpus size at a very low cost. First, a stochastic speech understanding model developed using data collected with the LIMSI ARISE dialog system is presented. Its performance is shown to be comparable to that of the rule-based caseframe grammar currently used in the system. In a second step, two ways of reducing the development cost are pursued: (1) reducing of the amount of manually annotated data used to train the stochastic models and (2) using automatically annotated data in the training process.


IEEE Transactions on Audio, Speech, and Language Processing | 2013

Comparison and Combination of Lightly Supervised Approaches for Language Portability of a Spoken Language Understanding System

Bassam Jabaian; Laurent Besacier; Fabrice Lefèvre

Portability of a spoken dialogue system (SDS) to a new domain or a new language is a hot topic as it may imply gains in time and cost for building new SDSs. In particular in this paper we investigate several fast and efficient approaches for language portability of the spoken language understanding (SLU) module of a dialogue system. We show that the use of statistical machine translation (SMT) can reduce the time and the cost of porting a system from a source to a target language. For conceptual decoding, a state-of-the-art module based on conditional random fields (CRF) is used and a new approach based on phrase-based statistical machine translation (PB-SMT) is also evaluated. The experimental results show the efficiency of the proposed methods for a fast and low cost SLU language portability. In addition, we propose two methods to increase SLU robustness to translation errors. Overall, it is shown that the combination of all these approaches can further reduce the concept error rate. While most of the experiments in this paper deal with portability from French to Italian (given the availability of the Media French corpus and its subset manually translated into Italian), a validation of our methodology is eventually proposed in Arabic.


spoken language technology workshop | 2006

A DBN-BASED MULTI-LEVEL STOCHASTIC SPOKEN LANGUAGE UNDERSTANDING SYSTEM

Fabrice Lefèvre

In recent years, efforts have been made for automatically identifying opinions, emotions and sentiments in text. The problem considered in this paper is the analysis of messages uttered by the users of a telephone service in response to a recorded message that asks if a problem they had was satisfactorily solved. Very often in these cases, subjective information is combined with factual information. The purpose of this type of opinion analysis is the detection of time variations of user satisfaction indices. Even if precision or recall is not very high because messages are ambiguous or ASR systems have made many word recognition errors, system strategies are acceptable if they detect the same trend in user satisfaction as it is indicated by human interpreters of the messages. In this paper a system for this type of opinion analysis is proposed for a telephone service survey task.


ieee automatic speech recognition and understanding workshop | 2005

A 2+1-level stochastic understanding model

Hélène Bonneau-Maynard; Fabrice Lefèvre

In this paper, an extension of the 2-level stochastic understanding system is presented. An additional stochastic level is introduced in the system as the attribute value normalization module. In order to improve the model trainability, the conceptual decoding and value normalization steps are decoupled, leading to a 2+1-level system. The proposed approach is evaluated on the French MEDIA task (tourist information and hotel booking). This new 10k-utterance corpus is segmentally annotated allowing for a direct training of the 2-level conceptual models. Further developments of the system (modality propagation and hierarchical recomposition) are also investigated. On the whole, the proposed improvements achieve a 24% relative reduction of the understanding error rate from 37.6% to 28.8%

Collaboration


Dive into the Fabrice Lefèvre's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jean-Luc Gauvain

Centre national de la recherche scientifique

View shared research outputs
Top Co-Authors

Avatar

Stéphane Huet

Université de Montréal

View shared research outputs
Top Co-Authors

Avatar

Lori Lamel

Centre national de la recherche scientifique

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Laurent Besacier

Centre national de la recherche scientifique

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge