Jean-Luc Gauvain
Université Paris-Saclay
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jean-Luc Gauvain.
Procedia Computer Science | 2016
Rasa Lileikytė; Arseniy Gorin; Lori Lamel; Jean-Luc Gauvain; Thiago Fraga-Silva
Abstract This paper reports on an experimental work to build a speech transcription system for Lithuanian broadcast data, relying on unsupervised and semi-supervised training methods as well as on other low-knowledge methods to compensate for missing resources. Unsupervised acoustic model training is investigated using 360xa0hours of untranscribed speech data. A graphemic pronunciation approach is used to simplify the pronunciation model generation and there-fore ease the language model adaptation for the system users. Discriminative training on top of semi-supervised training is also investigated, as well as various types of acoustic features and their combinations. Experimental results are provided for each of our development steps as well as contrastive results comparing various options. Using the best system configuration a word error rate of 18.3% is obtained on a set of development data from the Quaero program.
Computer Speech & Language | 2018
Rasa Lileikytė; Lori Lamel; Jean-Luc Gauvain; Arseniy Gorin
Abstract The research presented in the paper addresses conversational telephone speech recognition and keyword spotting for the Lithuanian language. Lithuanian can be considered a low e-resourced language as little transcribed audio data, and more generally, only limited linguistic resources are available electronically. Part of this research explores the impact of reducing the amount of linguistic knowledge and manual supervision when developing the transcription system. Since designing a pronunciation dictionary requires language-specific expertise, the need for manual supervision was assessed by comparing phonemic and graphemic units for acoustic modeling. Although the Lithuanian language is generally described in the linguistic literature with 56 phonemes, under low-resourced conditions some phonemes may not be sufficiently observed to be modeled. Therefore different phoneme inventories were explored to assess the effects of explicitly modeling diphthongs, affricates and soft consonants. The impact of using Web data for language modeling and additional untranscribed audio data for semi-supervised training was also measured. Out-of-vocabulary (OOV) keywords are a well-known challenge for keyword search. While word-based keyword search is quite effective for in-vocabulary words, OOV keywords are largely undetected. Morpheme-based subword units are compared with character n-gram-based units for their capacity to detect OOV keywords. Experimental results are reported for two training conditions defined in the IARPA Babel program: the full language pack and the very limited language pack, for which, respectively, 40u2009h and 3u2009h of transcribed training data are available. For both conditions, grapheme-based and phoneme-based models are shown to obtain comparable transcription and keyword spotting results. The use of Web texts for language modeling is shown to significantly improve both speech recognition and keyword spotting performance. Combining full-word and subword units leads to the best keyword spotting results.
international conference on acoustics, speech, and signal processing | 2017
Guangpu Huang; Thiago Fraga da Silva; Lori Lamel; Jean-Luc Gauvain; Arseniy Gorin; Antoine Laurent; Rasa Lileikyte; Abdel Messouadi
This paper reports on investigations using two techniques for language model text data augmentation for low-resourced automatic speech recognition and keyword search. Lowresourced languages are characterized by limited training materials, which typically results in high out-of-vocabulary (OOV) rates and poor language model estimates. One technique makes use of recurrent neural networks (RNNs) using word or subword units. Word-based RNNs keep the same system vocabulary, so they cannot reduce the OOV, whereas subword units can reduce the OOV but generate many false combinations. A complementary technique is based on automatic machine translation, which requires parallel texts and is able to add words to the vocabulary. These methods were assessed on 10 languages in the context of the Babel program and NIST OpenKWS evaluation. Although improvements vary across languages with both methods, small gains were generally observed in terms of word error rate reduction and improved keyword search performance.
international conference on acoustics, speech, and signal processing | 2017
Rasa Lileikyte; Thiago Fraga-Silva; Lori Lamel; Jean-Luc Gauvain; Antoine Laurent; Guangpu Huang
In this paper we aim to enhance keyword search for conversational telephone speech under low-resourced conditions. Two techniques to improve the detection of out-of-vocabulary keywords are assessed in this study: using extra text resources to augment the lexicon and language model, and via subword units for keyword search. Two approaches for data augmentation are explored to extend the limited amount of transcribed conversational speech: using conversational-like Web data and texts generated by recurrent neural networks. Contrastive comparisons of subword-based systems are performed to evaluate the benefits of multiple subword decodings and single decoding. Keyword search results are reported for all the techniques, but only some improve performance. Results are reported for the Mongolian and Igbo languages using data from the 2016 Babel program.
Archive | 1993
Lori Faith Lamel; Jean-Luc Gauvain; B. Prouts; C. Bouhier
Archive | 2000
Lori Lamel; Jean-Luc Gauvain; Gilles Adda
Archive | 1997
Jean-Luc Gauvain; Gilles Adda; Lori Lamel; Martine Adda-Decker
Archive | 1997
Jean-Luc Gauvain; Lori Lamel; Gilles Adda; Michele Jardin
Archive | 2009
Jun Luo; Lori Lamel; Jean-Luc Gauvain
Archive | 2001
Langzhou Chen; Jean-Luc Gauvain; Lori Lamel; Gilles Adda; Martine Adda