Viet Bac Le
Centre national de la recherche scientifique
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Viet Bac Le.
international conference on acoustics, speech, and signal processing | 2005
Viet Bac Le; Laurent Besacier
This paper presents our first steps in fast acoustic modeling for a new target language. Both knowledge-based and data-driven methods were used to obtain phone mapping tables between a source language (French) and a target language (Vietnamese). While acoustic models borrowed directly from the source language did not perform very well, we have shown that using a small amount of adaptation data in the target language (one or two hours) lead to very acceptable automatic speech recognition (ASR) performance. Our best continuous Vietnamese recognition system, adapted with only two hours of Vietnamese data, obtains a word accuracy of 63.9% on one hour of Vietnamese speech dialog for instance.
international conference on acoustics, speech, and signal processing | 2011
Lori Lamel; Jean-Luc Gauvain; Viet Bac Le; Ilya Oparin; Sha Meng
This paper describes recent advances at LIMSI in Mandarin Chinese speech-to-text transcription. A number of novel approaches were introduced in the different system components. The acoustic models are trained on over 1600 hours of audio data from a range of sources, and include pitch and MLP features. N-gram and neural network language models are trained on very large corpora, over 3 billion words of texts; and LM adaptation was explored at different adaptation levels: per show, per snippet, or per speaker cluster. Character-based consensus decoding was found to outperform word-based consensus decoding for Mandarin. The improved system reduces the relative character error rate (CER) by about 10% on previous GALE development and evaluation data sets, obtaining a CER of 9.2% on the P4 broadcast news and broadcast conversation evaluation data.
international conference on acoustics, speech, and signal processing | 2006
Viet Bac Le; Laurent Besacier; Tanja Schultz
This paper addresses particularly the use of acoustic-phonetic unit similarities for portability of context dependent acoustic models to new languages. Since the IPA-based method is limited to a source/target phoneme mapping table construction, an estimation method of the similarity between two phonemes is proposed in this paper. Based on these phoneme similarities, some estimation methods for polyphone similarity and clustered polyphonic model similarity are investigated. For a new language, first a polyphonic decision tree is built with a small amount of speech data. Then, clustered models in the target language are duplicated from the nearest clustered models in the source language and adapted with limited data to the target language. Results obtained from the experiments demonstrate the feasibility of these methods
workshop on statistical machine translation | 2009
Thi Ngoc Diep Do; Viet Bac Le; Brigitte Bigi; Laurent Besacier; Eric Castelli
This paper presents our first attempt at constructing a Vietnamese-French statistical machine translation system. Since Vietnamese is an under-resourced language, we concentrate on building a large Vietnamese-French parallel corpus. A document alignment method based on publication date, special words and sentence alignment result is proposed. The paper also presents an application of the obtained parallel corpus to the construction of a Vietnamese-French statistical machine translation system, where the use of different units for Vietnamese (syllables, words, or their combinations) is discussed.
international conference on acoustics, speech, and signal processing | 2010
Viet Bac Le; Lori Lamel; Jean-Luc Gauvain
It has become common practice to adapt acoustic models to specific-conditions (gender, accent, bandwidth) in order to improve the performance of speech-to-text (STT) transcription systems. With the growing interest in the use of discriminative features produced by a multi layer perceptron (MLP) in such systems, the question arise of whether it is necessary to specialize the MLP to particular conditions, and if so, how to incorporate the condition-specific MLP features in the system. This paper explores three approaches (adaptation, full training, and feature merging) to use condition-specific MLP features in a state-of-the-art BN STT system for French. The third approach without condition-specific adaptation was found to outperform the original models with condition-specific adaptation, and was found to perform almost as well as full training of multiple condition-specific HMMs.
international conference on acoustics, speech, and signal processing | 2008
Viet Bac Le; Sopheap Seng; Laurent Besacier; Brigitte Bigi
This paper presents the benefit of using multiple lexical units in the post-processing stage of an ASR system. Since the use of sub-word units can reduce the high out-of-vocabulary rate and improve the lack of text resources in statistical language modeling, we propose several methods to decompose, normalize and combine word and sub-word lattices generated from different ASR systems. By using a sub-word information table, every word in a lattice can be decomposed into sub-word units. These decomposed lattices can be combined into a common lattice in order to generate a confusion network. This lattices combination scheme results in an absolute syllable error rate reduction of about 1.4% over the sentence MAP baseline method for a Vietnamese ASR task. By comparing with the N-best lists combination and voting method, the proposed method works better.
conference of the international speech communication association | 2016
Gregory Gelly; Jean-Luc Gauvain; Viet Bac Le; Abdelkhalek Messaoudi
This paper describes the design of an acoustic language recognition system based on BLSTM that can discriminate closely related languages and dialects of the same language. We introduce a Divide-and-Conquer (D&C) method to quickly and successfully train an RNN-based multi-language classifier. Experiments compare this approach to the straightforward training of the same RNN, as well as to two widely used LID techniques: a phonotactic system using DNN acoustic models and an i-vector system. Results are reported on two different data sets: the 14 languages of NIST LRE07 and the 20 closely related languages and dialects of NIST OpenLRE15. In addition to reporting the NIST Cavg metric which served as the primary metric for the LRE07 and OpenLRE15 evaluations, the EER and LER are provided. When used with BLSTM, the D&C training scheme significantly outperformed the classical training method for multi-class RNNs. On the OpenLRE15 data set, this method also outperforms classical LID techniques and combines very well with a phonotactic system.
Multimedia Tools and Applications | 2010
Georges Quénot; Tien-Ping Tan; Viet Bac Le; Stéphane Ayache; Laurent Besacier; Philippe Mulhem
We present in this paper an approach based on the use of the International Phonetic Alphabet (IPA) for content-based indexing and retrieval of multilingual audiovisual documents. The approach works even if the languages of the document are unknown. It has been validated in the context of the “Star Challenge” search engine competition organized by the Agency for Science, Technology and Research (A*STAR) of Singapore. Our approach includes the building of an IPA-based multilingual acoustic model and a dynamic programming based method for searching document segments by “IPA string spotting”. Dynamic programming allows for retrieving the query string in the document string even with a significant transcription error rate at the phone level. The methods that we developed ranked us as first and third on the monolingual (English) search task, as fifth on the multilingual search task and as first on the multimodal (audio and image) search task.
Odyssey 2016 | 2016
Gregory Gelly; Jean-Luc Gauvain; Lori Lamel; Antoine Laurent; Viet Bac Le; Abdel Messaoudi
This paper describes our development work to design a language recognition system that can discriminate closely related languages and dialects of the same language. The work was a joint effort by LIMSI and Vocapia Research in preparation for the NIST 2015 Language Recognition Evaluation (LRE). The language recognition system results from a fusion of four core classifiers: a phonotactic component using DNN acoustic models, two purely acoustic components using a RNN model and i-vector model, and a lexical component. Each component generates language posterior probabilities optimized to maximize the LID NCE, making their combination simple and robust. The motivation for using multiple components representing different speech knowledge is that some dialect distinctions may not be manifest at the acoustic level. We report experiments on the NIST LRE15 data and provide an analysis of the results and some post-evaluation contrasts. The 2015 LRE task focused on the identification of 20 languages clustered in 6 groups (Arabic, Chinese, English, French, Slavic and Iberic) of similar languages. Results are reported using the NIST Cavg metric which served as the primary metric for the OpenLRE15 evaluation. Results are also reported for the EER and the LER.
conference of the international speech communication association | 2007
Viet Bac Le; Odile Mella; Dominique Fohr