Antoine Laurent
Vocapia Research
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Antoine Laurent.
international conference on acoustics, speech, and signal processing | 2011
Antoine Laurent; Sylvain Meignier; Teva Merlin; Paul Deléglise
Large vocabulary automatic speech recognition (ASR) technologies perform well in known and controlled contexts. In less controlled conditions, however, human review is often necessary to check and correct the results of such systems in order to ensure that the output of ASR will be understandable. We propose a method for computer-assisted transcription of speech, based on automatic reordering confusion networks. Our method will be evaluated in terms of KSR (Keystroke Saving Rate) and WSR (Word Stroke Ratio). It allows to significantly reduce the number of actions needed to correct ASR outputs. WSR computed before and after every network reordering shows a gain of about 17.7% (3.4 points).
international conference on acoustics, speech, and signal processing | 2012
Elie Khoury; Antoine Laurent; Sylvain Meignier; Simon Petitrenaud
In this paper, we consider the issue of speaker identification within audio records of broadcast news. The speaker identity information is extracted from both transcript-based and acoustic-based speaker identification systems. This information is combined in the belief functions framework, which makes coherent the knowledge representation of the problem. The Kuhn-Munkres algorithm is used to optimize the assignment problem of speaker identities and speaker clusters. Experiments carried out on French broadcast news from the French evaluation campaign ESTER show the efficiency of the proposed combination method.
international conference on acoustics, speech, and signal processing | 2009
Antoine Laurent; Teva Merlin; Sylvain Meignier; Yannick Estève; Paul Deléglise
This paper focuses on an approach to enhancing automatic phonetic transcription of proper nouns by using an iterative filter to retain only the most relevant part of a large set of phonetic variants, obtained by combining rule-based generation with extraction from actual audio signals. Using this technique, we were able to reduce the error rate affecting proper nouns during automatic speech transcription of the ESTER corpus of French broadcast news. The role of the filtering was to ensure that the new phonetic variants of proper nouns would not induce new errors in the transcription of the rest of the words.
conference of the international speech communication association | 2016
Arseniy Gorin; Rasa Lileikyte; Guangpu Huang; Lori Lamel; Jean-Luc Gauvain; Antoine Laurent
This research extends our earlier work on using machine translation (MT) and word-based recurrent neural networks to augment language model training data for keyword search in conversational Cantonese speech. MT-based data augmentation is applied to two language pairs: English-Lithuanian and English-Amharic. Using filtered N-best MT hypotheses for language modeling is found to perform better than just using the 1best translation. Target language texts collected from the Web and filtered to select conversational-like data are used in several manners. In addition to using Web data for training the language model of the speech recognizer, we further investigate using this data to improve the language model and phrase table of the MT system to get better translations of the English data. Finally, generating text data with a character-based recurrent neural network is investigated. This approach allows new word forms to be produced, providing a way to reduce the out-of-vocabulary rate and thereby improve keyword spotting performance. We study how these different methods of language model data augmentation impact speech-to-text and keyword spotting performance for the Lithuanian and Amharic languages. The best results are obtained by combining all of the explored methods.
ieee automatic speech recognition and understanding workshop | 2015
Thiago Fraga-Silva; Antoine Laurent; Jean-Luc Gauvain; Lori Lamel; Viet Bac Le; Abdelkhalek Messaoudi
This paper extends recent research on training data selection for speech transcription and keyword spotting system development. Selection techniques were explored in the context of the IARPA-Babel Active Learning (AL) task for 6 languages. Different selection criteria were considered with the goal of improving over a system built using a pre-defined 3-hour training data set. Four variants of the entropy-based criterion were explored: words, triphones, phones as well as the use of HMM-states previously introduced in [4]. The influence of the number of HMM-states was assessed as well as whether automatic or manual reference transcripts were used. The combination of selection criteria was investigated, and a novel multi-stage selection method proposed. This method was also assessed using larger data sets than were permitted in the Babel AL task. Results are reported for the 6 languages. The multi-stage selection was also applied to the surprise language (Swahili) in the NIST OpenKWS 2015 evaluation.
Computer Speech & Language | 2014
Antoine Laurent; Sylvain Meignier; Paul Deléglise
Accurate phonetic transcription of proper nouns can be an important resource for commercial applications that embed speech technologies, such as audio indexing and vocal phone directory lookup. However, an accurate phonetic transcription is more difficult to obtain for proper nouns than for regular words. Indeed, phonetic transcription of a proper noun depends on both the origin of the speaker pronouncing it and the origin of the proper noun itself. This work proposes a method that allows the extraction of phonetic transcriptions of proper nouns using actual utterances of those proper nouns, thus yielding transcriptions based on practical use instead of mere pronunciation rules. The proposed method consists in a process that first extracts phonetic transcriptions, and then iteratively filters them. In order to initialize the process, an alignment dictionary is used to detect word boundaries. A rule-based grapheme-to-phoneme generator (LIA_PHON), a knowledge-based approach (JSM), and a Statistical Machine Translation based system were evaluated for this alignment. As a result, compared to our reference dictionary (BDLEX supplemented by LIA_PHON for missing words) on the ESTER 1 French broadcast news corpus, we were able to significantly decrease the Word Error Rate (WER) on segments of speech with proper nouns, without negatively affecting the WER on the rest of the corpus.
international conference on acoustics, speech, and signal processing | 2016
Antoine Laurent; Thiago Fraga-Silva; Lori Lamel; Jean-Luc Gauvain
In this paper we investigate various techniques in order to build effective speech to text (STT) and keyword search (KWS) systems for low resource conversational speech. Subword decoding and graphemic mappings were assessed in order to detect out-of-vocabulary keywords. To deal with the limited amount of transcribed data, semi-supervised training and data selection methods were investigated. Robust acoustic features produced via data augmentation were evaluated for acoustic modeling. For language modeling, automatically retrieved conversational-like Webdata was used, as well as neural network based models. We report STT improvements with all the techniques, but interestingly only some improve KWS performance. Results are reported for the Swahili language in the context of the 2015 OpenKWS Evaluation.
Odyssey 2016 | 2016
Gregory Gelly; Jean-Luc Gauvain; Lori Lamel; Antoine Laurent; Viet Bac Le; Abdel Messaoudi
This paper describes our development work to design a language recognition system that can discriminate closely related languages and dialects of the same language. The work was a joint effort by LIMSI and Vocapia Research in preparation for the NIST 2015 Language Recognition Evaluation (LRE). The language recognition system results from a fusion of four core classifiers: a phonotactic component using DNN acoustic models, two purely acoustic components using a RNN model and i-vector model, and a lexical component. Each component generates language posterior probabilities optimized to maximize the LID NCE, making their combination simple and robust. The motivation for using multiple components representing different speech knowledge is that some dialect distinctions may not be manifest at the acoustic level. We report experiments on the NIST LRE15 data and provide an analysis of the results and some post-evaluation contrasts. The 2015 LRE task focused on the identification of 20 languages clustered in 6 groups (Arabic, Chinese, English, French, Slavic and Iberic) of similar languages. Results are reported using the NIST Cavg metric which served as the primary metric for the OpenLRE15 evaluation. Results are also reported for the EER and the LER.
XXXIIe Journées d'Etudes sur la Parole (JEP 2018) | 2018
Salima Mdhaffar; Antoine Laurent; Yannick Estève
RÉSUMÉ Ces dernières années, l’utilisation des réseaux neuronaux est devenue incontournable dans de nombreux domaines et notamment en traitement automatique des langues. Le travail présenté dans cet article s’inscrit dans le cadre de leur utilisation dans le domaine de la reconnaissance automatique de la parole. Nous présentons les résultats obtenus par des réseaux neuronaux récurrents (RNN) de natures différentes (LSTM, GRU, GRU-Highway) sur les données de la campagne d’évaluation MGB 3. Les données de cette campagne, qui n’est pas encore terminée, correspondent à des enregistrements d’émissions très diverses de la chaîne de télévision britannique BBC. Nos expériences offrent une comparaison des résultats des différents RNN et comment, en combinant des réseaux de neurones récurrents et des modèles de langage N-gram classiques modélisant les phrases dans les deux sens de lecture, il est possible d’améliorer de manière très significative les performances d’un système de reconnaissance de la parole.
international conference on acoustics, speech, and signal processing | 2017
Guangpu Huang; Thiago Fraga da Silva; Lori Lamel; Jean-Luc Gauvain; Arseniy Gorin; Antoine Laurent; Rasa Lileikyte; Abdel Messouadi
This paper reports on investigations using two techniques for language model text data augmentation for low-resourced automatic speech recognition and keyword search. Lowresourced languages are characterized by limited training materials, which typically results in high out-of-vocabulary (OOV) rates and poor language model estimates. One technique makes use of recurrent neural networks (RNNs) using word or subword units. Word-based RNNs keep the same system vocabulary, so they cannot reduce the OOV, whereas subword units can reduce the OOV but generate many false combinations. A complementary technique is based on automatic machine translation, which requires parallel texts and is able to add words to the vocabulary. These methods were assessed on 10 languages in the context of the Babel program and NIST OpenKWS evaluation. Although improvements vary across languages with both methods, small gains were generally observed in terms of word error rate reduction and improved keyword search performance.