Sahar Ghannay | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Sahar Ghannay is active.

Explore More

Publication

Featured researches published by Sahar Ghannay.

european signal processing conference | 2015

Word embeddings combination and neural networks for robustness in ASR error detection

Sahar Ghannay; Yannick Estève; Nathalie Camelin

This study focuses on error detection in Automatic Speech Recognition (ASR) output. We propose to build a confidence classifier based on a neural network architecture, which is in charge to attribute a label (error or correct) for each word within an ASR hypothesis. This classifier uses word embed-dings as inputs, in addition to ASR confidence-based, lexical and syntactic features. We propose to evaluate the impact of three different kinds of word embeddings on this error detection approach, and we present a solution to combine these three different types of word embeddings in order to take advantage of their complementarity. In our experiments, different approaches are evaluated on the automatic transcriptions generated by two different ASR systems applied on the ETAPE corpus (French broadcast news). Experimental results show that the proposed neural architectures achieve a CER reduction comprised between 4% and 5.8% in error detection, depending on test dataset, in comparison with a state-of-the-art CRF approach.

SLSP 2015 Proceedings of the Third International Conference on Statistical Language and Speech Processing - Volume 9449 | 2015

Combining Continuous Word Representation and Prosodic Features for ASR Error Prediction

Sahar Ghannay; Yannick Estève; Nathalie Camelin; Camille Dutrey; Fabian Santiago; Martine Adda-Decker

Recent advances in continuous word representation have been successfully used in several natural language processing tasks. This paper focuses on error prediction in Automatic Speech Recognition ASR outputs and proposes to investigate the use of continuous word representation word embeddings within a neural network architecture. The main contribution of this paper is about word embeddings combination: several combination approaches are proposed in order to take advantage of their complementarity. The use of prosodic features, in addition to classical syntactic ones, is evaluated. Experiments are made on automatic transcriptions generated by the LIUM ASR system applied on the ETAPE corpus. They show that the proposed neural architecture, using an effective continuous word representation combination and prosodic features as additional features, outperforms significantly state-of-the-art approach based on the use of Conditional Random Fields. Last, the proposed system produces a well calibrated confidence measure, evaluated in terms of Normalized Cross Entropy.

conference of the international speech communication association | 2016

Acoustic Word Embeddings for ASR Error Detection.

Sahar Ghannay; Yannick Estève; Nathalie Camelin; Paul Deléglise

This paper focuses on error detection in Automatic Speech Recognition (ASR) outputs. A neural network architecture is proposed, which is well suited to handle continuous word representations, like word embeddings. In a previous study, the authors explored the use of linguistic word embeddings, and more particularly their combination. In this new study, the use of acoustic word embeddings is explored. Acoustic word embeddings offer the opportunity of an a priori acoustic representation of words that can be compared, in terms of similarity, to an embedded representation of the audio signal. First, we propose an approach to evaluate the intrinsic performances of acoustic word embeddings in comparison to orthographic representations in order to capture discriminative phonetic information. Since French language is targeted in experiments, a particular focus is made on homophone words. Then, the use of acoustic word embeddings is evaluated for ASR error detection. The proposed approach gets a classification error rate of 7.94% while the previous state-of-the-art CRFbased approach gets a CER of 8.56% on the outputs of the ASR system which won the ETAPE evaluation campaign on speech recognition of French broadcast news.

workshop on evaluating vector space representations for nlp | 2016

Evaluation of acoustic word embeddings

Sahar Ghannay; Yannick Estève; Nathalie Camelin; Paul Deléglise

Recently, researchers in speech recognition have started to reconsider using whole words as the basic modeling unit, instead of phonetic units. These systems rely on a function that embeds an arbitrary or fixed dimensional speech segments to a vector in a fixed-dimensional space, named acoustic word embedding. Thus, speech segments of words that sound similarly will be projected in a close area in a continuous space. This paper focuses on the evaluation of acoustic word embeddings. We propose two approaches to evaluate the intrinsic performances of acoustic word embeddings in comparison to orthographic representations in order to evaluate whether they capture discriminative phonetic information. Since French language is targeted in experiments, a particular focus is made on homophone words.

arXiv: Computation and Language | 2018

TED-LIUM 3: Twice as Much Data and Corpus Repartition for Experiments on Speaker Adaptation

François Hernandez; Vincent Nguyen; Sahar Ghannay; Natalia A. Tomashenko; Yannick Estève

In this paper, we present TED-LIUM release 3 corpus (TED-LIUM 3 is available on https://lium.univ-lemans.fr/ted-lium3/) dedicated to speech recognition in English, which multiplies the available data to train acoustic models in comparison with TED-LIUM 2, by a factor of more than two. We present the recent development on Automatic Speech Recognition (ASR) systems in comparison with the two previous releases of the TED-LIUM Corpus from 2012 and 2014. We demonstrate that, passing from 207 to 452 h of transcribed speech training data is really more useful for end-to-end ASR systems than for HMM-based state-of-the-art ones. This is the case even if the HMM-based ASR system still outperforms the end-to-end ASR system when the size of audio training data is 452 h, with a Word Error Rate (WER) of 6.7% and 13.7%, respectively. Finally, we propose two repartitions of the TED-LIUM release 3 corpus: the legacy repartition that is the same as that existing in release 2, and a new repartition, calibrated and designed to make experiments on speaker adaptation. Similar to the two first releases, TED-LIUM 3 corpus will be freely available for the research community.

XXXIIe Journées d'Etudes sur la Parole (JEP 2018) | 2018

Simulation d'erreurs de reconnaissance automatique dans un cadre de compréhension de la parole

Edwin Simonnet; Sahar Ghannay; Nathalie Camelin; Yannick Estève

Simulating ASR errors for training SLU systems This paper presents an approach to simulate automatic speech recognition (ASR) errors from manual transcriptions and how it can be used to improve the performance of spoken language understanding (SLU) systems. The proposed method is based on the use of both acoustic and linguistic word embeddings in order to define a similarity measure between words. This measure is dedicated to predict ASR confusions. Actually, we assume that words acoustically and linguistically close are the ones confused by an ASR system. Experiments were carried on the French MEDIA corpus focusing on hotel reservation. They show that this approach significantly improves SLU system performance with a relative reduction of 21.2% of concept/value error rate (CVER), particularly when the SLU system is based on a neural approach (reduction of 22.4% of CVER). A comparison to a naive noising approach shows that the proposed noising approach is particularly relevant. MOTS-CLÉS : compréhension de la parole, augmentation des données, bruitage, reconnaissance automatique de la parole, erreurs.

International Conference on Statistical Language and Speech Processing | 2017

Enriching confusion networks for post-processing

Sahar Ghannay; Yannick Estève; Nathalie Camelin

The paper proposes a new approach for a posteriori enrichment of automatic speech recognition (ASR) confusion networks (CNs). CNs are usually needed to decrease word error rate and to compute confidence measures, but they are also used in many ways in order to improve post-processing of ASR outputs. For instance, they can be helpfully used to propose alternative word hypotheses when ASR outputs are corrected by a human on post-edition. However, CNs bins do not have a fixed length, and sometimes contain only one or two word hypotheses: in this case the number of alternatives to correct a misrecognized word is very low, reducing the chance of helping the human annotator.

language resources and evaluation | 2016