Pascal Nocera
University of Avignon
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Pascal Nocera.
text speech and dialogue | 2007
Georges Linarès; Pascal Nocera; Dominique Massonié; Driss Matrouf
The LIA developed a speech recognition toolkit providing most of the components required by speech-to-text systems. This toolbox allowed to build a Broadcast News (BN) transcription system was involved in the ESTER evaluation campaign ([1]), on unconstrained transcription and real-time transcription tasks. In this paper, we describe the techniques we used to reach the real-time, starting from our baseline 10xRT system. We focus on some aspects of the A* search algorithm which are critical for both efficiency and accuracy. Then, we evaluate the impact of the different system components (lexicon, language models and acoustic models) to the trade-off between efficiency and accuracy. Experiments are carried out in framework of the ESTER evaluation campaign. Our results show that the real time system reaches performance on about 5.6% absolute WER whorses than the standard 10xRT system, with an absolute WER (Word Error Rate) of about 26.8%.
empirical methods in natural language processing | 2005
Benoit Favre; Frédéric Béchet; Pascal Nocera
Traditional approaches to Information Extraction (IE) from speech input simply consist in applying text based methods to the output of an Automatic Speech Recognition (ASR) system. If it gives satisfaction with low Word Error Rate (WER) transcripts, we believe that a tighter integration of the IE and ASR modules can increase the IE performance in more difficult conditions. More specifically this paper focuses on the robust extraction of Named Entities from speech input where a temporal mismatch between training and test corpora occurs. We describe a Named Entity Recognition (NER) system, developed within the French Rich Broadcast News Transcription program ESTER, which is specifically optimized to process ASR transcripts and can be integrated into the search process of the ASR modules. Finally we show how some metadata information can be collected in order to adapt NER and ASR models to new conditions and how they can be used in a task of Named Entity indexation of spoken archives.
international conference on acoustics, speech, and signal processing | 2008
Stanislas Oger; Georges Linarès; Frédéric Béchet; Pascal Nocera
Most of the Web-based methods for lexicon augmenting consist in capturing global semantic features of the targeted domain in order to collect relevant documents from the Web. We suggest that the local context of the out-of-vocabulary (OOV) words contains relevant information on the OOV words. With this information, we propose to use the Web to build locally-augmented lexicons which are used in a final local decoding pass. Our experiments confirm the relevance of the Web for the OOV word retrieval. Different methods are proposed to retrieve the hypothesis words. Finally we present the integration of new words in the transcription process based on part-of-speech models. This technique allows to recover 7.6% of the significant OOV words and the accuracy of the system is improved.
international conference on acoustics, speech, and signal processing | 2004
Christophe Lévy; Georges Linarès; Pascal Nocera; Jean-François Bonastre
We present several methods able to fit speech recognition system requirements to cellular phone resources. The proposed techniques are evaluated on a digit recognition task using both French and English corpora. We investigate particularly three aspects of speech processing: acoustic parameterization, recognition algorithms; acoustic modeling. Several parameterization algorithms (LPCC, MFCC and PLP) are compared to the linear predictive coding (LPC) included in the GSM norm. The MFCC and PLP parameterization algorithms perform significantly better than the others. Moreover, feature vector size can be reduced to 6 PLP coefficients, allowing memory and computation resources to be decreased without a significant loss of performance. In order to achieve good performance with reasonable resource needs, we develop several methods to embed a classical HMM-based speech recognition system in a cellular phone. We first propose an automatic on-line building of a phonetic lexicon which allows a minimal but unlimited lexicon. Then we reduce the HMM complexity by decreasing the number of (Gaussian) components per state. Finally, we evaluate our propositions by comparing dynamic time warping (DTW) with our HMM system - in the cellular phone context - for clean conditions. The experiments show that our HMM system outperforms DTW for speaker independent tasks and allows more practical applications for the cellular-phone user interface.
international conference on communications | 2008
Hong Quang Nguyen; Pascal Nocera; Eric Castelli; T. Van Loan
This paper presents our study on context independent tone recognition of Vietnamese continuous speech. Each of the six Vietnamese tones is represented by a hidden Markov model (HMM for short) and we used VNSPEECHCORPUS to learn these models in terms of fundamental frequency, F0, and short-time energy. We focus on evaluating the influence of different factors on the tone recognition. The experimental results show that the best method to learn F0 and energy is to use a logarithmic transformation function and then normalization with mean and mean deviation. In addition, we show that using 8 forms of tones and the discrimination between male and female speakers increase the accuracy of the Vietnamese tone recognition system.
2008 IEEE International Conference on Research, Innovation and Vision for the Future in Computing and Communication Technologies | 2008
Hong Quang Nguyen; Pascal Nocera; Eric Castelli; Van Loan Trinh
This paper presents our study on the use of tone information in a large vocabulary for a Vietnamese continuous speech recognition system. Firstly, a new module of tone recognition using Hidden Markov model is presented. Then, a new methodology for integrating this module into the Speeral system is given. The experiments were implemented on VNSpeechCorpus. The results showed that the direct use of tone score in the Speeral system would increase the performance of the system, e.g., 28.6% relative reduction in word error rate.
international conference on acoustics, speech, and signal processing | 1997
Georges Linarès; Pascal Nocera; Henri Meloni
Describes a new neural architecture for unsupervised learning of a classification of mixed transient signals. This method is based on neural techniques for blind separation of sources and subspace methods. The feedforward neural network dynamically builds and refreshes an acoustic events classification by detecting novelties, creating and deleting classes. A self-organization process achieves a class prototype rotation in order to minimise the statistical dependence of class activities. Simulated multi-dimensional signals and mixed acoustic signals in a real noisy environment have been used to test our model. The results on classification and detection model properties are encouraging, in spite of structured sound bad modeling.
spoken language technology workshop | 2012
Mohamed Bouallegue; Emmanuel Ferreira; Driss Matrouf; Georges Linarès; Maria Goudi; Pascal Nocera
This paper explores a novel method for context-dependent models in automatic speech recognition (ASR), in the context of under-resourced languages. We present a simple way to realize a tying states approach, based on a new vectorial representation of the HMM states. This vectorial representation is considered as a vector of a low number of parameters obtained by the Subspace Gaussian Mixture Models paradigm (SGMM). The proposed method does not require phonetic knowledge or a large amount of data, which represent the major problems of acoustic modeling for under-resourced languages. This paper shows how this representation can be obtained and used for tying states. Our experiments, applied on Vietnamese, show that this approach achieves a stable gain compared to the classical approach which is based on decision trees. Furthermore, this method appears to be portable to other languages, as shown in the preliminary study conducted on Berber.
telecommunications and signal processing | 2007
Christophe Lévy; Georges Linarès; Pascal Nocera; Jean-François Bonastre
Speech recognition applications are known to require substantial amount of resources in terms of training data, memory and computing power. However, the targeted context of this work — embedded mobile phone speech recognition systems — only authorizes few KB of memory, few MIPS and usually a small amount of training data. In order to meet the resource constraints, an approach based on an HMM system using a GMM-based state-independent acoustic modeling is proposed in this paper. A transformation is computed and applied to the global GMM in order to obtain each of the HMM state-dependent probability density functions. This strategy aims at storing only the transformation function parameters for each state and enables to decrease the amount of computing power needed for the likelihood computation. The proposed approach is evaluated with a digit recognition task using the French corpus BDSON. Our method allows a Digit Error Rate (DER) of 2.1%, when the system respects the resource constraints. Compared to a standard HMM with comparable resources, our approach achieved a relative DER decrease of about 52%.
international conference on acoustics, speech, and signal processing | 2003
Olivier Bellot; Driss Matrouf; Pascal Nocera; Georges Linarès; Jean-François Bonastre
The aim of speaker adaptation techniques is to enhance speaker-independent acoustic models to bring their recognition accuracy as close as possible to the one obtained with speaker-dependent models. Recently, a technique based on a hierarchical structure and the maximum a posteriori criterion was proposed (SMAP) (Shinoda, K. and Lee, C.-H., Proc IEEE ICASSP, 1998). As in SMAP, we assume that the acoustic model parameters are organized in a tree containing all the Gaussian distributions. Each node in that tree represents a cluster of Gaussian distributions sharing a common affine transformation representing the mismatch between training and test conditions. To estimate this affine transformation, we propose a new technique based on merging Gaussians and the standard MAP adaptation. This new technique is very fast and allows a good unsupervised adaptation for both means and variances even with a small amount of adaptation data. This adaptation strategy has shown a significant performance improvement in a large vocabulary speech recognition task, alone and combined with the MLLR (maximum likelihood linear regression) adaptation.