Georges Linarès | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Georges Linarès is active.

Explore More

Publication

Featured researches published by Georges Linarès.

text speech and dialogue | 2007

The LIA speech recognition system: from 10xRT to 1xRT

Georges Linarès; Pascal Nocera; Dominique Massonié; Driss Matrouf

The LIA developed a speech recognition toolkit providing most of the components required by speech-to-text systems. This toolbox allowed to build a Broadcast News (BN) transcription system was involved in the ESTER evaluation campaign ([1]), on unconstrained transcription and real-time transcription tasks. In this paper, we describe the techniques we used to reach the real-time, starting from our baseline 10xRT system. We focus on some aspects of the A* search algorithm which are critical for both efficiency and accuracy. Then, we evaluate the impact of the different system components (lexicon, language models and acoustic models) to the trade-off between efficiency and accuracy. Experiments are carried out in framework of the ESTER evaluation campaign. Our results show that the real time system reaches performance on about 5.6% absolute WER whorses than the standard 10xRT system, with an absolute WER (Word Error Rate) of about 26.8%.

IEEE Transactions on Audio, Speech, and Language Processing | 2013

Dynamic Combination of Automatic Speech Recognition Systems by Driven Decoding

Benjamin Lecouteux; Georges Linarès; Yannick Estève; Guillaume Gravier

Combining automatic speech recognition (ASR) systems generally relies on the posterior merging of the outputs or on acoustic cross-adaptation. In this paper, we propose an integrated approach where outputs of secondary systems are integrated in the search algorithm of a primary one. In this driven decoding algorithm (DDA), the secondary systems are viewed as observation sources that should be evaluated and combined to others by a primary search algorithm. DDA is evaluated on a subset of the ESTER I corpus consisting of 4 hours of French radio broadcast news. Results demonstrate DDA significantly outperforms vote-based approaches: we obtain an improvement of 14.5% relative word error rate over the best single-systems, as opposed to the the 6.7% with a ROVER combination. An in-depth analysis of the DDA shows its ability to improve robustness (gains are greater in adverse conditions) and a relatively low dependency on the search algorithm. The application of DDA to both and beam-search-based decoder yields similar performances.

international conference on acoustics, speech, and signal processing | 2008

Generalized driven decoding for speech recognition system combination

Benjamin Lecouteux; Georges Linarès; Yannick Estève; Guillaume Gravier

Driven decoding algorithm (DDA) is initially an integrated approach for the combination of 2 speech recognition (ASR) systems. It consists in guiding the search algorithm of a primary ASR system by the one-best hypothesis of an auxiliary system. In this paper, we generalize DDA to confusion-network driven decoding and we propose new combination schemes for multiple system combination. Since previous experiments involved 2 ASR systems on broadcast news data, the proposed extended DDA is evaluated using 3 ASR systems from different labs. Results show that generalized- DDA outperforms significantly ROVER method: we obtain a 15.7% relative word error rate improvement with respect to the best single system, as opposed to 8.5% with the ROVER combination.

Pattern Recognition Letters | 2014

Feature selection using Principal Component Analysis for massive retweet detection

Mohamed Morchid; Richard Dufour; Pierre-Michel Bousquet; Georges Linarès; Juan-Manuel Torres-Moreno

Social networks become a major actor in massive information propagation. In the context of the Twitter platform, its popularity is due in part to the capability of relaying messages (i.e. tweets) posted by users. This particular mechanism, called retweet, allows users to massively share tweets they consider as potentially interesting for others. In this paper, we propose to study the behavior of tweets that have been massively retweeted in a short period of time. We first analyze specific tweet features through a Principal Component Analysis (PCA) to better understand the behavior of highly forwarded tweets as opposed to those retweeted only a few times. Finally, we propose to automatically detect the massively retweeted messages. The qualitative study is used to select the features allowing the best classification performance. We show that the selection of only the most correlated features, leads to the best classification accuracy (F-measure of 65.7%), with a gain of about 2.4 points in comparison to the use of the complete set of features.

international conference on acoustics, speech, and signal processing | 2014

Improving dialogue classification using a topic space representation and a Gaussian classifier based on the decision rule

Mohamed Morchid; Richard Dufour; Pierre-Michel Bousquet; Mohamed Bouallegue; Georges Linarès; Renato De Mori

In this paper, we study the impact of dialogue representations and classification methods in the task of theme identification of telephone conversation services having highly imperfect automatic transcriptions. Two dialogue representations are firstly compared: the classical Term Frequency-Inverse Document Frequency with Gini purity criteria (TF-IDF-Gini) method and the Latent Dirichlet Allocation (LDA) approach. We then propose to study an original classification method that takes advantage of the LDA topic space representation, highlighted as the best dialogue representation. To do so, two assumptions about topic representation led us to choose a Gaussian process (GP) based method. This approach is compared with a Support Vector Machine (SVM) classification method. Results show that the GP approach is a better solution to deal with the multiple theme complexity of a dialogue, no matter the conditions studied (manual or automatic transcriptions). We finally discuss the impact of the topic space reduction on the classification accuracy.

international conference on acoustics, speech, and signal processing | 2008

On-demand new word learning using world wide web

Stanislas Oger; Georges Linarès; Frédéric Béchet; Pascal Nocera

Most of the Web-based methods for lexicon augmenting consist in capturing global semantic features of the targeted domain in order to collect relevant documents from the Web. We suggest that the local context of the out-of-vocabulary (OOV) words contains relevant information on the OOV words. With this information, we propose to use the Web to build locally-augmented lexicons which are used in a final local decoding pass. Our experiments confirm the relevance of the Web for the OOV word retrieval. Different methods are proposed to retrieve the hypothesis words. Finally we present the integration of new words in the transcription process based on part-of-speech models. This technique allows to recover 7.6% of the significant OOV words and the accuracy of the system is improved.

international conference on acoustics, speech, and signal processing | 2007

System Combination by Driven Decoding

Benjamin Lecouteux; Georges Linarès; Yannick Estève; Julie Mauclair

The combination of automatic speech recognition (ASR) systems generally relies on a posteriori merge of system outputs or on a cross-adaptation. In this paper, we propose an integrated approach where the search of a primary system is driven by the outputs of a secondary one. This method allows to drive the primary system search by using the one-best hypotheses and the word posteriors gathered from the secondary system. Experiments are carried out within the experimental framework of the ESTER evaluation campaign (S. Galliano et al. 2005). Results show that the driven decoding algorithm significantly outperforms the two single ASR systems (-8% of relative WER, -1.7% absolute). Finally, we investigate the interactions between driven decoding and cross-adaptations. The best cross-adaptation strategy in combination with the driven decoding process brings to a final absolute gain of about 1.9% WER.

international conference on acoustics, speech, and signal processing | 2004

Reducing computational and memory cost for cellular phone embedded speech recognition system

Christophe Lévy; Georges Linarès; Pascal Nocera; Jean-François Bonastre

We present several methods able to fit speech recognition system requirements to cellular phone resources. The proposed techniques are evaluated on a digit recognition task using both French and English corpora. We investigate particularly three aspects of speech processing: acoustic parameterization, recognition algorithms; acoustic modeling. Several parameterization algorithms (LPCC, MFCC and PLP) are compared to the linear predictive coding (LPC) included in the GSM norm. The MFCC and PLP parameterization algorithms perform significantly better than the others. Moreover, feature vector size can be reduced to 6 PLP coefficients, allowing memory and computation resources to be decreased without a significant loss of performance. In order to achieve good performance with reasonable resource needs, we develop several methods to embed a classical HMM-based speech recognition system in a cellular phone. We first propose an automatic on-line building of a phonetic lexicon which allows a minimal but unlimited lexicon. Then we reduce the HMM complexity by decreasing the number of (Gaussian) components per state. Finally, we evaluate our propositions by comparing dynamic time warping (DTW) with our HMM system - in the cellular phone context - for clean conditions. The experiments show that our HMM system outperforms DTW for speaker independent tasks and allows more practical applications for the cellular-phone user interface.

international conference on acoustics, speech, and signal processing | 2013

Person name recognition in ASR outputs using continuous context models

Benjamin Bigot; Grégory Senay; Georges Linarès; Corinne Fredouille; Richard Dufour

The detection and characterization, in audiovisual documents, of speech utterances where person names are pronounced, is an important cue for spoken content analysis. This paper tackles the problematic of retrieving spoken person names in the 1-Best ASR outputs of broadcast TV shows. Our assumption is that a person name is a latent variable produced by the lexical context it appears in. Thereby, a spoken name could be derived from ASR outputs even if it has not been proposed by the speech recognition system. A new context modelling is proposed in order to capture lexical and structural information surrounding a spoken name. The fundamental hypothesis of this study has been validated on broadcast TV documents available in the context of the REPERE challenge.

international conference on acoustics, speech, and signal processing | 2008

Frame-based acoustic feature integration for speech understanding

Loïc Barrault; Christophe Servan; Driss Matrouf; Georges Linarès; R. De Mori

With the purpose of improving spoken language understanding (SLU) performance, a combination of different acoustic speech recognition (ASR) systems is proposed. State a posteriori probabilities obtained with systems using different acoustic feature sets are combined with log-linear interpolation. In order to perform a coherent combination of these probabilities, acoustic models must have the same topology (i.e. same set of states). For this purpose, a fast and efficient twin model training protocol is proposed. By a wise choice of acoustic feature sets and log-linear interpolation of their likelihood ratios, a substantial concept error rate (CER) reduction has been observed on the test part of the French MEDIA corpus.

Explore More