Mickael Rouvier
University of Avignon
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Mickael Rouvier.
north american chapter of the association for computational linguistics | 2016
Mickael Rouvier; Benoit Favre
This paper describes the system developed at LIF for the SemEval-2016 evaluation campaign. The goal of Task 4.A was to identify sentiment polarity in tweets. The system extends the Convolutional Neural Networks (CNN) state of the art approach. We initialize the input representations with embeddings trained on different units: lexical, partof-speech, and sentiment embeddings. Neural networks for each input space are trained separately, and then the representations extracted from their hidden layers are concatenated as input of a fusion neural network. The system ranked 2nd at SemEval-2016 and obtained an average F1 of 63.0%.
IEEE Transactions on Audio, Speech, and Language Processing | 2015
Mickael Rouvier; Stanislas Oger; Georges Linarès; Driss Matrouf; Bernard Merialdo; Yingbo Li
This paper presents investigations about the automatic identification of video genre by audio channel analysis. Genre refers to editorial styles such commercials, movies, sports... We propose and evaluate some methods based on both low and high level descriptors, in cepstral or time domains, but also by analyzing the global structure of the document and the linguistic contents. Then, the proposed features are combined and their complementarity is evaluated. On a database composed of single-stories web-videos, the best audio-only based system performs 9% of Classification Error Rate (CER). Finally, we evaluate the complementarity of the proposed audio features and video features that are classically used for Video Genre Identification (VGI). Results demonstrate the complementarity of the modalities for genre recognition, the final audio-video system reaching 6% CER.
international conference on acoustics, speech, and signal processing | 2010
Mickael Rouvier; Georges Linarès; Driss Matrouf
Video genre identification methods are frequently based on image or motion analysis, which are relatively time-consuming processes. Since such approaches are tractable by batch processing, as-soon-as-possible identification requires faster methods. In this paper, we investigate the use of audio-only methods for on-the-fly video classification. We propose to use several acoustic feature streams and we evaluate various combination schemes at the frame or at the score level. Results are compared to those obtained by humans, according to the listening duration. Although the system based on model combination slightly outperforms the humans on very soon detection. The latter remain significantly more accurate on long sessions.
text speech and dialogue | 2013
Fethi Bougares; Paul Deléglise; Yannick Estève; Mickael Rouvier
In this paper, we report the LIUM participation in the ETAPE [1] (Evaluations en Traitement Automatique de la Parole) evaluation campaign, on the rich transcription task for French track. After describing the ETAPE goals and guidelines, we present our ASR system, which ranked first in the ETAPE evaluation campaign. Two ASR systems were used for our participation in ETAPE 2011. In addition to the LIUM ASR system based on CMU Sphinx project, we utilized an additional open-source ASR system based on the RASR toolkit. We evaluate, in this paper, the gain obtained with various acoustics modeling and adaptation techniques for each of the two systems, as well as with various system combination techniques. The combination of two different ASR systems allows a significant WER reduction, from 23.6% for the best single ASR system to 22.6% for the combination.
acm multimedia | 2011
Yingbo Li; Bernard Merialdo; Mickael Rouvier; Georges Linarès
Currently there are a lot of algorithms for video summarization; however most of them only represent visual information. In this paper, we propose two approaches for the construction of the summary using both video and text. One approach focuses on static summaries, where the summary is a set of selected keyframes and keywords, to be displayed in a fixed area. The second approach addresses dynamic summaries where video segments are selected based on both their visual and textual content to compose a new video sequence of predefined duration. Our approaches rely on an existing summarization algorithm, Video Maximal Marginal Relevance (Video-MMR), and its extension Text Video Maximal Marginal Relevance (TV-MMR) proposed by us. We describe the details of those approaches and present experimental results.
international conference on acoustics, speech, and signal processing | 2010
Stanislas Oger; Mickael Rouvier; Georges Linarès
In this paper, we present a new method for video genre identification based on the linguistic content analysis. This approach relies on the analysis of the most frequent words in the video transcriptions provided by an automatic speech recognition system. Experiments are conducted on a corpus composed of cartoons, movies, news, commercials, documentary, sport and music. On this 7-genre identification task, the proposed transcription-based method obtains up to 80% of correct identification. Finally, this rate is increased to 95% by combining the proposed linguistic-level features with low-level acoustic features.
Eurasip Journal on Audio, Speech, and Music Processing | 2010
Mickael Rouvier; Georges Linarès; Benjamin Lecouteux
Spoken utterance retrieval was largely studied in the last decades, with the purpose of indexing large audio databases or of detecting keywords in continuous speech streams. While the indexing of closed corpora can be performed via a batch process, on-line spotting systems have to synchronously detect the targeted spoken utterances. We propose a two-level architecture for on-the-fly term spotting. The first level performs a fast detection of the speech segments that probably contain the targeted utterance. The second level refines the detection on the selected segments, by using a speech recognizer based on a query-driven decoding algorithm. Experiments are conducted on both broadcast and spontaneous speech corpora. We investigate the impact of the spontaneity level on system performance. Results show that our method remains effective even if the recognition rates are significantly degraded by disfluencies.
ieee automatic speech recognition and understanding workshop | 2011
Mickael Rouvier; Mohamed Bouallegue; Driss Matrouf; Georges Linarès
In this paper we propose a new feature normalization based on Factor Analysis (FA) for the problem of acoustic variability in Automatic Speech Recognition (ASR). The FA paradigm was previously used in the field of ASR, in order to model the usefull information: the HMM state dependent acoustic information. In this paper, we propose to use the FA paradigm to model the useless information (speaker- or channel-variability) in order to remove it from acoustic data frames. The transformed training data frames are then used to train new HMM models using the standard training algorithm. The transformation is also applied to the test data before the decoding process. With this approach we obtain, on french broadcast news, an absolute WER reduction of 1.3%.
Computer Speech & Language | 2011
Driss Matrouf; Florian Verdet; Mickael Rouvier; Jean-François Bonastre; Georges Linarès
Abstract: Audio pattern classification represents a particular statistical classification task and includes, for example, speaker recognition, language recognition, emotion recognition, speech recognition and, recently, video genre classification. The feature being used in all these tasks is generally based on a short-term cepstral representation. The cepstral vectors contain at the same time useful information and nuisance variability, which are difficult to separate in this domain. Recently, in the context of GMM-based recognizers, a novel approach using a Factor Analysis (FA) paradigm has been proposed for decomposing the target model into a useful information component and a session variability component. This approach is called Joint Factor Analysis (JFA), since it models jointly the nuisance variability and the useful information, using the FA statistical method. The JFA approach has even been combined with Support Vector Machines, known for their discriminative power. In this article, we successfully apply this paradigm to three automatic audio processing applications: speaker verification, language recognition and video genre classification. This is done by applying the same process and using the same free software toolkit. We will show that this approach allows for a relative error reduction of over 50% in all the aforementioned audio processing tasks.
spoken language technology workshop | 2008
Mickael Rouvier; Georges Linarès; Benjamin Lecouteux
This paper addresses the problem of on-the-fly term spotting in continuous speech streams. We propose a 2-level architecture in which recall and accuracy are sequentially optimized. The first level uses a cascade of phonetic filters to select the speech segments which probably contain the targeted terms. The second level performs a request-driven decoding of the selected speech segments. The results show good performance of the proposed system on broadcast news data : the best configuration reaches a F-measure of about 94% while respecting the on-the-fly processing constraint.