Mickael Rouvier | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mickael Rouvier is active.

Explore More

Publication

Featured researches published by Mickael Rouvier.

north american chapter of the association for computational linguistics | 2016

SENSEI-LIF at SemEval-2016 Task 4: Polarity embedding fusion for robust sentiment analysis

Mickael Rouvier; Benoit Favre

This paper describes the system developed at LIF for the SemEval-2016 evaluation campaign. The goal of Task 4.A was to identify sentiment polarity in tweets. The system extends the Convolutional Neural Networks (CNN) state of the art approach. We initialize the input representations with embeddings trained on different units: lexical, partof-speech, and sentiment embeddings. Neural networks for each input space are trained separately, and then the representations extracted from their hidden layers are concatenated as input of a fusion neural network. The system ranked 2nd at SemEval-2016 and obtained an average F1 of 63.0%.

IEEE Transactions on Audio, Speech, and Language Processing | 2015

Audio-based video genre identification

Mickael Rouvier; Stanislas Oger; Georges Linarès; Driss Matrouf; Bernard Merialdo; Yingbo Li

This paper presents investigations about the automatic identification of video genre by audio channel analysis. Genre refers to editorial styles such commercials, movies, sports... We propose and evaluate some methods based on both low and high level descriptors, in cepstral or time domains, but also by analyzing the global structure of the document and the linguistic contents. Then, the proposed features are combined and their complementarity is evaluated. On a database composed of single-stories web-videos, the best audio-only based system performs 9% of Classification Error Rate (CER). Finally, we evaluate the complementarity of the proposed audio features and video features that are classically used for Video Genre Identification (VGI). Results demonstrate the complementarity of the modalities for genre recognition, the final audio-video system reaching 6% CER.

international conference on acoustics, speech, and signal processing | 2010

On-the-fly video genre classification by combination of audio features

Mickael Rouvier; Georges Linarès; Driss Matrouf

Video genre identification methods are frequently based on image or motion analysis, which are relatively time-consuming processes. Since such approaches are tractable by batch processing, as-soon-as-possible identification requires faster methods. In this paper, we investigate the use of audio-only methods for on-the-fly video classification. We propose to use several acoustic feature streams and we evaluate various combination schemes at the frame or at the score level. Results are compared to those obtained by humans, according to the listening duration. Although the system based on model combination slightly outperforms the humans on very soon detection. The latter remain significantly more accurate on long sessions.

text speech and dialogue | 2013

LIUM ASR System for ETAPE French Evaluation Campaign: Experiments on System Combination Using Open-Source Recognizers

Fethi Bougares; Paul Deléglise; Yannick Estève; Mickael Rouvier

In this paper, we report the LIUM participation in the ETAPE [1] (Evaluations en Traitement Automatique de la Parole) evaluation campaign, on the rich transcription task for French track. After describing the ETAPE goals and guidelines, we present our ASR system, which ranked first in the ETAPE evaluation campaign. Two ASR systems were used for our participation in ETAPE 2011. In addition to the LIUM ASR system based on CMU Sphinx project, we utilized an additional open-source ASR system based on the RASR toolkit. We evaluate, in this paper, the gain obtained with various acoustics modeling and adaptation techniques for each of the two systems, as well as with various system combination techniques. The combination of two different ASR systems allows a significant WER reduction, from 23.6% for the best single ASR system to 22.6% for the combination.

acm multimedia | 2011

Static and dynamic video summaries

Yingbo Li; Bernard Merialdo; Mickael Rouvier; Georges Linarès

Currently there are a lot of algorithms for video summarization; however most of them only represent visual information. In this paper, we propose two approaches for the construction of the summary using both video and text. One approach focuses on static summaries, where the summary is a set of selected keyframes and keywords, to be displayed in a fixed area. The second approach addresses dynamic summaries where video segments are selected based on both their visual and textual content to compose a new video sequence of predefined duration. Our approaches rely on an existing summarization algorithm, Video Maximal Marginal Relevance (Video-MMR), and its extension Text Video Maximal Marginal Relevance (TV-MMR) proposed by us. We describe the details of those approaches and present experimental results.

international conference on acoustics, speech, and signal processing | 2010

Transcription-based video genre classification

Stanislas Oger; Mickael Rouvier; Georges Linarès

In this paper, we present a new method for video genre identification based on the linguistic content analysis. This approach relies on the analysis of the most frequent words in the video transcriptions provided by an automatic speech recognition system. Experiments are conducted on a corpus composed of cartoons, movies, news, commercials, documentary, sport and music. On this 7-genre identification task, the proposed transcription-based method obtains up to 80% of correct identification. Finally, this rate is increased to 95% by combining the proposed linguistic-level features with low-level acoustic features.

Eurasip Journal on Audio, Speech, and Music Processing | 2010

Query-Driven Strategy for On-the-Fly Term Spotting in Spontaneous Speech

Mickael Rouvier; Georges Linarès; Benjamin Lecouteux

Spoken utterance retrieval was largely studied in the last decades, with the purpose of indexing large audio databases or of detecting keywords in continuous speech streams. While the indexing of closed corpora can be performed via a batch process, on-line spotting systems have to synchronously detect the targeted spoken utterances. We propose a two-level architecture for on-the-fly term spotting. The first level performs a fast detection of the speech segments that probably contain the targeted utterance. The second level refines the detection on the selected segments, by using a speech recognizer based on a query-driven decoding algorithm. Experiments are conducted on both broadcast and spontaneous speech corpora. We investigate the impact of the spontaneity level on system performance. Results show that our method remains effective even if the recognition rates are significantly degraded by disfluencies.

ieee automatic speech recognition and understanding workshop | 2011

Factor analysis based session variability compensation for Automatic Speech Recognition

Mickael Rouvier; Mohamed Bouallegue; Driss Matrouf; Georges Linarès

In this paper we propose a new feature normalization based on Factor Analysis (FA) for the problem of acoustic variability in Automatic Speech Recognition (ASR). The FA paradigm was previously used in the field of ASR, in order to model the usefull information: the HMM state dependent acoustic information. In this paper, we propose to use the FA paradigm to model the useless information (speaker- or channel-variability) in order to remove it from acoustic data frames. The transformed training data frames are then used to train new HMM models using the standard training algorithm. The transformation is also applied to the test data before the decoding process. With this approach we obtain, on french broadcast news, an absolute WER reduction of 1.3%.

Computer Speech & Language | 2011

Modeling nuisance variabilities with factor analysis for GMM-based audio pattern classification

Driss Matrouf; Florian Verdet; Mickael Rouvier; Jean-François Bonastre; Georges Linarès

Abstract: Audio pattern classification represents a particular statistical classification task and includes, for example, speaker recognition, language recognition, emotion recognition, speech recognition and, recently, video genre classification. The feature being used in all these tasks is generally based on a short-term cepstral representation. The cepstral vectors contain at the same time useful information and nuisance variability, which are difficult to separate in this domain. Recently, in the context of GMM-based recognizers, a novel approach using a Factor Analysis (FA) paradigm has been proposed for decomposing the target model into a useful information component and a session variability component. This approach is called Joint Factor Analysis (JFA), since it models jointly the nuisance variability and the useful information, using the FA statistical method. The JFA approach has even been combined with Support Vector Machines, known for their discriminative power. In this article, we successfully apply this paradigm to three automatic audio processing applications: speaker verification, language recognition and video genre classification. This is done by applying the same process and using the same free software toolkit. We will show that this approach allows for a relative error reduction of over 50% in all the aforementioned audio processing tasks.

spoken language technology workshop | 2008

On-the-fly term spotting by phonetic filtering and request-driven decoding

Mickael Rouvier; Georges Linarès; Benjamin Lecouteux

This paper addresses the problem of on-the-fly term spotting in continuous speech streams. We propose a 2-level architecture in which recall and accuracy are sequentially optimized. The first level uses a cascade of phonetic filters to select the speech segments which probably contain the targeted terms. The second level performs a request-driven decoding of the selected speech segments. The results show good performance of the proposed system on broadcast news data : the best configuration reaches a F-measure of about 94% while respecting the on-the-fly processing constraint.

Explore More