Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Stanislas Oger is active.

Publication


Featured researches published by Stanislas Oger.


international conference on acoustics, speech, and signal processing | 2008

On-demand new word learning using world wide web

Stanislas Oger; Georges Linarès; Frédéric Béchet; Pascal Nocera

Most of the Web-based methods for lexicon augmenting consist in capturing global semantic features of the targeted domain in order to collect relevant documents from the Web. We suggest that the local context of the out-of-vocabulary (OOV) words contains relevant information on the OOV words. With this information, we propose to use the Web to build locally-augmented lexicons which are used in a final local decoding pass. Our experiments confirm the relevance of the Web for the OOV word retrieval. Different methods are proposed to retrieve the hypothesis words. Finally we present the integration of new words in the transcription process based on part-of-speech models. This technique allows to recover 7.6% of the significant OOV words and the accuracy of the system is improved.


Computer Speech & Language | 2012

Integrating imperfect transcripts into speech recognition systems for building high-quality corpora

Benjamin Lecouteux; Georges Linarès; Stanislas Oger

Abstract: The training of state-of-the-art automatic speech recognition (ASR) systems requires huge relevant training corpora. The cost of such databases is high and remains a major limitation for the development of speech-enabled applications in particular contexts (e.g. low-density languages or specialized domains). On the other hand, a large amount of data can be found in news prompts, movie subtitles or scripts, etc. The use of such data as training corpus could provide a low-cost solution to the acoustic model estimation problem. Unfortunately, prior transcripts are seldom exact with respect to the content of the speech signal, and suffer from a lack of temporal information. This paper tackles the issue of prompt-based speech corpora improvement, by addressing the problems mentioned above. We propose a method allowing to locate accurate transcript segments in speech signals and automatically correct errors or lack of transcript surrounding these segments. This method relies on a new decoding strategy where the search algorithm is driven by the imperfect transcription of the input utterances. The experiments are conducted on the French language, by using the ESTER database and a set of records (and associated prompts) from RTBF (Radio Television Belge Francophone). The results demonstrate the effectiveness of the proposed approach, in terms of both error correction and text-to-speech alignment.


IEEE Transactions on Audio, Speech, and Language Processing | 2015

Audio-based video genre identification

Mickael Rouvier; Stanislas Oger; Georges Linarès; Driss Matrouf; Bernard Merialdo; Yingbo Li

This paper presents investigations about the automatic identification of video genre by audio channel analysis. Genre refers to editorial styles such commercials, movies, sports... We propose and evaluate some methods based on both low and high level descriptors, in cepstral or time domains, but also by analyzing the global structure of the document and the linguistic contents. Then, the proposed features are combined and their complementarity is evaluated. On a database composed of single-stories web-videos, the best audio-only based system performs 9% of Classification Error Rate (CER). Finally, we evaluate the complementarity of the proposed audio features and video features that are classically used for Video Genre Identification (VGI). Results demonstrate the complementarity of the modalities for genre recognition, the final audio-video system reaching 6% CER.


international conference on acoustics, speech, and signal processing | 2010

Transcription-based video genre classification

Stanislas Oger; Mickael Rouvier; Georges Linarès

In this paper, we present a new method for video genre identification based on the linguistic content analysis. This approach relies on the analysis of the most frequent words in the video transcriptions provided by an automatic speech recognition system. Experiments are conducted on a corpus composed of cartoons, movies, news, commercials, documentary, sport and music. On this 7-genre identification task, the proposed transcription-based method obtains up to 80% of correct identification. Finally, this rate is increased to 95% by combining the proposed linguistic-level features with low-level acoustic features.


Computer Speech & Language | 2014

Web-based Possibilistic Language Models for Automatic Speech Recognition

Stanislas Oger; Georges Linarès

Abstract This paper describes a new kind of language models based on the possibility theory. The purpose of these new models is to better use the data available on the Web for language modeling. These models aim to integrate information relative to impossible word sequences. We address the two main problems of using this kind of model: how to estimate the measures for word sequences and how to integrate this kind of model into the ASR system. We propose a word-sequence possibilistic measure and a practical estimation method based on word-sequence statistics, which is particularly suited for estimating from Web data. We develop several strategies and formulations for using these models in a classical automatic speech recognition engine, which relies on a probabilistic modeling of the speech recognition process. This work is evaluated on two typical usage scenarios: broadcast news transcription with very large training sets and transcription of medical videos, in a specialized domain, with only very limited training data. The results show that the possibilistic models provide significantly lower word error rate on the specialized domain task, where classical n -gram models fail due to the lack of training materials. For the broadcast news, the probabilistic models remain better than the possibilistic ones. However, a log-linear combination of the two kinds of models outperforms all the models used individually, which indicates that possibilistic models bring information that is not modeled by probabilistic ones.


biomedical engineering and informatics | 2011

Audio indexing on a medical video database: The AVISON project

Grágory Senay; Stanislas Oger; Raphael Rubino; Georges Linarès; Thomas Parent

This paper presents an overview of our research conducted in the context of the AVISON project which aims to develop a platform for indexing surgery videos of the Institute of Research Against Digestive Cancer. The platform is intended to provide a friendly query-based access to the videos database of IRCAD institute, that is dedicated to the training of international surgeons. A text-based indexing system is used for querying the videos where the textual contents are obtained with an automatic speech recognition system. The paper presents the new approaches that we proposed for dealing with these highly specialised data in an automatic manner. We present new approaches for obtaining low-cost training corpus, for automatically adapting the automatic speech recognition system, for allowing multilingual querying of videos and, finally, for filtering documents that could affect the database quality due to transcription errors.


SPECOM'2009 | 2009

Using the World Wide Web for Learning New Words in Continuous Speech Recognition Tasks: Two Case Studies

Stanislas Oger; Vladimir Popescu; Georges Linarès


conference of the international speech communication association | 2009

Probabilistic and possibilistic language models based on the world wide web.

Stanislas Oger; Vladimir Popescu; Georges Linarès


language resources and evaluation | 2010

Transcriber driving strategies for transcription aid system

Grégory Senay; Georges Linarès; Benjamin Lecouteux; Stanislas Oger


conference of the international speech communication association | 2010

Combination of Probabilistic and Possibilistic Language Models

Stanislas Oger; Vladimir Popescu; Georges Linarès

Collaboration


Dive into the Stanislas Oger's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Vladimir Popescu

Grenoble Institute of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge