Olivier Galibert
Centre national de la recherche scientifique
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Olivier Galibert.
international conference on acoustics, speech, and signal processing | 2003
Vincent M. Stanford; John S. Garofolo; Olivier Galibert; Martial Michel; Christophe Laprun
Pervasive computing devices, sensors, and networks, provide infrastructure for context aware smart meeting rooms that sense ongoing human activities and respond to them. This requires advances in areas including networking, distributed computing, sensor data acquisition, signal processing, speech recognition, human identification, and natural language processing. Open interoperability and metrology standards for the sensor and recognition technologies can aid R&D programs in making these advances. The NIST (National Institute of Standards and Technology) Smart Space and Meeting Room projects are developing tools for data formats, transport, distributed processing, and metadata. We are using them to create annotated multi modal research corpora and measurement algorithms for smart meeting rooms, which we are making available to the research and development community.
content based multimedia indexing | 2012
Juliette Kahn; Olivier Galibert; Ludovic Quintard; Matthieu Carré; Aude Giraudel; Philippe Joly
The REPERE Challenge aims to support research on people recognition in multimodal conditions. To assess the technology progress, annual evaluation campaigns will be organized from 2012 to 2014. In this context the REPERE corpus, a French video corpus with multimodal annotation, has been developed. The systems have to answer the following questions: Who is speaking? Who is present in the video? What names are cited? What names are displayed? The challenge is to combine the various information coming from the speech and the images.
workshop on statistical machine translation | 2008
Daniel Déchelotte; Gilles Adda; Alexandre Allauzen; Hélène Bonneau-Maynard; Olivier Galibert; Jean-Luc Gauvain; Philippe Langlais; François Yvon
This paper describes our statistical machine translation systems based on the Moses toolkit for the WMT08 shared task. We address the Europarl and News conditions for the following language pairs: English with French, German and Spanish. For Europarl, n-best rescoring is performed using an enhanced n-gram or a neuronal language model; for the News condition, language models incorporate extra training data. We also report unconvincing results of experiments with factored models.
cross language evaluation forum | 2008
Sophie Rosset; Olivier Galibert; Gilles Adda; Eric Bilinski
In this paper, we present two different question-answering systems on speech transcripts which participated to the QAst 2007 evaluation. These two systems are based on a complete and multi-level analysis of both queries and documents. The first system uses handcrafted rules for small text fragments (snippet) selection and answer extraction. The second one replaces the handcrafting with an automatically generated research descriptor. A score based on those descriptors is used to select documents and snippets. The extraction and scoring of candidate answers is based on proximity measurements within the research descriptor elements and a number of secondary factors. The evaluation results are ranged from 17% to 39% as accuracy depending on the tasks.
international conference on acoustics, speech, and signal processing | 2007
Lori Lamel; Jean-Luc Gauvain; Gilles Adda; Claude Barras; Eric Bilinski; Olivier Galibert; Agusti Pujol; Holger Schwenk; Xuan Zhu
This paper describes the speech recognizers developed to transcribe European Parliament Plenary Sessions (EPPS) in English and Spanish in the 2nd TC-STAR Evaluation Campaign. The speech recognizers are state-of-the-art systems using multiple decoding passes with models (lexicon, acoustic models, language models) trained for the different transcription tasks. Compared to the LIMSI TC-STAR 2005 EPPS systems, relative word error rate reductions of about 30% have been achieved on the 2006 development data. The word error rates with the LIMSI systems on the 2006 EPPS evaluation data are 8.2% for English and 7.8% for Spanish. Experiments with cross-site adaptation and system combination are also described.
document analysis systems | 2014
Sylvie Brunessaux; Patrick Giroux; Bruno Grilheres; Mathieu Manta; Maylis Bodin; Khalid Choukri; Olivier Galibert; Juliette Kahn
This paper presents the achievements of an experimental project called Maurdor (Moyens AUtomatisés deReconnaissance de Documents ecRits - Automatic Processingof Digital Documents) funded by the French DGA that aims at improving processing technologies for handwritten and typewritten documents in French, English and Arabic. The first part describes the context and objectives of the project. The second part deals with the challenge of creating a realistic corpus of 10,000 annotated documents to support the efficient development and evaluation of processing modules. The third part presents the organisation, metric definition and results of the Maurdor International evaluation campaign. The last part presents the Maurdor demonstrator with a functional and technical perspective.
international conference on acoustics, speech, and signal processing | 2004
Lori Lamel; Jean-Luc Gauvain; Gilles Adda; Martine Adda-Decker; L. Canseco; Langzhou Chen; Olivier Galibert; Abdelkhalek Messaoudi; Holger Schwenk
The paper summarizes recent work underway at LIMSI on speech-to-text transcription in multiple languages. The research has been oriented towards the processing of broadcast audio and conversational speech for information access. Broadcast news transcription systems have been developed for seven languages, and it is planned to address several other languages in the near term. Research on conversational speech has mainly focused on the English language, with some initial work on French, Arabic and Spanish. Automatic processing must take into account the characteristics of the audio data, such as needing to deal with the continuous data stream, specificities of the language and the use of an imperfect word transcription for accessing the information content. Our experience thus far indicates that at todays word error rates, the techniques used in one language can be successfully ported to other languages, and most of the language specificities concern lexical and pronunciation modeling.
Natural Language Engineering | 2009
B.W. van Schooten; H.J.A. op den Akker; Sophie Rosset; Olivier Galibert; Aurélien Max; Gabriel Illouz
One of the basic topics of question answering (QA) dialogue systems is how follow-up questions should be interpreted by a QA system. In this paper, we shall discuss our experience with the IMIX and Ritel systems, for both of which a follow-up question handling scheme has been developed, and corpora have been collected. These two systems are each others opposites in many respects: IMIX is multimodal, non-factoid, black-box QA, while Ritel is speech, factoid, keyword-based QA. Nevertheless, we will show that they are quite comparable, and that it is fruitful to examine the similarities and differences. We shall look at how the systems are composed, and how real, non-expert, users interact with the systems. We shall also provide comparisons with systems from the literature where possible, and indicate where open issues lie and in what areas existing systems may be improved. We conclude that most systems have a common architecture with a set of common subtasks, in particular detecting follow-up questions and finding referents for them. We characterise these tasks using the typical techniques used for performing them, and data from our corpora. We also identify a special type of follow-up question, the discourse question, which is asked when the user is trying to understand an answer, and propose some basic methods for handling it.
cross language evaluation forum | 2009
Guillaume Bernard; Sophie Rosset; Olivier Galibert; Gilles Adda; Eric Bilinski
We present in this paper the three LIMSI question-answering systems on speech transcripts which participated to the QAst 2009 evaluation. These systems are based on a complete and multi-level analysis of both queries and documents. These systems use an automatically generated research descriptor. A score based on those descriptors is used to select documents and snippets. Three different methods are tried to extract and score candidate answers, and we present in particular a tree transformation based ranking method. We participated to all the tasks and submitted 30 runs (for 24 sub-tasks). The evaluation results for manual transcripts range from 27% to 36% for accuracy depending on the task and from 20% to 29% for automatic transcripts.
cross language evaluation forum | 2008
Sophie Rosset; Olivier Galibert; Guillaume Bernard; Eric Bilinski; Gilles Adda
In this paper, we present the LIMSI question-answering system which participated to the Question Answering on speech transcripts 2008 evaluation. This systems is based on a complete and multi-level analysis of both queries and documents. It uses an automatically generated research descriptor. A score based on those descriptors is used to select documents and snippets. The extraction and scoring of candidate answers is based on proximity measurements within the research descriptor elements and a number of secondary factors. We participated to all the subtasks and submitted 18 runs (for 16 sub-tasks). The evaluation results for manual transcripts range from 31% to 45% for accuracy depending on the task and from 16 to 41% for automatic transcripts.