Christian Raymond | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Christian Raymond is active.

Explore More

Publication

Featured researches published by Christian Raymond.

IEEE Transactions on Audio, Speech, and Language Processing | 2011

Comparing Stochastic Approaches to Spoken Language Understanding in Multiple Languages

Stefan Hahn; Marco Dinarelli; Christian Raymond; Fabrice Lefèvre; Patrick Lehnen; R. De Mori; Alessandro Moschitti; Hermann Ney; Giuseppe Riccardi

One of the first steps in building a spoken language understanding (SLU) module for dialogue systems is the extraction of flat concepts out of a given word sequence, usually provided by an automatic speech recognition (ASR) system. In this paper, six different modeling approaches are investigated to tackle the task of concept tagging. These methods include classical, well-known generative and discriminative methods like Finite State Transducers (FSTs), Statistical Machine Translation (SMT), Maximum Entropy Markov Models (MEMMs), or Support Vector Machines (SVMs) as well as techniques recently applied to natural language processing such as Conditional Random Fields (CRFs) or Dynamic Bayesian Networks (DBNs). Following a detailed description of the models, experimental and comparative results are presented on three corpora in different languages and with different complexity. The French MEDIA corpus has already been exploited during an evaluation campaign and so a direct comparison with existing benchmarks is possible. Recently collected Italian and Polish corpora are used to test the robustness and portability of the modeling approaches. For all tasks, manual transcriptions as well as ASR inputs are considered. Additionally to single systems, methods for system combination are investigated. The best performing model on all tasks is based on conditional random fields. On the MEDIA evaluation corpus, a concept error rate of 12.6% could be achieved. Here, additionally to attribute names, attribute values have been extracted using a combination of a rule-based and a statistical approach. Applying system combination using weighted ROVER with all six systems, the concept error rate (CER) drops to 12.0%.

Speech Communication | 2006

On the use of finite state transducers for semantic interpretation

Christian Raymond; Frédéric Béchet; Renato De Mori; Géraldine Damnati

A spoken language understanding (SLU) system is described. It generates hypotheses of conceptual constituents with a translation process. This process is performed by finite state transducers (FST) which accept word patterns from a lattice of word hypotheses generated by an Automatic Speech Recognition (ASR) system. FSTs operate in parallel and may share word hypotheses at their input. Semantic hypotheses are obtained by composition of compatible translations under the control of composition rules. Interpretation hypotheses are scored by the sum of the posterior probabilities of paths in the lattice of word hypotheses supporting the interpretation. A compact structured n-best list of interpretation is obtained and used by the SLU interpretation strategy.

international conference on acoustics, speech, and signal processing | 2010

On the use of machine translation for spoken language understanding portability

Christophe Servan; Nathalie Camelin; Christian Raymond; Frédéric Béchet; Renato De Mori

Across language portability of a spoken language understanding system (SLU) deals with the possibility of reusing with moderate effort in a new language knowledge and data acquired for another language. The approach proposed in this paper is motivated by the availability of the fairly large MEDIA corpus carefully transcribed in French and semantically annotated in terms of constituents. A method is proposed for manually translating a portion of the training set for training an automatic machine translation (MT) system to be used for translating the remaining data. As the source language is annotated in terms of concept tags, a solution is presented for automatically transferring these tags to the translated corpus. Experimental results are presented on the accuracy of the translation expressed with the BLEU score as function of the size of the training corpus. It is shown that the process leads to comparable concept error rates in the two languages making the proposed approach suitable for SLU portability across languages.

linguistic annotation workshop | 2007

Standoff Coordination for Multi-Tool Annotation in a Dialogue Corpus

Kepa Joseba Rodríguez; Stefanie Dipper; Michael Götze; Massimo Poesio; Giuseppe Riccardi; Christian Raymond; Joanna Rabiega-Wiśniewska

The LUNA corpus is a multi-lingual, multi-domain spoken dialogue corpus currently under development that will be used to develop a robust natural spoken language understanding toolkit for multilingual dialogue services. The LUNA corpus will be annotated at multiple levels to include annotations of syntactic, semantic, and discourse information; specialized annotation tools will be used for the annotation at each of these levels. In order to synchronize these multiple layers of annotation, the PAULA standoff exchange format will be used. In this paper, we present the corpus and its PAULA-based architecture.

IEEE Transactions on Audio, Speech, and Language Processing | 2007

Sequential Decision Strategies for Machine Interpretation of Speech

Christian Raymond; Frédéric Béchet; Nathalie Camelin; R. De Mori; Géraldine Damnati

Recognition errors made by automatic speech recognition (ASR) systems may not prevent the development of useful dialogue applications if the interpretation strategy has an introspection capability for evaluating the reliability of the results. This paper proposes an interpretation strategy which is particularly effective when applications are developed with a training corpus of moderate size. From the lattice of word hypotheses generated by an ASR system, a short list of conceptual structures is obtained with a set of finite state machines (FSM). Interpretation or a rejection decision is then performed by a tree-based strategy. The nodes of the tree correspond to elaboration-decision units containing a redundant set of classifiers. A decision tree based and two large margin classifiers are trained with a development set to become interpretation knowledge sources. Discriminative training of the classifiers selects linguistic and confidence-based features for contributing to a cooperative assessment of the reliability of an interpretation. Such an assessment leads to the definition of a limited number of reliability states. The probability that a proposed interpretation is correct is provided by its reliability state and transmitted to the dialogue manager. Experimental results are presented for a telephone service application

ieee automatic speech recognition and understanding workshop | 2003

Belief confirmation in spoken dialog systems using confidence measures

Christian Raymond; Yannick Estève; Frédéric Béchet; R. De Mori; Géraldine Damnati

The approach proposed is an alternative to the traditional architecture of spoken dialogue systems where the system belief is either not taken into account during the automatic speech recognition process or included in the decoding process but never challenged. By representing all the conceptual structures handled by the dialogue manager by finite state machines and by building a conceptual model that contains all the possible interpretations of a given word-graph, we propose a decoding architecture that searches first for the best conceptual interpretation before looking for the best string of words. Once both N-best sets (at the concept level and at the word level) are generated, a verification process is performed on each N-best set using acoustic and linguistic confidence measures. A first selection strategy that does not include for the moment the dialogue context is proposed and significant error reduction on the understanding measures are obtained.

international conference on acoustics, speech, and signal processing | 2005

Semantic interpretation with error correction

Christian Raymond; Frédéric Béchet; Nathalie Camelin; R. De Mori; Géraldine Damnati

The paper presents a semantic interpretation strategy, for spoken dialogue systems, including an error correction process. Semantic interpretations output by the spoken understanding module may be incorrect, but some semantic components may be correct. A set of situations are introduced, describing semantic confidence based on the agreement of semantic interpretations proposed by different classification methods. The interpretation strategy considers, with the highest priority, the validation of the interpretation arising from the most likely sequence of words. If our confidence score model gives a high probability that this interpretation is not correct, then possible corrections of it are considered using the other sequences in the N-best lists of possible interpretations. This strategy is evaluated on a dialogue corpus provided by France Telecom R&D and collected for a tourism telephone service. Significant reduction in understanding error rate are obtained as well as powerful new confidence measures.

international conference on acoustics, speech, and signal processing | 2004

Automatic learning of interpretation strategies for spoken dialogue systems

Christian Raymond; Frédéric Béchet; R. De Mori; Géraldine Damnati; Yannick Estève

The paper proposes a new application of automatically trained decision trees to derive the interpretation of a spoken sentence. A new strategy for building structured cohorts of candidates is also described. By evaluating predicates related to the acoustic confidence of the words expressing a concept, the linguistic and semantic consistency of candidates in the cohort and the rank of a candidate within a cohort, the decision tree automatically learns a decision strategy for rescoring or rejecting an n-best list of candidates representing a users utterance. A relative reduction of 18.6% in the understanding error rate is obtained by our rescoring strategy with no utterance rejection and a relative reduction of 43.1% of the same error rate is achieve with a rejection rate of only 8% of the utterances.

IEEE Transactions on Speech and Audio Processing | 2003

On the use of linguistic consistency in systems for human-computer dialogues

Yannick Estève; Christian Raymond; R. De Mori; David Janiszek

This paper introduces new recognition strategies based on reasoning about results obtained with different Language Models (LMs). Strategies are built following the conjecture that the consensus among the results obtained with different models gives rise to different situations in which hypothesized sentences have different word error rates (WER) and may be further processed with other LMs. New LMs are built by data augmentation using ideas from latent semantic analysis and trigram analogy. Situations are defined by expressing the consensus among the recognition results produced with different LMs and by the amount of unobserved trigrams in the hypothesized sentence. The diagnostic power of the use of observed trigrams or their corresponding class trigrams is compared with that of situations based on values of sentence posterior probabilities. In order to avoid or correct errors due to syntactic inconsistence of the recognized sentence, automata, obtained by explanation-based learning, are introduced and used in certain conditions. Semantic Classification Trees are introduced to provide sentence patterns expressing constraints of long distance syntactic coherence. Results on a dialogue corpus provided by France Telecom R&D have shown that starting with a WER of 21.87% on a test set of 1422 sentences, it is possible to subdivide the sentences into three sets characterized by automatically recognized situations. The first one has a coverage of 68% with a WER of 7.44%. The second one has various types of sentences with a WER around 20%. The third one contains 13% of the sentences that should be rejected with a WER around 49%. The second set characterizes sentences that should be processed with particular care by the dialogue interpreter with the possibility of asking a confirmation from the user.

international conference on multimedia retrieval | 2016

Bidirectional Joint Representation Learning with Symmetrical Deep Neural Networks for Multimodal and Crossmodal Applications

Vedran Vukotić; Christian Raymond; Guillaume Gravier

Common approaches to problems involving multiple modalities (classification, retrieval, hyperlinking, etc.) are early fusion of the initial modalities and crossmodal translation from one modality to the other. Recently, deep neural networks, especially deep autoencoders, have proven promising both for crossmodal translation and for early fusion via multimodal embedding. In this work, we propose a flexible crossmodal deep neural network architecture for multimodal and crossmodal representation. By tying the weights of two deep neural networks, symmetry is enforced in central hidden layers thus yielding a multimodal representation space common to the two original representation spaces. The proposed architecture is evaluated in multimodal query expansion and multimodal retrieval tasks within the context of video hyperlinking. Our method demonstrates improved crossmodal translation capabilities and produces a multimodal embedding that significantly outperforms multimodal embeddings obtained by deep autoencoders, resulting in an absolute increase of 14.14 in precision at 10 on a video hyperlinking task.

Explore More