Stéphane Huet | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Stéphane Huet is active.

Explore More

Publication

Featured researches published by Stéphane Huet.

Computer Speech & Language | 2010

Morpho-syntactic post-processing of N-best lists for improved French automatic speech recognition

Stéphane Huet; Guillaume Gravier; Pascale Sébillot

Many automatic speech recognition (ASR) systems rely on the sole pronunciation dictionaries and language models to take into account information about language. Implicitly, morphology and syntax are to a certain extent embedded in the language models but the richness of such linguistic knowledge is not exploited. This paper studies the use of morpho-syntactic (MS) information in a post-processing stage of an ASR system, by reordering N-best lists. Each sentence hypothesis is first part-of-speech tagged. A morpho-syntactic score is computed over the tag sequence with a long-span language model and combined to the acoustic and word-level language model scores. This new sentence-level score is finally used to rescore N-best lists by reranking or consensus. Experiments on a French broadcast news task show that morpho-syntactic knowledge improves the word error rate and confidence measures. In particular, it was observed that the errors corrected are not only agreement errors and errors on short grammatical words but also other errors on lexical words where the hypothesized lemma was modified.

Machine Translation | 2010

TransSearch: from a bilingual concordancer to a translation finder

Julien Bourdaillet; Stéphane Huet; Philippe Langlais; Guy Lapalme

As basic as bilingual concordancers may appear, they are some of the most widely used computer-assisted translation tools among professional translators. Nevertheless, they still do not benefit from recent breakthroughs in machine translation. This paper describes the improvement of the commercial bilingual concordancer TransSearch in order to embed a word alignment feature. The use of statistical word alignment methods allows the system to spot user query translations, and thus the tool is transformed into a translation search engine. We describe several translation identification and postprocessing algorithms that enhance the application. The excellent results obtained using a large translation memory consisting of 8.3 million sentence pairs are confirmed via human evaluation.

Polibits | 2011

A Graph-based Approach to Cross-language Multi-document Summarization

Florian Boudin; Stéphane Huet; Juan-Manuel Torres-Moreno

Abstract—Cross-language summarization is the task ofgenerating a summary in a language different from the languageof the source documents. In this paper, we propose a graph-basedapproach to multi-document summarization that integratesmachine translation quality scores in the sentence extractionprocess. We evaluate our method on a manually translated subsetof the DUC 2004 evaluation campaign. Results indicate that ourapproach improves the readability of the generated summarieswithout degrading their informativity.Index Terms—Graph-based approach, cross-language multi-document summarization. I. I NTRODUCTION T HE rapid growth and online availability of informationin numerous languages have made cross-languageinformation retrieval and extraction tasks a highly relevantﬁeld of research. Cross-language document summarizationaims at providing a quick access to information expressedin one or more languages. More precisely, this task consistsin producing a summary in one language different from thelanguage of the source documents. In this study, we focuson English to French multi-document summarization. Theprimary motivation is to allow French readers to access theever increasing amount of news available through Englishnews sources.Recent years have shown an increased amount of interestin applying graph theoretic models to Natural LanguageProcessing (NLP) [1]. Graphs are natural ways to encodeinformation for NLP. Entities can be naturally represented asnodes and relations between them can be represented as edges.Graph-based representations of linguistic units as diverse aswords, sentences and documents give rise to efﬁcient solutionsin a variety of tasks ranging from part-of-speech taggingto information extraction, and sentiment analysis. Here, weapply a graph-based ranking algorithm to multi-documentsummarization.A straightforward idea for cross-language summarizationis to translate the summary from one language to the other.

canadian conference on artificial intelligence | 2009

Enhancing the Bilingual Concordancer TransSearch with Word-Level Alignment

Julien Bourdaillet; Stéphane Huet; Fabrizio Gotti; Guy Lapalme; Philippe Langlais

Despite the impressive amount of recent studies devoted to improving the state of the art of Machine Translation (MT), Computer Assisted Translation (CAT) tools remain the preferred solution of human translators when publication quality is of concern. In this paper, we present our perspectives on improving the commercial bilingual concordancer TransSearch , a Web-based service whose core technology mainly relies on sentence-level alignment. We report on experiments which show that it can greatly benefit from statistical word-level alignment.

Archive | 2013

Translation of Idiomatic Expressions Across Different Languages: A Study of the Effectiveness of TransSearch

Stéphane Huet; Philippe Langlais

This chapter presents a case study relating how a user of TransSearch, a translation spotter as well as a bilingual concordancer available over the Web, can use the tool for finding translations of idiomatic expressions. We show that by paying close attention to the queries made to the system, TransSearch can effectively identify a fair number of idiomatic expressions and their translations. For indicative purposes, we compare the translations identified by our application to those returned by Google Translate and conduct a survey of recent Computer-Assisted Translation tools with similar functionalities to TransSearch.

applications of natural language to data bases | 2018

Cross-Language Text Summarization Using Sentence and Multi-Sentence Compression

Elvys Linhares Pontes; Stéphane Huet; Juan-Manuel Torres-Moreno; Andréa Carneiro Linhares

Cross-Language Automatic Text Summarization produces a summary in a language different from the language of the source documents. In this paper, we propose a French-to-English cross-lingual summarization framework that analyzes the information in both languages to identify the most relevant sentences. In order to generate more informative cross-lingual summaries, we introduce the use of chunks and two compression methods at the sentence and multi-sentence levels. Experimental results on the MultiLing 2011 dataset show that our framework improves the results obtained by state-of-the art approaches according to ROUGE metrics.

cross language evaluation forum | 2018

Microblog Contextualization: Advantages and Limitations of a Multi-sentence Compression Approach

Elvys Linhares Pontes; Stéphane Huet; Juan-Manuel Torres-Moreno

The content analysis task of the MC2 CLEF 2017 lab aims to generate small summaries in four languages to contextualize microblogs. This paper analyzes the challenges of this task and also details the advantages and limitations of our approach using a cross-lingual compressive text summarization. We split this task in several subtasks in order to discuss their setup. In addition, we suggest an evaluation protocol to reduce the bias of the current metrics toward the approaches by extraction.

Proceedings of the 1st ACM SIGCHI International Workshop on Investigating Social Interactions with Artificial Agents | 2017

Integration and evaluation of social competences such as humor in an artificial interactive agent

Matthieu Riou; Bassam Jabaian; Stéphane Huet; Thierry Chaminade; Fabrice Lefèvre

In this paper, we present a brief overview of our ongoing work about artificial interactive agents and their adaptation to users. Several possibilities to introduce humorous productions in a spoken dialog system are investigated in order to enhance naturalness during social interactions between the agent and the user. We finally describe our plan on how neuroscience will help to better evaluate the proposed systems, both objectively and subjectively.

applications of natural language to data bases | 2016

Automatic Text Summarization with a Reduced Vocabulary Using Continuous Space Vectors

Elvys Linhares Pontes; Stéphane Huet; Juan-Manuel Torres-Moreno; Andréa Carneiro Linhares

In this paper, we propose a new method that uses continuous vectors to map words to a reduced vocabulary, in the context of Automatic Text Summarization (ATS). This method is evaluated on the MultiLing corpus by the ROUGE evaluation measures with four ATS systems. Our experiments show that the reduced vocabulary improves the performance of state-of-the-art systems.

international joint conference on knowledge discovery, knowledge engineering and knowledge management | 2011

Concept Discovery and Automatic Semantic Annotation for Language Understanding in an Information-Query Dialogue System Using Latent Dirichlet Allocation and Segmental Methods

Nathalie Camelin; Boris Detienne; Stéphane Huet; Dominique Quadri; Fabrice Lefèvre

Efficient statistical approaches have been recently proposed for natural language understanding in the context of dialogue systems. However, these approaches are trained on data semantically annotated at the segmental level, which increases the production cost of these resources. This kind of semantic annotation implies both to determine the concepts in a sentence and to link them to their corresponding word segments. In this paper, we propose a two-step automatic method for semantic annotation. The first step is an implementation of the latent Dirichlet allocation aiming at discovering concepts in a dialogue corpus. Then this knowledge is used as a bootstrap to infer automatically a segmentation of a word sequence into concepts using either integer linear optimisation or stochastic word alignment models (IBM models). The relation between automatically-derived and manually-defined task-dependent concepts is evaluated on a spoken dialogue task with a reference annotation.

Explore More