Stéphane Huet
Université de Montréal
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Stéphane Huet.
Computer Speech & Language | 2010
Stéphane Huet; Guillaume Gravier; Pascale Sébillot
Many automatic speech recognition (ASR) systems rely on the sole pronunciation dictionaries and language models to take into account information about language. Implicitly, morphology and syntax are to a certain extent embedded in the language models but the richness of such linguistic knowledge is not exploited. This paper studies the use of morpho-syntactic (MS) information in a post-processing stage of an ASR system, by reordering N-best lists. Each sentence hypothesis is first part-of-speech tagged. A morpho-syntactic score is computed over the tag sequence with a long-span language model and combined to the acoustic and word-level language model scores. This new sentence-level score is finally used to rescore N-best lists by reranking or consensus. Experiments on a French broadcast news task show that morpho-syntactic knowledge improves the word error rate and confidence measures. In particular, it was observed that the errors corrected are not only agreement errors and errors on short grammatical words but also other errors on lexical words where the hypothesized lemma was modified.
Machine Translation | 2010
Julien Bourdaillet; Stéphane Huet; Philippe Langlais; Guy Lapalme
As basic as bilingual concordancers may appear, they are some of the most widely used computer-assisted translation tools among professional translators. Nevertheless, they still do not benefit from recent breakthroughs in machine translation. This paper describes the improvement of the commercial bilingual concordancer TransSearch in order to embed a word alignment feature. The use of statistical word alignment methods allows the system to spot user query translations, and thus the tool is transformed into a translation search engine. We describe several translation identification and postprocessing algorithms that enhance the application. The excellent results obtained using a large translation memory consisting of 8.3 million sentence pairs are confirmed via human evaluation.
Polibits | 2011
Florian Boudin; Stéphane Huet; Juan-Manuel Torres-Moreno
Abstract—Cross-language summarization is the task ofgenerating a summary in a language different from the languageof the source documents. In this paper, we propose a graph-basedapproach to multi-document summarization that integratesmachine translation quality scores in the sentence extractionprocess. We evaluate our method on a manually translated subsetof the DUC 2004 evaluation campaign. Results indicate that ourapproach improves the readability of the generated summarieswithout degrading their informativity.Index Terms—Graph-based approach, cross-language multi-document summarization. I. I NTRODUCTION T HE rapid growth and online availability of informationin numerous languages have made cross-languageinformation retrieval and extraction tasks a highly relevantfield of research. Cross-language document summarizationaims at providing a quick access to information expressedin one or more languages. More precisely, this task consistsin producing a summary in one language different from thelanguage of the source documents. In this study, we focuson English to French multi-document summarization. Theprimary motivation is to allow French readers to access theever increasing amount of news available through Englishnews sources.Recent years have shown an increased amount of interestin applying graph theoretic models to Natural LanguageProcessing (NLP) [1]. Graphs are natural ways to encodeinformation for NLP. Entities can be naturally represented asnodes and relations between them can be represented as edges.Graph-based representations of linguistic units as diverse aswords, sentences and documents give rise to efficient solutionsin a variety of tasks ranging from part-of-speech taggingto information extraction, and sentiment analysis. Here, weapply a graph-based ranking algorithm to multi-documentsummarization.A straightforward idea for cross-language summarizationis to translate the summary from one language to the other.
canadian conference on artificial intelligence | 2009
Julien Bourdaillet; Stéphane Huet; Fabrizio Gotti; Guy Lapalme; Philippe Langlais
Despite the impressive amount of recent studies devoted to improving the state of the art of Machine Translation (MT), Computer Assisted Translation (CAT) tools remain the preferred solution of human translators when publication quality is of concern. In this paper, we present our perspectives on improving the commercial bilingual concordancer TransSearch , a Web-based service whose core technology mainly relies on sentence-level alignment. We report on experiments which show that it can greatly benefit from statistical word-level alignment.
Archive | 2013
Stéphane Huet; Philippe Langlais
This chapter presents a case study relating how a user of TransSearch, a translation spotter as well as a bilingual concordancer available over the Web, can use the tool for finding translations of idiomatic expressions. We show that by paying close attention to the queries made to the system, TransSearch can effectively identify a fair number of idiomatic expressions and their translations. For indicative purposes, we compare the translations identified by our application to those returned by Google Translate and conduct a survey of recent Computer-Assisted Translation tools with similar functionalities to TransSearch.
applications of natural language to data bases | 2018
Elvys Linhares Pontes; Stéphane Huet; Juan-Manuel Torres-Moreno; Andréa Carneiro Linhares
Cross-Language Automatic Text Summarization produces a summary in a language different from the language of the source documents. In this paper, we propose a French-to-English cross-lingual summarization framework that analyzes the information in both languages to identify the most relevant sentences. In order to generate more informative cross-lingual summaries, we introduce the use of chunks and two compression methods at the sentence and multi-sentence levels. Experimental results on the MultiLing 2011 dataset show that our framework improves the results obtained by state-of-the art approaches according to ROUGE metrics.
cross language evaluation forum | 2018
Elvys Linhares Pontes; Stéphane Huet; Juan-Manuel Torres-Moreno
The content analysis task of the MC2 CLEF 2017 lab aims to generate small summaries in four languages to contextualize microblogs. This paper analyzes the challenges of this task and also details the advantages and limitations of our approach using a cross-lingual compressive text summarization. We split this task in several subtasks in order to discuss their setup. In addition, we suggest an evaluation protocol to reduce the bias of the current metrics toward the approaches by extraction.
Proceedings of the 1st ACM SIGCHI International Workshop on Investigating Social Interactions with Artificial Agents | 2017
Matthieu Riou; Bassam Jabaian; Stéphane Huet; Thierry Chaminade; Fabrice Lefèvre
In this paper, we present a brief overview of our ongoing work about artificial interactive agents and their adaptation to users. Several possibilities to introduce humorous productions in a spoken dialog system are investigated in order to enhance naturalness during social interactions between the agent and the user. We finally describe our plan on how neuroscience will help to better evaluate the proposed systems, both objectively and subjectively.
applications of natural language to data bases | 2016
Elvys Linhares Pontes; Stéphane Huet; Juan-Manuel Torres-Moreno; Andréa Carneiro Linhares
In this paper, we propose a new method that uses continuous vectors to map words to a reduced vocabulary, in the context of Automatic Text Summarization (ATS). This method is evaluated on the MultiLing corpus by the ROUGE evaluation measures with four ATS systems. Our experiments show that the reduced vocabulary improves the performance of state-of-the-art systems.
international joint conference on knowledge discovery, knowledge engineering and knowledge management | 2011
Nathalie Camelin; Boris Detienne; Stéphane Huet; Dominique Quadri; Fabrice Lefèvre
Efficient statistical approaches have been recently proposed for natural language understanding in the context of dialogue systems. However, these approaches are trained on data semantically annotated at the segmental level, which increases the production cost of these resources. This kind of semantic annotation implies both to determine the concepts in a sentence and to link them to their corresponding word segments. In this paper, we propose a two-step automatic method for semantic annotation. The first step is an implementation of the latent Dirichlet allocation aiming at discovering concepts in a dialogue corpus. Then this knowledge is used as a bootstrap to infer automatically a segmentation of a word sequence into concepts using either integer linear optimisation or stochastic word alignment models (IBM models). The relation between automatically-derived and manually-defined task-dependent concepts is evaluated on a spoken dialogue task with a reference annotation.