Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Michel Simard is active.

Publication


Featured researches published by Michel Simard.


international acm sigir conference on research and development in information retrieval | 1999

Cross-language information retrieval based on parallel texts and automatic mining of parallel texts from the Web

Jian-Yun Nie; Michel Simard; Pierre Isabelle; Richard Durand

This paper describes the use of a probabilistic translation model to cross-language IR (CLIR). The performance of this approach is compared with that using machine translation (MT). It is shown that using a probabilistic model, we are able to obtain performances close to those using an MT system. In addition, we also investigated the possibility of automatically gather parallel texts from the Web in an attempt to construct a reasonable training corpus. The result is very encouraging. We showed that in several tests, such a training corpus is as good as a manually constructed one for CLIR purposes.


Computational Linguistics | 2003

Embedding web-based statistical translation models in cross-language information retrieval

Wessel Kraaij; Jian-Yun Nie; Michel Simard

Although more and more language pairs are covered by machine translation (MT) services, there are still many pairs that lack translation resources. Cross-language information retrieval (CLIR) is an application that needs translation functionality of a relatively low level of sophistication, since current models for information retrieval (IR) are still based on a bag of words. The Web provides a vast resource for the automatic construction of parallel corpora that can be used to train statistical translation models automatically. The resulting translation models can be embedded in several ways in a retrieval model. In this article, we will investigate the problem of automatically mining parallel texts from the Web and different ways of integrating the translation models within the retrieval process. Our experiments on standard test collections for CLIR show that the Web-based translation models can surpass commercial MT systems in CLIR tasks. These results open the perspective of constructing a fully automatic query translation device for CLIR at a very low cost.


workshop on statistical machine translation | 2007

Rule-Based Translation with Statistical Phrase-Based Post-Editing

Michel Simard; Nicola Ueffing; Pierre Isabelle; Roland Kuhn

This article describes a machine translation system based on an automatic post-editing strategy: initially translate the input text into the target-language using a rule-based MT system, then automatically post-edit the output using a statistical phrase-based system. An implementation of this approach based on the SYSTRAN and PORTAGE MT systems was used in the shared task of the Second Workshop on Statistical Machine Translation. Experimental results on the test data of the previous campaign are presented.


empirical methods in natural language processing | 2005

Translating with Non-contiguous Phrases

Michel Simard; Nicola Cancedda; Bruno Cavestro; Marc Dymetman; Eric Gaussier; Cyril Goutte; Kenji Yamada; Philippe Langlais; Arne Mauser

This paper presents a phrase-based statistical machine translation method, based on non-contiguous phrases, i.e. phrases with gaps. A method for producing such phrases from a word-aligned corpora is proposed. A statistical translation model is also presented that deals such phrases, as well as a training method based on the maximization of translation accuracy, as measured with the NIST evaluation metric. Translations are produced by means of a beam-search decoder. Experimental results are presented, that demonstrate how the proposed method allows to better generalize from the training data.


Machine Translation | 1998

Bilingual Sentence Alignment: Balancing Robustness and Accuracy

Michel Simard; Plamondon Plamondon

Sentence alignment is the problem of making explicit the relations that exist between the sentences of two texts that are known to be mutual translations. Automatic sentence-alignment methods typically face two kinds of difficulties. First, there is the question of robustness. In real life, discrepancies between a source text and its translation are quite common: differences in layout, omissions, inversions, etc. Sentence-alignment programs must be ready to deal with such phenomena. Then, there is the question of accuracy. Even when translations are “clean”, alignment is still not a trivial matter: some decisions are hard to make, even for humans. We report here on the current state of our ongoing efforts to produce a sentence-alignment program that is both robust and accurate. The method that we propose relies on two new alignment engines: one that produces highly reliable and robust character-level alignments, and one that relies on statistical lexical knowledge to produce accurate mappings. Experimental results are presented which demonstrate the methods effectiveness, and highlight where problems remain to be solved.


meeting of the association for computational linguistics | 1998

Methods and Practical Issues in Evaluating Alignment Techniques

Philippe Langlais; Michel Simard; Jean Véronis

This paper describes the work achieved in the first half of a 4-year cooperative research project (ARCADE), financed by AUPELF-UREF. The project is devoted to the evaluation of parallel text alignment techniques. In its first period ARCADE ran a competition between six systems on a sentence-to-sentence alignment task which yielded two main types of results. First, a large reference bilingual corpus comprising of texts of different genres was created, each presenting various degrees of difficulty with respect to the alignment task.Second, significant methodological progress was made both on the evaluation protocols and metrics, and the algorithms used by the different systems. For the second phase, which is now underway, ARCADE has been opened to a larger number of teams who will tackle the problem of word-level alignment.


north american chapter of the association for computational linguistics | 2003

Translation spotting for translation memories

Michel Simard

The term translation spotting (TS) refers to the task of identifying the target-language (TL) words that correspond to a given set of source-language (SL) words in a pair of text segments known to be mutual translations. This article examines this task within the context of a sub-sentential translation-memory system, i.e. a translation support tool capable of proposing translations for portions of a SL sentence, extracted from an archive of existing translations. Different methods are proposed, based on a statistical translation model. These methods take advantage of certain characteristics of the application, to produce TL segments submitted to constraints of contiguity and compositionality. Experiments show that imposing these constraints allows important gains in accuracy, with regard to the most probable alignments predicted by the model.


workshop on statistical machine translation | 2007

NRC's PORTAGE System for WMT 2007

Nicola Ueffing; Michel Simard; Samuel Larkin; Howard Johnson

We present the PORTAGE statistical machine translation system which participated in the shared task of the ACL 2007 Second Workshop on Statistical Machine Translation. The focus of this description is on improvements which were incorporated into the system over the last year. These include adapted language models, phrase table pruning, an IBM1-based decoder feature, and rescoring with posterior probabilities.


Natural Language Engineering archive | 2005

Parallel texts

Rada Mihalcea; Michel Simard

Parallel texts have become a vital element for natural language processing. We present a panorama of current research activities related to parallel texts, and offer some thoughts about the future of this rich field of investigation.


cross language evaluation forum | 2001

Using Statistical Translation Models for Bilingual IR

Jian-Yun Nie; Michel Simard

This report describes our tests on applying statistical translation models for bilingual IR tasks in CLEF-2001. These translation models have been trained on a set of parallel web pages automatically mined from the Web. Our previous studies have shown the utility of such corpora for cross-language information retrieval. The goal of the current tests is to see how we can improve the quality of the translation models and make best uses of them. Several questions are considered: Is it useful to consider the IDF factor in addition to the translation probabilities? Is it useful to further clean the training corpora before model training or the translation models themselves? How could we combine the translation models with bilingual dictionaries? Although our tests do not allow us to answer all these questions, they provide useful indication to several further research directions.

Collaboration


Dive into the Michel Simard's collaboration.

Top Co-Authors

Avatar

James D. Wuest

Université de Montréal

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Cyril Goutte

National Research Council

View shared research outputs
Top Co-Authors

Avatar

Roland Kuhn

National Research Council

View shared research outputs
Top Co-Authors

Avatar

Okba Saied

Université de Montréal

View shared research outputs
Top Co-Authors

Avatar

Thierry Maris

Université de Montréal

View shared research outputs
Top Co-Authors

Avatar

Michèle Dartiguenave

California Institute of Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge