Marc Dymetman | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Marc Dymetman is active.

Explore More

Publication

Featured researches published by Marc Dymetman.

empirical methods in natural language processing | 2005

Translating with Non-contiguous Phrases

Michel Simard; Nicola Cancedda; Bruno Cavestro; Marc Dymetman; Eric Gaussier; Cyril Goutte; Kenji Yamada; Philippe Langlais; Arne Mauser

This paper presents a phrase-based statistical machine translation method, based on non-contiguous phrases, i.e. phrases with gaps. A method for producing such phrases from a word-aligned corpora is proposed. A statistical translation model is also presented that deals such phrases, as well as a training method based on the maximization of translation accuracy, as measured with the NIST evaluation metric. Translations are produced by means of a beam-search decoder. Experimental results are presented, that demonstrate how the proposed method allows to better generalize from the training data.

international conference on computational linguistics | 2000

XML and multilingual document authoring: convergent trends

Marc Dymetman; Veronika Lux; Aarne Ranta

Typical approaches to XML authoring view a XML document as a mixture of structure (the tags) and surface (text between the tags). We advocate a radical approach where the surface disappears from the XML document altogether to be handled exclusively by rendering mechanisms. This move is based on the view that the authors choices when authoring XML documents are best seen as language-neutral semantic decisions, that the structure can then be viewed as interlingual content, and that the textual output should be derived from this content by language-specific realization mechanisms, thus assimilating XML authoring to Multilingual Document Authoring. However, standard XML tools have important limitations when used for such a purpose: (1) they are weak at propagating semantic dependencies between different parts of the structure, and, (2) current XML rendering tools are ill-suited for handling the grammatical combination of textual units. We present two related proposals for overcoming these limitations: one (GF) originating in the tradition of mathematical proof editors and constructive type theory, the other (IG), a specialization of Definite Clause Grammars strongly inspired by GF.

international joint conference on natural language processing | 2009

Source-Language Entailment Modeling for Translating Unknown Terms

Shachar Mirkin; Lucia Specia; Nicola Cancedda; Ido Dagan; Marc Dymetman; Idan Szpektor

This paper addresses the task of handling unknown terms in SMT. We propose using source-language monolingual models and resources to paraphrase the source text prior to translation. We further present a conceptual extension to prior work by allowing translations of entailed texts rather than paraphrases only. A method for performing this process efficiently is presented and applied to some 2500 sentences with unknown terms. Our experiments show that the proposed approach substantially increases the number of properly translated texts.

international conference on natural language generation | 2000

Document structure and multilingual authoring

Caroline Brun; Marc Dymetman; Veronika Lux

The use of XML-based authoring tools is swiftly becoming a standard in the world of technical documentation. An XML document is a mixture of structure (the tags) and surface (text between the tags). The structure reflects the choices made by the author during the top-down stepwise refinement of the document under control of a DTD grammar. These choices are typically choices of meaning which are independent of the language in which the document is rendered, and can be seen as a kind of interlingua for the class of documents which is modeled by the DTD. Based on this remark, we advocate a radicalization of XML authoring, where the semantic content of the document is accounted for exclusively in terms of choice structures, and where appropriate rendering/realization mechanisms are responsible for producing the surface, possibly in several languages simultaneously. In this view, XML authoring has strong connections to natural language generation and text authoring. We describe the IG (Interaction Grammar) formalism, an extension of DTDs which permits powerful linguistic manipulations, and show its application to the production of multilingual versions of a certain class of pharmaceutical documents.

international conference on computational linguistics | 1990

A symmetrical approach to parsing and generation

Marc Dymetman; Pierre Isabelle; François Perrault

Lexical Grammars are a class of unification grammars which share a fixed rule component, for which there exists a simple left-recursion elimination transformation. The parsing and generation programs are seen as two dual non-left-recursive versions of the original grammar, and are implemented through a standard top-down Prolog interpreter. Formal criteria for termination are given as conditions on lexical entries: during parsing as well as during generation the processing of a lexical entry consumes some amount of a guide; the guide used for parsing is a list of words remaining to be analyzed, while the guide for generation is a list of the semantics of constituents waiting to be generated.

international joint conference on natural language processing | 2009

Phrase-Based Statistical Machine Translation as a Traveling Salesman Problem

Mikhail Zaslavskiy; Marc Dymetman; Nicola Cancedda

An efficient decoding algorithm is a crucial element of any statistical machine translation system. Some researchers have noted certain similarities between SMT decoding and the famous Traveling Salesman Problem; in particular (Knight, 1999) has shown that any TSP instance can be mapped to a sub-case of a word-based SMT model, demonstrating NP-hardness of the decoding task. In this paper, we focus on the reverse mapping, showing that any phrase-based SMT decoding problem can be directly reformulated as a TSP. The transformation is very natural, deepens our understanding of the decoding problem, and allows direct use of any of the powerful existing TSP solvers for SMT decoding. We test our approach on three datasets, and compare a TSP-based decoder to the popular beam-search algorithm. In all cases, our method provides competitive or better performance.

international conference on computational linguistics | 1988

CRITTER: a translation system for agricultural market reports

Pierre Isabelle; Marc Dymetman; Elliott Macklovitch

The CRITTER system is being developed to translate agricultural market reports between English and French. It is based on a transfer model, and designed to be reversible. The source and target language texts are described by means of: a) a surface syntactic representation consisting of a tree annotated with feature structures, built by an extraposition grammar, and b) a semantic representation exhibiting predicate argument structures and constrained by type checking, built in parallel with the syntactic structure in compositional fashion. CRITTERSs implementation is still incomplete, but results obtained so far are promising.

international conference on computational linguistics | 2002

Text authoring, knowledge acquisition and description logics

Marc Dymetman

We present a principled approach to the problem of connecting a controlled document authoring system with a knowledge base. We start by describing closed-world authoring situations, in which the knowledge base is used for constraining the possible documents and orienting the users selections. Then we move to open-world authoring situations in which, additionally, choices made during authoring are echoed back to the knowledge base. In this way the information implicitly encoded in a document becomes explicit in the knowledge base and can be re-exploited for simplifying the authoring of new documents. We show how a Datalog KB is sufficient for is the closed-world situation, while a Description Logic KB is better-adapted to the more complex open-world situation, All along, we pay special attention to logically sound solutions and to decidability issues in the different processes.

international conference on computational linguistics | 1992

A generalized Greibach Normal Form for definite clause grammars

Marc Dymetman

An arbitrary definite clause grammar can be transformed into a so-called Generalized Greibach Normal Form (GGNF), a generalization of the classical Greibach Normal Form (GNF) for context-free grammars.The normalized definite clause grammar is declaratively equivalent to the original definite clause grammar, that is, it assigns the same analyses to the same strings. Offline-parsability of the original grammar is reflected in an elementary textual property of the transformed grammar. When this property holds, a direct (top-down) Prolog implementation of the normalized grammar solves the parsing problem: all solutions are enumerated on backtracking and execution terminates.When specialized to the simpler case of context-free grammars, the GGNF provides a variant to the GNF, where the transformed context-free grammar not only generates the same strings as the original grammar, but also preserves their degrees of ambiguity (this last property does not hold for the GNF).The GGNF seems to be the first normal form result for DCGs. It provides an explicit factorization of the potential sources of undecidability for the parsing problem, and offers valuable insights on the computational structure of unification grammars in general.

logical aspects of computational linguistics | 1996

Logical Aspects of Computational Linguistics: An Introduction

Patrick Blackburn; Marc Dymetman; Alain Lecomte; Aarne Ranta; Christian Retoré; Éric Villemonte de la Clergerie

The papers in this collection are all devoted to single theme: logic and its applications in computational linguistics. They share many themes, goals and techniques, and any editorial classification is bound to highlight some connections at the expense of other. Nonetheless, we have found it useful to divide these papers (somewhat arbitrarily) into the following four categories: logical semantics of natural language, grammar and logic, mathematics with linguistic motivations, and computational perspectives. In this introduction, we use this four-way classification as a guide to the papers, and, more generally, to the research agenda that underlies them. We hope that the reader will find it a useful starting point to the collection.

Explore More