Fabien Cromieres | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Fabien Cromieres is active.

Explore More

Publication

Featured researches published by Fabien Cromieres.

meeting of the association for computational linguistics | 2009

An Alignment Algorithm Using Belief Propagation and a Structure-Based Distortion Model

Fabien Cromieres; Sadao Kurohashi

In this paper, we first demonstrate the interest of the Loopy Belief Propagation algorithm to train and use a simple alignment model where the expected marginal values needed for an efficient EM-training are not easily computable. We then improve this model with a distortion model based on structure conservation.

north american chapter of the association for computational linguistics | 2015

Leveraging Small Multilingual Corpora for SMT Using Many Pivot Languages

Raj Dabre; Fabien Cromieres; Sadao Kurohashi; Pushpak Bhattacharyya

We present our work on leveraging multilingual parallel corpora of small sizes for Statistical Machine Translation between Japanese and Hindi using multiple pivot languages. In our setting, the source and target part of the corpus remains the same, but we show that using several different pivot to extract phrase pairs from these source and target parts lead to large BLEU improvements. We focus on a variety of ways to exploit phrase tables generated using multiple pivots to support a direct source-target phrase table. Our main method uses the Multiple Decoding Paths (MDP) feature of Moses, which we empirically verify as the best compared to the other methods we used. We compare and contrast our various results to show that one can overcome the limitations of small corpora by using as many pivot languages as possible in a multilingual setting. Most importantly, we show that such pivoting aids in learning of additional phrase pairs which are not learned when the direct sourcetarget corpus is small. We obtained improvements of up to 3 BLEU points using multiple pivots for Japanese to Hindi translation compared to when only one pivot is used. To the best of our knowledge, this work is also the first of its kind to attempt the simultaneous utilization of 7 pivot languages at decoding time.

meeting of the association for computational linguistics | 2006

Sub-Sentential Alignment Using Substring Co-Occurrence Counts

Fabien Cromieres

In this paper, we will present an efficient method to compute the co-occurrence counts of any pair of substring in a parallel corpus, and an algorithm that make use of these counts to create sub-sentential alignments on such a corpus. This algorithm has the advantage of being as general as possible regarding the segmentation of text.

meeting of the association for computational linguistics | 2014

KyotoEBMT: An Example-Based Dependency-to-Dependency Translation Framework

John Richardson; Fabien Cromieres; Toshiaki Nakazawa; Sadao Kurohashi

This paper introduces the KyotoEBMT Example-Based Machine Translation framework. Our system uses a tree-to-tree approach, employing syntactic dependency analysis for both source and target languages in an attempt to preserve non-local structure. The effectiveness of our system is maximized with online example matching and a flexible decoder. Evaluation demonstrates BLEU scores competitive with state-of-the-art SMT systems such as Moses. The current implementation is intended to be released as open-source in the near future.

empirical methods in natural language processing | 2014

Translation Rules with Right-Hand Side Lattices

Fabien Cromieres; Sadao Kurohashi

In Corpus-Based Machine Translation, the search space of the translation candidates for a given input sentence is often defined by a set of (cyclefree) context-free grammar rules. This happens naturally in Syntax-Based Machine Translation and Hierarchical Phrase-Based Machine Translation (where the representation will be the set of the target-side half of the synchronous rules used to parse the input sentence). But it is also possible to describe Phrase-Based Machine Translation in this framework. We propose a natural extension to this representation by using lattice-rules that allow to easily encode an exponential number of variations of each rules. We also demonstrate how the representation of the search space has an impact on decoding efficiency, and how it is possible to optimize this representation.

north american chapter of the association for computational linguistics | 2016

Flexible Non-Terminals for Dependency Tree-to-Tree Reordering

John Richardson; Fabien Cromieres; Toshiaki Nakazawa; Sadao Kurohashi

A major benefit of tree-to-tree over treeto-string translation is that we can use target-side syntax to improve reordering. While this is relatively simple for binarized constituency parses, the reordering problem is considerably harder for dependency parses, in which words can have arbitrarily many children. Previous approaches have tackled this problem by restricting grammar rules, reducing the expressive power of the translation model. In this paper we propose a general model for dependency tree-to-tree reordering based on flexible non-terminals that can compactly encode multiple insertion positions. We explore how insertion positions can be selected even in cases where rules do not entirely cover the children of input sentence words. The proposed method greatly improves the flexibility of translation rules at the cost of only a 30% increase in decoding time, and we demonstrate a 1.2–1.9 BLEU improvement over a strong tree-to-tree baseline.

Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers | 2016

The Kyoto University Cross-Lingual Pronoun Translation System.

Raj Dabre; Yevgeniy Puzikov; Fabien Cromieres; Sadao Kurohashi

In this paper we describe our system we designed and implemented for the crosslingual pronoun prediction task as a part of WMT 2016. The majority of the paper will be dedicated to the system whose outputs we submitted wherein we describe the simplified mathematical model, the details of the components and the working by means of an architecture diagram which also serves as a flowchart. We then discuss the results of the official scores and our observations on the same.

Journal of Information Processing | 2018

Exploiting Multilingual Corpora Simply and Efficiently in Neural Machine Translation

Raj Dabre; Fabien Cromieres; Sadao Kurohashi

In this paper, we explore a simple approach for “Multi-Source Neural Machine Translation” (MSNMT) which only relies on preprocessing a N-way multilingual corpus without modifying the Neural Machine Translation (NMT) architecture or training procedure. We simply concatenate the source sentences to form a single, long multisource input sentence while keeping the target side sentence as it is and train an NMT system using this preprocessed corpus. We evaluate our method in resource poor as well as resource rich settings and show its effectiveness (up to 4 BLEU using 2 source languages and up to 6 BLEU using 5 source languages) and compare them against existing approaches. We also provide some insights on how the NMT system leverages multilingual information in such a scenario by visualizing attention. We then show that this multi-source approach can be used for transfer learning to improve the translation quality for single-source systems without using any additional corpora thereby highlighting the importance of multilingual-multiway corpora in low resource scenarios. We also extract and evaluate a multilingual dictionary by a method that utilizes the multi-source attention and show that it works fairly well despite its simplicity.

Proceedings of the First Conference on Machine Translation: Volume 1, Research Papers | 2016

Cross-language Projection of Dependency Trees with Constrained Partial Parsing for Tree-to-Tree Machine Translation.

Yu Shen; Chenhui Chu; Fabien Cromieres; Sadao Kurohashi

Tree-to-tree machine translation (MT) that utilizes syntactic parse trees on both source and target sides suffers from the non-isomorphism of the parse trees due to parsing errors and the difference of annotation criterion between the two languages. In this paper, we present a method that projects dependency parse trees from the language side that has a high quality parser, to the side that has a low quality parser, to improve the isomorphism of the parse trees. We first project a part of the dependencies with high confidence to make a partial parse tree, and then complement the remaining dependencies with partial parsing constrained by the already projected dependencies. MT experiments verify the effectiveness of our proposed method.

PLOS ONE | 2015

Analysis of binary multivariate longitudinal data via 2-dimensional orbits: An application to the Agincourt Health and Socio-Demographic Surveillance System in South Africa.

Maria Vivien Visaya; David Sherwell; Benn Sartorius; Fabien Cromieres

We analyse demographic longitudinal survey data of South African (SA) and Mozambican (MOZ) rural households from the Agincourt Health and Socio-Demographic Surveillance System in South Africa. In particular, we determine whether absolute poverty status (APS) is associated with selected household variables pertaining to socio-economic determination, namely household head age, household size, cumulative death, adults to minor ratio, and influx. For comparative purposes, households are classified according to household head nationality (SA or MOZ) and APS (rich or poor). The longitudinal data of each of the four subpopulations (SA rich, SA poor, MOZ rich, and MOZ poor) is a five-dimensional space defined by binary variables (questions), subjects, and time. We use the orbit method to represent binary multivariate longitudinal data (BMLD) of each household as a two-dimensional orbit and to visualise dynamics and behaviour of the population. At each time step, a point (x, y) from the orbit of a household corresponds to the observation of the household, where x is a binary sequence of responses and y is an ordering of variables. The ordering of variables is dynamically rearranged such that clusters and holes associated to least and frequently changing variables in the state space respectively, are exposed. Analysis of orbits reveals information of change at both individual- and population-level, change patterns in the data, capacity of states in the state space, and density of state transitions in the orbits. Analysis of household orbits of the four subpopulations show association between (i) households headed by older adults and rich households, (ii) large household size and poor households, and (iii) households with more minors than adults and poor households. Our results are compared to other methods of BMLD analysis.

Explore More