Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Josep Maria Crego is active.

Publication


Featured researches published by Josep Maria Crego.


Computational Linguistics | 2006

N-gram-based Machine Translation

José B. Mariòo; Rafael E. Banchs; Josep Maria Crego; Adrià de Gispert; Patrik Lambert; José A. R. Fonollosa; Marta Ruiz Costa-Jussà

This article describes in detail an n-gram approach to statistical machine translation. This approach consists of a log-linear combination of a translation model based on n-grams of bilingual units, which are referred to as tuples, along with four specific feature functions. Translation performance, which happens to be in the state of the art, is demonstrated with Spanish-to-English and English-to-Spanish translations of the European Parliament Plenary Sessions (EPPS).


Machine Translation | 2006

Improving statistical MT by coupling reordering and decoding

Josep Maria Crego; José B. Mariño

In this paper we describe an elegant and efficient approach to coupling reordering and decoding in statistical machine translation, where the n-gram translation model is also employed as distortion model. The reordering search problem is tackled through a set of linguistically motivated rewrite rules, which are used to extend a monotonic search graph with reordering hypotheses. The extended graph is traversed in the global search when a fully informed decision can be taken. Further experiments show that the n-gram translation model can be successfully used as reordering model when estimated with reordered source words. Experiments are reported on the Europarl task (Spanish–English and English–Spanish). Results are presented regarding translation accuracy and computational efficiency, showing significant improvements in translation quality with respect to monotonic search for both translation directions at a very low computational cost.


The Prague Bulletin of Mathematical Linguistics | 2011

Ncode: an Open Source Bilingual N-gram SMT Toolkit

Josep Maria Crego; François Yvon; José B. Mariño

Ncode: an Open Source Bilingual N-gram SMT Toolkit This paper describes Ncode, an open source statistical machine translation (SMT) toolkit for translation models estimated as n-gram language models of bilingual units (tuples). This toolkit includes tools for extracting tuples, estimating models and performing translation. It can be easily coupled to several other open source toolkits to yield a complete SMT pipeline. In this article, we review the main features of the toolkit and explain how to build a translation engine with Ncode. We also report a short comparison with the widely known Moses system. Results show that Ncode outperforms Moses in terms of memory requirements and translation speed. Ncode also achieves slightly higher accuracy results.


workshop on statistical machine translation | 2008

Using Shallow Syntax Information to Improve Word Alignment and Reordering for SMT

Josep Maria Crego; Nizar Habash

We describe two methods to improve SMT accuracy using shallow syntax information. First, we use chunks to refine the set of word alignments typically used as a starting point in SMT systems. Second, we extend an N-gram-based SMT system with chunk tags to better account for long-distance reorderings. Experiments are reported on an Arabic-English task showing significant improvements. A human error analysis indicates that long-distance reorderings are captured effectively.


spoken language technology workshop | 2006

REORDERING EXPERIMENTS FOR N-GRAM-BASED SMT

Josep Maria Crego; José B. Mariño

This paper addresses the problem of reordering in statistical machine translation (SMT). We describe an elegant and efficient approach to couple reordering (word order monotonization) and decoding, which does not need for any additional model. We use linguistically motivated reordering rules to extend a monotonic search graph (with reordering hypotheses). The extended graph is traversed in decoding when a fully- informed decision can be taken (no preprocessing decision about reordering is taken). We also show how the N-gram translation model can be successfully used as reordering model when estimated with reordered source words (to harmonize the source and target word order). Experiments are reported on the Euparl task (Spanish- to-English and English-to-Spanish). Results are presented regarding translation accuracy and computational efficiency, showing significant improvements in translation quality for both translation directions at a very low computational cost.


north american chapter of the association for computational linguistics | 2007

Discriminative Alignment Training without Annotated Data for Machine Translation

Patrik Lambert; Rafael E. Banchs; Josep Maria Crego

In present Statistical Machine Translation (SMT) systems, alignment is trained in a previous stage as the translation model. Consequently, alignment model parameters are not tuned in function of the translation task, but only indirectly. In this paper, we propose a novel framework for discriminative training of alignment models with automated translation metrics as maximization criterion. In this approach, alignments are optimized for the translation task. In addition, no link labels at the word level are needed. This framework is evaluated in terms of automatic translation evaluation metrics, and an improvement of translation quality is observed.


Machine Translation | 2010

Factored bilingual n-gram language models for statistical machine translation

Josep Maria Crego; François Yvon

In this work, we present an extension of n-gram-based translation models based on factored language models (FLMs). Translation units employed in the n-gram-based approach to statistical machine translation (SMT) are based on mappings of sequences of raw words, while translation model probabilities are estimated through standard language modeling of such bilingual units. Therefore, similar to other translation model approaches (phrase-based or hierarchical), the sparseness problem of the units being modeled leads to unreliable probability estimates, even under conditions where large bilingual corpora are available. In order to tackle this problem, we extend the n-gram-based approach to SMT by tightly integrating more general word representations, such as lemmas and morphological classes, and we use the flexible framework of FLMs to apply a number of different back-off techniques. In this work, we show that FLMs can also be successfully applied to translation modeling, yielding more robust probability estimates that integrate larger bilingual contexts during the translation process.


meeting of the association for computational linguistics | 2007

Extending MARIE: an N-gram-based SMT decoder

Josep Maria Crego; Jos'e B. Mari~no

In this paper we present several extensions of MARIE, a freely available N-gram-based statistical machine translation (SMT) decoder. The extensions mainly consist of the ability to accept and generate word graphs and the introduction of two new N-gram models in the loglinear combination of feature functions the decoder implements. Additionally, the decoder is enhanced with a caching strategy that reduces the number of N-gram calls improving the overall search efficiency. Experiments are carried out over the Eurpoean Parliament Spanish-English translation task.


workshop on statistical machine translation | 2007

Ngram-Based Statistical Machine Translation Enhanced with Multiple Weighted Reordering Hypotheses

Marta R. Costa-jussià; Josep Maria Crego; Patrik Lambert; Maxim Khalilov; José A. R. Fonollosa; José B. Mariño; Rafael E. Banchs

This paper describes the 2007 Ngram-based statistical machine translation system developed at the TALP Research Center of the UPC (Universitat Politecnica de Catalunya) in Barcelona. Emphasis is put on improvements and extensions of the previous years system, being highlyghted and empirically compared. Mainly, these include a novel word ordering strategy based on: (1) statistically monotonizing the training source corpus and (2) a novel reordering approach based on weighted reordering graphs. In addition, this system introduces a target language model based on statistical classes, a feature for out-of-domain units and an improved optimization procedure. The paper provides details of this system participation in the ACL 2007 SECOND WORKSHOP ON STATISTICAL MACHINE TRANSLATION. Results on three pairs of languages are reported, namely from Spanish, French and German into English (and the other way round) for both the in-domain and out-of-domain tasks.


north american chapter of the association for computational linguistics | 2007

Analysis and System Combination of Phrase- and N-Gram-Based Statistical Machine Translation Systems

Marta Ruiz Costa-Jussà; Josep Maria Crego; David Vilar; José A. R. Fonollosa; José B. Mariño; Hermann Ney

In the framework of the Tc-Star project, we analyze and propose a combination of two Statistical Machine Translation systems: a phrase-based and an N-gram-based one. The exhaustive analysis includes a comparison of the translation models in terms of efficiency (number of translation units used in the search and computational time) and an examination of the errors in each systems output. Additionally, we combine both systems, showing accuracy improvements.

Collaboration


Dive into the Josep Maria Crego's collaboration.

Top Co-Authors

Avatar

José B. Mariño

Polytechnic University of Catalonia

View shared research outputs
Top Co-Authors

Avatar

Patrik Lambert

Polytechnic University of Catalonia

View shared research outputs
Top Co-Authors

Avatar

François Yvon

Paris Diderot University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

José A. R. Fonollosa

Polytechnic University of Catalonia

View shared research outputs
Top Co-Authors

Avatar

Marta Ruiz Costa-Jussà

Polytechnic University of Catalonia

View shared research outputs
Top Co-Authors

Avatar

Maxim Khalilov

Polytechnic University of Catalonia

View shared research outputs
Top Co-Authors

Avatar

Hermann Ney

RWTH Aachen University

View shared research outputs
Researchain Logo
Decentralizing Knowledge