Gonzalo Iglesias
University of Vigo
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Gonzalo Iglesias.
Computational Linguistics | 2010
Adrià de Gispert; Gonzalo Iglesias; Graeme W. Blackwood; Eduardo Rodríguez Banga; William Byrne
In this article we describe HiFST, a lattice-based decoder for hierarchical phrase-based translation and alignment. The decoder is implemented with standard Weighted Finite-State Transducer (WFST) operations as an alternative to the well-known cube pruning procedure. We find that the use of WFSTs rather than k-best lists requires less pruning in translation search, resulting in fewer search errors, better parameter optimization, and improved translation performance. The direct generation of translation lattices in the target language can improve subsequent rescoring procedures, yielding further gains when applying long-span language models and Minimum Bayes Risk decoding. We also provide insights as to how to control the size of the search space defined by hierarchical rules. We show that shallow-n grammars, low-level rule catenation, and other search constraints can help to match the power of the translation system to specific language pairs.
meeting of the association for computational linguistics | 2009
Gonzalo Iglesias; Adrià de Gispert; Eduardo Rodríguez Banga; William Byrne
We describe refinements to hierarchical translation search procedures intended to reduce both search errors and memory usage through modifications to hypothesis expansion in cube pruning and reductions in the size of the rule sets used in translation. Rules are put into syntactic classes based on the number of non-terminals and the pattern, and various filtering strategies are then applied to assess the impact on translation speed and quality. Results are reported on the 2008 NIST Arabic-to-English evaluation task.
north american chapter of the association for computational linguistics | 2009
Gonzalo Iglesias; Adrià de Gispert; Eduardo Rodríguez Banga; William Byrne
This paper describes a lattice-based decoder for hierarchical phrase-based translation. The decoder is implemented with standard WFST operations as an alternative to the well-known cube pruning procedure. We find that the use of WFSTs rather than k-best lists requires less pruning in translation search, resulting in fewer search errors, direct generation of translation lattices in the target language, better parameter optimization, and improved translation performance when rescoring with long-span language models and MBR decoding. We report translation experiments for the Arabic-to-English and Chinese-to-English NIST translation tasks and contrast the WFST-based hierarchical decoder with hierarchical translation under cube pruning.
north american chapter of the association for computational linguistics | 2015
Adrià de Gispert; Gonzalo Iglesias; Bill Byrne
We propose the use of neural networks to model source-side preordering for faster and better statistical machine translation. The neural network trains a logistic regression model to predict whether two sibling nodes of the source-side parse tree should be swapped in order to obtain a more monotonic parallel corpus, based on samples extracted from the word-aligned parallel corpus. For multiple language pairs and domains, we show that this yields the best reordering performance against other state-of-the-art techniques, resulting in improved translation quality and very fast decoding.
Archive | 2012
Kathleen R. McKeown; Kristen Parton; Nizar Habash; Gonzalo Iglesias; Adrià de Gispert
Automatic post-editors (APEs) enable the re-use of black box machine translation (MT) systems for a variety of tasks where different aspects of translation are important. In this paper, we describe APEs that target adequacy errors, a critical problem for tasks such as cross-lingual question-answering, and compare different approaches for post-editing: a rule-based system and a feedback approach that uses a computer in the loop to suggest improvements to the MT system. We test the APEs on two different MT systems and across two different genres. Human evaluation shows that the APEs significantly improve adequacy, regardless of approach, MT system or genre: 30-56% of the post-edited sentences have improved adequacy compared to the original MT.
Computational Linguistics | 2014
Cyril Allauzen; Bill Byrne; Adrià de Gispert; Gonzalo Iglesias; Michael Riley
This article describes the use of pushdown automata (PDA) in the context of statistical machine translation and alignment under a synchronous context-free grammar. We use PDAs to compactly represent the space of candidate translations generated by the grammar when applied to an input sentence. General-purpose PDA algorithms for replacement, composition, shortest path, and expansion are presented. We describe HiPDT, a hierarchical phrase-based decoder using the PDA representation and these algorithms. We contrast the complexity of this decoder with a decoder based on a finite state automata representation, showing that PDAs provide a more suitable framework to achieve exact decoding for larger synchronous context-free grammars and smaller language models. We assess this experimentally on a large-scale Chinese-to-English alignment and translation task. In translation, we propose a two-pass decoding strategy involving a weaker language model in the first-pass to address the results of PDA complexity analysis. We study in depth the experimental conditions and tradeoffs in which HiPDT can achieve state-of-the-art performance for large-scale SMT.
Machine Translation | 2013
Adrià de Gispert; Graeme W. Blackwood; Gonzalo Iglesias; William Byrne
We report an empirical study of n-gram posterior probability confidence measures for statistical machine translation (SMT). We first describe an efficient and practical algorithm for rapidly computing n-gram posterior probabilities from large translation word lattices. These probabilities are shown to be a good predictor of whether or not the n-gram is found in human reference translations, motivating their use as a confidence measure for SMT. Comprehensive n-gram precision and word coverage measurements are presented for a variety of different language pairs, domains and conditions. We analyze the effect on reference precision of using single or multiple references, and compare the precision of posteriors computed from k-best lists to those computed over the full evidence space of the lattice. We also demonstrate improved confidence by combining multiple lattices in a multi-source translation framework.
Speech Communication | 2008
Manuel González González; Eduardo Rodríguez Banga; Francisco Campillo Díaz; Francisco Méndez Pazó; Leandro Rodríguez Liñares; Gonzalo Iglesias
In this article, we present the main linguistic and phonetic features of Galician which need to be considered in the development of speech technology applications for this language. We also describe the solutions adopted in our text-to-speech system, also useful for speech recognition and speech-to-speech translation. On the phonetic plane in particular, the handling of vocal contact and the determination of mid-vowel openness are discussed. On the linguistic plane we place special emphasis on the handling of clitics and verbs. It should be noted that in Galician there is high interrelation between phonetics and grammatical information. Therefore, the task of morphosyntactic disambiguation is also addressed. Moreover, this task is fundamental for a higher level linguistic analysis.
empirical methods in natural language processing | 2015
Gonzalo Iglesias; Adrià de Gispert; Bill Byrne
We describe a simple and efficient algorithm to disambiguate non-functional weighted finite state transducers (WFSTs), i.e. to generate a new WFST that contains a unique, best-scoring path for each hypothesis in the input labels along with the best output labels. The algorithm uses topological features combined with a tropical sparse tuple vector semiring. We empirically show that our algorithm is more efficient than previous work in a PoStagging disambiguation task. We use our method to rescore very large translation lattices with a bilingual neural network language model, obtaining gains in line with the literature.
north american chapter of the association for computational linguistics | 2016
Daniel Beck; Adrià de Gispert; Gonzalo Iglesias; Aurelien Waite; Bill Byrne
We address the problem of automatically finding the parameters of a statistical machine translation system that maximize BLEU scores while ensuring that decoding speed exceeds a minimum value. We propose the use of Bayesian Optimization to efficiently tune the speed-related decoding parameters by easily incorporating speed as a noisy constraint function. The obtained parameter values are guaranteed to satisfy the speed constraint with an associated confidence margin. Across three language pairs and two speed constraint values, we report overall optimization time reduction compared to grid and random search. We also show that Bayesian Optimization can decouple speed and BLEU measurements, resulting in a further reduction of overall optimization time as speed is measured over a small subset of sentences.