Adam Lopez
University of Edinburgh
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Adam Lopez.
ACM Computing Surveys | 2008
Adam Lopez
Statistical machine translation (SMT) treats the translation of natural language as a machine learning problem. By examining many samples of human-produced translation, SMT algorithms automatically learn how to translate. SMT has made tremendous strides in less than two decades, and new ideas are constantly introduced. This survey presents a tutorial overview of the state of the art. We describe the context of the current research and then move to a formal problem description and an overview of the main subproblems: translation modeling, parameter estimation, and decoding. Along the way, we present a taxonomy of some different approaches within these areas. We conclude with an overview of evaluation and a discussion of future directions.
international conference on computational linguistics | 2008
Adam Lopez
Translation model size is growing at a pace that outstrips improvements in computing power, and this hinders research on many interesting models. We show how an algorithmic scaling technique can be used to easily handle very large models. Using this technique, we explore several large model variants and show an improvement 1.4 BLEU on the NIST 2006 Chinese-English task. This opens the door for work on a variety of models that are much less constrained by computational limitations.
empirical methods in natural language processing | 2005
David Chiang; Adam Lopez; Nitin Madnani; Christof Monz; Philip Resnik; Michael Subotin
Hierarchical organization is a well known property of language, and yet the notion of hierarchical structure has been largely absent from the best performing machine translation systems in recent community-wide evaluations. In this paper, we discuss a new hierarchical phrase-based statistical machine translation system (Chiang, 2005), presenting recent extensions to the original proposal, new evaluation results in a community-wide evaluation, and a novel technique for fine-grained comparative analysis of MT systems.
meeting of the association for computational linguistics | 2009
Adam Lopez
We present a unified view of many translation algorithms that synthesizes work on deductive parsing, semiring parsing, and efficient approximate search algorithms. This gives rise to clean analyses and compact descriptions that can serve as the basis for modular implementations. We illustrate this with several examples, showing how to build search spaces for several disparate phrase-based search strategies, integrate non-local features, and devise novel models. Although the framework is drawn from parsing and applied to translation, it is applicable to many dynamic programming problems arising in natural language processing and other areas.
meeting of the association for computational linguistics | 2005
Adam Lopez; Philip Resnik
We introduce improvements to statistical word alignment based on the Hidden Markov Model. One improvement incorporates syntactic knowledge. Results on the workshop data show that alignment performance exceeds that of a state-of-the art system based on more complex models, resulting in over a 5.5% absolute reduction in error on Romanian-English.
workshop on statistical machine translation | 2009
Michael Auli; Adam Lopez; Hieu Hoang; Philipp Koehn
Translation systems are complex, and most metrics do little to pinpoint causes of error or isolate system differences. We use a simple technique to discover induction errors, which occur when good translations are absent from model search spaces. Our results show that a common pruning heuristic drastically increases induction error, and also strongly suggest that the search spaces of phrase-based and hierarchical phrase-based models are highly overlapping despite the well known structural differences.
conference on computational natural language learning | 2009
Abhishek Arun; Chris Dyer; Barry Haddow; Phil Blunsom; Adam Lopez; Philipp Koehn
Recent advances in statistical machine translation have used beam search for approximate NP-complete inference within probabilistic translation models. We present an alternative approach of sampling from the posterior distribution defined by a translation model. We define a novel Gibbs sampler for sampling translations given a source sentence and show that it effectively explores this posterior distribution. In doing so we overcome the limitations of heuristic beam search and obtain theoretically sound solutions to inference problems such as finding the maximum probability translation and minimum expected risk training and decoding.
international conference on acoustics, speech, and signal processing | 2012
Kenji Sagae; Maider Lehr; Emily Prud'hommeaux; Puyang Xu; Nathan Glenn; Damianos Karakos; Sanjeev Khudanpur; Brian Roark; Murat Saraclar; Izhak Shafran; Daniel M. Bikel; Chris Callison-Burch; Yuan Cao; Keith B. Hall; Eva Hasler; Philipp Koehn; Adam Lopez; Matt Post; Darcey Riley
This paper investigates semi-supervised methods for discriminative language modeling, whereby n-best lists are “hallucinated” for given reference text and are then used for training n-gram language models using the perceptron algorithm. We perform controlled experiments on a very strong baseline English CTS system, comparing three methods for simulating ASR output, and compare the results with training with “real” n-best list output from the baseline recognizer. We find that methods based on extracting phrasal cohorts - similar to methods from machine translation for extracting phrase tables - yielded the largest gains of our three methods, achieving over half of the WER reduction of the fully supervised methods.
Machine Translation | 2010
Abhishek Arun; Barry Haddow; Philipp Koehn; Adam Lopez; Chris Dyer; Phil Blunsom
Recent advances in statistical machine translation have used approximate beam search for NP-complete inference within probabilistic translation models. We present an alternative approach of sampling from the posterior distribution defined by a translation model. We define a novel Gibbs sampler for sampling translations given a source sentence and show that it effectively explores this posterior distribution. In doing so we overcome the limitations of heuristic beam search and obtain theoretically sound solutions to inference problems such as finding the maximum probability translation and minimum risk training and decoding.
international conference on acoustics, speech, and signal processing | 2012
Arda Çelebi; Hasim Sak; Erinç Dikici; Murat Saraclar; Maider Lehr; Emily Prud'hommeaux; Puyang Xu; Nathan Glenn; Damianos Karakos; Sanjeev Khudanpur; Brian Roark; Kenji Sagae; Izhak Shafran; Daniel M. Bikel; Chris Callison-Burch; Yuan Cao; Keith B. Hall; Eva Hasler; Philipp Koehn; Adam Lopez; Matt Post; Darcey Riley
We present our work on semi-supervised learning of discriminative language models where the negative examples for sentences in a text corpus are generated using confusion models for Turkish at various granularities, specifically, word, sub-word, syllable and phone levels. We experiment with different language models and various sampling strategies to select competing hypotheses for training with a variant of the perceptron algorithm. We find that morph-based confusion models with a sample selection strategy aiming to match the error distribution of the baseline ASR system gives the best performance. We also observe that substituting half of the supervised training examples with those obtained in a semi-supervised manner gives similar results.