Eva Hasler
University of Edinburgh
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Eva Hasler.
conference of the european chapter of the association for computational linguistics | 2014
Eva Hasler; Phil Blunsom; Philipp Koehn; Barry Haddow
Translating text from diverse sources poses a challenge to current machine translation systems which are rarely adapted to structure beyond corpus level. We explore topic adaptation on a diverse data set and present a new bilingual variant of Latent Dirichlet Allocation to compute topic-adapted, probabilistic phrase translation features. We dynamically infer document-specific translation probabilities for test sets of unknown origin, thereby capturing the effects of document context on phrase translations. We show gains of up to 1.26 BLEU over the baseline and 1.04 over a domain adaptation benchmark. We further provide an analysis of the domain-specific data and show additive gains of our model in combination with other types of topic-adapted features.
The Prague Bulletin of Mathematical Linguistics | 2011
Eva Hasler; Barry Haddow; Philipp Koehn
Margin Infused Relaxed Algorithm for Moses We describe an open-source implementation of the Margin Infused Relaxed Algorithm (MIRA) for statistical machine translation (SMT). The implementation is part of the Moses toolkit and can be used as an alternative to standard minimum error rate training (MERT). A description of the implementation and its usage on core feature sets as well as large, sparse feature sets is given and we report experimental results comparing the performance of MIRA with MERT in terms of translation quality and stability.
international conference on acoustics, speech, and signal processing | 2012
Kenji Sagae; Maider Lehr; Emily Prud'hommeaux; Puyang Xu; Nathan Glenn; Damianos Karakos; Sanjeev Khudanpur; Brian Roark; Murat Saraclar; Izhak Shafran; Daniel M. Bikel; Chris Callison-Burch; Yuan Cao; Keith B. Hall; Eva Hasler; Philipp Koehn; Adam Lopez; Matt Post; Darcey Riley
This paper investigates semi-supervised methods for discriminative language modeling, whereby n-best lists are “hallucinated” for given reference text and are then used for training n-gram language models using the perceptron algorithm. We perform controlled experiments on a very strong baseline English CTS system, comparing three methods for simulating ASR output, and compare the results with training with “real” n-best list output from the baseline recognizer. We find that methods based on extracting phrasal cohorts - similar to methods from machine translation for extracting phrase tables - yielded the largest gains of our three methods, achieving over half of the WER reduction of the fully supervised methods.
meeting of the association for computational linguistics | 2016
Felix Stahlberg; Eva Hasler; Aurelien Waite; Bill Byrne
We investigate the use of hierarchical phrase-based SMT lattices in end-to-end neural machine translation (NMT). Weight pushing transforms the Hiero scores for complete translation hypotheses, with the full translation grammar score and full n-gram language model score, into posteriors compatible with NMT predictive probabilities. With a slightly modified NMT beam-search decoder we find gains over both Hiero and NMT decoding alone, with practical advantages in extending NMT to very large input and output vocabularies.
workshop on statistical machine translation | 2014
Philip Williams; Rico Sennrich; Maria Nadejde; Matthias Huck; Eva Hasler; Philipp Koehn
This paper describes the string-to-tree systems built at the University of Edinburgh for the WMT 2014 shared translation task. We developed systems for English-German, Czech-English, FrenchEnglish, German-English, Hindi-English, and Russian-English. This year we improved our English-German system through target-side compound splitting, morphosyntactic constraints, and refinements to parse tree annotation; we addressed the out-of-vocabulary problem using transliteration for Hindi and Russian and using morphological reduction for Russian; we improved our GermanEnglish system through tree binarization; and we reduced system development time by filtering the tuning sets.
workshop on statistical machine translation | 2007
Yu Chen; Andreas Eisele; Christian Federmann; Eva Hasler; Michael Jellinghaus; Silke Theison
We describe an architecture that allows to combine statistical machine translation (SMT) with rule-based machine translation (RBMT) in a multi-engine setup. We use a variant of standard SMT technology to align translations from one or more RBMT systems with the source text. We incorporate phrases extracted from these alignments into the phrase table of the SMT system and use the open-source decoder Moses to find good combinations of phrases from SMT training data with the phrases derived from RBMT. First experiments based on this hybrid architecture achieve promising results.
workshop on statistical machine translation | 2014
Eva Hasler; Barry Haddow; Philipp Koehn
Despite its potential to improve lexical selection, most state-of-the-art machine translation systems take only minimal contextual information into account. We capture context with a topic model over distributional profiles built from the context words of each translation unit. Topic distributions are inferred for each translation unit and used to adapt the translation model dynamically to a given test context by measuring their similarity. We show that combining information from both local and global test contexts helps to improve lexical selection and outperforms a baseline system by up to 1.15 BLEU. We test our topic-adapted model on a diverse data set containing documents from three different domains and achieve competitive performance in comparison with two supervised domain-adapted systems.
international conference on acoustics, speech, and signal processing | 2012
Arda Çelebi; Hasim Sak; Erinç Dikici; Murat Saraclar; Maider Lehr; Emily Prud'hommeaux; Puyang Xu; Nathan Glenn; Damianos Karakos; Sanjeev Khudanpur; Brian Roark; Kenji Sagae; Izhak Shafran; Daniel M. Bikel; Chris Callison-Burch; Yuan Cao; Keith B. Hall; Eva Hasler; Philipp Koehn; Adam Lopez; Matt Post; Darcey Riley
We present our work on semi-supervised learning of discriminative language models where the negative examples for sentences in a text corpus are generated using confusion models for Turkish at various granularities, specifically, word, sub-word, syllable and phone levels. We experiment with different language models and various sampling strategies to select competing hypotheses for training with a variant of the perceptron algorithm. We find that morph-based confusion models with a sample selection strategy aiming to match the error distribution of the baseline ASR system gives the best performance. We also observe that substituting half of the supervised training examples with those obtained in a semi-supervised manner gives similar results.
international conference on acoustics, speech, and signal processing | 2012
Puyang Xu; Sanjeev Khudanpur; Maider Lehr; Emily Prud'hommeaux; Nathan Glenn; Damianos Karakos; Brian Roark; Kenji Sagae; Murat Saraclar; Izhak Shafran; Daniel M. Bikel; Chris Callison-Burch; Yuan Cao; Keith B. Hall; Eva Hasler; Philipp Koehn; Adam Lopez; Matt Post; Darcey Riley
Discriminative language modeling is a structured classification problem. Log-linear models have been previously used to address this problem. In this paper, the standard dot-product feature representation used in log-linear models is replaced by a non-linear function parameterized by a neural network. Embeddings are learned for each word and features are extracted automatically through the use of convolutional layers. Experimental results show that as a stand-alone model the continuous space model yields significantly lower word error rate (1% absolute), while having a much more compact parameterization (60%-90% smaller). If the baseline scores are combined, our approach performs equally well.
international conference on computational linguistics | 2014
Eva Hasler
We describe our systems for the SemEval 2014 Task 5: L2 writing assistant where a system has to find appropriate translations of L1 segments in a given L2 context. We participated in three out of four possible language pairs (English-Spanish, FrenchEnglish and Dutch-English) and achieved the best performance for all our submitted systems according to word-based accuracy. Our models are based on phrasebased machine translation systems and combine topical context information and language model scoring.