Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Alexandra Birch is active.

Publication


Featured researches published by Alexandra Birch.


meeting of the association for computational linguistics | 2016

Neural Machine Translation of Rare Words with Subword Units

Rico Sennrich; Barry Haddow; Alexandra Birch

Neural machine translation (NMT) models typically operate with a fixed vocabulary, but translation is an open-vocabulary problem. Previous work addresses the translation of out-of-vocabulary words by backing off to a dictionary. In this paper, we introduce a simpler and more effective approach, making the NMT model capable of open-vocabulary translation by encoding rare and unknown words as sequences of subword units. This is based on the intuition that various word classes are translatable via smaller units than words, for instance names (via character copying or transliteration), compounds (via compositional translation), and cognates and loanwords (via phonological and morphological transformations). We discuss the suitability of different word segmentation techniques, including simple character n-gram models and a segmentation based on the byte pair encoding compression algorithm, and empirically show that subword models improve over a back-off dictionary baseline for the WMT 15 translation tasks English-German and English-Russian by 1.1 and 1.3 BLEU, respectively.


meeting of the association for computational linguistics | 2016

Improving Neural Machine Translation Models with Monolingual Data

Rico Sennrich; Barry Haddow; Alexandra Birch

Neural Machine Translation (NMT) has obtained state-of-the art performance for several language pairs, while only using parallel data for training. Target-side monolingual data plays an important role in boosting fluency for phrase-based statistical machine translation, and we investigate the use of monolingual data for NMT. In contrast to previous work, which combines NMT models with separately trained language models, we note that encoder-decoder NMT architectures already have the capacity to learn the same information as a language model, and we explore strategies to train with monolingual data without changing the neural network architecture. By pairing monolingual training data with an automatic back-translation, we can treat it as additional parallel training data, and we obtain substantial improvements on the WMT 15 task English German (+2.8-3.7 BLEU), and for the low-resourced IWSLT 14 task Turkish->English (+2.1-3.4 BLEU), obtaining new state-of-the-art results. We also show that fine-tuning on in-domain monolingual and parallel data gives substantial improvements for the IWSLT 15 task English->German.


arXiv: Computation and Language | 2016

Edinburgh Neural Machine Translation Systems for WMT 16

Rico Sennrich; Barry Haddow; Alexandra Birch

We participated in the WMT 2016 shared news translation task by building neural translation systems for four language pairs, each trained in both directions: English Czech, English German, English Romanian and English Russian. Our systems are based on an attentional encoder-decoder, using BPE subword segmentation for open-vocabulary translation with a fixed vocabulary. We experimented with using automatic back-translations of the monolingual News corpus as additional training data, pervasive dropout, and target-bidirectional models. All reported methods give substantial improvements, and we see improvements of 4.3--11.2 BLEU over our baseline systems. In the human evaluation, our systems were the (tied) best constrained system for 7 out of 8 translation directions in which we participated.


workshop on statistical machine translation | 2007

CCG Supertags in Factored Statistical Machine Translation

Alexandra Birch; Miles Osborne; Philipp Koehn

Combinatorial Categorial Grammar (CCG) supertags present phrase-based machine translation with an opportunity to access rich syntactic information at a word level. The challenge is incorporating this information into the translation process. Factored translation models allow the inclusion of supertags as a factor in the source or target language. We show that this results in an improvement in the quality of translation and that the value of syntactic supertags in flat structured phrase-based models is largely due to better local reorderings.


empirical methods in natural language processing | 2008

Predicting Success in Machine Translation

Alexandra Birch; Miles Osborne; Philipp Koehn

The performance of machine translation systems varies greatly depending on the source and target languages involved. Determining the contribution of different characteristics of language pairs on system performance is key to knowing what aspects of machine translation to improve and which are irrelevant. This paper investigates the effect of different explanatory variables on the performance of a phrase-based system for 110 European language pairs. We show that three factors are strong predictors of performance in isolation: the amount of reordering, the morphological complexity of the target language and the historical relatedness of the two languages. Together, these factors contribute 75% to the variability of the performance of the system.


Machine Translation | 2010

Metrics for MT evaluation: evaluating reordering

Alexandra Birch; Miles Osborne; Phil Blunsom

Translating between dissimilar languages requires an account of the use of divergent word orders when expressing the same semantic content. Reordering poses a serious problem for statistical machine translation systems and has generated a considerable body of research aimed at meeting its challenges. Direct evaluation of reordering requires automatic metrics that explicitly measure the quality of word order choices in translations. Current metrics, such as BLEU, only evaluate reordering indirectly. We analyse the ability of current metrics to capture reordering performance. We then introduce permutation distance metrics as a direct method for measuring word order similarity between translations and reference sentences. By correlating all metrics with a novel method for eliciting human judgements of reordering quality, we show that current metrics are largely influenced by lexical choice, and that they are not able to distinguish between different reordering scenarios. Also, we show that permutation distance metrics correlate very well with human judgements, and are impervious to lexical differences.


workshop on statistical machine translation | 2009

A Quantitative Analysis of Reordering Phenomena

Alexandra Birch; Phil Blunsom; Miles Osborne

Reordering is a serious challenge in statistical machine translation. We propose a method for analysing syntactic reordering in parallel corpora and apply it to understanding the differences in the performance of SMT systems. Results at recent large-scale evaluation campaigns show that synchronous grammar-based statistical machine translation models produce superior results for language pairs such as Chinese to English. However, for language pairs such as Arabic to English, phrase-based approaches continue to be competitive. Until now, our understanding of these results has been limited to differences in Bleu scores. Our analysis shows that current state-of-the-art systems fail to capture the majority of reorderings found in real data.


north american chapter of the association for computational linguistics | 2016

Controlling Politeness in Neural Machine Translation via Side Constraints.

Rico Sennrich; Barry Haddow; Alexandra Birch

Many languages use honorifics to express politeness, social distance, or the relative social status between the speaker and their addressee(s). In machine translation from a language without honorifics such as English, it is difficult to predict the appropriate honorific, but users may want to control the level of politeness in the output. In this paper, we perform a pilot study to control honorifics in neural machine translation (NMT) via side constraints, focusing on English→German. We show that by marking up the (English) source side of the training data with a feature that encodes the use of honorifics on the (German) target side, we can control the honorifics produced at test time. Experiments show that the choice of honorifics has a big impact on translation quality as measured by BLEU, and oracle experiments show that substantial improvements are possible by constraining the translation to the desired level of politeness.


workshop on statistical machine translation | 2006

Constraining the Phrase-Based, Joint Probability Statistical Translation Model

Alexandra Birch; Chris Callison-Burch; Miles Osborne; Philipp Koehn

The joint probability model proposed by Marcu and Wong (2002) provides a strong probabilistic framework for phrase-based statistical machine translation (SMT). The models usefulness is, however, limited by the computational complexity of estimating parameters at the phrase level. We present the first model to use word alignments for constraining the space of phrasal alignments searched during Expectation Maximization (EM) training. Constraining the joint model improves performance, showing results that are very close to state-of-the-art phrase-based models. It also allows it to scale up to larger corpora and therefore be more widely applicable.


Proceedings of the Second Conference on Machine Translation | 2017

The University of Edinburgh's Neural MT Systems for WMT17

Rico Sennrich; Alexandra Birch; Anna Currey; Ulrich Germann; Barry Haddow; Kenneth Heafield; Antonio Barone; Philip Williams

This paper describes the University of Edinburghs submissions to the WMT17 shared news translation and biomedical translation tasks. We participated in 12 translation directions for news, translating between English and Czech, German, Latvian, Russian, Turkish and Chinese. For the biomedical task we submitted systems for English to Czech, German, Polish and Romanian. Our systems are neural machine translation systems trained with Nematus, an attentional encoder-decoder. We follow our setup from last year and build BPE-based models with parallel and back-translated monolingual training data. Novelties this year include the use of deep architectures, layer normalization, and more compact models due to weight tying and improvements in BPE segmentations. We perform extensive ablative experiments, reporting on the effectivenes of layer normalization, deep architectures, and different ensembling techniques.

Collaboration


Dive into the Alexandra Birch's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Barry Haddow

University of Edinburgh

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Marcin Junczys-Dowmunt

Adam Mickiewicz University in Poznań

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Peter Bell

University of Edinburgh

View shared research outputs
Researchain Logo
Decentralizing Knowledge