Ann Irvine | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ann Irvine is active.

Explore More

Publication

Featured researches published by Ann Irvine.

Natural Language Engineering | 2016

End-to-end statistical machine translation with zero or small parallel texts

Ann Irvine; Chris Callison-Burch

We use bilingual lexicon induction techniques, which learn translations from monolingual texts in two languages, to build an end-to-end statistical machine translation (SMT) system without the use of any bilingual sentence-aligned parallel corpora. We present detailed analysis of the accuracy of bilingual lexicon induction, and show how a discriminative model can be used to combine various signals of translation equivalence (like contextual similarity, temporal similarity, orthographic similarity and topic similarity). Our discriminative model produces higher accuracy translations than previous bilingual lexicon induction techniques. We reuse these signals of translation equivalence as features on a phrase-based SMT system. These monolingually-estimated features enhance low resource SMT systems in addition to allowing end-to-end machine translation without parallel corpora. Natural Language Engineering 1 (1): 1–34. Printed in the United Kingdom c

conference on computational natural language learning | 2014

Hallucinating Phrase Translations for Low Resource MT

Ann Irvine; Chris Callison-Burch

We demonstrate that “hallucinating” phrasal translations can significantly improve the quality of machine translation in low resource conditions. Our hallucinated phrase tables consist of entries composed from multiple unigram translations drawn from the baseline phrase table and from translations that are induced from monolingual corpora. The hallucinated phrase table is very noisy. Its translations are low precision but high recall. We counter this by introducing 30 new feature functions (including a variety of monolinguallyestimated features) and by aggressively pruning the phrase table. Our analysis evaluates the intrinsic quality of our hallucinated phrase pairs as well as their impact in end-to-end Spanish-English and Hindi-English MT.

Computational Linguistics | 2017

A comprehensive analysis of bilingual lexicon induction

Ann Irvine; Chris Callison-Burch

Bilingual lexicon induction is the task of inducing word translations from monolingual corpora in two languages. In this article we present the most comprehensive analysis of bilingual lexicon induction to date. We present experiments on a wide range of languages and data sizes. We examine translation into English from 25 foreign languages: Albanian, Azeri, Bengali, Bosnian, Bulgarian, Cebuano, Gujarati, Hindi, Hungarian, Indonesian, Latvian, Nepali, Romanian, Serbian, Slovak, Somali, Spanish, Swedish, Tamil, Telugu, Turkish, Ukrainian, Uzbek, Vietnamese, and Welsh. We analyze the behavior of bilingual lexicon induction on low-frequency words, rather than testing solely on high-frequency words, as previous research has done. Low-frequency words are more relevant to statistical machine translation, where systems typically lack translations of rare words that fall outside of their training data. We systematically explore a wide range of features and phenomena that affect the quality of the translations discovered by bilingual lexicon induction. We provide illustrative examples of the highest ranking translations for orthogonal signals of translation equivalence like contextual similarity and temporal similarity. We analyze the effects of frequency and burstiness, and the sizes of the seed bilingual dictionaries and the monolingual training corpora. Additionally, we introduce a novel discriminative approach to bilingual lexicon induction. Our discriminative model is capable of combining a wide variety of features that individually provide only weak indications of translation equivalence. When feature weights are discriminatively set, these signals produce dramatically higher translation quality than previous approaches that combined signals in an unsupervised fashion (e.g., using minimum reciprocal rank). We also directly compare our models performance against a sophisticated generative approach, the matching canonical correlation analysis (MCCA) algorithm used by Haghighi et al. (2008). Our algorithm achieves an accuracy of 42% versus MCCAs 15%.

workshop on statistical machine translation | 2014

Using Comparable Corpora to Adapt MT Models to New Domains

Ann Irvine; Chris Callison-Burch

In previous work we showed that when using an SMT model trained on old-domain data to translate text in a new-domain, most errors are due to unseen source words, unseen target translations, and inaccurate translation model scores (Irvine et al., 2013a). In this work, we target errors due to inaccurate translation model scores using new-domain comparable corpora, which we mine from Wikipedia. We assume that we have access to a large olddomain parallel training corpus but only enough new-domain parallel data to tune model parameters and do evaluation. We use the new-domain comparable corpora to estimate additional feature scores over the phrase pairs in our baseline models. Augmenting models with the new features improves the quality of machine translations in the medical and science domains by up to 1.3 BLEU points over very strong baselines trained on the 150 million word Canadian Hansard dataset.

The Prague Bulletin of Mathematical Linguistics | 2010

Integrating Output from Specialized Modules in Machine Translation: Transliterations in Joshua

Ann Irvine; Mike Kayser; Zhifei Li; Wren N. G. Thornton; Chris Callison-Burch

Integrating Output from Specialized Modules in Machine Translation: Transliterations in Joshua In many cases in SMT we want to allow specialized modules to propose translation fragments to the decoder and allow them to compete with translations contained in the phrase table. Transliteration is one module that may produce such specialized output. In this paper, as an example, we build a specialized Urdu transliteration module and integrate its output into an Urdu-English MT system. The module marks-up the test text using an XML format, and the decoder allows alternate translations (transliterations) to compete.

technical symposium on computer science education | 2013

How PhD students at research universities can prepare for a career at a liberal arts college (abstract only)

Ann Irvine; Darakhshan J. Mir; Michael Hay

We will discuss how to better organize as graduate students and postdoctoral researchers seeking a career in liberal arts colleges (LACs). The BoF will bring together those who are interested in a career path to a LAC but do not have reliable advice and mentorship in their home departments and often turn out to be the only person in their department with such a career choice. Additionally, several people who have recently made a successful transition from graduate school to new faculty positions will attend the BoF.

conference of the european chapter of the association for computational linguistics | 2012