Andy Way
Dublin City University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Andy Way.
meeting of the association for computational linguistics | 2004
Aoife Cahill; Michael Burke; Ruth O'Donovan; Josef van Genabith; Andy Way
This paper shows how finite approximations of long distance dependency (LDD) resolution can be obtained automatically for wide-coverage, robust, probabilistic Lexical-Functional Grammar (LFG) resources acquired from treebanks. We extract LFG subcategorisation frames and paths linking LDD reentrancies from f-structures generated automatically for the Penn-II treebank trees and use them in an LDD resolution algorithm to parse new text. Unlike (Collins, 1999; Johnson, 2000), in our approach resolution of LDDs is done at f-structure (attribute-value structure representations of basic predicate-argument or dependency structure) without empty productions, traces and coindexation in CFG parse trees. Currently our best automatically induced grammars achieve 80.97% f-score for f-structures parsing section 23 of the WSJ part of the Penn-II treebank and evaluating against the DCU 1051 and 80.24% against the PARC 700 Dependency Bank (King et al., 2003), performing at the same or a slightly better level than state-of-the-art hand-crafted grammars (Kaplan et al., 2004).
Computational Linguistics | 2004
Michael Carl; Andy Way; Walter Daelemans
I Foundations of EBMT.- 1 An Overview of EBMT.- 2 What is Example-Based Machine Translation?.- 3 Example-Based Machine Translation in a Controlled Environment.- 4 EBMT Seen as Case-based Reasoning.- II Run-time Approaches to EBMT.- 5 Formalizing Translation Memory.- 6 EBMT Using DP-Matching Between Word Sequences.- 7 A Hybrid Rule and Example-Based Method for Machine Translation.- 8 EBMT of POS-Tagged Sentences via Inductive Learning.- III Template-Driven EBMT.- 9 Learning Translation Templates from Bilingual Translation Examples.- 10 Clustered Transfer Rule Induction for Example-Based Translation.- 11 Translation Patterns, Linguistic Knowledge and Complexity in EBMT.- 12 Inducing Translation Grammars from Bracketed Alignments.- IV EBMT and Derivation Trees.- 13 Extracting Translation Knowledge from Parallel Corpora.- 14 Finding Translation Patterns from Dependency Structures.- 15 A Best-First Alignment Algorithm for Extraction of Transfer Mappings.- 16 Translating with Examples: The LFG-DOT Models of Translation.
workshop on statistical machine translation | 2008
Sergio Penkale; Rejwanul Haque; Sandipan Dandapat; Pratyush Banerjee; Ankit Kumar Srivastava; Jinhua Du; Pavel Pecina; Sudip Kumar Naskar; Mikel L. Forcada; Andy Way
In this paper, we give a description of the machine translation system developed at DCU that was used for our participation in the evaluation campaign of the Third Workshop on Statistical Machine Translation at ACL 2008. We describe the modular design of our data-driven MT system with particular focus on the components used in this participation. We also describe some of the significant modules which were unused in this task. We participated in the EuroParl task for the following translation directions: Spanish-English and French-English, in which we employed our hybrid EBMT-SMT architecture to translate. We also participated in the Czech-English News and News Commentary tasks which represented a previously untested language pair for our system. We report results on the provided development and test sets.
Computational Linguistics | 2008
Aoife Cahill; Michael Burke; Ruth O'Donovan; Stefan Riezler; Josef van Genabith; Andy Way
A number of researchers have recently conducted experiments comparing deep hand-crafted wide-coverage with shallow treebank- and machine-learning-based parsers at the level of dependencies, using simple and automatic methods to convert tree output generated by the shallow parsers into dependencies. In this article, we revisit such experiments, this time using sophisticated automatic LFG f-structure annotation methodologies with surprising results. We compare various PCFG and history-based parsers to find a baseline parsing system that fits best into our automatic dependency structure annotation technique. This combined system of syntactic parser and dependency structure annotation is compared to two hand-crafted, deep constraint-based parsers, RASP and XLE. We evaluate using dependency-based gold standards and use the Approximate Randomization Test to test the statistical significance of the results. Our experiments show that machine-learning-based shallow grammars augmented with sophisticated automatic dependency annotation technology outperform hand-crafted, deep, wide-coverage constraint grammars. Currently our best system achieves an f-score of 82.73% against the PARC 700 Dependency Bank, a statistically significant improvement of 2.18% over the most recent results of 80.55% for the hand-crafted LFG grammar and XLE parsing system and an f-score of 80.23% against the CBS 500 Dependency Bank, a statistically significant 3.66% improvement over the 76.57% achieved by the hand-crafted RASP grammar and parsing system.
workshop on statistical machine translation | 2007
Karolina Owczarzak; Josef van Genabith; Andy Way
We present a method for evaluating the quality of Machine Translation (MT) output, using labelled dependencies produced by a Lexical-Functional Grammar (LFG) parser. Our dependency-based method, in contrast to most popular string-based evaluation metrics, does not unfairly penalize perfectly valid syntactic variations in the translation, and the addition of WordNet provides a way to accommodate lexical variation. In comparison with other metrics on 16,800 sentences of Chinese-English newswire text, our method reaches high correlation with human scores.
workshop on statistical machine translation | 2006
Karolina Owczarzak; Declan Groves; Josef van Genabith; Andy Way
In this paper we present a novel method for deriving paraphrases during automatic MT evaluation using only the source and reference texts, which are necessary for the evaluation, and word and phrase alignment software. Using target language paraphrases produced through word and phrase alignment a number of alternative reference sentences are constructed automatically for each candidate translation. The method produces lexical and low-level syntactic paraphrases that are relevant to the domain in hand, does not use external knowledge resources, and can be combined with a variety of automatic MT evaluation system.
Computational Linguistics | 2003
Andy Way; Nano Gough
We have developed an example-based machine translation (EBMT) system that uses the World Wide Web for two different purposes: First, we populate the systems memory with translations gathered from rule-based MT systems located on the Web. The source strings input to these systems were extracted automatically from an extremely small subset of the rule types in the Penn-II Treebank. In subsequent stages, the source, target translation pairs obtained are automatically transformed into a series of resources that render the translation process more successful. Despite the fact that the output from on-line MT systems is often faulty, we demonstrate in a number of experiments that when used to seed the memories of an EBMT system, they can in fact prove useful in generating translations of high quality in a robust fashion. In addition, we demonstrate the relative gain of EBMT in comparison to on-line systems. Second, despite the perception that the documents available on the Web are of questionable quality, we demonstrate in contrast that such resources are extremely useful in automatically postediting translation candidates proposed by our system.
international conference on computational linguistics | 2009
John Tinsley; Mary Hearne; Andy Way
Given much recent discussion and the shift in focus of the field, it is becoming apparent that the incorporation of syntax is the way forward for the current state-of-the-art in machine translation (MT). Parallel treebanks are a relatively recent innovation and appear to be ideal candidates for MT training material. However, until recently there has been no other means to build them than by hand. In this paper, we describe how we make use of new tools to automatically build a large parallel treebank and extract a set of linguistically motivated phrase pairs from it. We show that adding these phrase pairs to the translation model of a baseline phrase-based statistical MT (PBSMT) system leads to significant improvements in translation quality. We describe further experiments on incorporating parallel treebank information into PBSMT, such as word alignments. We investigate the conditions under which the incorporation of parallel treebank data performs optimally. Finally, we discuss the potential of parallel treebanks in other paradigms of MT.
international conference on computational linguistics | 2008
Ventsislav Zhechev; Andy Way
The need for syntactically annotated data for use in natural language processing has increased dramatically in recent years. This is true especially for parallel treebanks, of which very few exist. The ones that exist are mainly hand-crafted and too small for reliable use in data-oriented applications. In this paper we introduce a novel platform for fast and robust automatic generation of parallel treebanks. The software we have developed based on this platform has been shown to handle large data sets. We also present evaluation results demonstrating the quality of the derived treebanks and discuss some possible modifications and improvements that can lead to even better results. We expect the presented platform to help boost research in the field of data-oriented machine translation and lead to advancements in other fields where parallel treebanks can be employed.
Natural Language Engineering | 2005
Andy Way; Nano Gough
In previous work (Gough and Way 2004), we showed that our Example-Based Machine Translation (EBMT) system improved with respect to both coverage and quality when seeded with increasing amounts of training data, so that it significantly outperformed the on-line MT system Logomedia according to a wide variety of automatic evaluation metrics. While it is perhaps unsurprising that system performance is correlated with the amount of training data, we address in this paper the question of whether a large-scale, robust EBMT system such as ours can outperform a Statistical Machine Translation (SMT) system. We obtained a large English-French translation memory from Sun Microsystems from which we randomly extracted a near 4K test set. The remaining data was split into three training sets, of roughly 50K, 100K and 200K sentence-pairs in order to measure the effect of increasing the size of the training data on the performance of the two systems. Our main observation is that contrary to perceived wisdom in the field, there appears to be little substance to the claim that SMT systems are guaranteed to outperform EBMT systems when confronted with ‘enough’ training data. Our tests on a 4.8 million word bitext indicate that while SMT appears to outperform our system for French-English on a number of metrics, for English-French, on all but one automatic evaluation metric, the performance of our EBMT system is superior to the baseline SMT model.