Anthony Hartley | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Anthony Hartley is active.

Explore More

Publication

Featured researches published by Anthony Hartley.

conference of the european chapter of the association for computational linguistics | 2003

Improving machine translation quality with automatic named entity recognition

Bogdan Babych; Anthony Hartley

Named entities create serious problems for state-of-the-art commercial machine translation (MT) systems and often cause translation failures beyond the local context, affecting both the overall morphosyntactic well-formedness of sentences and word sense disambiguation in the source text. We report on the results of an experiment in which MT input was processed using output from the named entity recognition module of Sheffields GATE information extraction (IE) system. The gain in MT quality indicates that specific components of IE technology could boost the performance of current MT systems.

meeting of the association for computational linguistics | 2006

Using Comparable Corpora to Solve Problems Difficult for Human Translators

Serge Sharoff; Bogdan Babych; Anthony Hartley

In this paper we present a tool that uses comparable corpora to find appropriate translation equivalents for expressions that are considered by translators as difficult. For a phrase in the source language the tool identifies a range of possible expressions used in similar contexts in target language corpora and presents them to the translator as a list of suggestions. In the paper we discuss the method and present results of human evaluation of the performance of the tool, which highlight its usefulness when dictionary solutions are lacking.

conference of the association for machine translation in the americas | 2004

A fluency error categorization scheme to guide automated machine translation evaluation

Debbie Elliott; Anthony Hartley; Eric Atwell

Existing automated MT evaluation methods often require expert human translations. These are produced for every language pair evaluated and, due to this expense, subsequent evaluations tend to rely on the same texts, which do not necessarily reflect real MT use. In contrast, we are designing an automated MT evaluation system, intended for use by post-editors, purchasers and developers, that requires nothing but the raw MT output. Furthermore, our research is based on texts that reflect corporate use of MT. This paper describes our first step in system design: a hierarchical classification scheme of fluency errors in English MT output, to enable us to identify error types and frequencies, and guide the selection of errors for automated detection. We present results from the statistical analysis of 20,000 words of MT output, manually annotated using our classification scheme, and describe correlations between error frequencies and human scores for fluency and adequacy.

language resources and evaluation | 2009

‘Irrefragable answers’ using comparable corpora to retrieve translation equivalents

Serge Sharoff; Bogdan Babych; Anthony Hartley

international conference on computational linguistics | 2004

Extending MT evaluation tools with translation complexity metrics

Bogdan Babych; Debbie Elliott; Anthony Hartley

In this paper we report on the results of an experiment in designing resource-light metrics that predict the potential translation complexity of a text or a corpus of homogenous texts for state-of-the-art MT systems. We show that the best prediction of translation complexity is given by the average number of syllables per word (ASW). The translation complexity metrics based on this parameter are used to normalise automated MT evaluation scores such as BLEU, which otherwise are variable across texts of different types. The suggested approach makes a fairer comparison between the MT systems evaluated on different corpora. The translation complexity metric was integrated into two automated MT evaluation packages - BLEU and the Weighted N-gram model. The extended MT evaluation tools are available from the first authors web site: http://www.comp.leeds.ac.uk/bogdan/evalMT.html

Archive | 2001

Translation, controlled languages, generation

Anthony Hartley; Cécile Paris

In this chapter we explore the relationship between translation and controlled languages (CLs). These are stringent sets of writing rules or guidelines designed to prevent authors from introducing ambiguities into their texts, and they are increasingly used in the commercial world for authoring technical documents such as maintenance and user manuals. Very often, these documents then serve as the source from which translations are produced into a large number of target languages. We further explore the relationship between controlled languages and generation, by which we mean natural language generation (NLG)—the production by computer of texts in human languages, such as English, French and German. At first sight, it looks as if NLG has much to gain from work on CLs, by adopting rules designed for human writers as the basis of its computer programs. However, on closer examination it becomes apparent that CL research can benefit as much, if not more, from work in NLG. In building NLG systems it is good practice to clearly distinguish rules concerned with pragmatic and semantic function from rules concerned with syntactic form, and then to specify appropriate mappings between them.

Social Media for Government Services | 2015

‘Garbage Let’s Take Away’: Producing Understandable and Translatable Government Documents: A Case Study from Japan

Rei Miyata; Anthony Hartley; Kyo Kageura; Cécile Paris

Government departments increasingly communicate information to citizens digitally via web sites, and, in many societies, the linguistic diversity of these citizens is also growing. In Japan, a largely monolingual society, municipal governments now routinely address the necessity of providing practical and legal information to residents with limited Japanese by machine-translating their public service web sites into selected languages. Cost constraints often mean the translation is left un-edited and, as a result, may be unclear, misleading or even incomprehensible. While machine translation from Japanese is particularly challenging because of its structural uniqueness, the state of the art in the field generally is such that poor output is a universal problem. The solution we propose draws on recent advances in controlled authoring, document structuring and machine translation evaluation. It is realised as a prototype tool that enables non-professional writers to create documents where individual sentences and overall flow are both clear. The tool is designed to enhance machine-translatability into English without compromising the readability of the Japanese original. The originality of the tool is to provide an interactive sentence checker that is context-sensitive to the individual functional elements of a document template specialised for the public administration domain. Where natural Japanese sentences give bad translation results, we pre-process them internally into a form which yields acceptable machine translation output. Evaluation of the tool will target three concerns: its usability by non-professional authors; the acceptability of the Japanese document; and the comprehensibility of the English translation. We suggest that such an authoring framework could facilitate government communication with citizens in many societies beyond Japan.

Languages for specific purposes in the digital era, 2013, ISBN 9783319022215, págs. 197-222 | 2014

Innovative Methods for LSP-Teaching: How We Use Corpora to Teach Business Russian

James Wilson; Serge Sharoff; Paul Stephenson; Anthony Hartley

While still not in the mainstream of language learning and teaching, corpora over the last two decades have grown substantially in size and number, tools to manipulate them have become more sophisticated and user-friendly, and several corpus-based dictionaries and grammars have been published. The field of Language for Specific Purposes (hereafter, LSP) teaching, a rapidly growing market and an industry-relevant branch of language pedagogy for which there is a lack of “conventional” printed teaching materials, can benefit considerably from a corpus-based approach. In this chapter, we present how we use corpora to teach a business Russian course at the University of Leeds, UK. In the first part of the chapter, we describe how, on the IntelliText project (http:corpus.leeds.ac.uk/it), we have simplified our existing corpus interface in order to make it accessible to users with no training in or background knowledge of computational or corpus linguistics and implemented functions to meet the demands of a wide range of users in the humanities. Then we look at how some of these specific corpus-based tools and functions can be used to facilitate and enhance two core LSP-based skills: vocabulary acquisition and register recognition and differentiation. In addition, we present sample exercises that we use to support our corpus-based tools in order to maximise their effectiveness in LSP teaching as well as in learning and teaching more generally.

Archive | 2002