Bogdan Babych
University of Leeds
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Bogdan Babych.
conference of the european chapter of the association for computational linguistics | 2003
Bogdan Babych; Anthony Hartley
Named entities create serious problems for state-of-the-art commercial machine translation (MT) systems and often cause translation failures beyond the local context, affecting both the overall morphosyntactic well-formedness of sentences and word sense disambiguation in the source text. We report on the results of an experiment in which MT input was processed using output from the named entity recognition module of Sheffields GATE information extraction (IE) system. The gain in MT quality indicates that specific components of IE technology could boost the performance of current MT systems.
meeting of the association for computational linguistics | 2004
Bogdan Babych; Tony Hartley
We present the results of an experiment on extending the automatic method of Machine Translation evaluation BLUE with statistical weights for lexical items, such as tf.idf scores. We show that this extension gives additional information about evaluated texts; in particular it allows us to measure translation Adequacy, which, for statistical MT systems, is often overestimated by the baseline BLEU method. The proposed model uses a single human reference translation, which increases the usability of the proposed method for practical purposes. The model suggests a linguistic interpretation which relates frequency weights and human intuition about translation Adequacy and Fluency.
meeting of the association for computational linguistics | 2006
Serge Sharoff; Bogdan Babych; Anthony Hartley
In this paper we present a tool that uses comparable corpora to find appropriate translation equivalents for expressions that are considered by translators as difficult. For a phrase in the source language the tool identifies a range of possible expressions used in similar contexts in target language corpora and presents them to the translator as a list of suggestions. In the paper we discuss the method and present results of human evaluation of the performance of the tool, which highlight its usefulness when dictionary solutions are lacking.
language resources and evaluation | 2009
Serge Sharoff; Bogdan Babych; Anthony Hartley
In this paper we present a tool that uses comparable corpora to find appropriate translation equivalents for expressions that are considered by translators as difficult. For a phrase in the source language the tool identifies a range of possible expressions used in similar contexts in target language corpora and presents them to the translator as a list of suggestions. In the paper we discuss the method and present results of human evaluation of the performance of the tool, which highlight its usefulness when dictionary solutions are lacking.
international conference on computational linguistics | 2004
Bogdan Babych; Debbie Elliott; Anthony Hartley
In this paper we report on the results of an experiment in designing resource-light metrics that predict the potential translation complexity of a text or a corpus of homogenous texts for state-of-the-art MT systems. We show that the best prediction of translation complexity is given by the average number of syllables per word (ASW). The translation complexity metrics based on this parameter are used to normalise automated MT evaluation scores such as BLEU, which otherwise are variable across texts of different types. The suggested approach makes a fairer comparison between the MT systems evaluated on different corpora. The translation complexity metric was integrated into two automated MT evaluation packages - BLEU and the Weighted N-gram model. The extended MT evaluation tools are available from the first authors web site: http://www.comp.leeds.ac.uk/bogdan/evalMT.html
conference of the european chapter of the association for computational linguistics | 2006
Serge Sharoff; Bogdan Babych; Paul Rayson; Olga Mudraya; Scott Piao
The problem we address in this paper is that of providing contextual examples of translation equivalents for words from the general lexicon using comparable corpora and semantic annotation that is uniform for the source and target languages. For a sentence, phrase or a query expression in the source language the tool detects the semantic type of the situation in question and gives examples of similar contexts from the target language corpus.
iberian conference on information systems and technologies | 2017
Yu Yuan; Bogdan Babych; Serge Sharoff
In this study the plausibility of automated human translation quality estimation is investigated to tackle the slowness, expensiveness and inconsistency of human evaluation. A reference free approach using machine learning is advanced to address four research questions. The methodology characteristic of this approach is then presented in detail. Finally, the author reports the latest progress of the project and some preliminary findings.
Archive | 2016
Marta R. Costa-juss; Reinhard Rapp; Patrik Lambert; Kurt Eberle; Rafael E. Banchs; Bogdan Babych
This volume provides an overview of the field of Hybrid Machine Translation (MT) and presents some of the latest research conducted by linguists and practitioners from different multidisciplinary areas. Nowadays, most important developments in MT are achieved by combining data-driven and rule-based techniques. These combinations typically involve hybridization of different traditional paradigms, such as the introduction of linguistic knowledge into statistical approaches to MT, the incorporation of data-driven components into rule-based approaches, or statistical and rule-based pre- and post-processing for both types of MT architectures. The book is of interest primarily to MT specialists, but also in the wider fields of Computational Linguistics, Machine Learning and Data Mining to translators and managers of translation companies and departments who are interested in recent developments concerning automated translation tools.
Proceedings of the 3rd Workshop on Hybrid Approaches to Machine Translation (HyTra) | 2014
Bogdan Babych; Jonathan Geiger; Mireia Ginestí Rosell; Kurt Eberle
Linguistic resources available in the public domain, such as lemmatisers, part-ofspeech taggers and parsers can be used for the development of MT systems: as separate processing modules or as annotation tools for the training corpus. For SMT this annotation is used for training factored models, and for the rule-based systems linguistically annotated corpus is the basis for creating analysis, generation and transfer dictionaries from corpora. However, the annotation in many cases is insufficient for rule-based MT, especially for the generation tasks. In this paper we analyze a specific case when the part-ofspeech tagger does not provide information about de/het gender of Dutch nouns that is needed for our rule-based MT systems translating into Dutch. We show that this information can be derived from large annotated monolingual corpora using a set of context-checking rules on the basis of co-occurrence of nouns and determiners in certain morphosyntactic configurations. As not all contexts are sufficient for disambiguation, we evaluate the coverage and the accuracy of our method for different frequency thresholds
language resources and evaluation | 2012
Inguna Skadiņa; Ahmet Aker; Nikos Mastropavlos; Fangzhong Su; Dan TufiÈ; Mateja Verlic; Andrejs Vasiļjevs; Bogdan Babych; Paul D. Clough; Robert J. Gaizauskas; Nikos Glaros; Monica Lestari Paramita