Archive | 2019

RTM Stacking Results for Machine Translation Performance Prediction

Abstract

We obtain new results using referential translation machines with increased number of learning models in the set of results that are stacked to obtain a better mixture of experts prediction. We combine features extracted from the word-level predictions with the sentenceor document-level features, which significantly improve the results on the training sets but decrease the test set results. 1 Referential Translation Machines for Machine Translation Performance Predicion Quality estimation task in WMT19 (Specia et al., 2019) (QET19) address machine translation performance prediction (MTPP), where translation quality is predicted without using reference translations, at the sentenceand word(Task 1), and document-levels (Task 2). The tasks contain subtasks involving English-German, EnglishRussian, and English-French machine translation (MT). The target to predict in Task 1 is HTER (human-targeted translation edit rate) scores (Snover et al., 2006) and binary classification of word-level translation errors and the target in Task 2 is multi-dimensional quality metrics (MQM) (Lommel, 2015). Table 1 lists the number of sentences in the training and test sets for each task and the number of instances used as interpretants in the RTM models (M for million). We use referential translation machine (RTM) (Biçici, 2018; Biçici and Way, 2015) models for building our prediction models. RTMs predict data translation between the instances in the training set and the test set using interpretants, data close to the task instances. Interpretants provide context for the prediction task and are used during the derivation of the features measuring the closeness of the test sentences to the RTM interpretants Task Train Test Training LM Task 1 (en-de) 14442 1000 0.250M 5M Task 1 (en-ru) 16089 1000 Task 2 (en-fr) 1468 180 Table 1: Number of instances and interpretants used. training data, the difficulty of translating them, and to identify translation acts between any two data sets for building prediction models. With the enlarging parallel and monolingual corpora made available by WMT, the capability of the interpretant datasets selected by RTM models to provide context for the training and test sets improve as can be seen in the data statistics of parfda instance selection (Biçici, 2019). Figure 1 depicts RTMs and explains the model building process. RTMs use parfda for instance selection and machine translation performance prediction system (MTPPS) for obtaining the features, which includes additional features from word alignment and also from GLMd for word-level prediction. We use ridge regression, kernel ridge regression, k-nearest neighors, support vector regression, AdaBoost (Freund and Schapire, 1997), gradient tree boosting, gaussian process regressor, extremely randomized trees (Geurts et al., 2006), and multi-layer perceptron (Bishop, 2006) as learning models in combination with feature selection (FS) (Guyon et al., 2002) and partial least squares (PLS) (Wold et al., 1984) where most of these models can be found in scikit-learn.1 We experiment with: • including the statistics of the binary tags obtained as features extracted from word-level tag predictions for sentence-level prediction, • using KNN to estimate the noise level for http://scikit-learn.org/

Archive | 2019

RTM Stacking Results for Machine Translation Performance Prediction

Abstract

Volume None

Pages 73-77

DOI 10.18653/v1/W19-5405

Language English

Journal None

Full Text