[PDF] Linguistic evaluation of German-English Machine Translation using a Test Suite

Abstract

We present the results of the application of a grammatical test suite for German → English MT on the systems submitted at WMT19, with a detailed analysis for 107 phenomena organized in 14 categories. The systems still translate wrong one out of four test items in average. Low performance is indicated for idioms, modals, pseudo-clefts, multi-word expressions and verb valency. When compared to last year, there has been a improvement of function words, non-verbal agreement and punctuation. More detailed conclusions about particular systems and phenomena are also presented.

Full PDF

LLinguistic evaluation of German-English Machine Translation using aTest Suite

Eleftherios Avramidis, Vivien Macketanz, Ursula Strohriegel, Hans Uszkoreit

German Research Center for Artiﬁcial Intelligence (DFKI), Berlin, Germany [email protected]

Abstract

We present the results of the application of agrammatical test suite for German → EnglishMT on the systems submitted at WMT19, witha detailed analysis for 107 phenomena organi-zed in 14 categories. The systems still transla-te wrong one out of four test items in avera-ge. Low performance is indicated for idioms,modals, pseudo-clefts, multi-word expressionsand verb valency. When compared to last ye-ar, there has been a improvement of functionwords, non verbal agreement and punctuati-on. More detailed conclusions about particularsystems and phenomena are also presented.

For decades, the development of Machine Trans-lation (MT) has been based on either automaticmetrics or human evaluation campaigns with themain focus on producing scores or comparisons(rankings) expressing a generic notion of quali-ty. Through the years there have been few ex-amples of more detailed analyses of the trans-lation quality, both automatic (HTER (Snoveret al., 2009), Hjerson (Popovi´c, 2011)) and human(MQM Lommel et al., 2014). Nevertheless, the-se efforts have not been systematic and they haveonly focused on few shallow error categories (e.g.morphology, lexical choice, reordering), whereasthe human evaluation campaigns have been limi-ted by the requirement for manual human effort.Additionally, previous work on MT evaluation fo-cused mostly on the ability of the systems to trans-late test sets sampled from generic text sources,based on the assumption that this text is repre-sentative of a common translation task (Callison-Burch et al., 2007).In order to provide more systematic methods toevaluate MT in a more ﬁne-grained level, recentresearch has relied to the idea of test suites (Guil-lou and Hardmeier, 2016; Isabelle et al., 2017). The test suites are assembled in a way that allowstesting particular issues which are the focus of theevaluation. The evaluation of the systems is not ba-sed on generic text samples, but from the perspec-tive of fulﬁlling a priori quality requirements.In this paper we use the DFKI test suite forGerman → English MT (Burchardt et al., 2017) inorder to analyze the performance of the 16 MTSystems that took part at the translation task ofthe Fourth Conference of Machine Translation.The evaluation focuses on 107 mostly gramma-tical phenomena organized in 14 categories. Inorder to apply the test suite, we follow a semi-automatic methodology that beneﬁts from regu-lar expressions, followed by minimal human re-ﬁnement (Section 3). The application of the sui-te allows us to form conclusions on the particulargrammatical performance of the systems and per-form several comparisons (Section 4).

Several test suites have been presented as part ofthe Test Suite track of the Third Conference ofMachine Translation (Bojar et al., 2018a). Eachtest suite focused on a particular phenomenon,such as discourse (Bojar et al., 2018b), morpho-logy (Burlot et al., 2018), grammatical contrasts(Cinkova and Bojar, 2018), pronouns (Guillouet al., 2018) and word sense disambiguation (Ri-os et al., 2018). In contrast to the above test sui-tes, our test suite is the only one that does sucha systematic evaluation of more than one hundredphenomena. A direct comparison can be done withthe latter related paper, since it focuses at the sa-me language direction. Its authors use automatedmethods to extract text items, whereas in our testsuite the test items are created manually. a r X i v : . [ c s . C L ] O c t Method

The test suite is a manually devised test set who-se contents are chosen with the purpose to test theperformance of the MT system on speciﬁc pheno-mena or requirements related to quality. For eachphenomenon a subset of relevant test sentences ischosen manually. Then, each MT system is re-quested to translate the given subset and the per-formance of the system on the particular pheno-menon is calculated based on the percentage ofthe phenomenon instances that have been properlytranslated.For this paper we use the latest version of theDFKI Test Suite for MT on German to English.The test suite has been presented in (Burchardtet al., 2017) and applied extensively in last year’sshared task (Macketanz et al., 2018b). The cur-rent version contains 5560 test sentences in orderto control 107 phenomena organised in 14 catego-ries. It is similar to the method used last year, withfew minor corrections. The number of the test in-stances per phenomenon varies, ranging between a20 and 180 sentences. A full list of the phenome-na and their categories can be seen as part of theresults in the Appendix. An example list of testsentences with correct and incorrect translations isavailable on GitHub . The construction and the application of the testsuite follows the steps below, also indicated in Fi-gure 1: (a) Produce paradigms : A person with goodknowledge of German and English grammar de-vises or selects a set of source language sentencesthat may trigger translation errors related to parti-cular phenomena. These sentences may be writtenfrom scratch, inspired from previous observationson common MT errors or drawn from existing re-sources (Lehmann et al., 1996). (b) Fetch sample translations : The source sen-tences are given as an input to easily accessibleMT systems and their outputs are fetched. (c) Write regular expressions : By inspecting theMT output for every given sentence, the annotatorwrites rules that control whether the output con-tains a correct translation regarding the respectivephenomenon. The rules are written as positive or https://github.com/DFKI-NLP/TQ_AutoTest Lexical AmbiguityDas Gericht gestern Abend war lecker.The court last night was delicious. failThe dish last night was delicious. passConditionalEr w¨urde einkaufen gehen, wenn die Gesch¨aftenicht geschlossen h¨atten.He would go shopping if the stores didn’t close. failHe would go shopping if the shops hadn’t closed. passPassive voiceEs wurde viel gefeiert und getanzt.A lot was celebrated and danced. failThere was a lot of celebration and dancing. pass

Table 1: Examples of passing and failing MT outputs negative regular expressions, that signify a corrector an incorrect translation respectively. (d) Fetch more translations : When the test sui-te contains a sufﬁcient number of test items withthe respective control rules, the test suite is readyfor its broad application. The test items are conse-quently given to a large number of MT systems.This is done in contact with their developers orthrough the submission process of a shared task,as is the case described in this paper. (e) Apply regular expressions : The control rulesare applied on the MT outputs in order to checkwhether the relevant phenomena have been trans-lated properly. When the MT output matches a po-sitive regular expression, the translation is consi-dered correct ( pass ) whereas when the MT outputmatches a negative regular expression, the trans-lation is considered incorrect ( fail ). Examples canbe seen in Table 1.In case an MT output does not match either apositive or a negative regular expression, or in casethese contradict to each other, the automatic eva-luation results in a uncertain decision ( warning ). (f) Resolve warnings and reﬁne regular expres-sions : The warnings are given to the annotator, sothat they manually resolve them and if possible re-ﬁne the rules to address similar cases in the future.Through the iterative execution of steps (e) and (f)(which are an extension of steps (c) and (d) respec-tively) the rules get more robust and attain a bettercoverage. Additionally, the annotator can add fullsentences as rules, instead of regular expressions,whenFor every system we calculate the phenomenon-speciﬁc translation accuracy as the the number ofthe test sentences for the phenomenon which weretranslated properly, divided by the number of alltest sentences for this phenomenon: igure 1: Example of the preparation and application of the test suite for one test sentence accuracy = correct translationssum of test sentencesWhen doing comparisons, the signiﬁcance ofevery comparison is conﬁrmed with a one-tailedZ-test with α = 0 . . In the evaluation presented in the paper, MT out-puts are obtained from the 16 systems that are partof the news translation task of the Fourth Confe-rence on Machine Translation (WMT19). Accor-ding to the details that the developers have publis-hed by the time this paper is written, 10 of the sys-tems are declared to be Neural Machine Transla-tion (NMT) systems and 9 of them conﬁrm thatthey follow the Transformer paradigm, whereasfor the rest 6 systems no details were given. Forthe evaluation of the MT outputs the software TQ-AutoTest (Macketanz et al., 2018a) was used.After processing the MT output for the 5560items of the test suite, the automatic application ofthe regular expressions resulted to about 10% war-nings. Consequently, one human annotator (stu-dent of linguistics) committed about 70 hours ofwork in order to reduce the warnings to 3%. Theﬁnal results were calculated using 5393 test items,which, after the manual inspection, did not haveany warning for any of the respective MT-outputs.Since we applied the same test suite as last year,this year’s automatic evaluation is proﬁting fromthe manual reﬁnement of the regular expressionsthat took place last year. The ﬁrst application ofthe test suite in 2018 resulted in about 10-45% ofwarnings, whereas this year’s application, we onlyhad 8-28%. This year’s results are therefore based on 16% more valid test items, as compared to lastyear.

The results of the test suite evaluation can be seenin Tables 3 and 4, where the best systems for everycategory or phenomenon are boldfaced. The ave-rage accuracy per system is calculated either basedon all test items (with the assumption that all itemshave equal importance) or based on the categories(with the assumption that all categories have equalimportance). In any case, since the averages arecalculated on an artiﬁcial test suite and not on asample test set, one must be careful with their in-terpretation.

Despite the signiﬁcant progress of NMT and therecent claims for human parity, the results in termsof the test suite are somewhat mediocre. The MTsystems achieve 75.6% accuracy in average for allgiven test items, which indicates that one out offour test items is not translated properly. If oneconsiders the categories separately, only four cate-gories have an accuracy of more than 80%: negati-on , where there are hardly any mistakes, followedby composition , function word and non-verbalagreement . The lowest-performing categories arethe multi-word expressions (MWE) and the verbvalency with about 66% accuracy. Most MT systems seem to struggle with idioms ,since they could only translate properly only11.6% of the ones in our test set, whereas a similarituation can be observed with resultative predica-tes (17.8%).

Negated modal pluperfect and mo-dal pluperfect have an accuracy of only 23-28%and pseudo-cleft sentences of 36.6%. Some of thephenomena have an accuracy of about 50%, in par-ticular the domain-speciﬁc terms, the pseudo-cleftclauses and the modal of pluperfect subjunctive II(negated or not). We may assume that these phe-nomena are not correctly translated because theydo not occur often enough in the training and de-velopment corpora.On the other side, for quite a few phenomenaan accuracy of more than 90% has been achieved.This includes several cases of verbs declinationconcerning the transitive, intransitive and ditran-sitive verbs mostly on perfect and future tenses,the passive voice, the polar question, the inﬁniti-ve clause, the conditional, the focus particles, thelocation and the phrasal verbs.

As seen in Table 3, the system that signiﬁcant-ly wins most categories is Facebook with 11 ca-tegories and an average of 87.5% (if all catego-ries counted equally), followed by DFKI and RW-TH which are in the best cluster for 10 catego-ries. When it comes to averaging all test items,the best systems are RWTH and online-A. On spe-ciﬁc categories, the most clear results come in punctuation where NEU has the best performan-ce with 100% accuracy, whereas Online-X has theworst with 31.7%. Concerning ambiguity , Face-book has the highest performance with 92.6% ac-curacy. In verb tense/aspect/mood , RWTH Aa-chen and Online-A have the highest performan-ce with 84% accuracy, whereas in this category,MSRA.MADL has the lowest performance with60.4%. For the rest of the categories there aresmall differences between the systems, since mo-re than ﬁve systems fall into the same signiﬁcancecluster of the best performance.When looking into particular phenomena (Ta-ble 4), Facebook has the higher accuracy con-cerning lexical ambiguity with an accuracy of93.7%. NEU and MSRA.MADL do best with mo-re than 95% on quotation marks .The best system for translating modal plufe-rect is online-A with 75.6%, whereas at the sa-me category, online-Y and online-G perform wor-se with less than 2.2%. On modal negated - prete-rite , the best systems are RWTH and UCAM with more than 95%. On the contrary, MSRA.MADLachieves the worst accuracy, as compared to othersystems, in phenomena related to modals (perfect,present, preterite, negated modal Future I), whereit mistranslates half of the test items. One system,Online-X, was the worst on quotation marks, asit did not convey properly any of them, comparedto other systems that did relatively well. Online-Y also performs signiﬁcantly worse than the othersystems on domain-speciﬁc terms.

One can attempt to do a vague comparison ofthe statistics between two consequent years (Ta-ble 2). Here, the last column indicates the per-centage of improvement from the average accura-cy of all systems from last year’s shared task tothe average accuracy of all systems of this year.Although this is not entirely accurate, since diffe-rent systems participate, we assume that the lar-ge amount of the test items allows some gene-ralisations to this direction. When one comparesthe overall accuracy, there has been an improve-ment of about 6%. When focusing on particularcategories, the biggest improvements are seen atfunction words (+12.5%), non-verbal agreement(+9.7%) and punctuation (+8%). The smallest im-provement is seen at named entity and terminolo-gy (+0.3%).We also attempt to perform comparisons of thesystems which were submitted with the same na-me both years. Again, the comparison should bedone under the consideration that the MT systemsare different in many aspects, which are not pos-sible to consider at the time this paper is written.The highest improvement is shown by the systemonline-G, which has an average accuracy improve-ment of 18.7%, with most remarkable the one con-cerning negation, function words and non-verbalagreement. Online-A has also improved at compo-sition, verb issues and non-verbal agreement andRWTH and UEDIN at punctuation. On the contra-ry, we can notice that UCAM deteriorated its ac-curacy for several categories, mostly for coordina-tion and ellipsis (-13.1%), verb issues (-7.6%) andcomposition (-4.7%). JHU and Online-G and RW-TH show some deterioration for three categorieseach, whereas Online-A seems to have worsenedconsiderably regarding punctuation (-21.6%) andUEDIN regarding negation (-10.5%). unsupervised systems excludedategory Table 2: Percentage (%) of accuracy improvement or deterioration between WMT18 and WMT19 for all thesystems submitted (averaged in last column) and the systems submitted with the same name

The application of the test suite results in a mul-titude of ﬁndings of minor or major importan-ce. Despite the recent advances, state-of-the-artGerman → English MT still translates erroneous-ly one out of four test items of our test suite, in-dicating that there is still room for improvement.For instance, one can note the low performance onMWE and verb valency, whereas there are issueswith idioms, modals and pseudo-clefts. Functionwords, non verbal agreement and punctuation onthe other side have signiﬁcantly improved.One potential beneﬁt of the test suite would beto investigate the implication of particular deve-lopment settings and design decisions on particu-lar phenomena. For some superﬁcial issues, suchas punctuation, this would be relatively easy, aspre- and post-processing steps may be responsible.But for more complex phenomena, further compa-rative analysis of settings is needed. Unfortunate-ly, this was hard to achieve for this shared task dueto the heterogeneity of the systems, but also due tothe fact that at the time this paper was written, noexact details about the systems were known. Weaim at looking further on such an analysis in fu-ture steps.

Acknowledgments

This research was supported by the German Fe-deral Ministry of Education and Research throughthe projects DEEPLEE (01IW17001) and BBDC2(01IS18025E).

References

Ondej Bojar, Rajen Chatterjee, Christian Federmann,Mark Fishel, Yvette Graham, Barry Haddow, Matt-hias Huck, Antonio Jimeno Yepes, Philipp Koehn,Christof Monz, Matteo Negri, Aur´elie N´ev´eol, Ma-riana Neves, Matt Post, Lucia Specia, Marco Turchi,and Karin Verspoor, editors. 2018a.

Proceedings ofthe Third Conference on Machine Translation . As-sociation for Computational Linguistics, Belgium,Brussels.Ondej Bojar, Ji´ı M´ırovsk´y, Kateina Rysov´a, and Mag-dal´ena Rysov´a. 2018b. EvalD Reference-Less Dis-course Evaluation for WMT18. In

Proceedings ofthe Third Conference on Machine Translation , pages545–549, Belgium, Brussels. Association for Com-putational Linguistics.Aljoscha Burchardt, Vivien Macketanz, Jon Dehda-ri, Georg Heigold, Jan-Thorsten Peter, and Phi-lip Williams. 2017. A Linguistic Evaluation ofRule-Based, Phrase-Based, and Neural MT Engines.

The Prague Bulletin of Mathematical Linguistics ,108:159–170.Franck Burlot, Yves Scherrer, Vinit Ravishankar,Ondej Bojar, Stig-Arne Gr¨onroos, Maarit Ko-ponen, Tommi Nieminen, and Franc¸ois Yvon.2018. The WMT’18 Morpheval test suites forEnglish-Czech, English-German, English-Finnishand Turkish-English. In

Proceedings of the ThirdConference on Machine Translation , pages 550–564, Belgium, Brussels. Association for Computa-tional Linguistics.Chris Callison-Burch, Cameron Fordyce, PhilippKoehn, Christof Monz, and Josh Schroeder. 2007.(Meta-) evaluation of machine translation. In

Pro-ceedings of the Second Workshop on Statistical Ma-chine Translation , pages 136–158, Prague, CzechRepublic. Association for Computational Lingui-stics.ilvie Cinkova and Ondej Bojar. 2018. Testsuite onCzech–English Grammatical Contrasts. In

Procee-dings of the Third Conference on Machine Translati-on , pages 565–575, Belgium, Brussels. Associationfor Computational Linguistics.Liane Guillou and Christian Hardmeier. 2016. PRO-TEST: A Test Suite for Evaluating Pronouns in Ma-chine Translation.

Tenth International Conferenceon Lan- guage Resources and Evaluation (LREC2016) .Liane Guillou, Christian Hardmeier, EkaterinaLapshinova-Koltunski, and Sharid Lo´aiciga.2018. A Pronoun Test Suite Evaluation of theEnglish–German MT Systems at WMT 2018. In

Proceedings of the Third Conference on MachineTranslation , pages 576–583, Belgium, Brussels.Association for Computational Linguistics.Pierre Isabelle, Colin Cherry, and George Foster. 2017.A Challenge Set Approach to Evaluating MachineTranslation. In

EMNLP 2017: Conference on Empi-rical Methods in Natural Language Processing , Co-penhagen, Denmark. Association for ComputationalLinguistics.Sabine Lehmann, Stephan Oepen, Sylvie Regnier-Prost, Klaus Netter, Veronika Lux, Judith Klein,Kirsten Falkedal, Frederik Fouvry, Dominique Esti-val, Eva Dauphin, Herve Compagnion, Judith Baur,Lorna Balkan, and Doug Arnold. 1996. TSNLP -Test Suites for Natural Language Processing.

Pro-ceedings of the 16th . . . , page 7.Arle Lommel, Aljoscha Burchardt, Maja Popovi´c, KimHarris, Eleftherios Avramidis, and Hans Uszkoreit.2014. Using a new analytic measure for the annota-tion and analysis of MT errors on real data. In

Pro-ceedings of the 17th Annual Conference of the Eu-ropean Association for Machine Translation , pages165–172. Croatian Language Technologies Society,European Association for Machine Translation.Vivien Macketanz, Renlong Ai, Aljoscha Burchardt,and Hans Uszkoreit. 2018a. TQ-AutoTest An Auto-mated Test Suite for (Machine) Translation Quality.In

Proceedings of the Eleventh International Con-ference on Language Resources and Evaluation. In-ternational Conference on Language Resources andEvaluation (LREC-2018), 11th, May 7-12, Miyaza-ki, Japan . European Language Resources Associati-on (ELRA).Vivien Macketanz, Eleftherios Avramidis, AljoschaBurchardt, and Hans Uszkoreit. 2018b. Fine-grained evaluation of German-English MachineTranslation based on a Test Suite. In

Proceedingsof the Third Conference on Machine Translation(WMT18) , Brussels, Belgium. Association for Com-putational Linguistics.Maja Popovi´c. 2011. Hjerson: An Open SourceTool for Automatic Error Classiﬁcation of MachineTranslation Output.

The Prague Bulletin of Mathe-matical Linguistics , 96(-1):59–68. Annette Rios, Mathias M¨uller, and Rico Sennrich.2018. The Word Sense Disambiguation Test Sui-te at WMT18. In

Proceedings of the Third Confe-rence on Machine Translation , pages 594–602, Bel-gium, Brussels. Association for Computational Lin-guistics.Matthew Snover, Nitin Madnani, Bonnie J Dorr, andRichard Schwartz. 2009. Fluency, adequacy, orHTER?: exploring different human judgments witha tunable MT metric. In

Proceedings of the FourthWorkshop on Statistical Machine Translation , num-ber March in StatMT ’09, pages 259–268, Strouds-burg, PA, USA. Association for Computational Lin-guistics. A pp e nd i ce s D F K I F B J HU MM L P M S R AN E U on l A on l B on l G on l X on l Y P R O M T R W T H T a r t u U C A M U E D I N a vg A m b i gu it y8170 . . . . . . . . . . . . . . . . . C o m po s iti on48 . . . . . . . . . . . . . . . . . C oo r d i n a ti on & e lli p s i s . . . . . . . . . . . . . . . . . F a l s e fr i e nd s . . . . . . . . . . . . . . . . . F un c ti on w o r d60 . . . . . . . . . . . . . . . . . L DD & i n t e rr og a ti v e s . . . . . . . . . . . . . . . . . M W E . . . . . . . . . . . . . . . . . N a m e d e n tit y & t e r m i no l ogy87 . . . . . . . . . . . . . . . . . N e g a ti on20100 . . . . . . . . . . . . . . . . . N on - v e r b a l a g r ee m e n t . . . . . . . . . . . . . . . . . P un c t u a ti on6085 . . . . . . . . . . . . . . . . . S ubo r d i n a ti on168 . . . . . . . . . . . . . . . . . V e r b t e n s e / a s p ec t/ m ood437577 . . . . . . . . . . . . . . . . . V e r bv a l e n c y86 . . . . . . . . . . . . . . . . . a v e r a g e ( it e m s ) . . . . . . . . . . . . . . . . . a v e r a g e ( ca t e go r i e s ) . . . . . . . . . . . . . . . . . T a b l e : A cc u r ac i e s o f s u cce ss f u lt r a n s l a ti on s f o r s y s t e m s a nd14 ca t e go r i e s . B o l d f ace i nd i ca t e ss i gn i ﬁ ca n tl yb e s t s y s t e m s i n eac h r o w D F K I F B J HU M LL P M S R AN E U on l A on l B on l G on l X on l Y P R O M T R W T H T a r t u U C A M U E D I N a vg A m b i gu it y8170 . . . . . . . . . . . . . . . . . L e x i ca l a m b i gu it y6373 . . . . . . . . . . . . . . . . . S t r u c t u r a l a m b i gu it y18 . . . . . . . . . . . . . . . . . C o m po s iti on48 . . . . . . . . . . . . . . . . . C o m pound28 . . . . . . . . . . . . . . . . . P h r a s a l v e r b20 . . . . . . . . . . . . . . . . . C oo r d i n a ti on & e lli p s i s . . . . . . . . . . . . . . . . . G a pp i ng19 . . . . . . . . . . . . . . . . . R i gh t nod e r a i s i ng20 . . . . . . . . . . . . . . . . . S l u i c i ng1888 . . . . . . . . . . . . . . . . . S t r i pp i ng17 . . . . . . . . . . . . . . . . . F a l s e fr i e nd s . . . . . . . . . . . . . . . . . F un c ti on w o r d60 . . . . . . . . . . . . . . . . . F o c u s p a r ti c l e . . . . . . . . . . . . . . . . . M od a l p a r ti c l e . . . . . . . . . . . . . . . . . Q u e s ti on t a g18 . . . . . . . . . . . . . . . . . L DD & i n t e rr og a ti v e s . . . . . . . . . . . . . . . . . D F K I F B J HU M LL P M S R AN E U on l A on l B on l G on l X on l Y P R O M T R W T H T a r t u U C A M U E D I N a vg E x t e nd e d a d j ec ti v ec on s t r u c ti on18 . . . . . . . . . . . . . . . . . E x t r a po s iti on1844 . . . . . . . . . . . . . . . . . M u lti p l ec onn ec t o r s . . . . . . . . . . . . . . . . . P i e d - p i p i ng19 . . . . . . . . . . . . . . . . . P o l a r qu e s ti on19 . . . . . . . . . . . . . . . . . S c r a m b li ng17 . . . . . . . . . . . . . . . . . T op i ca li za ti on18 . . . . . . . . . . . . . . . . . W h - m ov e m e n t . . . . . . . . . . . . . . . . . M W E . . . . . . . . . . . . . . . . . C o ll o ca ti on1968 . . . . . . . . . . . . . . . . . I d i o m . . . . . . . . . . . . . . . . . P r e po s iti on a l M W E . . . . . . . . . . . . . . . . . V e r b a l M W E . . . . . . . . . . . . . . . . . N a m e d e n tit y & t e r m i no l ogy87 . . . . . . . . . . . . . . . . . D a t e . . . . . . . . . . . . . . . . . D o m a i n s p ec i ﬁ c t e r m . . . . . . . . . . . . . . . . . L o ca ti on20 . . . . . . . . . . . . . . . . . M ea s u r i ngun it . . . . . . . . . . . . . . . . . P r op e r n a m e . . . . . . . . . . . . . . . . . N e g a ti on20100 . . . . . . . . . . . . . . . . . N on - v e r b a l a g r ee m e n t . . . . . . . . . . . . . . . . . C o r e f e r e n ce . . . . . . . . . . . . . . . . . E x t e r n a l po ss e ss o r . . . . . . . . . . . . . . . . . I n t e r n a l po ss e ss o r . . . . . . . . . . . . . . . . . P un c t u a ti on6085 . . . . . . . . . . . . . . . . . C o mm a . . . . . . . . . . . . . . . . . Q uo t a ti on m a r k s . . . . . . . . . . . . . . . . . S ubo r d i n a ti on168 . . . . . . . . . . . . . . . . . A dv e r b i a l c l a u s e . . . . . . . . . . . . . . . . . C l e f t s e n t e n ce . . . . . . . . . . . . . . . . . F r ee r e l a ti v ec l a u s e . . . . . . . . . . . . . . . . . I nd i r ec t s p eec h19 . . . . . . . . . . . . . . . . . I n ﬁ n iti v ec l a u s e . . . . . . . . . . . . . . . . . O b j ec t c l a u s e . . . . . . . . . . . . . . . . . P s e udo - c l e f t s e n t e n ce . . . . . . . . . . . . . . . . . R e l a ti v ec l a u s e . . . . . . . . . . . . . . . . . S ub j ec t c l a u s e . . . . . . . . . . . . . . . . . V e r b t e n s e / a s p ec t/ m ood437577 . . . . . . . . . . . . . . . . . C ond iti on a l . . . . . . . . . . . . . . . . . D it r a n s iti v e -f u t u r e I . . . . . . . . . . . . . . . . . D it r a n s iti v e -f u t u r e I s ub j un c ti v e II . . . . . . . . . . . . . . . . . D it r a n s iti v e -f u t u r e II . . . . . . . . . . . . . . . . . D it r a n s iti v e -f u t u r e II s ub j un c ti v e II . . . . . . . . . . . . . . . . . D F K I F B J HU M LL P M S R AN E U on l A on l B on l G on l X on l Y P R O M T R W T H T a r t u U C A M U E D I N a vg D it r a n s iti v e - p e rf ec t . . . . . . . . . . . . . . . . . D it r a n s iti v e - p l up e rf ec t . . . . . . . . . . . . . . . . . D it r a n s iti v e - p l up e rf ec t s ub j un c ti v e II . . . . . . . . . . . . . . . . . D it r a n s iti v e - p r e s e n t . . . . . . . . . . . . . . . . . D it r a n s iti v e - p r e t e r it e . . . . . . . . . . . . . . . . . D it r a n s iti v e - p r e t e r it e s ub j un c ti v e II . . . . . . . . . . . . . . . . . I m p e r a ti v e . . . . . . . . . . . . . . . . . I n t r a n s iti v e -f u t u r e I . . . . . . . . . . . . . . . . . I n t r a n s iti v e -f u t u r e I s ub j un c ti v e II . . . . . . . . . . . . . . . . . I n t r a n s iti v e -f u t u r e II . . . . . . . . . . . . . . . . . I n t r a n s iti v e -f u t u r e II s ub j un c ti v e II . . . . . . . . . . . . . . . . . I n t r a n s iti v e - p e rf ec t . . . . . . . . . . . . . . . . . I n t r a n s iti v e - p l up e rf ec t . . . . . . . . . . . . . . . . . I n t r a n s iti v e - p l up e rf ec t s ub j un c ti v e II . . . . . . . . . . . . . . . . . I n t r a n s iti v e - p r e s e n t . . . . . . . . . . . . . . . . . I n t r a n s iti v e - p r e t e r it e . . . . . . . . . . . . . . . . . I n t r a n s iti v e - p r e t e r it e s ub j un c ti v e II . . . . . . . . . . . . . . . . . M od a l -f u t u r e I . . . . . . . . . . . . . . . . . M od a l -f u t u r e I s ub j un c ti v e II . . . . . . . . . . . . . . . . . M od a l - p e rf ec t . . . . . . . . . . . . . . . . . M od a l - p l up e rf ec t . . . . . . . . . . . . . . . . . M od a l - p l up e rf ec t s ub j un c ti v e II . . . . . . . . . . . . . . . . . M od a l - p r e s e n t . . . . . . . . . . . . . . . . . M od a l - p r e t e r it e . . . . . . . . . . . . . . . . . M od a l - p r e t e r it e s ub j un c ti v e II . . . . . . . . . . . . . . . . . M od a l n e g a t e d -f u t u r e I . . . . . . . . . . . . . . . . . M od a l n e g a t e d -f u t u r e I s ub j un c ti v e II . . . . . . . . . . . . . . . . . M od a l n e g a t e d - p e rf ec t . . . . . . . . . . . . . . . . . M od a l n e g a t e d - p l up e rf ec t . . . . . . . . . . . . . . . . . M od a l n e g a t e d - p l up e rf ec t s ub j un c ti v e II . . . . . . . . . . . . . . . . . M od a l n e g a t e d - p r e s e n t . . . . . . . . . . . . . . . . . M od a l n e g a t e d - p r e t e r it e . . . . . . . . . . . . . . . . . M od a l n e g a t e d - p r e t e r it e s ub j un c ti v e II . . . . . . . . . . . . . . . . . P r og r e ss i v e . . . . . . . . . . . . . . . . . R e ﬂ e x i v e -f u t u r e I . . . . . . . . . . . . . . . . . R e ﬂ e x i v e -f u t u r e I s ub j un c ti v e II . . . . . . . . . . . . . . . . . R e ﬂ e x i v e -f u t u r e II . . . . . . . . . . . . . . . . . R e ﬂ e x i v e -f u t u r e II s ub j un c ti v e II . . . . . . . . . . . . . . . . . R e ﬂ e x i v e - p e rf ec t . . . . . . . . . . . . . . . . . R e ﬂ e x i v e - p l up e rf ec t . . . . . . . . . . . . . . . . . R e ﬂ e x i v e - p l up e rf ec t s ub j un c ti v e II . . . . . . . . . . . . . . . . . R e ﬂ e x i v e - p r e s e n t . . . . . . . . . . . . . . . . . R e ﬂ e x i v e - p r e t e r it e . . . . . . . . . . . . . . . . . D F K I F B J HU M LL P M S R AN E U on l A on l B on l G on l X on l Y P R O M T R W T H T a r t u U C A M U E D I N a vg R e ﬂ e x i v e - p r e t e r it e s ub j un c ti v e II . . . . . . . . . . . . . . . . . T r a n s iti v e -f u t u r e I . . . . . . . . . . . . . . . . . T r a n s iti v e -f u t u r e I s ub j un c ti v e II . . . . . . . . . . . . . . . . . T r a n s iti v e -f u t u r e II . . . . . . . . . . . . . . . . . T r a n s iti v e -f u t u r e II s ub j un c ti v e II . . . . . . . . . . . . . . . . . T r a n s iti v e - p e rf ec t . . . . . . . . . . . . . . . . . T r a n s iti v e - p l up e rf ec t . . . . . . . . . . . . . . . . . T r a n s iti v e - p l up e rf ec t s ub j un c ti v e II . . . . . . . . . . . . . . . . . T r a n s iti v e - p r e s e n t . . . . . . . . . . . . . . . . . T r a n s iti v e - p r e t e r it e . . . . . . . . . . . . . . . . . T r a n s iti v e - p r e t e r it e s ub j un c ti v e II . . . . . . . . . . . . . . . . . V e r bv a l e n c y86 . . . . . . . . . . . . . . . . . C a s e gov e r n m e n t . . . . . . . . . . . . . . . . . M e d i op a ss i v e vo i ce . . . . . . . . . . . . . . . . . P a ss i v e vo i ce . . . . . . . . . . . . . . . . . R e s u lt a ti v e p r e d i ca t e s . . . . . . . . . . . . . . . . . a v e r a g e ( it e m s ) . . . . . . . . . . . . . . . . . T a b l e : A cc u r ac i e s ( % ) o f s u cce ss f u lt r a n s l a ti on s f o r s y s t e m s a nd107ph e no m e n a o r g a n i ze d i n14 ca t e go r i e s . B o l d f ace i nd i ca t e s t h e s i gn i ﬁ ca n tl yb e s t s y s t e m s i n eac h r o ww