[PDF] Understanding Pre-Editing for Black-Box Neural Machine Translation

Abstract

Pre-editing is the process of modifying the source text (ST) so that it can be translated by machine translation (MT) in a better quality. Despite the unpredictability of black-box neural MT (NMT), pre-editing has been deployed in various practical MT use cases. Although many studies have demonstrated the effectiveness of pre-editing methods for particular settings, thus far, a deep understanding of what pre-editing is and how it works for black-box NMT is lacking. To elicit such understanding, we extensively investigated human pre-editing practices. We first implemented a protocol to incrementally record the minimum edits for each ST and collected 6,652 instances of pre-editing across three translation directions, two MT systems, and four text domains. We then analysed the instances from three perspectives: the characteristics of the pre-edited ST, the diversity of pre-editing operations, and the impact of the pre-editing operations on NMT outputs. Our findings include the following: (1) enhancing the explicitness of the meaning of an ST and its syntactic structure is more important for obtaining better translations than making the ST shorter and simpler, and (2) although the impact of pre-editing on NMT is generally unpredictable, there are some tendencies of changes in the NMT outputs depending on the editing operation types.

Full PDF

UUnderstanding Pre-Editing for Black-Box Neural Machine Translation

Rei Miyata † Atsushi Fujita ‡† Nagoya UniversityFuro-cho, Chikusa-ku, Nagoya, 464-8601, Japan [email protected] ‡ National Institute of Information and Communications Technology3-5 Hikaridai, Seika-cho, Souraku-gun, Kyoto, 619-0289, Japan [email protected]

Abstract

Pre-editing is the process of modifying thesource text (ST) so that it can be translatedby machine translation (MT) in a better quality.Despite the unpredictability of black-box neu-ral MT (NMT), pre-editing has been deployedin various practical MT use cases. Althoughmany studies have demonstrated the effective-ness of pre-editing methods for particular set-tings, thus far, a deep understanding of whatpre-editing is and how it works for black-boxNMT is lacking. To elicit such understanding,we extensively investigated human pre-editingpractices. We ﬁrst implemented a protocol toincrementally record the minimum edits foreach ST and collected 6,652 instances of pre-editing across three translation directions, twoMT systems, and four text domains. We thenanalysed the instances from three perspectives:the characteristics of the pre-edited ST, the di-versity of pre-editing operations, and the im-pact of the pre-editing operations on NMT out-puts. Our ﬁndings include the following: (1)enhancing the explicitness of the meaning ofan ST and its syntactic structure is more im-portant for obtaining better translations thanmaking the ST shorter and simpler, and (2) al-though the impact of pre-editing on NMT isgenerally unpredictable, there are some ten-dencies of changes in the NMT outputs de-pending on the editing operation types.

Recent advances in machine translation (MT) havegreatly facilitated its practical use in various set-tings from business documentation to personalcommunication. In many practical cases, MT sys-tems are used as black-box and one well-testedapproach to make use of a black-box MT is pre-editing, i.e., modifying the source text (ST) to makeit suitable for the intended MT system.The effectiveness of pre-editing has so far beendemonstrated in many studies (Pym, 1990; O’Brien and Roturier, 2007; Seretan et al., 2014). A studyfocusing on statistical MT (SMT) has also shownthat more than 90% of an ST can be rewritten intoa text that can be machine-translated with sufﬁcientquality (Miyata and Fujita, 2017), exhibiting thepotential of the pre-editing approach.However, the feasibility and possibility of pre-editing for neural MT (NMT) has not been ex-amined extensively. While efforts have recentlybeen invested in the implementation of pre-editingstrategies for black-box NMT settings, achievingimproved MT quality (e.g., Hiraoka and Yamada,2019; Mehta et al., 2020), the potential gains of pre-editing remain unexplored. Notably, the impact ofpre-editing on black-box MT is unpredictable innature. In particular, NMT models trained in anend-to-end manner can be sensitive to minor modi-ﬁcations of the ST (Cheng et al., 2019), which mayaffect the feasibility of pre-editing.In short, while pre-editing has been implementedin practical MT use cases, what pre-editing is andhow it works with black-box NMT systems remainopen questions. To explore the possibility of pre-editing and its automation, in this study, we provideﬁne-grained analyses of human pre-editing prac-tices and their impact on NMT. We systematicallycollected pre-editing instances in various condi-tions, i.e., translation directions, NMT systems,and text domains (§3). We then conducted in-depthanalyses of the collected instances from the fol-lowing three perspectives: the characteristics ofthe pre-edited ST (§4), the diversity of pre-editingoperations (§5), and the impact of pre-editing op-erations on the NMT outputs (§6). The ﬁndings ofthese analyses provide useful insights into the ef-fective and efﬁcient implementation of pre-editingfor the better use of black-box NMT systems in thefuture, as well as the robustness of current NMTsystems when STs are manually perturbed. a r X i v : . [ c s . C L ] F e b Related Work

Pre-editing is the process of rewriting the sourcetext (ST) to be translated in order to obtain bettertranslations by MT. Though the scope of effectivepre-editing operations depends on the downstreamMT system and there is no deterministic relationbetween pre-editing operations and the quality ofMT output, its effectiveness has been demonstratedfor various translation directions, MT architectures,and text domains.Manual pre-editing has long been implementedin combination with controlled languages (Pym,1990; Reuther, 2003; Nyberg et al., 2003; Kuhn,2014). In the period of rule-based MT (RBMT),pre-editing was considered as a promising ap-proach since the behaviour of RBMT is more pre-dictable and controllable. For example, O’Brienand Roturier (2007) examined the impact of En-glish controlled language rules on two differentMT engines, revealing the rules of high effective-ness. The pre-editing approach with controlledlanguages has also been tested for statistical MT(SMT) (Aikawa et al., 2007; Hartley et al., 2012;Seretan et al., 2014). These studies developed orutilised a set of controlled language rules for rewrit-ing ST. While these rule sets are optimised forparticular MT systems and differ from each other,we can observe some shared characteristics amongthem. In particular, rules that prohibit long sen-tences (e.g., of more than 25 words) are widelyadopted in the existing rule sets (O’Brien, 2003).Automation of pre-editing is also an impor-tant research ﬁeld in natural language processing.Semi-automatic tools such as controlled languagecheckers (Bernth and Gdaniec, 2001; Mitamuraet al., 2003) and interactive rewriting assistants(Mirkin et al., 2013; Gulati et al., 2015) were de-veloped to facilitate manual pre-editing activities.Fully automatic pre-editing has long been explored(e.g., Shirai et al., 1998; Mitamura and Nyberg,2001; Yoshimi, 2001; Sun et al., 2010). In par-ticular, many researchers have examined methodsof reordering the source-side word order as a pre-translation processing (Xia and McCord, 2004; Liet al., 2007; Hoshino et al., 2015). While the re-ordering approach has generally proven effectivefor SMT, its effectiveness for NMT is not obvious;negative effects have even be reported (Zhu, 2015;Du and Way, 2017). In recent years, techniques ofautomatic text simpliﬁcation have been applied toimprove NMT outputs ( ˇStajner and Popovi´c, 2018; Mehta et al., 2020). The underlying assumptionof these studies is that simpler sentences are moremachine translatable.Previous studies have investigated various pre-editing methods from different perspectives, fo-cusing on different linguistic phenomena. Indeed,individual research has led to improved MT results.However, what is crucially needed is a broad under-standing of what pre-editing is and how it works.For example, Miyata and Fujita (2017) addressedthis issue by collecting instances of bilingual pre-editing , i.e., pre-editing ST while referring to itsMT output, done by human editors and analysingthem in detail. They demonstrated the maximumgain of pre-editing for an SMT and provided a com-prehensive typology of editing operations. Never-theless, their study has two major limitations: (1)recent NMT was not examined, and (2) practicalinsights for better practices of pre-editing were notsufﬁciently presented.NMT models trained in an end-to-end mannerbehave very differently from SMT and RBMT,which, in turn, affects pre-editing practices. Asreported in several studies, despite their rapid im-provement, NMT models are still vulnerable to in-put noise (Belinkov and Bisk, 2018; Ebrahimi et al.,2018; Cheng et al., 2019; Niu et al., 2020). Thepre-editing operations identiﬁed in previous stud-ies are not necessarily effective for current black-box NMT systems. For example, Marzouk andHansen-Schirra (2019) adopted nine controlled lan-guage rules and evaluated their impact on the MToutput for German-to-English translation in thetechnical domain. The human evaluation resultsrevealed that these rules improved the performanceof the RBMT, SMT, and hybrid systems, but didnot have positive effects on the NMT system. Hi-raoka and Yamada (2019) demonstrated the effec-tiveness of the following three pre-editing rules inimproving Japanese-to-English TED Talk subtitle The ideal goal of the pre-editing approach is to adapt theSTs to what the intended NMT system can properly translate,and in the end, what it has been trained on, i.e., training data.For a black-box MT system, because we cannot directly referto its training data, we should grasp its statistical characteris-tics indirectly through MT output. The rules are as follows: (1) using straight quotes forinterface texts, (2) avoiding light-verb construction, (3) for-mulating conditions as if sentences, (4) using unambiguouspronominal references, (5) avoiding participial constructions,(6) avoiding passives, (7) avoiding constructions with “sein”+ “zu” + inﬁnitive, (8) avoiding superﬂuous preﬁxes, and (9)avoiding omitting parts of the words (Marzouk and Hansen-Schirra, 2019, p.184). . Perfect

Information in the original text has been completely translated. There are no grammatical errors inthe translation. The word choice and phrasing are natural even from a native speaker’s point ofview.

4. Good

The word choice and phrasing are slightly unnatural, but the information in the original text hasbeen completely translated, and there are no grammatical errors in the translation.

3. Fair

There are some minor errors in the translation of less important information in the original text, butthe meaning of the original text can be easily understood.

2. Acceptable

Important parts of the original text are omitted or incorrectly translated, but the core meaning ofthe original text can still be understood with some effort.

1. Incorrect/nonsense

The meaning of the original text is incomprehensible.

Table 1: MT evaluation criterion adopted in Miyata and Fujita (2017): The “Perfect” and “Good” ratings areregarded as satisfactory quality.

Best path Best-STOrg-ST

Figure 1: Tree representation of STversions in a unit.

Name Domain Mode Size Avg. length (S.D.) hospital hospital conversation spoken 25 13.0 (4.7)municipal municipal procedure written 25 20.4 (10.7)bccwj Japanese-origin newsarticle from BCCWJ written 25 28.6 (18.6)reuters English-origin newsarticle from Reuters written 25 36.8 (15.3)

Table 2: Statistics for the Org-ST datasets for pre-editing. translation using a black-box NMT system: (1) in-serting punctuation, (2) making implied subjectsand objects explicit, and (3) writing proper nounsin the target language (English).As these studies cover a limited range of lin-guistic phenomena, translation directions, and textdomains, we are not in the position to draw decisiveconclusions; we still do not know what types ofpre-editing operations are possible and how NMTis affected when these operations are performed.To elicit the best pre-editing practices for NMT, asa starting point, we need to understand what is hap-pening and what can be obtained in the process ofpre-editing, while also re-examining the previousﬁndings and conventional methods.

To collect ﬁne-grained manual pre-editing in-stances, we adopted the protocol formalised byMiyata and Fujita (2017), in which a human editorincrementally and minimally rewrites an ST on atrial-and-error basis with the aim of obtaining bet-ter MT output. An original ST (Org-ST) and itspre-edited versions are collectively called a unit .Using an online editing platform we developed, ed-itors implement the protocol in the following steps:

Step 1.

Evaluate the MT output of the current STbased on a 5-point scale criterion shown inTable 1. If the quality of the MT output issatisfactory (i.e., “Perfect” or “Good”), go toStep 4; otherwise, go to Step 2.

Step 2.

Select one of the versions of the ST in theunit to be rewritten and go to Step 3. If none ofthe versions are likely to become satisfactorythrough further edits, go to Step 4.

Step 3.

Minimally edit the ST while maintainingits meaning, referring to the correspondingMT output. The MT output for the edited STis automatically generated and registered inthe unit. Return to Step 1. Step 4.

Select one version of the ST that achievesthe best MT quality (Best-ST) from amongall the versions in the unit, and terminate theprocess for the unit.The pre-editing instances in a unit collectedthrough this protocol form a tree structure as shownin Figure 1. We refer to the shortest path betweenthe Org-ST and the Best-ST as

Best path . An im-portant extension to the work in Miyata and Fujita(2017) is that our platform provides editors with avisualisation of the tree representation of the pre-editing history. This can facilitate the selection ofST versions in Step 2.

To extensively investigate pre-editing phenomena,we prepared the following conditions:

Translation directions:

We targeted Japanese-to-English (Ja-En), Japanese-to-Chinese (Ja-Zh),and Japanese-to-Korean (Ja-Ko) translations. We operationally deﬁned “to minimally edit” as “to mod-ify an ST with a small edit that is difﬁcult to be further decom-posed into more than one independent edit, without inducingungrammaticality in the edited sentence.” ang. System Domain Num. of pre-editing instances Num. of unitsTotal Avg. Med. Max Org=Satisfactory Best=SatisfactoryJa-En Google hospital

255 10.2 7 55 4/25

162 6.5 5 44 9/25

545 21.8 10.5 171 7/25 23/25 reuters

370 14.8 6.5 80 7/25

139 5.6 5.5 25 7/25

136 5.4 4 35 10/25

493 19.7 11.5 79 2/25 22/25 reuters

492 19.7 18 86 4/25 24/25

Ja-Zh Google hospital

264 10.6 10 30 0/25 24/25 municipal

376 15.0 13 41 0/25 23/25 bccwj

427 17.1 16 41 2/25 20/25 reuters

304 12.2 10 27 0/25 24/25

TexTra hospital

160 6.4 6.5 15 1/25

172 6.9 7 20 2/25

231 9.2 5 38 4/25 22/25 reuters

249 10.0 7 31 1/25 22/25

Ja-Ko Google hospital

209 8.4 9 22 0/25

225 9.0 8 26 0/25

223 8.9 7 27 1/25 22/25 reuters

293 11.7 10 33 0/25 24/25

TexTra hospital

160 6.4 6 26 2/25

171 6.8 5 32 2/25

277 11.1 6 28 3/25 23/25 reuters

319 12.8 11 38 1/25 23/25

Table 3: Statistics for the collected pre-editing instances and the MT quality achievement.

MT systems:

As black-box MT systems, weadopted Google Translate and TexTra. Bothare general-purpose NMT systems that areprevalently used for translating Japanese textsinto other languages.

Text domains:

We selected four text domains,whose linguistic characteristics, such as modeand sentence length, are different from eachother (see Table 2 for details).We randomly selected 25 Japanese sentences foreach of the four text domains, and used the result-ing ST set consisting of 100 sentences for all ofthe six combinations of translation direction andMT system. We assigned one editor to each transla-tion direction. Each editor was asked to work withboth MT systems, without being informed of thetype of MT system used in the task. All editorswere professional translators with sufﬁcient writ-ing skills in Japanese and experience for evaluatingMT outputs. Before the commencement of the for-mal tasks, we trained the editors using examplesentences so that they could become accustomedto the task and platform.The Ja-En task was implemented from Novem-ber to December 2019; the Ja-Zh and Ja-Ko taskswere implemented from December 2019 to Febru-ary 2020. https://translate.google.com/ https://textra.nict.go.jp/ Table 3 shows statistics for the pre-editing instancescollected through the protocol described above. Ingeneral, the numbers of collected instances forthe hospital and municipal domains were smallerthan those for the bccwj and reuters domains, re-ﬂecting the inﬂuence of sentence length of the Org-ST. In other words, the shorter the sentence is, thefewer parts there are to be edited.A notable ﬁnding is that while only about 11%(69/600) of the MT output for the Org-ST was ofsatisfactory quality, 95% (571/600) of the MT out-put of the Best-ST was satisfactory. This meansthat almost all the ST can be pre-edited into a formthat can lead to satisfactory MT output, demonstrat-ing the potential of both pre-editing and NMT.The number of collected instances can be in-terpreted as the editing efforts required to obtainthe Best-ST from the Org-ST. In most of the set-tings, the median number of collected instancesfor a unit falls in the range of 5 to 10. It is thusnecessary to optimise the pre-editing process for anintended MT system. The length of the Best pathapproximates the minimum editing efforts neededto obtain the Best-ST. The total number of pre-editing instances in the Best path was 2,443, whilethe total of all instances is 6,652. This implies thatthere is substantial opportunity for reduction of thepre-editing efforts. rg-ST Ja-En (Best-ST) Ja-Zh (Best-ST) Ja-Ko (Best-ST)Google TexTra Google TexTra Google TexTraSentence length Avg.

S.D.

Med.

Attachment distance Avg. (Avg. per sentence) S.D.

Med.

Dependency depth Avg.

S.D.

Med.

Lexical diversity Token (A)

Type (B)

A/B

Word frequency rank 25th (Percentile) 50th (Med.)

170 143 154 143 155 143 169.5

Table 4: Linguistic characteristics of the Org-ST and Best-ST.

To understand the differences between the originaland pre-edited STs, in this section, we describetheir general linguistic characteristics. Here, wecompare the Org-ST and the Best-ST that achieveda satisfactory MT result in order to elicit the fea-tures of machine translatable ST.

To quantify structural complexity, we used the fol-lowing three indices:(1) sentence length: the number of words per sen-tence (2) attachment distance: the averaged distanceof all attachment pairs of the Japanese basephrases in a sentence(3) dependency depth: the maximum distancefrom the root word in the dependency treeWe used the Japanese tokeniser MeCab to calcu-late (1) and the Japanese dependency parser JU-MAN/KNP to calculate (2) and (3).The ﬁrst three blocks in Table 4 show the resultsfor these indices. It is evident on all indices, theOrg-ST exhibits the lowest scores. In other words,the length and surface complexity of the sentencesgenerally increased through the pre-editing opera-tions. This is a counter-intuitive ﬁnding in that mostprevious pre-editing practices have axiomaticallyassumed that shorter and less complex sentencesare better for MT. We further delve into this in §5. If ST instance includes multiple sentences, we averagedthe scores. https://taku910.github.io/mecab/ http://nlp.ist.i.kyoto-u.ac.jp/index.php?KNP The remaining two blocks in Table 4 present statis-tics for the lexical characteristics of the STs. Theresults for lexical diversity indicate that both thetotal number of word types and the Token/Typeratio increased from the Org-ST to the Best-STfor all the conditions. This suggests that thoughthe diversity of words increased slightly, the worddistribution became peakier through pre-editing.We also calculated the word frequency rank withWikipedia as the reference. To assess the statusof word frequency in relation to MT, it would beideal to use the training data for each MT system,but such data are unavailable in black-box MT set-tings. Therefore, we decided to use Wikipedia as aconvenient way to observe general word frequency.Lower numbers indicate higher word frequenciesin Wikipedia. The 50th and 75th percentile valuesin the datasets imply that pre-editing induced theavoidance of low-frequency words.To further inspect the differences between theOrg-ST and the Best-ST, we extracted the wordtypes (a) that appeared only in the Org-ST and (b)that appeared only in the Best-ST. Figure 2 illus-trates the rank distributions of (a) and (b) for eachcondition. It is clear that low-frequency words witha frequency rank of around 10,000 decreased inthe Best-ST, while words with a frequency rankof around 2,000–4,000 increased in the Best-ST.As Koehn and Knowles (2017) demonstrated, low-frequency words still pose major obstacles forNMT systems. Our results endorse this claim froma different perspective and can provide generalstrategies for word choice in the pre-editing task. We used the whole text data of Japanese Wikipedia ob-tained in October 2019 (https://dumps.wikimedia.org/). a - E n - GO r g ( ) J a - E n - G B e s t ( ) J a - E n - T O r g ( ) J a - E n - T B e s t ( ) J a - Z h - GO r g ( ) J a - Z h - G B e s t ( ) J a - Z h - T O r g ( ) J a - Z h - T B e s t ( ) J a - K o - GO r g ( ) J a - K o - G B e s t ( ) J a - K o - T O r g ( ) J a - K o - T B e s t ( ) W o r d fr e qu e n c y r a nk Figure 2: Differences in word frequency rank distribution between the Org-ST and Best-ST (G: Google, T: TexTra).The numbers in parentheses indicate the number of instances, i.e., word types.

To understand the diversity of edit operations forpre-editing, we manually annotated the collectedpre-editing instances in terms of linguistic opera-tions. Given that the Best path contains effectiveediting operations for improved MT quality, wefocused on the pairs of ST versions in the Best path(e.g., the pairs { →

3, 3 →

7, 7 → } in Figure 1).We randomly selected 10 units for each of the 24combinations of translation direction, MT system,and text domain, resulting in a total of 961 pre-editing instances. We then excluded 26 instancesthat could be decomposed into multiple smalleredits and classiﬁed the remaining 935 instances,each of which consists of a minimum edit of ST,based on the typology proposed by Miyata and Fu-jita (2017). Through the classiﬁcation, we reﬁnedthe existing typology to consistently accommodateall the instances.Table 5 presents our typology of editing opera-tions with the number of instances in the differentconditions. The typology consists of 39 opera-tion types under 6 major categories, which enablesus to grasp the diversity and trends of pre-editingoperations. Compared to structural editing, localmodiﬁcations of words and phrases were frequentlyused in the Best path. The dominant type is C01(Use of synonymous words) : content words arereplaced by another synonymous word. This oper-ation is important for achieving appropriate wordchoice in the MT output.

C07 (Change of con-tent) , the second dominant type, includes the ad- Only 2.7% of the edits were not regarded as minimum,which demonstrated satisfactory adherence to our instruc-tions, compared with the implementation by Miyata and Fujita(2017), in which 568 pre-editing instances were ﬁnally decom-posed into 979 instances. dition of information that is inferred by humaneditors based on the intra-sentential context or evenexternal knowledge. For example, a named entity‘

Nemuro-sho ’ (Nemuro ofﬁce) was changed into‘

Nemuro-keisatsu-sho ’ (Nemuro police ofﬁce) byusing the knowledge of the entity. It might be chal-lenging to automate such creative operations.It is also notable that

S01 (Sentence splitting) only amounts to 1.5% of all instances, which sup-ports the observation in §4.1 that in general, sen-tence length was not reduced, and even increasedby pre-editing. Among the 14 cases of this type,nine of the split sentences were 60–67 words inlength. These results support the empirical obser-vation by Koehn and Knowles (2017) that NMTsystems still have difﬁculty in translating sentenceslonger than 60 words, and suggest that sentencesplitting may only be promising for such very longsentences.

Towards the effective exercise of pre-editing, wefurther analysed the pre-editing instances in termsof informational strategies based on the notion ofexplicitation/implicitation acknowledged in trans-lation studies (Vinay and Darbelnet, 1958; Chester-man, 1997; Murtisari, 2016). Following these stud-ies, we broadly deﬁned explicitation as an act ofindicating what is implied in the text to clarify itsmeaning and implicitation as the inverse act of ex-plicitation. We classiﬁed all the instances analysedabove except for the

E01 and

E02 types into threegeneral strategies, namely, explicitation, implicita-tion, and (information) preservation. The right sideof Table 5 shows the classiﬁcation result. The totalnumbers of instances classiﬁed into each strategywere 329, 88, and 480, respectively. Not surpris-ingly, this indicates that explicitation is an essential

D Editing operation type Ja-En Ja-Zh Ja-Ko Total Expl. Impl. Pres.G T G T G T

S01 Sentence splitting 1 0 3 3 4 3 14 0 0 14S02 Structural change 3 5 9 4 4 2 27 8 1 18S03 Use/disuse of topicalisation 1 7 4 3 1 3 19 5 2 12S04 Insertion of subject/object 2 1 1 3 5 2 14 14 0 0S05 Use/disuse of clause-ending noun 3 2 2 2 2 1 12 12 0 0S06 Change of voice 1 3 0 0 0 0 4 2 0 2S07 Other structural changes 1 0 2 1 1 0 5 3 0 2P01 Insertion/deletion of punctuation 19 16 5 12 9 10 71 0 0 71P02 Use/disuse of chunking marker(s) 6 12 2 1 3 4 28 11 8 9P03 Phrase reordering 6 4 7 1 9 4 31 0 0 31P04 Change of modiﬁcation 1 3 3 0 0 0 7 0 0 7P05 Change of connective expression 3 18 4 2 10 3 40 24 5 11P06 Change of parallel expression 3 8 2 8 4 11 36 7 2 27P07 Change of apposition expression 1 7 2 1 1 4 16 8 4 4P08 Change of noun/verb phrase 1 3 2 1 3 3 13 9 3 1P09 Use/disuse of compound noun 1 5 2 2 6 12 28 16 12 0P10 Use/disuse of afﬁx 4 4 1 2 3 3 17 1 0 16P11 Change of sahen noun expression 0 1 1 1 2 0 5 1 0 4P12 Change of formal noun expression 1 2 2 2 2 0 9 4 0 5P13 Other phrasal changes 0 1 0 1 2 1 5 4 0 1C01 Use of synonymous words 18 18 19 18 25 20 118 14 10 94C02 Use/disuse of abbreviation 2 7 2 2 1 7 21 19 2 0C03 Use/disuse of anaphoric expression 4 4 2 2 1 1 14 10 2 2C04 Use/disuse of emphatic expression 1 2 2 1 4 1 11 10 1 0C05 Category indication/suppression 5 3 6 5 4 7 30 29 1 0C06 Explanatory paraphrase 3 4 1 0 1 1 10 0 0 10C07 Change of content 22 20 21 9 14 8 94 57 23 14F01 Change of particle 9 14 4 6 7 7 47 13 5 29F02 Change of compound particle 8 5 5 2 5 6 31 24 2 5F03 Change of aspect 1 4 1 0 5 1 12 0 0 12F04 Change of tense 0 0 1 1 1 1 4 0 0 4F05 Change of modality 3 1 2 1 3 1 11 5 0 6F06 Use/disuse of honoriﬁc expression 3 1 1 2 2 1 10 0 0 10O01 Japanese orthographical change 10 16 9 5 9 12 61 12 4 45O02 Change of half-/full-width character 0 5 3 2 2 4 16 7 1 8O03 Insertion/deletion/change of symbol 0 2 0 0 0 0 2 0 0 2O04 Other orthographical change 0 1 0 0 3 0 4 0 0 4E01 Grammatical errors 0 8 5 2 2 5 22 – – –E02 Content errors 5 0 8 1 1 1 16 – – –

Table 5: Constructed typology of editing operations (G: Google, T: TexTra). The ﬁrst letter of ID indicates the sixmajor categories (S: Structure, P: Phrase, C: Content word, F: Functional word, O: Orthography, E: Errors casuallyintroduced in the ST). The right three columns provide the frequencies for general informational strategies (Expl.:Explicitation, Impl.: Implicitation, Pres.: Preservation). strategy for effective pre-editing.We also grouped all the 329 instances of explici-tation into the following four subcategories. Information addition is the strategy of addingsupplementary information, such as subjects,modality, and explanation, to clarify the con-tent of the ST. For example, subjects weresometimes inserted as they tend to be omittedin Japanese sentences. This strategy gener-ally corresponds to operation

C07 (Changeof content) described earlier.

Use of clear relation includes structural changesand the use of explicit connective markersto make the relation between words, phrases, See Appendix A for details. and clauses more intelligible. For example,the relation between the subject and objectcan be clariﬁed by using the nominative casemarker ‘ ga ’ in Japanese. Use of narrower sense is the strategy of replacinggeneral words with more speciﬁc ones. Forexample, the verb ‘ dasu ,’ which has multiplemeanings such as ‘put,’ ‘take,’ and ‘send,’ wasreplaced with the verb ‘ teishutsusuru ,’ whichhas a narrower range of meaning and was cor-rectly translated as ‘submit.’

Normalisation includes the use of authorised orstandardised expressions, style, and notation.For example, elliptic sentence-ending wascompleted to construct a normal structure. a-En Ja-Zh Ja-KoGoogle TexTra Google TexTra Google TexTraTER Pearson’s r Spearman’s ρ Num. of edits Pearson’s r Spearman’s ρ Table 6: Correlation of the TER and the number of edits between ST and MT.

These strategies can be used as concise pre-editing principles for human editors and can guideresearchers in devising effective tools for pre-editing. We also emphasise that these general infor-mational strategies are not speciﬁc to the Japaneselanguage and could be applied to other languages.

This section investigates how pre-editing opera-tions affect the NMT output. As indicated in §2,NMT systems still lack robustness, and minor mod-iﬁcations of the input would drastically change theoutput. From the practical viewpoint of deployingpre-editing, predictability is an important object topursue. Here, we examine the impacts of minimumedits of the ST on the NMT output. To measurethe amount of text editing, hereafter, we use theTranslation Edit Rate (TER), which is calculatedby dividing the number of edits (insertion, deletion,substitution, and shift) required to change a stringinto the reference string by the average numberof reference words (Snover et al., 2006). For anyconsecutive pair of STs or their corresponding MToutputs, we used the chronologically later versionas the reference. For word-level tokenisation, weused MeCab for Japanese, NLTK for English,jieba for Chinese, and KoNLPy for Korean. To grasp the general tendency, using all the col-lected pre-editing instances (see Table 3), weﬁrst calculated the correlation coefﬁcients (Pear-son’s r and Spearman’s ρ ) between the amountof edits (the TER and the number of edits)in the ST and in the MT. More formally, let ST (cid:48) be the pre-edited versions of ST . ForTER, the correlation is between TER ( ST , ST (cid:48) ) and TER ( MT ( ST ) , MT ( ST (cid:48) )) . For the number of edits, the https://github.com/fxsjy/jieba https://konlpy.org/en/latest/api/konlpy.tag/ correlation is between EditCount ( ST , ST (cid:48) ) and EditCount ( MT ( ST ) , MT ( ST (cid:48) )) .As shown in Table 6, most coefﬁcients are inthe range of 0.15–0.25, suggesting a very weakcorrelation. This means that the change in NMToutput is hardly predictable based on the amount ofedits in the ST. For example, the replacement of asingle particle in the ST sometimes caused drasticchanges of lexical choices in the MT output.The Japanese-to-Korean translation is an excep-tion; in particular, the correlation coefﬁcients ofthe TER for the Google NMT system, i.e., 0.580for Pearson’s r and 0.574 for Spearman’s ρ , in-dicate a moderate positive relationship betweenthe changes in the ST and those in the MT. Thisis partly attributable to the fact that the syntacticstructures of Japanese and Korean, including theword order and usage of particles, are substantiallyclose. Thus, it is relatively easy to build sufﬁcientlyaccurate MT systems. Finally, using the pre-editing instances in the Bestpath analysed in §5, we further investigated to whatextent each type of minimum editing operation af-fects the MT output. At this stage, we focused onthe 28 editing types that have at least 10 instances,considering that it is difﬁcult to derive reliable in-sights from fewer data.Figure 3 presents the distribution of the degreeof changes in the MT output when an ST is pre-edited, measured by

TER ( MT ( ST ) , MT ( ST (cid:48) )) . Mostof the structural edits ( S01–S04 ) resulted in size-able changes in the MT. This is reasonable sincestructural modiﬁcations in the ST tended to causemajor changes in the MT as well, leading to highTER. In contrast, many of the editing types that in-clude local modiﬁcations of functional words andorthographic notations (

F01–F03, F05, F06, O01,O02 ) did not have major impacts on the MT results.It is worth noticing that

P03 (Phrase reorder-ing) did not drastically affect the MT output. Inother words, recent NMT systems in practical usemanage to retain the phrase-level equivalence even ( ) S ( ) S ( ) S ( ) S ( ) P ( ) P ( ) P ( ) P ( ) P ( ) P ( ) P ( ) P ( ) P ( ) C ( ) C ( ) C ( ) C ( ) C ( ) C ( ) C ( ) F ( ) F ( ) F ( ) F ( ) F ( ) O ( ) O ( ) TE R Figure 3: Distribution of the TER for changes in the MT for each operation type with at least 10 instances. Thenumbers in parentheses indicate the number of instances. when the position of a phrase is shifted. The inﬂu-ence of

P02 (Use/disuse of chunking marker(s)) is fairly signiﬁcant. For human readers, the useof chunking markers, such as double quotes andsquare brackets, does not greatly affect the sentenceparsing, but for NMT, it might seriously impingeon the tokenisation result, eventually leading to alarge change in the ﬁnal output.

Towards a better understanding of pre-editing forblack-box NMT settings, in this study, we collectedinstances of manual pre-editing in various condi-tions and conducted in-depth analyses of the in-stances. We implemented a human-in-the-loop pro-tocol to incrementally record minimum edits ofST for all combinations of three translation direc-tions, two NMT systems, and four text domains,and obtained a total of 6,652 instances of manualpre-editing. Since more than 95% of the STs weresuccessfully pre-edited into one that led to a satis-factory MT quality, our collected instances containempirical, tacit human knowledge on the effectiveuse of black-box NMT systems. We also investi-gated the collected data from three perspectives:the characteristics of the pre-edited STs, the diver-sity of pre-editing operations, and the impact ofpre-editing operations on the NMT output. Theremarkable ﬁndings can be summarised as follows:• Contrary to the acknowledged practices ofpre-editing, the operation of making sourcesentences shorter and simpler was not fre-quently observed. Rather, it is more importantto make the content, syntactic relations, andword senses clearer and more explicit, even ifthe ST becomes longer. • As indicated by recent studies, the NMT sys-tems are still sensitive to minor edits in theST, and are unpredictable in general. How-ever, there are recognisable tendencies in theMT output according to the types of editingoperations, such as the relatively small impactof phrase reordering on NMT.In future work, we plan to explore the effectiveimplementation of pre-editing. The ﬁndings of thisstudy provide a broad overview of the range ofpre-editing operations and their expected beneﬁts,which enables us to ﬁnd feasible pre-editing so-lutions in practical use cases of black-box NMTsystems. To develop automatic pre-editing toolsusing a collection of pre-editing instances, we needto handle the data insufﬁciency issue in machinelearning, ﬁlling the gap between the training dataand targeted black-box MT systems.Moreover, as our pre-editing instances containa wide variety of perturbations in the ST, they canalso be used to evaluate the robustness of MT sys-tems, which can lead to advances in MT research.We aim to jointly improve the two wheels of trans-lation technology: pre-editing and MT.

Acknowledgments

This work was partly supported by JSPS KAK-ENHI Grant Numbers 19K20628 and 19H05660,and the Research Grant Program of KDDI Founda-tion, Japan. One of the corpora used in our studywas created under a program “Research and De-velopment of Enhanced Multilingual and Multipur-pose Speech Translation System” of the Ministryof Internal Affairs and Communications, Japan. eferences

Takako Aikawa, Lee Schwartz, Ronit King, MonicaCorston-Oliver, and Carmen Lozano. 2007. Impactof controlled language on translation quality andpost-editing in a statistical machine translation envi-ronment. In

Proceedings of the Machine TranslationSummit XI , pages 1–7, Copenhagen, Denmark.Yonatan Belinkov and Yonatan Bisk. 2018. Syntheticand natural noise both break neural machine transla-tion. In

Proceedings of the 6th International Con-ference on Learning Representations (ICLR) , pages1–13, Vancouver, Canada.Arendse Bernth and Claudia Gdaniec. 2001. MTrans-latability.

Machine Translation , 16(3):175–218.Yong Cheng, Lu Jiang, and Wolfgang Macherey. 2019.Robust neural machine translation with doubly ad-versarial inputs. In

Proceedings of the 57th AnnualMeeting of the Association for Computational Lin-guistics (ACL) , pages 4324–4333. Florence, Italy.Andrew Chesterman. 1997.

Memes of Translation .John Benjamins, Amsterdam.Jinhua Du and Andy Way. 2017. Pre-reordering forneural machine translation: Helpful or harmful?

The Prague Bulletin of Mathematical Linguistics ,108:171–182.Javid Ebrahimi, Daniel Lowd, and Dejing Dou. 2018.On adversarial examples for character-level neuralmachine translation. In

Proceedings of the 27th In-ternational Conference on Computational Linguis-tics (COLING) , pages 653–663, Santa Fe, New Mex-ico, USA.Asheesh Gulati, Pierrette Bouillon, Johanna Gerlach,Victoria Porro, and Violeta Seretan. 2015. TheACCEPT Academic Portal: A user-centred onlineplatform for pre-editing and post-editing. In

Pro-ceedings of the 7th International Conference of theIberian Association of Translation and InterpretingStudies (AIETI) , Malaga, Spain.Anthony Hartley, Midori Tatsumi, Hitoshi Isahara, KyoKageura, and Rei Miyata. 2012. Readability andtranslatability judgments for ‘Controlled Japanese’.In

Proceedings of the 16th Annual Conference ofthe European Association for Machine Translation(EAMT) , pages 237–244, Trento, Italy.Yusuke Hiraoka and Masaru Yamada. 2019. Pre-editing plus neural machine translation for subti-tling: Effective pre-editing rules for subtitling ofTED talks. In

Proceedings of the Machine Trans-lation Summit XVII , pages 64–72, Dublin, Ireland.Sho Hoshino, Yusuke Miyao, Katsuhito Sudoh, Kat-suhiko Hayashi, and Masaaki Nagata. 2015. Dis-criminative preordering meets Kendall’s τ maxi-mization. In Proceedings of the 53rd Annual Meet-ing of the Association for Computational Linguis-tics and the 7th International Joint Conference on Natural Language Processing (ACL-IJCNLP) , pages139–144, Beijing, China.Philipp Koehn and Rebecca Knowles. 2017. Six chal-lenges for neural machine translation. In

Proceed-ings of the 1st Workshop on Neural Machine Trans-lation (NMT) , pages 28–39, Vancouver, Canada.Tobias Kuhn. 2014. A survey and classiﬁcation of con-trolled natural languages.

Computational Linguis-tics , 40(1):121–170.Chi-Ho Li, Minghui Li, Dongdong Zhang, Mu Li,Ming Zhou, and Yi Guan. 2007. A probabilistic ap-proach to syntax-based reordering for statistical ma-chine translation. In

Proceedings of the 45th AnnualMeeting on Association for Computational Linguis-tics (ACL) , pages 720–727, Prague, Czech Republic.Shaimaa Marzouk and Silvia Hansen-Schirra. 2019.Evaluation of the impact of controlled language onneural machine translation compared to other MT ar-chitectures.

Machine Translation , 33(1-2):179–203.Sneha Mehta, Bahareh Azarnoush, Boris Chen,Avneesh Saluja, Vinith Misra, Ballav Bihani, andRitwik Kumar. 2020. Simplify-then-translate: Au-tomatic preprocessing for black-box translation. In

Proceedings of the 34th AAAI Conference on Arti-ﬁcial Intelligence (AAAI) , pages 8488–8495, NewYork, USA.Shachar Mirkin, Sriram Venkatapathy, Marc Dymet-man, and Ioan Calapodescu. 2013. SORT: An inter-active source-rewriting tool for improved translation.In

Proceedings of the 51st Annual Meeting of the As-sociation for Computational Linguistics (ACL), Sys-tem Demonstrations , pages 85–90, Soﬁa, Bulgaria.Teruko Mitamura, Kathryn L. Baker, Eric Nyberg, andDavid Svoboda. 2003. Diagnostics for interactivecontrolled language checking. In

Proceedings of theJoint Conference Combining the 8th InternationalWorkshop of the European Association for MachineTranslation and the 4th Controlled Language Appli-cations Workshop (EAMT/CLAW) , pages 237–244,Dublin, Ireland.Teruko Mitamura and Eric Nyberg. 2001. Automaticrewriting for controlled language translation. In

Pro-ceedings of the NLPRS2001 Workshop on AutomaticParaphrasing: Theories and Applications , pages 1–12, Tokyo, Japan.Rei Miyata and Atsushi Fujita. 2017. Dissecting hu-man pre-editing toward better use of off-the-shelfmachine translation systems. In

Proceedings of the20th Annual Conference of the European Associa-tion for Machine Translation (EAMT) , pages 54–59,Prague, Czech Republic.Elisabet Titik Murtisari. 2016. Explicitation in Transla-tion Studies: The journey of an elusive concept.

TheInternational Journal for Translation & InterpretingResearch , 8(2):64–81.ing Niu, Prashant Mathur, Georgiana Dinu, and YaserAl-Onaizan. 2020. Evaluating robustness to inputperturbations for neural machine translation. In

Pro-ceedings of the 58th Annual Meeting of the Asso-ciation for Computational Linguistics (ACL) , pages8538–8544, Online.Eric Nyberg, Teruko Mitamura, and Willem-Olaf Hui-jsen. 2003. Controlled language for authoring andtranslation. In Harold Somers, editor,

Computersand Translation: A Translator’s Guide , pages 245–281. John Benjamins, Amsterdam.Sharon O’Brien. 2003. Controlling controlled En-glish: An analysis of several controlled languagerule sets. In

Proceedings of the Joint ConferenceCombining the 8th International Workshop of theEuropean Association for Machine Translation andthe 4th Controlled Language Applications Workshop(EAMT/CLAW) , pages 105–114, Dublin, Ireland.Sharon O’Brien and Johann Roturier. 2007. Howportable are controlled language rules? In

Proceed-ings of the Machine Translation Summit XI , pages345–352, Copenhagen, Denmark.Peter Pym. 1990. Pre-editing and the use of simpliﬁedwriting for MT. In Pamela Mayorcas, editor,

Trans-lating and the Computer 10: The Translation Envi-ronment 10 Years on , pages 80–95. Aslib, London.Ursula Reuther. 2003. Two in one – Can it work?:Readability and translatability by means of con-trolled language. In

Proceedings of the Joint Con-ference Combining the 8th International Workshopof the European Association for Machine Transla-tion and the 4th Controlled Language ApplicationsWorkshop (EAMT/CLAW) , pages 124–132, Dublin,Ireland.Violeta Seretan, Pierrette Bouillon, and Johanna Ger-lach. 2014. A large-scale evaluation of pre-editing strategies for improving user-generated con-tent translation. In

Proceedings of the 9th Interna-tional Conference on Language Resources and Eval-uation (LREC) , pages 1793–1799, Reykjavik, Ice-land.Satoshi Shirai, Satoru Ikehara, Akio Yokoo, and Yoshi-fumi Ooyama. 1998. Automatic rewriting methodfor internal expressions in Japanese to English MTand its effects. In

Proceedings of the 2nd Interna-tional Workshop on Controlled Language Applica-tions (CLAW) , pages 62–75, Pennsylvania, USA.Matthew Snover, Bonnie Dorr, Richard Schwartz, Lin-nea Micciulla, and John Makhoul. 2006. A studyof translation edit rate with targeted human an-notation. In

Proceedings of the 7th Conferenceof the Association for Machine Translation in theAmericas (AMTA) , pages 223–231, Cambridge, Mas-sachusetts, USA.Yanli Sun, Sharon O’Brien, Minako O’Hagan, andFred Hollowood. 2010. A novel statistical pre-processing model for rule-based machine translation system. In

Proceedings of the 14th Annual Confer-ence of the European Association for Machine Trans-lation (EAMT) , Saint-Rapha¨el, France.Jean-Paul Vinay and Jean Darbelnet. 1958.

Stylistiquecompar´ee du franc¸ais et de l’anglais . Didier, Paris,trans. and ed. by J. C. Sager & M.-J. Hamel (1995)as

Comparative Stylistics of French and English: AMethodology for Translation.

John Benjamins, Am-sterdam.Sanja ˇStajner and Maja Popovi´c. 2018. Improving ma-chine translation of English relative clauses with au-tomatic text simpliﬁcation. In

Proceedings of the1st Workshop on Automatic Text Adaptation (ATA) ,pages 39–48, Tilburg, Netherlands.Fei Xia and Michael McCord. 2004. Improving astatistical MT system with automatically learnedrewrite patterns. In

Proceedings of the 20th Inter-national Conference on Computational Linguistics(COLING) , pages 508–514, Geneva, Switzerland.Takehiko Yoshimi. 2001. Improvement of translationquality of English newspaper headlines by automaticpre-editing.

Machine Translation , 16(4):233–250.Zhongyuan Zhu. 2015. Evaluating neural machinetranslation in English-Japanese task. In

Proceedingsof the 2nd Workshop on Asian Translation (WAT) ,pages 61–68, Kyoto, Japan. xplicitation strategy Total Example of ST pre-editing MT output

Information addition 142 12 日は台湾の休日のため休場。 The twelfth is a holiday in Taiwan. → 日は台湾の休日のため株式市場は休場。 kabushiki shijo wa kyujo. The stock market was closed on thetwelfth due to a holiday in Taiwan.Use of clear relation 103 来院しなくても日前後で登録のクレジットカードから引き落としを行います。 Raiin-shinakutemo toka zengo de tourokuno kurejitto kado kara hikiotoshi o okon-aimasu . Withdraw from your registered creditcard in about 10 days without visitingthe hospital. → 来院しなくても日前後で登録のクレジットカードから引き落としが行われます。 Raiin-shinakutemo toka zengo de tourokuno kurejitto kado kara hikiotoshi ga okon-awaremasu . Even if you do not visit the hospi-tal, your credit card will be debitedin about 10 days.Use of narrower sense 54 採尿と採便を出してください。 Sai-nyo to sai-ben o dashite kudasai.

Please collect urine and feces. → 採尿と採便を提出してください。 Sai-nyo to sai-ben o teishutsushite kuda-sai.

Please submit urine and stool samples.Normalisation 30 単位は億円。 Tan’i wa oku en.

Figures are in billions of yen . → 単位は億円です。 Tan’i wa oku en desu . The unit is 100 million yen . Table 7: The number of instances and an example of each explicitation strategy for pre-editing ST with MT outputs.

A Details of Explicitation Strategy

Table 7 shows the statistics and examples of eachsubcategory of the explicitation strategy. A totalof 329 pre-editing instances of the explicitationstrategy can be further classiﬁed into four subcate-gories: information addition, use of clear relation,use of narrower sense, and normalisation.The example of the information addition illus-trates the insertion of a subject ‘ kabushiki shijo wa ’(‘stock market’), which is implicit in the precedingST. The example of the use of clear relation showsthat the relation between the subject and object canbe clariﬁed by using the nominative case marker‘ ga ’ instead of the accusative one ‘ o ’ and accord-ingly changing the voice of the main clause. Asa result, the inappropriate imperative construction‘Withdraw from ...’ in the MT output is changed tothe correct passive construction ‘will be debited.’In the example of the use of narrower sense, theverb ‘ dashite ,’ which has multiple meanings suchas ‘put,’ ‘take,’ and ‘send,’ was replaced with theverb ‘ teishutsushite ,’ which has a narrower rangeof meaning and was correctly translated as ‘sub-mit.’ In the example of normalisation, the ellip-tic sentence-ending was completed with a normalstructure ‘... desu.’ This operation led to not only the improvement of the sentence construction, butalso the semantic correctness in the MT output(‘billions’ →→