[PDF] Generating Adversarial Examples in Chinese Texts Using Sentence-Pieces

Abstract

Adversarial attacks in texts are mostly substitution-based methods that replace words or characters in the original texts to achieve success attacks. Recent methods use pre-trained language models as the substitutes generator. While in Chinese, such methods are not applicable since words in Chinese require segmentations first. In this paper, we propose a pre-train language model as the substitutes generator using sentence-pieces to craft adversarial examples in Chinese. The substitutions in the generated adversarial examples are not characters or words but \textit{'pieces'}, which are more natural to Chinese readers. Experiments results show that the generated adversarial samples can mislead strong target models and remain fluent and semantically preserved.

Full PDF

GGenerating Adversarial Examples in Chinese Texts Using Sentence-Pieces

Linyang Li, Yunfan Shao ∗ , Demin Song , Xipeng Qiu † , Xuanjing Huang Shanghai Key Laboratory of Intelligent Information Processing, Fudan UniversitySchool of Computer Science, Fudan University { linyangli19, yfshao19, dmsong20, xpqiu, xjhuang } @fudan.edu.cn Abstract

Adversarial attacks in texts are mostlysubstitution-based methods that replace wordsor characters in the original texts to achievesuccess attacks. Recent methods use pre-trained language models as the substitutes gen-erator. While in Chinese, such methods arenot applicable since words in Chinese requiresegmentations ﬁrst. In this paper, we pro-pose a pre-train language model as the substi-tutes generator using sentence-pieces to craftadversarial examples in Chinese. The substi-tutions in the generated adversarial examplesare not characters or words but ’pieces’ , whichare more natural to Chinese readers. Experi-ments results show that the generated adversar-ial samples can mislead strong target modelsand remain ﬂuent and semantically preserved.

Adversarial attacks (Goodfellow et al., 2014; Ku-rakin et al., 2016; Chakraborty et al., 2018) areﬁrstly introduced in the computer vision ﬁelds.Neural networks are vulnerable to adversarial sam-ples with small perturbations based on gradients tothe original inputs.In the natural language processing ﬁelds, pertur-bations cannot be applied directly, so most meth-ods involve strategies such as replacing, insertionor deletion (Ebrahimi et al., 2017; Alzantot et al.,2018; Jia and Liang, 2017; Jin et al., 2019). Mostreplacing strategies incorporate a synonym dictio-nary (Dong et al., 2010; Fellbaum, 1998) or useword embeddings (Pennington et al., 2014; Mrkˇsi´cet al., 2016; Jin et al., 2019).Recently, incorporating pre-trained languagemodels such as BERT (Devlin et al., 2018) as asubstitute generator is introduced in the adversarialexample generation of texts (Li et al., 2020; Garg ∗ Equal Contribution. † Corresponding author.

Input sentence 这种情况放在⼗年以前是不可想象的，侧⾯反映出公信⼒的坍塌。这种情况在⼗年以前是不可想象的 , 侧⾯反映出公信⼒的衰落。 Piece-Level

Attacker

This situation could not be possible ten years ago, which indicates the collapse of the public credibility.

Strong

Victim Model such as

BERT 愤怒（ anger ）开⼼ (happy)

Text Classiﬁcation : What type of this sentence is ?

This situation could not be possible ten years ago, which indicates the fading of the public credibility.

Figure 1: Example of piece-level Adversary and Ramakrishnan, 2020). Such a process broughtadversarial example generation to a higher level:using well-learned models to generate adversar-ial samples instead of speciﬁc rules such as entitychecking or grammar checking (Jin et al., 2019).However, in Chinese language, crafting adversar-ial examples are more challenging: Chinese doesnot contain explicit whitespace between words, sothe boundaries between characters and words arevague. Therefore, current Chinese pre-trained lan-guage models are character-based, yet replacingcertain characters in Chinese cannot maintain ﬂu-ency and semantics.Therefore, in this paper, we propose the ideaof incorporating sentence-pieces in the Chinesetokenization in pre-trained language models, anduse such a language model as a sentence-piecelevel substitution generator to craft high-qualityadversarial examples in Chinese.At ﬁrst, we use sentence-piece tokenizer (Kudoand Richardson, 2018) to create a piece-level vo-cabulary based on the large-scale corpus collectedonline (Xu, 2019). After sentence-piece tokeniza-tion, we use this piece-level vocabulary to pre-train a r X i v : . [ c s . C L ] D ec masked language model as the substitution gen-erator following the standard pre-training processof BERT. Then we use the pre-trained substitu-tion generator to craft piece-level adversarial sam-ples in Chinese. As seen in Fig1, we can generatecharacter-level, word-level and phrase-level substi-tutions in Chinese.We use the trained Chinese Substitution Gen-erator to attack broadly used Chinese text classi-ﬁcation tasks such as Sogou, Iﬂytek, Weibo andLaw34 datasets. The experiments show that thegenerated adversarial samples successfully misleadtarget models and reserve the semantic informationand ﬂuency.To summarize the key contribution of this pa-per: we use sentence-piece based tokenization totrain a masked language model that can generatepiece-level substitutions for Chinese adversarial ex-amples. To our knowledge, we are the ﬁrst to craftadversarial samples in Chinese that can generatemulti-level substitutions. In the NLP ﬁeld, adversarial attacks face a ma-jor challenge: people cannot apply gradients onthe embedding space to craft adversarial samples,which is widely explored in the CV ﬁeld (Good-fellow et al., 2014; Chakraborty et al., 2018). Soreplacing characters (Ebrahimi et al., 2017), words(Alzantot et al., 2018; Jin et al., 2019; Ren et al.,2019; Papernot et al., 2016) or paraphrasing sen-tences (Jia and Liang, 2017) are the main streamsof generating adversarial samples in texts.The formulation of attacking NLP models isthat, given inputs X = [ w , · · · , w i , · · · ] , where w i is the target character/word, we have a can-didate list S ( w i ) = [ s i , · · · , s ij ] , which is thepotential substitutes of w i . The goal is to ﬁndan adversarial sample X (cid:48) = [ w , · · · , s id , · · · ] where argmax ( F ( X )) (cid:54) = argmax ( F ( X (cid:48) )) . Inmost cases, we assume that we know the outputscore of the target model, which is a black-boxscenario, while in white-box attacks we know themodel architecture and therefore the gradients overthe inputs.Jin et al. (2019) uses word embeddings to craftcandidate list S to ﬁnd substitutions while Zanget al. (2020) uses knowledge extracted from Word-Net (Fellbaum, 1998). Recently Li et al. (2020);Garg and Ramakrishnan (2020); Shi et al. (2019) uses pre-trained models to ﬁnd similar tokens asthe candidate list. Pre-trained models exempliﬁed by BERT (Devlinet al., 2018) introduces a masked-language modeltask to predict the masked tokens. These models(Devlin et al., 2018) use BPE-based tokenizationsto avoid unknown words in English. With massivecalculation, these pre-trained models have revolu-tionized NLP tasks.In Chinese texts generation, a whole-word-maskstrategy is introduced to predict the entire wordproperly (Cui et al., 2019), while the tokenizationis still character-level. Therefore, such a modelcannot be directly applied to generate multiple can-didates as the substitution list (Li et al., 2020; Gargand Ramakrishnan, 2020) in adversarial attacks.

Sentence-piece tokenization (Kudo and Richard-son, 2018) is derived from byte-pair encoding (Sen-nrich et al., 2016) and unigram language model(Kudo, 2018) with the extension of direct trainingfrom raw sentences. Diffrent from BPE, sentence-piece is directly trained from raw texts so no wordsegmentation is needed. Therefore, we could cre-ate a vocabulary based on sentence-piece basedtokenization in Chinese.

In this section, we introduce mainly two parts: (1)how we train a piece-level masked language modelas a Chinese substitution generator; (2) how weuse such a model to generate adversarial samplesin Chinese.

To train a Chinese Substitution Generator, we fol-low the standard protocol of pre-training maskedlanguage models. We simply replace the vocabu-lary with Chinese sentence-pieces in order to gen-erate proper substitutions.There are only a few thousands of commonlyused Chinese characters, we extend the vocabularysize to 60 thousand pieces, based on the trainingdata collected online (Xu, 2019).The statistics shown in Table 1 indicates thatthere are around 9% (5400) of pieces in the vocab-ulary are single characters. Most pieces are bi-char har Num(%) Examples1 9 % 是 (is), 在 (in), 前 (before)2 44 % 但是 (but), 这个 (this), 生活 (life)3 24 % 自己的 (belong to me), 是不是 (is or not)4 17 % 这个问题 (this problem)5+ 4 % 证券投资基金 (Securities Investment Funds) Table 1: Statistics of the vocabulary words or tri-char words. There are also a consider-able amount of pieces with more than 3 characters,which are usually phrases or continuous entities.The pre-trained model is an architecture thesame as BERT-base, with 12 layers and hiddensize set to 768. We use LAMB optimizer (Youet al., 2019) to pre-train our model on NVIDIA3090 GPUs using fairseq toolkit (Ott et al., 2019).

In crafting adversarial examples, we apply a two-step algorithm which is widely used in craftingsubstitution-based adversarial examples (Jin et al.,2019; Li et al., 2020): (1) ﬁrst we ﬁnd the mostvulnerable pieces by iteratively ranking the pieceimportance of the original input sentence. We fol-low Jin et al. (2019); Li et al. (2020) to measurethe piece importance by masking tokens iterativelyand calculate the output scores: I w i = o y ( S ) − o y ( S \ w i ) , (1)where S \ w i = [ w , · · · , w i − , [ MASK ] , w i +1 , · · · ] is the sentence after replacing w i with [ MASK ] and o y () is the output score after the softmax functionin the ﬁnal classiﬁcation layer.Then we replace pieces with the candidates pre-dicted by the Chinese Substitution Generator.Following Li et al. (2020), we do not mask theoriginal pieces so that the semantic information ispreserved. We use the top- K predictions of thegiven pieces in the masked language model as thesubstitution candidates. Since the tokenization pro-cess is piece-level, the substitution candidates areﬂexible: candidates contain both single characters,phrases that expand the meaning of the originalpiece, words that have similar meanings with theoriginal piece.As seen in Table 2, we have different types ofsubstitutions of a single piece: we can replaceChinese word ’ 今天 (today)’ with a synonym ’ 今日 (today)’, or we can ﬁnd an expanded word ’ 今天的 (today’s)’; also, we can replace it with a singlecharacter ’ 今 (now/’, which is a shorten version of’today’. With the piece-level substitution generator,the generated examples could be very diversiﬁed. pieces candidate types candidates list 今天 words 今日 phrase 今天的 character 今 Table 2: eaxmple of top-k predictions

After getting candidate list, we replace pieces bythe order of the ranked word importance. In prac-tice, the candidate list size K is 12 in all datasets.We ﬁnd the most harmful piece in the candidatelist as the perturbation of the current piece, if themodel cannot correctly predict the classiﬁcationof the sample, we return the generated adversarialsample. Otherwise, we continue to ﬁnd anotherpiece to replace until we ﬁnd a proper adversarialexample. We use several popular Chinese text classiﬁcationtasks as our attacking datasets :Sogou : the Sogou dataset is a 12-class sentence-genre task.IﬂyTek: the Iﬂytek dataset is part of the ChineseGLUE benchmark, and it is a 119-class sentencegenre task.Weibo : the Weibo dataset is a sentiment classi-ﬁcation task containing 8 emotions.Law34 : the Law34 dataset is a 34-class taskpredicting the court decision type of the given texts.We ﬁne-tune the standard BERT-base-chinesemodel as our victim models for the correspondingtasks with huggingface transformers (Wolf et al.,2020).We randomly select 200 examples from the de-velopment set in each dataset and craft their corre-sponding adversarial examples. The major metric is the attack success rate, which isthe percentage of successful attacks in the dataset.The second metric is the change rate of the gen-erated adversarial samples. We intuitively believethat fewer changes could result in a less semanticshift.Further, we run a human evaluation to measurethe quality of the generated adversarial samples. https://github.com/LinyangLee/CN-TC-datasets https://github.com/CLUEbenchmark/CLUE https://github.com/google-research/bert ataset Method Ori Acc Atk Acc Ptb% HumanConsist % FluencyIﬂyTek Char-R 60.0 9.0 2.7 (c) 90 4.2Word-R 10.0 2.5 (w) 92 4.3Pce-R( ours ) 11.0 2.7 (pce) 94 4.5Weibo Char-R 92.0 27.0 9.1 (c) 88 4.0Word-R 30.0 8.7 (w) 89 4.0Pce-R( ours ) 35.0 10.2 (pce) 90 4.2Sogou Char-R 93.5 15.0 3.2 (c) 90 3.8Word-R 16.0 3.0 (w) 91 3.9Pce-R( ours ) 18.5 4.4 (pce) 93 4.2Law34 Char-R 93.0 10.0 2.8 (c) 94 4.5Word-R 11.0 2.7 (w) 95 4.6Pce-R( ours ) 12.0 4.2 (pce) 97 4.8

Table 3: Main Results of Generated Examples

We mix original samples and generated adversariesand ask human judges to predict the label and theﬂuency of the examples. Since there could betoo much labels for human judges, we ask humanjudges to predict whether the given texts are thecorrect label or not. We also ask them to score theﬂuency ranging from 1-5.

We setup a strong baseline to compare with ourmethod:Char-Replace: We incorporate the original Chi-nese BERT which is character-level to generateadversarial samples by replacing characters. Thehyper-parameters are the same as our piece-levelmodel.Word-Replace: We use a 50-dimension word-embedding collected by Zhang and Yang (2018)and use cosine-similarity to ﬁnd candidate list asdone by Jin et al. (2019). Since it is a word-levelattack, we use Jieba tokenization tool to tokenizethe sequence. We use a threshold of the cosine-similarity to constrain the quality of the candidates.We use around 60K most-frequent words in theword-embedding. As seen in Table 3, the generated adversarial sam-ples successfully mislead the strong ﬁne-tunedBERT-chinese models. Also, human judges give ahigh accuracy predicting whether the classiﬁcationis correct, also give a high ﬂuency score.Compared with the character-level and word-level method, the piece-level adversarial samplescan achieve similar attacking results and maintain ahigh ﬂuency score. The adversarial samples should https://github.com/fxsjy/jieba Weibo IﬂyTek Sogou Law

Figure 2: Trade-Off curve between K and Success be both harmful to the target models and seman-tically ﬂuent, therefore the piece-level adversarialsamples generated are better adversarial samplesthough the success rate is lower than the character-replace and word-replace methods. The successrate could be higher when expanding the size ofthe candidate list according to Morris et al. (2020)therefore what matters most is the quality of theadversarial samples.As seen in the appendix, the generated piece-level adversarial samples could make successful at-tacks from different aspects and remain ﬂuent whilethe character-level adversarial samples are harderto comprehend. We can replace tokens with sim-ilar meaning substitutes, and we can also replacethem with irrelevant but ﬂuent and label-preservedsubstitutes. A larger candidate size would result in easier at-tacks. Therefore we apply a trade-off curve show-ing that we can achieve a signiﬁcantly higher attacksuccess rate when we use a large candidate size.But it is intuitive that larger candidate size mayalso signiﬁcantly harm the quality of the generatedadversarial samples in both semantics and ﬂuency.Therefore we believe that success rate is not themost important , since maintaining the ﬂuency andsemantic is one major concern in crafting adversar-ial samples. While in our experiment, piece-levelcandidates are generally more natural to humanjudges.

In this paper, we propose a piece-level adversarialsample generation strategy for Chinese texts, whichcan ﬁll in the blank of text adversarial sample gen-eration for languages other than English. eferences

Moustafa Alzantot, Yash Sharma, Ahmed Elgohary,Bo-Jhang Ho, Mani B. Srivastava, and Kai-WeiChang. 2018. Generating natural language adversar-ial examples.

CoRR , abs/1804.07998.Anirban Chakraborty, Manaar Alam, Vishal Dey, Anu-pam Chattopadhyay, and Debdeep Mukhopadhyay.2018. Adversarial attacks and defences: A survey. arXiv preprint arXiv:1810.00069 .Yiming Cui, Wanxiang Che, Ting Liu, Bing Qin,Ziqing Yang, Shijin Wang, and Guoping Hu. 2019.Pre-training with whole word masking for chinesebert. arXiv preprint arXiv:1906.08101 .Jacob Devlin, Ming-Wei Chang, Kenton Lee, andKristina Toutanova. 2018. BERT: pre-training ofdeep bidirectional transformers for language under-standing.

CoRR , abs/1810.04805.Zhendong Dong, Qiang Dong, and Changling Hao.2010. HowNet and its computation of meaning. , pages53–56.Javid Ebrahimi, Anyi Rao, Daniel Lowd, and De-jing Dou. 2017. Hotﬂip: White-box adversarialexamples for text classiﬁcation. arXiv preprintarXiv:1712.06751 .Christiane Fellbaum. 1998. Wordnet: An electroniclexical database.

Bradford Books .Siddhant Garg and Goutham Ramakrishnan. 2020.Bae: Bert-based adversarial examples for text clas-siﬁcation. arXiv preprint arXiv:2004.01970 .Ian J Goodfellow, Jonathon Shlens, and ChristianSzegedy. 2014. Explaining and harnessing adversar-ial examples. arXiv preprint arXiv:1412.6572 .Robin Jia and Percy Liang. 2017. Adversarial exam-ples for evaluating reading comprehension systems. arXiv preprint arXiv:1707.07328 .Di Jin, Zhijing Jin, Joey Tianyi Zhou, and PeterSzolovits. 2019. Is BERT really robust? naturallanguage attack on text classiﬁcation and entailment.

CoRR , abs/1907.11932.Taku Kudo. 2018. Subword regularization: Improvingneural network translation models with multiple sub-word candidates. arXiv preprint arXiv:1804.10959 .Taku Kudo and John Richardson. 2018. SentencePiece:A simple and language independent subword tok-enizer and detokenizer for neural text processing. In

Proceedings of the 2018 Conference on EmpiricalMethods in Natural Language Processing: SystemDemonstrations , pages 66–71, Brussels, Belgium.Association for Computational Linguistics.Alexey Kurakin, Ian Goodfellow, and Samy Bengio.2016. Adversarial examples in the physical world. arXiv preprint arXiv:1607.02533 . Linyang Li, Ruotian Ma, Qipeng Guo, XiangyangXue, and Xipeng Qiu. 2020. Bert-attack: Adver-sarial attack against bert using bert. arXiv preprintarXiv:2004.09984 .John X. Morris, Eli Liﬂand, Jack Lanchantin, YangfengJi, and Yanjun Qi. 2020. Reevaluating adversarialexamples in natural language. In

ArXiv , volumeabs/2004.14174.Nikola Mrkˇsi´c, Diarmuid ´O S´eaghdha, Blaise Thom-son, Milica Gaˇsi´c, Lina Rojas-Barahona, Pei-HaoSu, David Vandyke, Tsung-Hsien Wen, and SteveYoung. 2016. Counter-ﬁtting word vectors to lin-guistic constraints. In

Proceedings of HLT-NAACL .Myle Ott, Sergey Edunov, Alexei Baevski, AngelaFan, Sam Gross, Nathan Ng, David Grangier, andMichael Auli. 2019. fairseq: A fast, extensibletoolkit for sequence modeling. In

Proceedings ofNAACL-HLT 2019: Demonstrations .Nicolas Papernot, Patrick McDaniel, AnanthramSwami, and Richard Harang. 2016. Crafting adver-sarial input sequences for recurrent neural networks.In

MILCOM 2016-2016 IEEE Military Communica-tions Conference , pages 49–54. IEEE.Jeffrey Pennington, Richard Socher, and ChristopherManning. 2014. Glove: Global vectors for wordrepresentation. In

Proceedings of the conference onempirical methods in natural language processing ,pages 1532–1543.Shuhuai Ren, Yihe Deng, Kun He, and Wanxiang Che.2019. Generating natural language adversarial ex-amples through probability weighted word saliency.In

Proceedings of the 57th Annual Meeting of theAssociation for Computational Linguistics , pages1085–1097.Rico Sennrich, Barry Haddow, and Alexandra Birch.2016. Neural machine translation of rare wordswith subword units. In

Proceedings of the 54th An-nual Meeting of the Association for ComputationalLinguistics (Volume 1: Long Papers) , pages 1715–1725, Berlin, Germany. Association for Computa-tional Linguistics.Zhouxing Shi, Minlie Huang, Ting Yao, and Jing-fang Xu. 2019. Robustness to modiﬁcation withshared words in paraphrase identiﬁcation.

CoRR ,abs/1909.02560.Thomas Wolf, Lysandre Debut, Victor Sanh, JulienChaumond, Clement Delangue, Anthony Moi, Pier-ric Cistac, Tim Rault, R´emi Louf, Morgan Funtow-icz, Joe Davison, Sam Shleifer, Patrick von Platen,Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu,Teven Le Scao, Sylvain Gugger, Mariama Drame,Quentin Lhoest, and Alexander M. Rush. 2020.Transformers: State-of-the-art natural language pro-cessing. In

Proceedings of the 2020 Conference onEmpirical Methods in Natural Language Processing:System Demonstrations , pages 38–45, Online. Asso-ciation for Computational Linguistics.right Xu. 2019. Nlp chinese corpus: Large scale chi-nese corpus for nlp.Yang You, Jing Li, Jonathan Hseu, Xiaodan Song,James Demmel, and Cho-Jui Hsieh. 2019. ReducingBERT pre-training time from 3 days to 76 minutes.

CoRR , abs/1904.00962.Yuan Zang, Fanchao Qi, Chenghao Yang, Zhiyuan Liu,Meng Zhang, Qun Liu, and Maosong Sun. 2020.Word-level textual adversarial attacking as combina-torial optimization. In

Proceedings of the 58th An-nual Meeting of the Association for ComputationalLinguistics , pages 6066–6080.Yue Zhang and Jie Yang. 2018. Chinese ner using lat-tice lstm.

ArXiv . A Appendices

Here we provide some of the generated adversarialsamples.As seen in Table 4, piece-level substitutionscould have various types of strategies to craft ad-versaries:• Synonym Replacing: Using synonyms or sim-ilar pieces as substitutions.• Rewrite: Paraphrasing the given pieces.• Expansion: Expand the given pieces. ataset

Label

Law

Ori 被告人黎某甲伙同绰号 “ 杨某 ” 等人去到该房内对严某乙进行殴打 , 之后黎某甲等人又拿着铁棍、木棍等追赶乘坐小汽车离开的严某乙至苍梧大道广场罗马柱对出街道处，将小汽车的挡风玻璃等物品砸烂，并将严某乙打伤。经法医鉴定严某乙所受损伤构成轻伤。 (The defendant Li along with his associates, ’Yang’ et al. went to the room to beat Yan.Then they chased after Yan who was leaving in a car with iron rods and wooden sticks.On the street facing the Roman pillar of Cangwu Avenue Square, they smashed the windshieldof the car and injured Yang. According to the forensic examination,Yan’s injury constituted a minor injury.) 故意伤害 Intentional injuryAdv 被告人黎某甲伙同绰号 “ 杨某 ” 等人去到该房内对严某乙进行殴打 , 之后黎某甲等人又拿着铁棍、木棍等追赶乘坐小汽车离开的严某乙至苍梧大道广场罗马柱对出街道处，将小汽车的挡风玻璃等物品砸烂，并将严某乙重伤。经法医鉴定严某乙所受损伤构成轻伤。 (The defendant Li along with his associates, ’Yang’ et al. went to the room to beat Yan.Then they chased after Yan who was leaving in a car with iron rods and wooden sticks.On the street facing the Roman pillar of Cangwu Avenue Square, they smashed the windshieldof the car and seriously injured Yang. According to the forensic examination,Yan’s injury constituted a minor injury.) 酒驾 Drunk driving

Weibo

Ori 你竟然还给我唱《夜上海》《给我一个吻》啊啊啊啊啊啊啊啊啊啊啊啊！！！ You even sang ”Night Shanghai” and ”Give Me a Kiss” to me. OMG! 开心 (Happy)Adv 你还可以给我唱的《夜上海》《给我一个吻》啊啊啊啊啊啊啊啊啊啊啊啊！！！ You could also sang ”Night Shanghai” and ”Give Me a Kiss” to me. OMG! 嫌弃 (Disgust) Iﬂytek