Generating Adversarial Examples in Chinese Texts Using Sentence-Pieces
Linyang Li, Yunfan Shao, Demin Song, Xipeng Qiu, Xuanjing Huang
GGenerating Adversarial Examples in Chinese Texts Using Sentence-Pieces
Linyang Li, Yunfan Shao ∗ , Demin Song , Xipeng Qiu † , Xuanjing Huang Shanghai Key Laboratory of Intelligent Information Processing, Fudan UniversitySchool of Computer Science, Fudan University { linyangli19, yfshao19, dmsong20, xpqiu, xjhuang } @fudan.edu.cn Abstract
Adversarial attacks in texts are mostlysubstitution-based methods that replace wordsor characters in the original texts to achievesuccess attacks. Recent methods use pre-trained language models as the substitutes gen-erator. While in Chinese, such methods arenot applicable since words in Chinese requiresegmentations first. In this paper, we pro-pose a pre-train language model as the substi-tutes generator using sentence-pieces to craftadversarial examples in Chinese. The substi-tutions in the generated adversarial examplesare not characters or words but ’pieces’ , whichare more natural to Chinese readers. Experi-ments results show that the generated adversar-ial samples can mislead strong target modelsand remain fluent and semantically preserved.
Adversarial attacks (Goodfellow et al., 2014; Ku-rakin et al., 2016; Chakraborty et al., 2018) arefirstly introduced in the computer vision fields.Neural networks are vulnerable to adversarial sam-ples with small perturbations based on gradients tothe original inputs.In the natural language processing fields, pertur-bations cannot be applied directly, so most meth-ods involve strategies such as replacing, insertionor deletion (Ebrahimi et al., 2017; Alzantot et al.,2018; Jia and Liang, 2017; Jin et al., 2019). Mostreplacing strategies incorporate a synonym dictio-nary (Dong et al., 2010; Fellbaum, 1998) or useword embeddings (Pennington et al., 2014; Mrkˇsi´cet al., 2016; Jin et al., 2019).Recently, incorporating pre-trained languagemodels such as BERT (Devlin et al., 2018) as asubstitute generator is introduced in the adversarialexample generation of texts (Li et al., 2020; Garg ∗ Equal Contribution. † Corresponding author.
Input sentence 这种情况放在⼗年以前是不可想象的,侧⾯反映出公信⼒的坍塌。 这种情况在⼗年以前是不可想象的 , 侧⾯反映出公信⼒的衰落。 Piece-Level
Attacker
This situation could not be possible ten years ago, which indicates the collapse of the public credibility.
Strong
Victim Model such as
BERT 愤怒( anger ) 开⼼ (happy)
Text Classification : What type of this sentence is ?
This situation could not be possible ten years ago, which indicates the fading of the public credibility.
Figure 1: Example of piece-level Adversary and Ramakrishnan, 2020). Such a process broughtadversarial example generation to a higher level:using well-learned models to generate adversar-ial samples instead of specific rules such as entitychecking or grammar checking (Jin et al., 2019).However, in Chinese language, crafting adversar-ial examples are more challenging: Chinese doesnot contain explicit whitespace between words, sothe boundaries between characters and words arevague. Therefore, current Chinese pre-trained lan-guage models are character-based, yet replacingcertain characters in Chinese cannot maintain flu-ency and semantics.Therefore, in this paper, we propose the ideaof incorporating sentence-pieces in the Chinesetokenization in pre-trained language models, anduse such a language model as a sentence-piecelevel substitution generator to craft high-qualityadversarial examples in Chinese.At first, we use sentence-piece tokenizer (Kudoand Richardson, 2018) to create a piece-level vo-cabulary based on the large-scale corpus collectedonline (Xu, 2019). After sentence-piece tokeniza-tion, we use this piece-level vocabulary to pre-train a r X i v : . [ c s . C L ] D ec masked language model as the substitution gen-erator following the standard pre-training processof BERT. Then we use the pre-trained substitu-tion generator to craft piece-level adversarial sam-ples in Chinese. As seen in Fig1, we can generatecharacter-level, word-level and phrase-level substi-tutions in Chinese.We use the trained Chinese Substitution Gen-erator to attack broadly used Chinese text classi-fication tasks such as Sogou, Iflytek, Weibo andLaw34 datasets. The experiments show that thegenerated adversarial samples successfully misleadtarget models and reserve the semantic informationand fluency.To summarize the key contribution of this pa-per: we use sentence-piece based tokenization totrain a masked language model that can generatepiece-level substitutions for Chinese adversarial ex-amples. To our knowledge, we are the first to craftadversarial samples in Chinese that can generatemulti-level substitutions. In the NLP field, adversarial attacks face a ma-jor challenge: people cannot apply gradients onthe embedding space to craft adversarial samples,which is widely explored in the CV field (Good-fellow et al., 2014; Chakraborty et al., 2018). Soreplacing characters (Ebrahimi et al., 2017), words(Alzantot et al., 2018; Jin et al., 2019; Ren et al.,2019; Papernot et al., 2016) or paraphrasing sen-tences (Jia and Liang, 2017) are the main streamsof generating adversarial samples in texts.The formulation of attacking NLP models isthat, given inputs X = [ w , · · · , w i , · · · ] , where w i is the target character/word, we have a can-didate list S ( w i ) = [ s i , · · · , s ij ] , which is thepotential substitutes of w i . The goal is to findan adversarial sample X (cid:48) = [ w , · · · , s id , · · · ] where argmax ( F ( X )) (cid:54) = argmax ( F ( X (cid:48) )) . Inmost cases, we assume that we know the outputscore of the target model, which is a black-boxscenario, while in white-box attacks we know themodel architecture and therefore the gradients overthe inputs.Jin et al. (2019) uses word embeddings to craftcandidate list S to find substitutions while Zanget al. (2020) uses knowledge extracted from Word-Net (Fellbaum, 1998). Recently Li et al. (2020);Garg and Ramakrishnan (2020); Shi et al. (2019) uses pre-trained models to find similar tokens asthe candidate list. Pre-trained models exemplified by BERT (Devlinet al., 2018) introduces a masked-language modeltask to predict the masked tokens. These models(Devlin et al., 2018) use BPE-based tokenizationsto avoid unknown words in English. With massivecalculation, these pre-trained models have revolu-tionized NLP tasks.In Chinese texts generation, a whole-word-maskstrategy is introduced to predict the entire wordproperly (Cui et al., 2019), while the tokenizationis still character-level. Therefore, such a modelcannot be directly applied to generate multiple can-didates as the substitution list (Li et al., 2020; Gargand Ramakrishnan, 2020) in adversarial attacks.
Sentence-piece tokenization (Kudo and Richard-son, 2018) is derived from byte-pair encoding (Sen-nrich et al., 2016) and unigram language model(Kudo, 2018) with the extension of direct trainingfrom raw sentences. Diffrent from BPE, sentence-piece is directly trained from raw texts so no wordsegmentation is needed. Therefore, we could cre-ate a vocabulary based on sentence-piece basedtokenization in Chinese.
In this section, we introduce mainly two parts: (1)how we train a piece-level masked language modelas a Chinese substitution generator; (2) how weuse such a model to generate adversarial samplesin Chinese.
To train a Chinese Substitution Generator, we fol-low the standard protocol of pre-training maskedlanguage models. We simply replace the vocabu-lary with Chinese sentence-pieces in order to gen-erate proper substitutions.There are only a few thousands of commonlyused Chinese characters, we extend the vocabularysize to 60 thousand pieces, based on the trainingdata collected online (Xu, 2019).The statistics shown in Table 1 indicates thatthere are around 9% (5400) of pieces in the vocab-ulary are single characters. Most pieces are bi-char har Num(%) Examples1 9 % 是 (is), 在 (in), 前 (before)2 44 % 但 是 (but), 这 个 (this), 生 活 (life)3 24 % 自 己 的 (belong to me), 是 不 是 (is or not)4 17 % 这 个 问 题 (this problem)5+ 4 % 证 券 投 资 基 金 (Securities Investment Funds) Table 1: Statistics of the vocabulary words or tri-char words. There are also a consider-able amount of pieces with more than 3 characters,which are usually phrases or continuous entities.The pre-trained model is an architecture thesame as BERT-base, with 12 layers and hiddensize set to 768. We use LAMB optimizer (Youet al., 2019) to pre-train our model on NVIDIA3090 GPUs using fairseq toolkit (Ott et al., 2019).
In crafting adversarial examples, we apply a two-step algorithm which is widely used in craftingsubstitution-based adversarial examples (Jin et al.,2019; Li et al., 2020): (1) first we find the mostvulnerable pieces by iteratively ranking the pieceimportance of the original input sentence. We fol-low Jin et al. (2019); Li et al. (2020) to measurethe piece importance by masking tokens iterativelyand calculate the output scores: I w i = o y ( S ) − o y ( S \ w i ) , (1)where S \ w i = [ w , · · · , w i − , [ MASK ] , w i +1 , · · · ] is the sentence after replacing w i with [ MASK ] and o y () is the output score after the softmax functionin the final classification layer.Then we replace pieces with the candidates pre-dicted by the Chinese Substitution Generator.Following Li et al. (2020), we do not mask theoriginal pieces so that the semantic information ispreserved. We use the top- K predictions of thegiven pieces in the masked language model as thesubstitution candidates. Since the tokenization pro-cess is piece-level, the substitution candidates areflexible: candidates contain both single characters,phrases that expand the meaning of the originalpiece, words that have similar meanings with theoriginal piece.As seen in Table 2, we have different types ofsubstitutions of a single piece: we can replaceChinese word ’ 今 天 (today)’ with a synonym ’ 今 日 (today)’, or we can find an expanded word ’ 今 天 的 (today’s)’; also, we can replace it with a singlecharacter ’ 今 (now/’, which is a shorten version of’today’. With the piece-level substitution generator,the generated examples could be very diversified. pieces candidate types candidates list 今 天 words 今 日 phrase 今 天 的 character 今 Table 2: eaxmple of top-k predictions
After getting candidate list, we replace pieces bythe order of the ranked word importance. In prac-tice, the candidate list size K is 12 in all datasets.We find the most harmful piece in the candidatelist as the perturbation of the current piece, if themodel cannot correctly predict the classificationof the sample, we return the generated adversarialsample. Otherwise, we continue to find anotherpiece to replace until we find a proper adversarialexample. We use several popular Chinese text classificationtasks as our attacking datasets :Sogou : the Sogou dataset is a 12-class sentence-genre task.IflyTek: the Iflytek dataset is part of the ChineseGLUE benchmark, and it is a 119-class sentencegenre task.Weibo : the Weibo dataset is a sentiment classi-fication task containing 8 emotions.Law34 : the Law34 dataset is a 34-class taskpredicting the court decision type of the given texts.We fine-tune the standard BERT-base-chinesemodel as our victim models for the correspondingtasks with huggingface transformers (Wolf et al.,2020).We randomly select 200 examples from the de-velopment set in each dataset and craft their corre-sponding adversarial examples. The major metric is the attack success rate, which isthe percentage of successful attacks in the dataset.The second metric is the change rate of the gen-erated adversarial samples. We intuitively believethat fewer changes could result in a less semanticshift.Further, we run a human evaluation to measurethe quality of the generated adversarial samples. https://github.com/LinyangLee/CN-TC-datasets https://github.com/CLUEbenchmark/CLUE https://github.com/google-research/bert ataset Method Ori Acc Atk Acc Ptb% HumanConsist % FluencyIflyTek Char-R 60.0 9.0 2.7 (c) 90 4.2Word-R 10.0 2.5 (w) 92 4.3Pce-R( ours ) 11.0 2.7 (pce) 94 4.5Weibo Char-R 92.0 27.0 9.1 (c) 88 4.0Word-R 30.0 8.7 (w) 89 4.0Pce-R( ours ) 35.0 10.2 (pce) 90 4.2Sogou Char-R 93.5 15.0 3.2 (c) 90 3.8Word-R 16.0 3.0 (w) 91 3.9Pce-R( ours ) 18.5 4.4 (pce) 93 4.2Law34 Char-R 93.0 10.0 2.8 (c) 94 4.5Word-R 11.0 2.7 (w) 95 4.6Pce-R( ours ) 12.0 4.2 (pce) 97 4.8
Table 3: Main Results of Generated Examples
We mix original samples and generated adversariesand ask human judges to predict the label and thefluency of the examples. Since there could betoo much labels for human judges, we ask humanjudges to predict whether the given texts are thecorrect label or not. We also ask them to score thefluency ranging from 1-5.
We setup a strong baseline to compare with ourmethod:Char-Replace: We incorporate the original Chi-nese BERT which is character-level to generateadversarial samples by replacing characters. Thehyper-parameters are the same as our piece-levelmodel.Word-Replace: We use a 50-dimension word-embedding collected by Zhang and Yang (2018)and use cosine-similarity to find candidate list asdone by Jin et al. (2019). Since it is a word-levelattack, we use Jieba tokenization tool to tokenizethe sequence. We use a threshold of the cosine-similarity to constrain the quality of the candidates.We use around 60K most-frequent words in theword-embedding. As seen in Table 3, the generated adversarial sam-ples successfully mislead the strong fine-tunedBERT-chinese models. Also, human judges give ahigh accuracy predicting whether the classificationis correct, also give a high fluency score.Compared with the character-level and word-level method, the piece-level adversarial samplescan achieve similar attacking results and maintain ahigh fluency score. The adversarial samples should https://github.com/fxsjy/jieba Weibo IflyTek Sogou Law
Figure 2: Trade-Off curve between K and Success be both harmful to the target models and seman-tically fluent, therefore the piece-level adversarialsamples generated are better adversarial samplesthough the success rate is lower than the character-replace and word-replace methods. The successrate could be higher when expanding the size ofthe candidate list according to Morris et al. (2020)therefore what matters most is the quality of theadversarial samples.As seen in the appendix, the generated piece-level adversarial samples could make successful at-tacks from different aspects and remain fluent whilethe character-level adversarial samples are harderto comprehend. We can replace tokens with sim-ilar meaning substitutes, and we can also replacethem with irrelevant but fluent and label-preservedsubstitutes. A larger candidate size would result in easier at-tacks. Therefore we apply a trade-off curve show-ing that we can achieve a significantly higher attacksuccess rate when we use a large candidate size.But it is intuitive that larger candidate size mayalso significantly harm the quality of the generatedadversarial samples in both semantics and fluency.Therefore we believe that success rate is not themost important , since maintaining the fluency andsemantic is one major concern in crafting adversar-ial samples. While in our experiment, piece-levelcandidates are generally more natural to humanjudges.
In this paper, we propose a piece-level adversarialsample generation strategy for Chinese texts, whichcan fill in the blank of text adversarial sample gen-eration for languages other than English. eferences
Moustafa Alzantot, Yash Sharma, Ahmed Elgohary,Bo-Jhang Ho, Mani B. Srivastava, and Kai-WeiChang. 2018. Generating natural language adversar-ial examples.
CoRR , abs/1804.07998.Anirban Chakraborty, Manaar Alam, Vishal Dey, Anu-pam Chattopadhyay, and Debdeep Mukhopadhyay.2018. Adversarial attacks and defences: A survey. arXiv preprint arXiv:1810.00069 .Yiming Cui, Wanxiang Che, Ting Liu, Bing Qin,Ziqing Yang, Shijin Wang, and Guoping Hu. 2019.Pre-training with whole word masking for chinesebert. arXiv preprint arXiv:1906.08101 .Jacob Devlin, Ming-Wei Chang, Kenton Lee, andKristina Toutanova. 2018. BERT: pre-training ofdeep bidirectional transformers for language under-standing.
CoRR , abs/1810.04805.Zhendong Dong, Qiang Dong, and Changling Hao.2010. HowNet and its computation of meaning. , pages53–56.Javid Ebrahimi, Anyi Rao, Daniel Lowd, and De-jing Dou. 2017. Hotflip: White-box adversarialexamples for text classification. arXiv preprintarXiv:1712.06751 .Christiane Fellbaum. 1998. Wordnet: An electroniclexical database.
Bradford Books .Siddhant Garg and Goutham Ramakrishnan. 2020.Bae: Bert-based adversarial examples for text clas-sification. arXiv preprint arXiv:2004.01970 .Ian J Goodfellow, Jonathon Shlens, and ChristianSzegedy. 2014. Explaining and harnessing adversar-ial examples. arXiv preprint arXiv:1412.6572 .Robin Jia and Percy Liang. 2017. Adversarial exam-ples for evaluating reading comprehension systems. arXiv preprint arXiv:1707.07328 .Di Jin, Zhijing Jin, Joey Tianyi Zhou, and PeterSzolovits. 2019. Is BERT really robust? naturallanguage attack on text classification and entailment.
CoRR , abs/1907.11932.Taku Kudo. 2018. Subword regularization: Improvingneural network translation models with multiple sub-word candidates. arXiv preprint arXiv:1804.10959 .Taku Kudo and John Richardson. 2018. SentencePiece:A simple and language independent subword tok-enizer and detokenizer for neural text processing. In
Proceedings of the 2018 Conference on EmpiricalMethods in Natural Language Processing: SystemDemonstrations , pages 66–71, Brussels, Belgium.Association for Computational Linguistics.Alexey Kurakin, Ian Goodfellow, and Samy Bengio.2016. Adversarial examples in the physical world. arXiv preprint arXiv:1607.02533 . Linyang Li, Ruotian Ma, Qipeng Guo, XiangyangXue, and Xipeng Qiu. 2020. Bert-attack: Adver-sarial attack against bert using bert. arXiv preprintarXiv:2004.09984 .John X. Morris, Eli Lifland, Jack Lanchantin, YangfengJi, and Yanjun Qi. 2020. Reevaluating adversarialexamples in natural language. In
ArXiv , volumeabs/2004.14174.Nikola Mrkˇsi´c, Diarmuid ´O S´eaghdha, Blaise Thom-son, Milica Gaˇsi´c, Lina Rojas-Barahona, Pei-HaoSu, David Vandyke, Tsung-Hsien Wen, and SteveYoung. 2016. Counter-fitting word vectors to lin-guistic constraints. In
Proceedings of HLT-NAACL .Myle Ott, Sergey Edunov, Alexei Baevski, AngelaFan, Sam Gross, Nathan Ng, David Grangier, andMichael Auli. 2019. fairseq: A fast, extensibletoolkit for sequence modeling. In
Proceedings ofNAACL-HLT 2019: Demonstrations .Nicolas Papernot, Patrick McDaniel, AnanthramSwami, and Richard Harang. 2016. Crafting adver-sarial input sequences for recurrent neural networks.In
MILCOM 2016-2016 IEEE Military Communica-tions Conference , pages 49–54. IEEE.Jeffrey Pennington, Richard Socher, and ChristopherManning. 2014. Glove: Global vectors for wordrepresentation. In
Proceedings of the conference onempirical methods in natural language processing ,pages 1532–1543.Shuhuai Ren, Yihe Deng, Kun He, and Wanxiang Che.2019. Generating natural language adversarial ex-amples through probability weighted word saliency.In
Proceedings of the 57th Annual Meeting of theAssociation for Computational Linguistics , pages1085–1097.Rico Sennrich, Barry Haddow, and Alexandra Birch.2016. Neural machine translation of rare wordswith subword units. In
Proceedings of the 54th An-nual Meeting of the Association for ComputationalLinguistics (Volume 1: Long Papers) , pages 1715–1725, Berlin, Germany. Association for Computa-tional Linguistics.Zhouxing Shi, Minlie Huang, Ting Yao, and Jing-fang Xu. 2019. Robustness to modification withshared words in paraphrase identification.
CoRR ,abs/1909.02560.Thomas Wolf, Lysandre Debut, Victor Sanh, JulienChaumond, Clement Delangue, Anthony Moi, Pier-ric Cistac, Tim Rault, R´emi Louf, Morgan Funtow-icz, Joe Davison, Sam Shleifer, Patrick von Platen,Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu,Teven Le Scao, Sylvain Gugger, Mariama Drame,Quentin Lhoest, and Alexander M. Rush. 2020.Transformers: State-of-the-art natural language pro-cessing. In
Proceedings of the 2020 Conference onEmpirical Methods in Natural Language Processing:System Demonstrations , pages 38–45, Online. Asso-ciation for Computational Linguistics.right Xu. 2019. Nlp chinese corpus: Large scale chi-nese corpus for nlp.Yang You, Jing Li, Jonathan Hseu, Xiaodan Song,James Demmel, and Cho-Jui Hsieh. 2019. ReducingBERT pre-training time from 3 days to 76 minutes.
CoRR , abs/1904.00962.Yuan Zang, Fanchao Qi, Chenghao Yang, Zhiyuan Liu,Meng Zhang, Qun Liu, and Maosong Sun. 2020.Word-level textual adversarial attacking as combina-torial optimization. In
Proceedings of the 58th An-nual Meeting of the Association for ComputationalLinguistics , pages 6066–6080.Yue Zhang and Jie Yang. 2018. Chinese ner using lat-tice lstm.
ArXiv . A Appendices
Here we provide some of the generated adversarialsamples.As seen in Table 4, piece-level substitutionscould have various types of strategies to craft ad-versaries:• Synonym Replacing: Using synonyms or sim-ilar pieces as substitutions.• Rewrite: Paraphrasing the given pieces.• Expansion: Expand the given pieces. ataset
Label
Law
Ori 被 告 人 黎 某 甲 伙 同 绰 号 “ 杨某 ” 等 人 去 到 该 房 内 对 严 某 乙 进 行 殴 打 , 之 后 黎 某 甲 等 人 又 拿 着 铁 棍 、 木 棍 等 追 赶 乘 坐 小 汽 车 离 开 的 严 某 乙 至 苍 梧 大 道 广 场 罗 马 柱 对 出 街 道 处 , 将小 汽 车 的 挡 风 玻 璃 等 物 品 砸 烂 , 并 将 严 某 乙 打 伤 。 经 法 医 鉴 定 严 某 乙 所 受 损 伤 构 成 轻 伤 。 (The defendant Li along with his associates, ’Yang’ et al. went to the room to beat Yan.Then they chased after Yan who was leaving in a car with iron rods and wooden sticks.On the street facing the Roman pillar of Cangwu Avenue Square, they smashed the windshieldof the car and injured Yang. According to the forensic examination,Yan’s injury constituted a minor injury.) 故 意 伤 害 Intentional injuryAdv 被 告 人 黎 某 甲 伙 同 绰 号 “ 杨某 ” 等 人 去 到 该 房 内 对 严 某 乙 进 行 殴 打 , 之 后 黎 某 甲 等 人 又 拿 着 铁 棍 、 木 棍 等 追 赶 乘 坐 小 汽 车 离 开 的 严 某 乙 至 苍 梧 大 道 广 场 罗 马 柱 对 出 街 道 处 , 将小 汽 车 的 挡 风 玻 璃 等 物 品 砸 烂 , 并 将 严 某 乙 重 伤 。 经 法 医 鉴 定 严 某 乙 所 受 损 伤 构 成 轻 伤 。 (The defendant Li along with his associates, ’Yang’ et al. went to the room to beat Yan.Then they chased after Yan who was leaving in a car with iron rods and wooden sticks.On the street facing the Roman pillar of Cangwu Avenue Square, they smashed the windshieldof the car and seriously injured Yang. According to the forensic examination,Yan’s injury constituted a minor injury.) 酒 驾 Drunk driving
Ori 你 竟 然 还 给 我 唱 《 夜 上 海 》《 给 我 一个 吻 》 啊啊啊啊啊啊啊啊啊啊啊啊 !!! You even sang ”Night Shanghai” and ”Give Me a Kiss” to me. OMG! 开心 (Happy)Adv 你 还 可 以 给 我 唱 的 《 夜 上 海 》《 给 我 一个 吻 》 啊啊啊啊啊啊啊啊啊啊啊啊 !!! You could also sang ”Night Shanghai” and ”Give Me a Kiss” to me. OMG! 嫌 弃 (Disgust) Iflytek