[PDF] Enhancing Model Robustness By Incorporating Adversarial Knowledge Into Semantic Representation

Abstract

Despite that deep neural networks (DNNs) have achieved enormous success in many domains like natural language processing (NLP), they have also been proven to be vulnerable to maliciously generated adversarial examples. Such inherent vulnerability has threatened various real-world deployed DNNs-based applications. To strength the model robustness, several countermeasures have been proposed in the English NLP domain and obtained satisfactory performance. However, due to the unique language properties of Chinese, it is not trivial to extend existing defenses to the Chinese domain. Therefore, we propose AdvGraph, a novel defense which enhances the robustness of Chinese-based NLP models by incorporating adversarial knowledge into the semantic representation of the input. Extensive experiments on two real-world tasks show that AdvGraph exhibits better performance compared with previous work: (i) effective - it significantly strengthens the model robustness even under the adaptive attacks setting without negative impact on model performance over legitimate input; (ii) generic - its key component, i.e., the representation of connotative adversarial knowledge is task-agnostic, which can be reused in any Chinese-based NLP models without retraining; and (iii) efficient - it is a light-weight defense with sub-linear computational complexity, which can guarantee the efficiency required in practical scenarios.

Full PDF

EENHANCING MODEL ROBUSTNESS BY INCORPORATING ADVERSARIAL KNOWLEDGEINTO SEMANTIC REPRESENTATION

Jinfeng Li ∗ Tianyu Du Xiangyu Liu Rong Zhang Hui Xue Shouling Ji Alibaba Group, Hangzhou, China Zhejiang University, Hangzhou, China

ABSTRACT

AdvGraph , a novel defense which enhances therobustness of Chinese-based NLP models by incorporating adver-sarial knowledge into the semantic representation of the input. Ex-tensive experiments on two real-world tasks show that

AdvGraph exhibits better performance compared with previous work: (i) effec-tive – it signiﬁcantly strengthens the model robustness even underthe adaptive attacks setting without negative impact on model per-formance over legitimate input; (ii) generic – its key component,i.e., the representation of connotative adversarial knowledge is task-agnostic, which can be reused in any Chinese-based NLP modelswithout retraining; and (iii) efﬁcient – it is a light-weight defensewith sub-linear computational complexity, which can guarantee theefﬁciency required in practical scenarios.

Index Terms — Adversarial examples, model robustness, adver-sarial defense

1. INTRODUCTION

Deep neural networks (DNNs) have recently revolutionized naturallanguage processing (NLP) with their impressive performance onmany tasks, including text classiﬁcation [1, 2, 3], machine transla-tion [4, 5] and question answering [6, 7]. Such advances in DNNshave led to broad deployment of systems on important tasks in phys-ical world. However, recent studies have revealed that DNNs areinherently vulnerable to adversarial examples that are maliciouslycrafted by adding small perturbations into benign input that can trig-ger the target DNNs to misbehave [8, 9, 10]. Such vulnerability hasraised great concerns about the security as well as the deployment ofDNNs in real-world tasks especially the security-sensitive tasks.In the meantime, DNNs-based text classiﬁcation has been thebackbone technique behind online toxic content censorship systems[11, 12], which have been widely used to automatically detect toxicuser-generated content for purifying the online social network andhave achieved great success in replacing the time-consuming and la-borious manual censorship. Nevertheless, it cannot be ignored thatthere are many malicious netizens in online social networks whousually obfuscate their insulting comments by replacing the toxic ∗ Contacted e-mail: [email protected] 专业婇僄(彩票)指导, 莓夭(每天)稳贝兼(赚)800, 详婧茄莪崴(请加我微). (

Professional lottery guide that helps you earn at least ¥800 everyday. For more details, please add my WeChat .)真他蚂(妈)稀拦(烂),电影两个小时的信息量竟然比2分钟的预告片还少, 难道是拍给智樟(障)看的吗？(

It's damn trash. The two-hour movie is even less informa-tive than the two-minute trailer. Is it filmed for idiots? ) Task:

Antispam.

Classifier:

TextCNN+AdvGraph.

Result:

86% Spam.

Task:

Sentiment Analysis.

Classifier:

BiLSTM+AdvGraph.

Result:

92% Negative.

Fig. 1 : User-generated obfuscated texts which bypass the commer-cial text moderation systems of Baidu and Netease. The charactersin brackets are the original ones while those in red are the variants.words with their variants (also known as morphs ) to evade the detec-tion of censorship systems [13], and the situation is even worse onChinese social media [14]. For instance, to make their insulting com-ments evasive, malicious netizens may obfuscate some toxic wordsin their comments with the corresponding variants, such as substi-tuting “ 智障 ” (idiot) with “ 智樟 ” as shown in Fig. 1. These variantsare usually visually or phonetically similar to their original words,which can retain the toxic meaning from the human perspective dueto the powerful cognitive and perceptual capabilities of human be-ings. However, due to the inherent vulnerability of DNNs to adver-sarial examples, the DNNs-based censorship systems can be easilydeceived into making wrong decisions on these adversarial texts. Itwas reported that major social media like Twitter, Facebook, andSina Weibo were all criticized for not doing enough to curb the dif-fusion of such toxic content and under pressure to cleanse their plat-forms [15, 16]. Therefore, it is urgent to develop the correspondingdefense countermeasures for improving the robustness of real-worlddeployed systems against the aforementioned adversarial texts.To this end, several defense methods have been proposed in theEnglish NLP domain, which can be summarized into adversarialtraining and spelling correction . Speciﬁcally, Wang et al. [17] andCheng et al. [18] proposed to retrain the NLP model with diver-siﬁed adversarial training data and showed a marginal increase inrobustness. Zhou et al. [19] proposed a spelling correction-basedframework for blocking adversarial texts, which ﬁrst identiﬁes theperturbed tokens by a discriminator and then recovers the tokensfrom discriminated perturbations by a masked language model ob-jective with contextualized language modeling. Similarly, Li et al. [9] adopted the context-aware spelling correction method to mitigateeditorial adversarial attacks and achieved satisfactory performance.However, although these methods have been shown to be effec-tive in enhancing the robustness of English-based NLP models, it isstill intractable to extend them to the Chinese NLP domain due tothe following unique properties of Chinese: (i) complexity – Chineseis a logographic language without word delimiters, in which eachcharacter is individually meaningful and the meaning of each char-acter changes dramatically when the context changes, which makes a r X i v : . [ c s . C L ] F e b (cid:1) (cid:2) (cid:7) (cid:3) (cid:4) (cid:6) (cid:2) (cid:3)(cid:3)(cid:3)(cid:1)(cid:2)(cid:2)(cid:2) (cid:2) (cid:2) (cid:3) (cid:3)(cid:3)(cid:2) (cid:2) (cid:5) (cid:1) (cid:2) (cid:7) (cid:3) (cid:4) (cid:6) (cid:2) (cid:3)(cid:3)(cid:3)(cid:1) (cid:2)(cid:2)(cid:2) (cid:2) (cid:2) (cid:3) (cid:3)(cid:3)(cid:2) (cid:2) Char ID … ……

Graph EmbeddingSemantic Embedding Embedding Fusion Adversarial Graph

Fig. 2 : The framework of our defense approach. The letters “P” and“G” in adversarial graph denote the phonetic-based and glyph-basedvariation relationship, respectively.Chinese NLP models inherently more vulnerable; (ii) sparsity and diversity – there is an extremely large and sparse character space(i.e., there are more than 50,000 Chinese characters, of which onlyabout 3,500 are commonly used ), and each character might beperturbed by various variation strategies (e.g., glyph- and phonetic-based strategies) [13], which makes the variants more diverse andsparse and thus limits the efﬁcacy of common defenses; and (iii) dy-namicity – in the context of toxic content detection on Chinese onlinesocial media or e-commercial platforms, the arms race between ad-versaries and defenders is extremely ﬁerce [20], and static defensemethods like adversarial training and spelling correction are usuallyonly effective in handling known variants and are still vulnerable tonew attacks. Hence, the defense against adversarial texts in the Chi-nese NLP domain remains a challenging and unsolved problem.To tackle the aforementioned challenges, in this paper, we pro-pose AdvGraph , a novel defense to enhance the inherent robustnessof Chinese-based NLP models by incorporating adversarial knowl-edge into the semantic representation of input texts (shown in Fig. 2).At a high level, we ﬁrst construct an undirected adversarial graphbased on the glyph and phonetic similarity of Chinese characters tomodel the adversarial relationship between characters explicitly. Theintuition behind is that real-world adversarial texts are usually gen-erated utilizing the glyph-based or phonetic-based perturbation [21].Then, we leverage the graph embedding scheme to learn the noderepresentation of the adversarial graph for capturing the prior ad-versarial knowledge between characters. Finally, we incorporate thenode representation into the semantic representation through mul-timodal fusion, and the fused semantic-rich representation is thenready for the downstream tasks. Through extensive experiments, weshow that

AdvGraph can greatly improve the inherent model ro-bustness even under the adaptive attack setting without negative im-pact on the model performance over benign texts, which outperformsthe baselines by a signiﬁcant margin.

2. PROPOSED METHOD2.1. Problem deﬁnition

Consider a Chinese-based text classiﬁer F : X → Y which maps theinput from the feature space X to the label space Y , an adversarywho has query access to the classiﬁcation conﬁdence returned by F ,aims to generate an adversarial text x adv from a legitimate input x ∈ X that contains N characters (i.e., x = { x , x , · · · , x N } )with the ground-truth label y ∈ Y , so that F ( x adv ) (cid:54) = F ( x ) . Inthis paper, we aim to defend against such attacks by incorporatingthe adversarial knowledge learnt from an adversarial graph into thesemantic representation of input. Then, our defense is formalized as F ( x adv ) = F f ( E s ( x adv ) ⊕ E g ( x adv )) = F ( x ) = y, (1) https://en.wikipedia.org/wiki/Chinese characters 微 C1:6@ ×24×1 S1:6@

C2:16@

S2:16@

C3:64@

S3:64@

Convolution Max-Pool Max-PoolConvolution Convolution Avg-Pool

Fig. 3 : Architecture of the glyph representation model.where F f ( · ) is the classiﬁcation function of F , E s ( x adv ) is thesemantic representation of x adv , E g ( x adv ) is the adversarial rep-resentation learnt from an adversarial graph G adv and ⊕ is the fus-ing operation which incorporates the adversarial knowledge into thesemantic representation to form a semantic-rich representation. The framework of

AdvGraph is presented in Fig. 2, which is builtupon adversarial graph, semantic embedding and multimodal fusion.Below, we will elaborate on each of the backbone techniques.

Adversarial graph.

The variation association among Chinesecharacters is usually a many-to-many relationship, i.e., a Chinesecharacter may have multiple variants and it may also be a variantof different characters. The relationship is symmetric since the twocharacters that are visually or phonetically similar are variants ofeach other. Hence, we ﬁrst leverage an undirected graph to modelsuch adversarial relationships explicitly, in which each node denotesa Chinese character and an edge is formed if two characters have avariation relationship. Then, we learn the representation of the ad-versarial relationships among characters utilizing graph embedding. (1) Graph construction.

The adversarial graph G adv is intu-itively built based on the glyph and phonetic similarity of Chinesecharacters since real-world adversarial text are usually generatedbased on the glyph- and phonetic-based perturbations [21]. For thephonetic-based variation relation, we ﬁrst convert each character intoits pinyin form, and then calculate the edit distance between theirpinyin. The characters are viewed as similar if the corresponding editdistance equals to 0 or 1 (only removal operation is considered), andan edge is then formed between them. Different from phonetic simi-larity, the glyph similarity cannot be directly quantiﬁed since it is re-ﬂected through the visual perception of humans. It is also intractableto manually collect those characters that are visually similar due tothe extremely large character space of Chinese. To tackle this chal-lenge, we ﬁrst convert each character into an image of size × ,then we implement a convolutional neural network g-CNN (shown inFig. 3) as done in [13] to learn the glyph representation of each char-acter over the converted image. Different from [13], our g-CNN istrained on 10,000 manually labelled triplets ( x i , x + i , x − i ) over 3,000commonly used characters, in which ( x i , x + i ) and ( x i , x − i ) are thesimilar and dissimilar character pairs, respectively. The glyph repre-sentation is learned by minimizing the triplet loss [22] deﬁned as L = M =10 , (cid:88) i [ (cid:107) h ( x i ) − h ( x + i ) (cid:107) −(cid:107) h ( x i ) − h ( x − i ) (cid:107) + α ] + , (2)where h ( x i ) is the hidden representation of character x i and α is thehyper-parameter. Then, the glyph-based variation relation is built bycalculating the similarity between glyph representations h ( x i ) , i.e,each character in G adv is connected to its top-10 visually similarcharacters. The constructed G adv is illustrated in Fig. 2. (2) Graph embedding. To easily incorporate the complex ad-versarial knowledge into the semantic representation of the input,we ﬁrst map the adversarial relationships in G adv to the featurespace by learning the node representation using graph embedding Pinyin is the ofﬁcial Mandarin romanization system for standard Chinese. cheme. Concretely, we adopt the node2vec [23] to learn the noderepresentation of G adv . Given G adv = ( V, E ) , the mapping function f : V → R d from nodes set V to feature space is learnt based on theidea of Skip-gram [24] by maximizing the objective function L ( f, θ ) = (cid:88) x i ∈ V log ( (cid:89) x j ∈ N S ( x i ) p ( x j | f ( x i ))) , (3)where θ is the parameter of f , p ( x j ) | f ( x i )) is the probability ofobserving the neighbor node x j for node x i conditioned on its fea-ture representation, and N S ( x i ) is the neighborhood of x i obtainedby sampling strategies include both breath-ﬁrst sampling (BFS) anddepth-ﬁrst sampling (DFS). Particularly, BFS helps to model the po-tential adversarial relationships based on structural equivalence, i.e.,nodes with similar structural roles in G adv are embedded closelysince they might have the same variation situation in adversarial sce-nario. DFS helps to model the direct adversarial relationships basedon homophily, i.e., nodes that are highly interconnected or belongto similar clusters are represented closely since they usually expressthe same meaning in adversarial texts. Through the aforementionedframework, we can not only learn the adversarial knowledge in di-rect variation relationships, but also the knowledge in more complexvariation relationships (e.g., secondary variation “ 微 ” → “ 崴 ” shownin Figs. 1 and 2, which “ 崴 ” does not have a direct relationship with“ 微 ” but is a common variant of “ 微 ” in spam advertisement), whichis usually could not be handled by previous defense.Note that the process for building G adv and learning node repre-sentation is task-agnostic and is done ofﬂine. Hence, the adversarialrepresentation can be reused in any Chinese NLP models once learnt,which greatly improves the generality and efﬁciency of the defense. Semantic embedding.

We can either use the classical Word2Vecmodels like CBoW and Skip-gram [25] or directly utilize the pre-trained embedding of language models like BERT [2] to obtain thesemantic embedding. To make a fair comparison for better verifyingthe efﬁcacy of the proposed defense, we adopt the Skip-gram modelto learn semantic embedding from scratch.

Fusion and classiﬁcation.

To improve the model robustness, weleverage intermediate multimodal fusion [26] which is a ﬁne-grainedfusion scheme, to incorporate the node representation into semanticrepresentation. Speciﬁcally, we design two unimodal models φ ( g ) ( · ) and φ ( s ) ( · ) for the graph and semantics modalities, respectively.Hence, for the given adversarial text x adv = { x (cid:48) , x (cid:48) , · · · , x (cid:48) N } , itsadversarial representation E g ( x adv ) is denoted by the last hiddenoutput of φ ( g ) , i.e., E g ( x adv ) = φ ( g ) ([ f ( x (cid:48) ) , f ( x (cid:48) ) , · · · , f ( x (cid:48) N )]) ,and the semantic representation E s ( x adv ) can be obtained similarly.Then, we concatenate E g ( x adv ) and E s ( x adv ) to form a semantic-rich representation for the downstream classiﬁcation tasks, i.e., F ( x adv ) = arg max ˆ y e F ˆ y ( E g ( x adv ) ⊕ E s ( x adv )) (cid:80) Ci =1 e F i ( E g ( x adv ) ⊕ E s ( x adv )) , (4)where F i is the conﬁdence of the i-th class, C is the total number ofclasses, and the defense is viewed as successful if ˆ y = y .

3. EXPERIMENT3.1. Experimental settingDatasets.

AdvGraph is evaluated on two datasets in the task ofclassiﬁcation: (i) Douban Short Movie Comments (DMSC) [27]: Itis a public benchmark for Chinese sentiment analysis, consisted ofover millions of movie comments. (ii) Spam Advertisement (Spa-mAds): It is a real-world dataset that contains 100,000 user com-ments collected from the e-commercial platform Taobao, of which

Table 1 : Model performance in the non-adversarial scenario. Avg-conf is the average conﬁdence on correctly classiﬁed texts.

Model Antispam Sentiment AnalysisAccuracy Avg-conf Accuracy Avg-confTextCNN 0.928 0.944 0.874 0.873TextCNN+SC 0.920 0.936 0.864 0.867TextCNN+AdvGraph

BiLSTM 0.893 0.894 0.851 0.849BiLSTM+SC 0.886 0.887 0.845 0.844BiLSTM+AdvGraph

Setup.

We adopt the black-box attack TextBugger [9] to evalu-ate the efﬁcacy of

AdvGraph . We use attack success rate (ASR) andthe average number of perturbed words (perturbation) in the adver-sarial texts to measure the attack performance, and we use semanticsimilarity [28] as well as adversarial similarity (i.e., the maximumof the phonetic and glyph similarity between the original and gener-ated texts) to quantify the quality of generated adversarial texts. Weexperiment with two commonly used models, i.e., TextCNN [1] andBiLSTM [3], to evaluate the generalizability of

AdvGraph acrossarchitectures. In addition, we compare

AdvGraph with a spellingcorrection-based approach (SC) [29].

We ﬁrst evaluate the effectiveness of

AdvGraph in the non-adversarial scenario to verify whether the added defense will have anegative impact on the model performance on legitimate texts.The main results are summarized in Table. 1, from which we cansee that all the TextCNN and BiLSTM models have achieved con-siderable high accuracy in the two tasks. However, the accuracy isdecreased by about 1% when adopting SC as the defense method.This is mainly because that the SC method makes erroneous correc-tion over some legitimate texts, resulting in a negative impact on thedownstream classiﬁcation. In comparison, the accuracy of the mod-els defended by

AdvGraph is promoted in most cases, which indi-cates that

AdvGraph would not affect the model performance in thenon-adversarial scenario. In addition, the average conﬁdence of themodels protected by

AdvGraph on correctly classiﬁed samples ishigher than that of the common models and the models defended bySC. It shows that the models defended by

AdvGraph have learnedbetter decision boundaries which can separate samples of differentclasses more perfectly. We argue that this mainly beneﬁts from thesemantic-rich representation obtained by multimodal fusion, whichintroduces more knowledge for the ﬁnal decision that can hardly belearned by simply increasing the data amount.

Table 2 : Model performance on user-generated obfuscated texts.

Model Antispam Sentiment AnalysisAccuracy Perturbation Accuracy PerturbationTextCNN 0.630 1.23 0.669 1.16TextCNN+SC 0.758 1.47 0.734 1.25TextCNN+AdvGraph

BiLSTM 0.618 1.19 0.622 1.14BiLSTM+SC 0.743 1.41 0.715 1.22BiLSTM+AdvGraph

We then evaluate the efﬁcacy of

AdvGraph from the perspectiveof mitigating the user-generated obfuscated texts. Speciﬁcally, weﬁrst collect 1,000 obfuscated spam ads and 1,000 obfuscated nega-tive movie comments from e-commercial platforms and online so- able 3 : The attack performance against all the target models under the adaptive setting.Model Antispam Sentiment AnalysisASR Perturbation AdversarialSimilarity SemanticSimilarity ASR Perturbation AdversarialSimilarity SemanticSimilarityTextCNN 0.769 1.63 0.917 0.874 0.703 2.07 0.911 0.832TextCNN+SC 0.763 1.56 0.919 0.873 0.673 2.02 0.902 0.831TextCNN+AdvGraph

BiLSTM 0.757 1.97 0.903 0.858 0.759 2.04 0.916 0.831BiLSTM+SC 0.738 1.92 0.931 0.872 0.716 1.99 0.910 0.837BiLSTM+AdvGraph

Max Perturbation A S R TextCNNTextCNN+SC TextCNN+AdvGraphBiLSTM BiLSTM+SCBiLSTM+AdvGraph (a) Antispam

Max Perturbation A S R TextCNNTextCNN+SC TextCNN+AdvGraphBiLSTM BiLSTM+SCBiLSTM+AdvGraph (b) Sentiment Analysis

Fig. 4 : The impact of maximum perturbation allowed on ASR.cial media, respectively. Each collected text is manually conﬁrmedto have at least one variant in the text (as shown in Fig. 1). Then,the efﬁcacy is evaluated on these texts in terms of accuracy and theaverage number of perturbations in the correctly handled texts.The evaluation results is shown in Table. 2. It is obviouslyobserved that both the SC-based defense and

AdvGraph have pro-moted the model performance on the user-generated obfuscatedtexts. For instance, the accuracy of TextCNN in the antispam taskis promoted by . when leveraging the SC-based defense and itis promoted by . when adopting AdvGraph . This indicatesthat both these two methods are effective in the static adversarialscenario and

AdvGraph still outperforms the baseline.

Finally, we evaluate the efﬁcacy of

AdvGraph under the adaptiveattack setting assuming attackers know our defense [30]. Under thissetting, attackers can explore the vulnerability of the target modeland the defense through query access to the whole pipeline. This isa more realistic worst-case setting since attackers in real adversar-ial scenarios usually only have black-box query access to the targetmodels, and they would adopt new variation strategies to evade thedefense once they perceive the defense. In this evaluation, we applyTextBugger to mimic the real-world adversaries, and the adversarialtexts are generated from 1,000 correctly classiﬁed samples randomlysampled for each task. The maximum perturbation allowed per textis 4 since the average length of sampled texts is about 40.

Attack performance.

The results of adaptive attack are pre-sented in Table. 3. Obviously, it is observed that there is only anunnoticeable reduction in ASR against the target models when tak-ing the SC-based method as the defense. By analyzing the correctedtexts output by the SC model, we ﬁnd that the SC model itself is vul-nerable to adaptive attacks while the efﬁcacy of SC-based defense iscompletely dependent on its correction performance, thus leading tolimited effectiveness in the adaptive attack scenario. On the contrary,the ASR against the models protected by

AdvGraph is greatly de-creased with the required perturbations increased across all the cases,which outperforms the baselines by a signiﬁcant margin. In addition,it can be reﬂected from the adversarial similarity and semantic simi-larity metrics that the quality of the generated adversarial texts is also

Sensitivity Score C D F TextCNNTextCNN+SCTextCNN+AdvGraph (a) TextCNN

Sensitivity Score C D F BiLSTMBiLSTM+SCBiLSTM+AdvGraph (b) BiLSTM

Fig. 5 : The model sensitivity against perturbations in antispam task.worse. This demonstrates that our proposed defense method is morerobust against the adaptive attack and is more effective in weakeningthe attack threat as well as increasing the attack cost.

Impact of maximum perturbation on ASR.

We also investi-gate the impact of the maximum perturbations allowed in per text onthe ASR against all the target models. The analysis results are shownin Fig. 4, from which we can see that

AdvGraph exhibits good per-formance in mitigating the attack power, i.e., the ASR against thetarget models defended by

AdvGraph increases slightly as the al-lowed maximum number of perturbations grows, and outperformsthe baselines by a signiﬁcant margin. In contrast, the defense efﬁ-cacy of the SC-based method is negligible, which once again provesthat it is not practical in the adaptive adversarial scenario.

Sensitivity analysis.

We further analyze the model sensitivityagainst each perturbation replacement when generating adversarialtexts by taking the antispam task as an example. The cumulativedistribution of the sensitive score, i.e., the reduction in classiﬁca-tion conﬁdence after each perturbation, is visualized in Fig. 5. It isclearly observed from Fig. 5 that the sensitivity scores of the modelsdefended by

AdvGraph are much smaller than those of the commonmodels and the models defended by SC, and a more obvious trendis observed on the BiLSTM models. This indicates that

AdvGraph does enhance the inherent robustness of the target models, whicheffectively mitigates their sensitivity to adversarial perturbations.

4. CONCLUSION

In this paper, we introduce a novel defense speciﬁcally designed forthe Chinese-based NLP models, which greatly enhances the modelrobustness by incorporating the adversarial knowledge into the se-mantic representation of the input with a delicately built adversar-ial relationship graph. The extensive evaluations on two real-worldtasks show that

AdvGraph exhibits excellent performance in de-fending against the user-generated obfuscated text as well as theadaptive adversarial attacks without negative impact on the modelperformance over benign inputs. Although the proposed method isonly evaluated on Chinese tasks, we argue that its basic idea can beextended to some other languages like English, and evaluating itsgeneralizability across languages could be a promising future work. . REFERENCES [1] Yoon Kim, “Convolutional neural networks for sentence clas-siﬁcation,” arXiv preprint arXiv:1408.5882 , 2014.[2] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and KristinaToutanova, “Bert: Pre-training of deep bidirectionaltransformers for language understanding,” arXiv preprintarXiv:1810.04805 , 2018.[3] Peng Zhou, Wei Shi, Jun Tian, Zhenyu Qi, Bingchen Li, Hong-wei Hao, and Bo Xu, “Attention-based bidirectional long short-term memory networks for relation classiﬁcation,” in

Proceed-ings of the 54th Annual Meeting of the Association for Compu-tational Linguistics , 2016, pp. 207–212.[4] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio,“Neural machine translation by jointly learning to align andtranslate,” arXiv preprint arXiv:1409.0473 , 2014.[5] Kyunghyun Cho, Bart Van Merri¨enboer, Dzmitry Bahdanau,and Yoshua Bengio, “On the properties of neural machinetranslation: Encoder-decoder approaches,” arXiv preprintarXiv:1409.1259 , 2014.[6] Daniele Bonadiman, Antonio Uva, and Alessandro Moschitti,“Effective shared representations with multitask learning forcommunity question answering,” in

Proceedings of the 15thConference of the European Chapter of the Association forComputational Linguistics , 2017, pp. 726–732.[7] Merve ¨Unl¨u, Ebru Arisoy, and Murat Sarac¸lar, “Question an-swering for spoken lecture processing,” in

ICASSP 2019-2019IEEE International Conference on Acoustics, Speech and Sig-nal Processing (ICASSP) . IEEE, 2019, pp. 7365–7369.[8] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, JoanBruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus,“Intriguing properties of neural networks,” arXiv preprintarXiv:1312.6199 , 2013.[9] Jinfeng Li, Shouling Ji, Tianyu Du, Bo Li, and Ting Wang,“Textbugger: Generating adversarial text against real-worldapplications,” in , 2019.[10] Melika Behjati, Seyed-Mohsen Moosavi-Dezfooli,Mahdieh Soleymani Baghshah, and Pascal Frossard, “Uni-versal adversarial attacks on text classiﬁers,” in

ICASSP2019-2019 IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP) . IEEE, 2019, pp.7345–7349.[11] Mladen Karan and Jan ˇSnajder, “Cross-domain detection ofabusive language online,” in

Proceedings of the 2nd workshopon abusive language online (ALW2) , 2018, pp. 132–137.[12] Chikashi Nobata, Joel Tetreault, Achint Thomas, YasharMehdad, and Yi Chang, “Abusive language detection in on-line user content,” in

Proceedings of the 25th internationalconference on world wide web , 2016, pp. 145–153.[13] Jinfeng Li, Tianyu Du, Shouling Ji, Rong Zhang, Quan Lu,Min Yang, and Ting Wang, “Textshield: Robust text classi-ﬁcation based on multimodal embedding and neural machinetranslation,” in , 2020.[14] Le Chen, Chi Zhang, and Christo Wilson, “Tweeting underpressure: analyzing trending topics and evolving word choiceon sina weibo,” in

Proceedings of the ﬁrst ACM conference onOnline social networks , 2013, pp. 89–100. [15] Joe Mayes and Stefan Nicola, “Facebook warns it can’t fullysolve toxic content problem,” 2019.[16] Hui Zhang, “Weibo portals close for week following ‘harmfulcontent’ criticism from regulator,” 2018.[17] Yicheng Wang and Mohit Bansal, “Robust machine compre-hension models via adversarial training,” in

Proceedings of the2018 Conference of the North American Chapter of the Associ-ation for Computational Linguistics: Human Language Tech-nologies, Volume 2 (Short Papers) , 2018, pp. 575–581.[18] Yong Cheng, Lu Jiang, Wolfgang Macherey, and Jacob Eisen-stein, “Advaug: Robust adversarial augmentation for neuralmachine translation,” in

Proceedings of the 58th Annual Meet-ing of the Association for Computational Linguistics , 2020, pp.5961–5970.[19] Yichao Zhou, Jyun-Yu Jiang, Kai-Wei Chang, and WeiWang, “Learning to discriminate perturbations for block-ing adversarial attacks in text classiﬁcation,” arXiv preprintarXiv:1909.03084 , 2019.[20] Ao Li, Zhou Qin, Runshi Liu, Yiqun Yang, and Dong Li,“Spam review detection with graph convolutional networks,”in

Proceedings of the 28th ACM International Conference onInformation and Knowledge Management , 2019, pp. 2703–2711.[21] C-L Liu, M-H Lai, K-W Tien, Y-H Chuang, S-H Wu, and C-YLee, “Visually and phonologically similar characters in incor-rect chinese words: Analyses, identiﬁcation, and applications,”

ACM Transactions on Asian Language Information Processing(TALIP) , vol. 10, no. 2, pp. 1–39, 2011.[22] Florian Schroff, Dmitry Kalenichenko, and James Philbin,“Facenet: A uniﬁed embedding for face recognition and clus-tering,” in

Proceedings of the IEEE conference on computervision and pattern recognition , 2015, pp. 815–823.[23] Aditya Grover and Jure Leskovec, “node2vec: Scalable fea-ture learning for networks,” in

Proceedings of the 22nd ACMSIGKDD international conference on Knowledge discoveryand data mining , 2016, pp. 855–864.[24] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado,and Jeff Dean, “Distributed representations of words andphrases and their compositionality,” in

Advances in neural in-formation processing systems , 2013, pp. 3111–3119.[25] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean,“Efﬁcient estimation of word representations in vector space,” arXiv preprint arXiv:1301.3781 , 2013.[26] Jennifer Williams, Ramona Comanescu, Oana Radu, andLeimin Tian, “Dnn multimodal fusion techniques for predict-ing video sentiment,” in

Proceedings of Grand Challenge andWorkshop on Human Multimodal Language (Challenge-HML)