[PDF] Adversarial Examples on Graph Data: Deep Insights into Attack and Defense

Abstract

Graph deep learning models, such as graph convolutional networks (GCN) achieve remarkable performance for tasks on graph data. Similar to other types of deep models, graph deep learning models often suffer from adversarial attacks. However, compared with non-graph data, the discrete features, graph connections and different definitions of imperceptible perturbations bring unique challenges and opportunities for the adversarial attacks and defenses for graph data. In this paper, we propose both attack and defense techniques. For attack, we show that the discreteness problem could easily be resolved by introducing integrated gradients which could accurately reflect the effect of perturbing certain features or edges while still benefiting from the parallel computations. For defense, we observe that the adversarially manipulated graph for the targeted attack differs from normal graphs statistically. Based on this observation, we propose a defense approach which inspects the graph and recovers the potential adversarial perturbations. Our experiments on a number of datasets show the effectiveness of the proposed methods.

Full PDF

AAdversarial Examples on Graph Data: Deep Insights into Attack and Defense

Huijun Wu , , Chen Wang , Yuriy Tyshetskiy , Andrew Docherty , Kai Lu , Liming Zhu , University of New South Wales, Australia Data61, CSIRO National University of Defense Technology, China { ﬁrst, second } @data61.csiro.au, [email protected] Abstract

Graph deep learning models, such as graph convo-lutional networks (GCN) achieve remarkable per-formance for tasks on graph data. Similar to othertypes of deep models, graph deep learning mod-els often suffer from adversarial attacks. However,compared with non-graph data, the discrete fea-tures, graph connections and different deﬁnitionsof imperceptible perturbations bring unique chal-lenges and opportunities for the adversarial attacksand defenses for graph data. In this paper, we pro-pose both attack and defense techniques. For at-tack, we show that the discreteness problem couldeasily be resolved by introducing integrated gradi-ents which could accurately reﬂect the effect of per-turbing certain features or edges while still beneﬁt-ing from the parallel computations. For defense, weobserve that the adversarially manipulated graphfor the targeted attack differs from normal graphsstatistically. Based on this observation, we proposea defense approach which inspects the graph andrecovers the potential adversarial perturbation. Ourexperiments on a number of datasets show the ef-fectiveness of the proposed methods.

Graph is commonly used to model many real-world rela-tionships, such as social networks [Newman et al. , 2002],citation networks and transactions [Ron and Shamir, 2013]and the control-ﬂow of programs [Allen, 1970]. The recentadvance [Kipf and Welling, 2017; Veliˇckovi´c et al. , 2018;Cao et al. , 2016; Henaff et al. , 2015] in deep learning ex-pands its applications on graph data. One common task ongraph data is node classiﬁcation : for a graph and labels of aportion of nodes, the goal is to predict the labels for the unla-belled nodes. This can be used to classify the unknown rolesin the graph. For example, topics of papers in the citationnetwork, customer types in the recommendation systems.Compared with the classic methods [Bhagat et al. , 2011;Xu et al. , 2013], deep learning starts to push forward the per-formance of node classiﬁcation tasks. The graph convolu-tional networks [Bruna et al. , 2013; Edwards and Xie, 2016] and its recent variants [Kipf and Welling, 2017] perform con-volution operations in the graph domain by aggregating andcombining the information of neighbor nodes. In these works,both node features and the graph structures (i.e., edges) areconsidered for classifying nodes.Deep learning methods are often criticized for their lackof robustness [Goodfellow et al. , 2015]. In other words, it isnot difﬁcult to craft adversarial examples by only perturbinga tiny portion of examples to fool the deep neural networks togive incorrect predictions. Graph convolutional networks areno exception. These vulnerabilities under adversarial attacksare major obstacles for deep learning applications to be usedin the safety-critical scenarios. In graph neural networks, onenode can be a user in the social network or an e-commercewebsite. A malicious user may manipulate his proﬁle or con-nect to targeted users on purpose to mislead the analytics sys-tem. Similarly, adding fake comments to speciﬁc productscan fool the recommender systems of a website.The key challenge for simply adopting existing adversar-ial attack techniques used in non-graph data on graph con-volutional networks is the discrete input problems. Speciﬁ-cally, the features of the graph nodes are often discrete. Theedges, especially those in unweighted graphs, are also dis-crete. To address this, some recent studies have proposedgreedy methods [Wang et al. , 2018; Z¨ugner et al. , 2018]to attack the graph-based deep learning systems. A greedymethod to perturb either features or graph structure itera-tively. Graph structure and features statistics are preservedduring the greedy attack. In this paper, we show that al-though having the discrete input issue, the gradients can stillbe approximated accurately by integrated gradients. Inte-grated gradients approximate Shapley values [Hart, 1989;Lundberg and Lee, 2016] by integrating partial gradients withrespect to input features from reference input to the actual in-put. Integrated gradients greatly improve the efﬁciency of thenode and edge selection in comparison to iterative methods.Compared with explorations in attacks, the defense of ad-versarial examples in graph models is not well-studied. Inthis paper, we show that one key reason for the vulnerabili-ties of graph models, such as GCN, is that these models areessentially aggregating the features according to graph struc-tures. They heavily rely on the nearest neighboring informa-tion while making predictions on target nodes. We lookedinto the perturbations made by the existing attack techniques a r X i v : . [ c s . L G ] M a y nd found that adding edges which connect to nodes with dif-ferent features plays the key role in all of the attack methods.In this paper, we show that simply performing pre-processingto the adjacency matrix of the graph is able to identify themanipulated edges. For nodes with bag-of-words (BOW)features, the Jaccard index is effective while measuring thesimilarities between connected nodes. By removing edgesthat connect very dissimilar nodes, we are able to defend thetargeted adversarial attacks without decreasing the accuracyof the GCN models. Our results on a number of real-worlddatasets show the effectiveness and efﬁciency of the proposedattack and defense. Given an attributed graph G = ( A , X ) , A ∈ [0 , N × N is the adjacency matrix and X ∈ [0 , D represents the D -dimenisonal binary node features. Assuming the in-dices for nodes and features are V = { , , ..., N } and F = { , , ..., D } , respectively. We then consider thetask of semi-supervised node classiﬁcation where a subsetof nodes V L ⊆ V are labelled with labels from classes C = { , , ..., c K } . The target of the task is to map each node inthe graph to a class label. This is often called transductive learning given the fact that the test nodes are already knownduring the training time.In this work, we study Graph Convolutional Network(GCN) [Kipf and Welling, 2017], a well-established methodfor semi-supervised node classiﬁcations. For GCN, initially, H = X . The GCN model then follows the following rule toaggregate the neighboring features: H ( l +1) = σ ( ˜ D − ˜ A ˜ D − H ( l ) W ( l ) ) (1)where ˜ A = A + I N is the adjacency matrix of the graph G with self connections added, ˆ D is a diagonal matrix with ˜ D i,i = Σ j ˜ A ij , and σ is the activation function to introducenon-linearity. Each of the above equation corresponds to onegraph convolution layer. A fully connected layer with soft-max loss is usually used after L layers of graph convolutionlayers for the classiﬁcation. A two-layer GCN is commonlyused for semi-supervised node classiﬁcation tasks [Kipf andWelling, 2017]. The model can, therefore, be described as: Z = f ( X, A ) = softmax ( ˆ Aσ ( ˆ AXW (0) ) W (1) ) (2)where ˆ A = ˜ D − ˜ A ˜ D − . ˆ A is essentially the symmetri-cally normalized adjacency matrix. W (0) and W (1) are theinput-to-hidden and hidden-to-output weights, respectively. Gradients are commonly exploited to attack deep learningmodels [Yuan et al. , 2019]. One can either use the gradi-ents of the loss function or the gradients of the model out-put w.r.t the input data to achieve the attacks. Two examplesare Fast Gradient Sign Method (FGSM) attack and Jacobian-based Saliency Map Approach (JSMA) attack. Fast GradientSign Method (FGSM) [Ian J. Goodfellow, 2014] generates adversarial examples by performing gradient update along thedirection of the sign of gradients of loss function w.r.t eachpixel for image data. Their perturbation can be expressed as: η = (cid:15) sign ( ∇ J θ ( x, l )) (3)where (cid:15) is the magnitude of the perturbation. The gener-ated example is x (cid:48) = x + η .JSMA attack was ﬁrst proposed in [Papernot et al. , 2016].By exploiting the forward derivative of a DNN model, onecan ﬁnd the adversarial perturbations that force the model tomisclassify the test point into a speciﬁc target class . Given afeed-forward neural network F and sample X , the Jacobianis computed by: ∇ F ( X ) = ∂F ( X ) ∂X = (cid:20) ∂F j ( X ) ∂x i (cid:21) i ∈ ...M,j ∈ ...N (4)where the dimensions for the model output and input dataare M and N , respectively. To achieve a target class t , onewants F t ( X ) gets increased while F j ( X ) for all the other j (cid:54) = t to decrease. This is accomplished by exploiting theadversarial saliency map which is deﬁned by: S ( X, t )[ i ] = (cid:40) , if ∂F t ( X ) ∂X i < or Σ j (cid:54) = t ∂F j ( X ) ∂X i > ∂F t ( X ) ∂X i | Σ j (cid:54) = t ∂F j ( X ) ∂X i | , otherwise (cid:41) (5)Starting from a normal example, the attacker follows thesaliency map and iteratively perturb the example with a verytiny amount until the predicted label is ﬂipped. For untar-geted attack, one tries to minimize the prediction score forthe winning class. Although adversarial attack for a graph is a relatively newtopic, a few works have been done as the defense for adver-sarial images on convolutional neural networks (e.g., [Xu et al. , 2018; Papernot and McDaniel, 2018]). For images,as the feature space is continuous, adversarial examples arecarefully crafted with little perturbations. Therefore, in somecases, adding some randomization to the images is able todefend the attacks [Xie et al. , 2018]. Other forms of inputpre-processing, such as local smoothing [Xu et al. , 2018] andimage compression [Shaham et al. , 2018] have also been usedto defend the attacks. These pre-processing works based onthe observation that neighboring pixels of natural images arenormally similar. Adversarial training [Tram`er et al. , 2018]introduces the generated examples to the training data to en-hance the robustness of the model.

Although FGSM and JSMA are not the most sophisticatedattack techniques, they are still not well-studied for graphmodels. For image data, the success of FGSM and JSMAbeneﬁts from the continuous features in pixel color space.However, recent explorations in the graph adversarial attacktechniques [Z¨ugner et al. , 2018; Dai et al. , 2018] show thatimply applying these methods may not lead to successfulattacks. These work address this problem by either usinggreedy methods or reinforcement learning based methodswhich are often expensive.The node features in a graph are often bag-of-words kindof features which can either be 1 or 0. The unweighted edgesin a graph are also frequently used to express the existenceof speciﬁc relationships, thus having only 1 or 0 in the ad-jacency matrix. When attacking the model, the adversarialperturbations are limited to either changing 1 to 0 or viceversa. The main issue of applying vanilla FGSM and JSMAin graph models is the inaccurate gradients. Given a targetnode t , for FGSM attack, ∇ J W (1) ,W (2) ( t ) = ∂J W (1) ,W (2) ( t ) ∂X measures the feature importance of all nodes to the loss func-tion value. Here, X is the feature matrix, each row of whichdescribes the features for a node in the graph. For a speciﬁcfeature i of node n , a larger value of ∇ J W (1) ,W (2) in indicatesperturbing feature i to 1 is helpful to get the target node mis-classiﬁed. However, following this gradient may not help fortwo reasons: First, the feature value might already be 1 so thatwe could not perturb it anymore; Second, even if the featurevalue is 0, since a GCN model may not learn a local linearfunction between 0 and 1 for this feature value, the result ofthis perturbation is unpredictable. It is also similar for JSMAas the Jacobian of the model shares all the limitations withthe gradients of loss. In other words, vanilla gradients sufferfrom local gradient problems. Take a simple ReLU network f ( x ) = ReLU ( x ) as an example, when x increase from 0 to1, the function value also increases by 1. However, comput-ing the gradient at x = 0 gives 0, which does not capture themodel behaviors accurately. To address this, we propose anintegrated gradients based method rather than directly usingvanilla derivatives for the attacks. Integrated gradients wereinitially proposed by [Sundararajan et al. , 2017] to providesensitivity and implementation invariance for feature attribu-tion in the deep neural networks, particularly the convolu-tional neural networks for images.The integrated gradient is deﬁned as follows: for a givenmodel F : R n → [0 , , let x ∈ R n be the input, x (cid:48) is thebaseline input (e.g., the black image for image data). Con-sider a straight-line path from x (cid:48) to the input x , the integratedgradients are obtained by accumulating all the gradients at allthe points along the path. Formally, for the i th feature of x ,the integrated gradients (IG) is as follows: IG i ( F ( x )) ::= ( x i − x (cid:48) i ) × (cid:90) α =0 ∂F ( x (cid:48) + αx ( x − x (cid:48) )) ∂x i dα (6) For GCN on graph data, we propose a generic attack frame-work. Given the adjacency matrix A , feature matrix X , andthe target node t , we compute the integrated gradients forfunction F W (1) ,W (2) ( A, X, t ) w.r.t I where I is the input forattack. I = A indicates edge attacks while I = X indi-cates feature attacks. When F is the loss function of theGCN model, we call this attack technique FGSM-like attackwith integrated gradients , namely IG-FGSM. Similarly, wecall the attack technique by IG-JSMA when F is the predic-tion output of the GCN model. For a targeted IG-JSMA or IG-FGSM attack, the optimization goal is to maximize thevalue of F . Therefore, for the features or edges having thevalue of 1, we select the features/edges which have the low-est negative IG scores and perturb them to 0. The untargetedIG-JSMA attack aims to minimize the prediction score for thewinning class so that we try to increase the input dimensionswith high IG scores to 0.Note that unlike image feature attribution where the base-line input is the black image, we use the all-zero or all-onefeature/adjacency matrices to represent the 1 → → A and feature matrix X to all-zero respectively since we wantto describe the overall change pattern of the target function F while gradually adding edges/features to the current stateof A and X . On the contrary, to add edges/features, we com-pute the change pattern by gradually removing edges/featuresfrom all-one to the current state, thus setting either A or X toan all-one matrix. To keep the direction of gradients consis-tent and ensure the computation is tractable, the IG (for edgeattack) is computed as follows: IG ( F ( X, A, t ))[ i, j ] ≈  ( A ij − × Σ mk =1 ∂F ( km × ( Aij − ∂Aij × m , for removing edges (1 − A ij ) × Σ mk =1 ∂F ( km × (1 − Aij )) ∂Aij × m , for adding edges (7) Algorithm 1 shows the pseudo-code for untargeted IG-JSMA attack. We compute the integrated gradients of theprediction score for winning class c w.r.t the entries of A and X . The integrated gradients are then used as metrics to mea-sure the priority of perturbing speciﬁc features or edges in thegraph G . Note that the edge and feature values are consideredand only the scores of possible perturbations are computed(see Eq.(7)). For example, we only compute the importanceof adding edges if the edge does not exist before. Therefore,for a feature or an edge with high perturbation priority, weperturb it by simply ﬂipping it to a different binary value.While setting the number of steps m for computing inte-grated gradients, one size does not ﬁt all. Essentially, moresteps are required to accurately estimate the discrete gradi-ents when the function learned for certain features/edges isnon-linear. Therefore, we enlarge the number of steps whileattacking the nodes with low classiﬁcation margins until sta-ble performance is achieved. Moreover, the calculation canbe done in an incremental way if we increase the number ofsteps by integer multiples.To ensure the perturbations are unnoticeable, the graphstructure and feature statistics should be preserved for edgeattack and feature attack, respectively. The speciﬁc prop-erties to preserve highly depend on the application require-ments. For our IG based attacks, we simply check againstthese application-level requirements while selecting an edgeor a feature for perturbation. In practice, this process can betrivial as many statistics can be pre-computed or re-computedincrementally [Z¨ugner et al. , 2018]. lgorithm 1: IG-JSMA - Integrated Gradient Guided un-targeted JSMA attack on GCN

Input:

Graph G (0) = ( A (0) , X (0) ) , target node v F : the GCN model trained on G (0) budget ∆ : the maximum number of perturbations. Output:

Modiﬁed graph G (cid:48) = ( A (cid:48) , X (cid:48) ) . Procedure

Attack() //compute the gradients as the perturbation scores foredges and features. s e ← calculate edge importance(A) s f ← calculate feature importance(X) //sort nodes and edges according to their scores. features ← sort by importance(s f) edges ← sort by importance(s e) f ← features.ﬁrst, e ← edges.ﬁrst while | A (cid:48) − A | + | X (cid:48) − X | < ∆ do //decide which to perturb if s e [ e ] > s f [ f ] then ﬂip feature f f ← f.next else ﬂip edge e e = ← e.next end end return G (cid:48) In order to defend the adversarial targeted attacks on GCNs,we ﬁrst hypothesize that the GCNs are easily attacked dueto the fact that the GCN models strongly rely on the graphstructure and local aggregations. The model trained on theattacked graph therefore suffers from the attack surface of themodel crafted by the adversarial graph. As it is well knownthat adversarial attacks on deep learning systems are trans-ferable to models with similar architecture and trained on thesame dataset. Existing attacks on GCN models are success-ful as the attacked graphs are directly used to train the newmodel. Given that, one feasible defense is to make the adja-cency matrix trainable. If the edge weights are learned dur-ing the training process, they may evolve so that the graphbecomes different compared with the graph crafted by the ad-versary.We then verify this idea by making the edge weights train-able in GCN models. In CORA-ML dataset, we select a nodethat is correctly classiﬁed and has the highest prediction scorefor its ground-truth class. The adversarial graph was con-structed by using nettack [Z¨ugner et al. , 2018]. Without anydefense, the target node is misclassiﬁed with the conﬁdenceof 0.998 after the attack. Our defense initializes the weightsof the edges just as the adversarial graph. We then train theGCN model without making any additional modiﬁcations onthe loss functions or other parameters of the model. Inter-estingly, with such a simple defense method, the target nodeis correctly classiﬁed with high conﬁdence (0.912) after theattack.To explain why the defense works, we observe followingthe characteristics of the attacks: First, perturbing edges is more effective than modifying the features. This is consis-tent in all the attacks (i.e., FGSM, JSMA, nettack, and IG-JSMA). Feature-only perturbations generally fail to changethe predicted class of the target node. Moreover, the attackapproaches tend to favour adding edges over removing edges;Second, nodes with more neighbors are more difﬁcult to at-tack than those with less neighbors. This is also consistentwith the observations in [Z¨ugner et al. , 2018] that nodes withhigher degrees have higher classiﬁcation accuracy in both theclean and the attacked graphs.Last, the attacks tend to connect the target node to nodeswith different features and labels. We ﬁnd out that this is themost powerful way to perform attacks. We verify this obser-vation using CORA-ML dataset. To measure the similarityof the features, we use the Jaccard similarity score since thefeatures of CORA-ML dataset are bag-of-words. Note thatour defense mechanism is generic, while the similarity mea-sures may vary among different datasets. For the graphs withother types of features, such as numeric features, we may usedifferent similarity measures. Given two nodes u and v with n binary features, the Jaccard similarity score measures theoverlap that u and v share with their features. Each feature of u and v can either be 0 or 1. The total number of each com-bination of features for both u and v are speciﬁed as follows: M is the number of features where both u and v have avalue of 1. M is the feature number where the value of thefeature is 0 in node u but 1 in node v . Similarly, M is thetotal number of features which have a value of 1 in node u but 0 in node v . M represents the total number of featureswhich are 0 for both nodes. The Jaccard similarity score isgiven as J u,v = M M + M + M . (8)We train a two-layer GCN on the CORA-ML dataset andstudy the nodes that are classiﬁed correctly with high prob-ability (i.e., ≥ et al. , 2018]. For example,we enable both feature and edge attacks for nettack and at-tack the node 200 in the GCN model trained on CORA-MLdataset. Given the node degree of 3, the attack removes theedge 200 → J , = 0.113). Meanwhile, the attacks add edge 200 → → .0 0.2 0.4 0.6 0.8 1.0Jaccard similarity between connected nodes024681012 p e r c e n t a g e ( % ) (a) Clean p e r c e n t a g e ( % ) (b) AttackedFigure 1: Histograms for the Jaccard similarities between connectednodes before and after FGSM attack. correct class. Correspondingly, while removing edges, theattack tends to remove the edges connecting the nodes thatshare many similarities to the target node. The edge attacksare more effective due to the fact that adding or removingone edge affects all the feature dimensions during the aggre-gation. In contrast, modifying one feature only affects onedimension in the feature vector and the perturbation can beeasily masked by other neighbors of nodes with high degrees.Based on these observations, we make another hypothesisthat the above defense approach works because the model as-signs lower weights to the edges that connect the target nodeto the nodes sharing little feature similarity with it. To verifythis, we plot the learned weights and the Jaccard similarityscores of the end nodes for the edges starting from the targetnode (see Figure 2). Note that for the target node we choose,the Jaccard similarity scores between every neighbor of thetarget node and itself are larger than 0 in the clean graph. Theedges with zero similarity scores are all added by the attack.As expected, the model learns low weights for most of theedges with low similarity scores. N o r m a li z e d e dg e w e i g h t s / J a cc a r d s i m il a r i t y Learned Edge WeightsJaccard Similarity of End Nodes

Figure 2: The normalized learned edge weights and the Jaccard sim-ilarity scores for the end nodes of the edges. Each value of the x-axisrepresents an edge in the neighborhood of the target node.

To make the defense more efﬁcient, we do not even need touse learnable edge weights as the defense. Learning the edgeweights inevitably introduces extra parameters to the model,which may affect the its scalability and accuracy. A simpleapproach is potentially as effective based on the following:First, normal nodes generally do not connect to many nodesthat share no similarities with it; Second, the learning process essentially assigns low weights to the edges connecting twodissimilar nodes. We therefore propose a simple yet effectivedefense approach based on the following insight.Our defense approach is pre-processing based. We performa pre-processing on a given graph before training. We checkthe adjacency matrix of the graph and inspect the edges. Allthe edges that connect nodes with low similarity score (e.g.,= 0) are selected as candidates to remove. Although the cleangraph may also have a small number of such edges, we ﬁndthat removing these edges does little harm to the prediction ofthe target node. On the contrary, the removal of these edgesmay improve the prediction in some cases. This is intuitiveas aggregating features from nodes that differ sharply fromthe target often over-smooths the node representations. Infact, a recent study [Wu et al. , 2019] shows that the nonlin-earity and multiple weight matrices at different layers do notcontribute much to the predictive capabilities of GCN modelsbut introduce unnecessary complexity. [Z¨ugner et al. , 2018]uses a simpliﬁed surrogate model to achieve the attacks onGCN models for the same reason. Dai et al. [Dai et al. , 2018]brieﬂy introduces a defense method by dropping some edgesduring the training. They show this decreases the attack rateslightly. In fact, their method works only when the edges con-necting dissimilar nodes are removed. However, this defensefails to differentiate the useful edges from those need to beremoved, thus achieving sub-optimal defense performance.The proposed defense is computationally efﬁcient as it onlymakes one pass to the existing edges in the graph, thus havingthe complexity of O ( N ) where N is the number of edges. Forlarge graphs, calculating the similarity scores can be easilyparallelized in implementation. We use the widely used CORA-ML, CITESEER [Bojchevskiand G¨unnemann, 2018] and Polblogs [Adamic and Glance,2005] datasets. The overview of the datasets is listed below.

Table 1: Statistics of the datasets.

Dataset Nodes Features EdgesCORA-ML 2708 1433 5429Citeseer 3327 3703 4732Polblogs 1490 - 19025

We split each graph in labeled (20%) and unlabeled nodes(80%). Among the labeled nodes, half of them is used fortraining while the rest half is used for validation. For thepolblogs dataset, since there are no feature attributes, we setthe attribute matrix to an identity matrix.

As mentioned, due to the transductive setting, the models arenot regarded as ﬁxed while attacking. After perturbing eitherfeatures or edges, the model is retrained for evaluating the at-tack effectiveness. To verify the effectiveness of the attack,we select the nodes with different prediction scores. Speciﬁ-cally, we select in total 40 nodes which contain the 10 nodeswith top scores, 10 nodes with the lowest scores and 20 ran-domly selected nodes. We compare the proposed IG-JSMAith several baselines including random attacks, FGSM, andnettack. Note that for the baselines, we conducted direct at-tacks on the features of the target node or the edges directlyconnected to the target node. Direct attacks achieve muchbetter attacks so that can act as stronger baselines.To evaluate how effective is the attack, we use classiﬁca-tion margins as the metric. For a target node v , the classi-ﬁcation margin of v is Z v,c − max c (cid:48) (cid:54) = c Z v,c (cid:48) where c is theground truth class, Z v,c is the probability of class c given tothe node v by the graph model. A lower classiﬁcation mar-gin indicates better attack performance. Figure 3 shows theclassiﬁcation margins of nodes after re-training the model onthe modiﬁed graph. We found that IG-JSMA outperforms thebaselines. More remarkably, IG-JSMA is quite stable as theclassiﬁcation margins have much less variance. Just as statedin [Z¨ugner et al. , 2018], the vanilla gradient-based methods,such as FGSM are not able to capture the actual change ofloss for discrete data. Similarly, while used to describe thesaliency map, the vanilla gradients are also not accurate.To demonstrate the effectiveness of IG-JSMA, we alsocompare it with the original JSMA method where the saliencymap is computed by the vanilla gradients. Table 2 comparesthe ratio of correctly classiﬁed nodes after the JSMA and IG-JSMA attacks for 100 random sampled nodes, respectively.A lower value is better as more nodes are misclassiﬁed. Wecan see that IG-JSMA outperforms JSMA attack. This showsthat the saliency map given by integrated gradients approxi-mate the change patterns of the discrete features/edges better. Table 2: The ratio of correctly classiﬁed nodes under JSMA andIG-JSMA attacks.

Dataset CORA Citeseer PolblogsJSMA 0.04 0.06 0.04IG JSMA 0.00 0.01 0.01

Figure 4 gives an intuitive example about this. For thegraph, we conducted evasion attack where the parameters ofthe model are kept ﬁxed as the clean graph. For a target nodein the graph, given a two-layer GCN model, the predictionof the target node only relies on its two-hop ego graph. Wedeﬁne the importance of a feature/an edge as follows: Fora target node v , The brute-force method to measure the im-portance of the nodes and edges is to remove one node or oneedge at a time in the graph and check the change of predictionscore of the target node.Assume the prediction score for the winning class c is p c .After setting entry A ij of the adjacency matrix from 1 to 0,the p c changes to p (cid:48) c . We deﬁne the importance of the edgeby ∆ p c = p (cid:48) c − p c . To measure the importance of a node, wecould simply remove all the edges connected to the node andsee how the prediction scores change. The importance valuescan be regarded as the ground truth discrete gradients.Both vanilla gradients and integrated gradients are approx-imations of the ground truth importance scores. The nodeimportance can be approximated by the sum of the gradientsof the prediction score w.r.t all the features of the node as wellas the gradients w.r.t to the entries of the adjacency matrix. In Figure 4, the node color represents the class of the node.Round nodes indicate positive importance scores while dia-mond nodes indicate negative importance score. The nodesize indicates the value of the positive/negative importancescore. A larger node means higher importance. Similarly, rededges are the edges which have positive importance scoreswhile blue ones have negative importance scores. Thickeredges correspond to more important edges in the graph andthe pentagram represents the target node in the attack.Figure 4a, 4b and 4c show the node importance resultsof brute-force, vanilla gradients and integrated gradients ap-proach respectively ( In the following, we study the effectiveness of the pro-posed defense technique under different settings. We usethe CORA-ML and Citeseer datasets that have features forthe nodes. We ﬁrst evaluate whether the proposed defensemethod affects the performance of the model. Table 4 showsthe accuracy of the GCN models with/without the defense.

Table 3: Accuracy (%) of models on clean data with/without theproposed defense. We remove the outliers (i.e., accu ≤ / for CORA-ML/Citeseer) due to the high variance. Dataset w/o defensde w/ defenseCORA-ML 80.9 ± ± ± ± We ﬁnd that the proposed defense was cheap to use as thepre-processing of our defense method almost makes no neg-ative impact on the performance of the GCN models. More-over, the time overhead is negligible. Enabling defense onthe GCNs models for the two datasets increases the run timeof training by only 7.52s and 3.79s, respectively. Note thatrun time results are obtained using our non-optimized Pythonimplementation.For different attacks, we then evaluate how the classiﬁ-cation margins and accuracy of the attacked nodes changewith/without the defense. As in the experiments of trans-ductive attack, we select 40 nodes with different predictionscores. The statistics of the selected nodes are the follow-ings: For CORA-ML and Citeseers datasets, we train theGCN models on the clean graphs. The selected nodes haveclassiﬁcation margins of . ± . and . ± . ,respectively. and FGSM netattack IG_JSMAattack−1.00−0.75−0.50−0.250.000.250.500.751.00 c l a ss i f i c a t i o n m a r g i n (a) CORA rand FGSM netattack IG_JSMAattack−1.00−0.75−0.50−0.250.000.250.500.751.00 c l a ss i f i c a t i o n m a r g i n (b) Citeseer rand FGSM netattack IG_JSMAattack−1.00−0.75−0.50−0.250.000.250.500.751.00 c l a ss i f i c a t i o n m a r g i n (c) polblogsFigure 3: The classiﬁcation margin under different attack techniques.(a) Ground Truth (b) Vanilla Gradients (c) Integrated GradientsFigure 4: The approximations of node/edge importance.Table 4: Classiﬁcation margins and error rates (%) for the GCNmodels with different attacks. Dataset Attack CM (w/ attack) Accu (w/ attack)w/ defense no defense w/ defense no defenseCORA FGSM 0.299 ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± The results are given in Table 4. First of all, without de-fenses, most of the selected nodes are misclassiﬁed as the ac-curacy is always under 0.05 for any attacks. By enabling thedefense approach, the accuracy can be signiﬁcantly improvedregardless of the attack methods. This, to some degree, showsthat all the attack methods seek similar edges to attack and theproposed defense approach is attack-independent. Althougha few nodes were still misclassiﬁed with the defense, the pre- diction conﬁdence for their winning class is much lower sincethe classiﬁcation margins increase. Therefore, it becomesharder to fool the users because manual checks are gener-ally involved in predictions with low conﬁdence. Overall, theproposed defense is effective even though we only remove theedges that connect nodes with Jaccard similarity score of 0.

Graph neural networks (GNN) signiﬁcantly improved the an-alytic performance on many types of graph data. However,like deep neural networks in other types of data, GNN suf-fers from robustness problems. In this paper, we gave in-sight into the robustness problem in graph convolutional net-works (GCN). We proposed an integrated gradients based at-tack method that outperformed existing iterative and gradient-based techniques in terms of attack performance. We also an-alyzed attacks on GCN and revealed the robustness issue wasrooted in the local aggregation in GCN. We give an effec-tive defense method to improve the robustness of GCN mod-els. We demonstrated the effectiveness and efﬁciency of ourmethods on benchmark data. eferences [Adamic and Glance, 2005] Lada A Adamic and NatalieGlance. The political blogosphere and the 2004 us elec-tion: divided they blog. In

Proceedings of the 3rd inter-national workshop on Link discovery , pages 36–43. ACM,2005.[Allen, 1970] Frances E Allen. Control ﬂow analysis. In

ACM Sigplan Notices , volume 5, pages 1–19. ACM, 1970.[Bhagat et al. , 2011] Smriti Bhagat, Graham Cormode, andS Muthukrishnan. Node classiﬁcation in social networks.In

Social network data analytics , pages 115–148. Springer,2011.[Bojchevski and G¨unnemann, 2018] Aleksandar Bojchevskiand Stephan G¨unnemann. Deep gaussian embedding of at-tributed graphs: Unsupervised inductive learning via rank-ing.

Proceedings of ICLR’18 , 2018.[Bruna et al. , 2013] Joan Bruna, Wojciech Zaremba, ArthurSzlam, and Yann LeCun. Spectral networks and lo-cally connected networks on graphs. arXiv preprintarXiv:1312.6203 , 2013.[Cao et al. , 2016] Shaosheng Cao, Wei Lu, and QiongkaiXu. Deep neural networks for learning graph represen-tations. In

Proceedings of AAAI’16 , 2016.[Dai et al. , 2018] Hanjun Dai, Hui Li, Tian Tian, Xin Huang,Lin Wang, Jun Zhu, and Le Song. Adversarial attack ongraph structured data.

Proceedings of ICML’18 , 2018.[Edwards and Xie, 2016] Michael Edwards and XianghuaXie. Graph based convolutional neural network.

Proceed-ings of BMVC’16 , 2016.[Goodfellow et al. , 2015] Ian J Goodfellow, JonathonShlens, and Christian Szegedy. Explaining and harnessingadversarial examples.

Proceedings of ICLR’15 , 2015.[Hart, 1989] Sergiu Hart. Shapley value. In

Game Theory ,pages 210–216. Springer, 1989.[Henaff et al. , 2015] Mikael Henaff, Joan Bruna, and YannLeCun. Deep convolutional networks on graph-structureddata.

Proceedings of NeurIPS’15 , 2015.[Ian J. Goodfellow, 2014] Christian Szegedy Ian J. Goodfel-low, Jonathon Shlens. Explaining and harnessing adver-sarial examples. arXiv preprint arXiv:1412.06572 , 2014.[Kipf and Welling, 2017] Thomas N Kipf and Max Welling.Semi-supervised classiﬁcation with graph convolutionalnetworks.

Proceedings of ICLR’17 , 2017.[Lundberg and Lee, 2016] Scott Lundberg and Su-In Lee.An unexpected unity among methods for interpretingmodel predictions.

Proceedings of NeurIPS’16 , 2016.[Newman et al. , 2002] Mark EJ Newman, Duncan J Watts,and Steven H Strogatz. Random graph models of socialnetworks.

Proceedings of the National Academy of Sci-ences , 99(suppl 1):2566–2572, 2002.[Papernot and McDaniel, 2018] Nicolas Papernot andPatrick McDaniel. Deep k-nearest neighbors: Towardsconﬁdent, interpretable and robust deep learning. arXivpreprint arXiv:1803.04765 , 2018. [Papernot et al. , 2016] Nicolas Papernot, Patrick McDaniel,Somesh Jha, Matt Fredrikson, Z Berkay Celik, and Anan-thram Swami. The limitations of deep learning in adver-sarial settings. In

Security and Privacy (EuroS&P), 2016IEEE European Symposium on , pages 372–387. IEEE,2016.[Ron and Shamir, 2013] Dorit Ron and Adi Shamir. Quan-titative analysis of the full bitcoin transaction graph. In

International Conference on Financial Cryptography andData Security , pages 6–24. Springer, 2013.[Shaham et al. , 2018] Uri Shaham, James Garritano, YutaroYamada, Ethan Weinberger, Alex Cloninger, XiuyuanCheng, Kelly Stanton, and Yuval Kluger. Defendingagainst adversarial images using basis functions transfor-mations. arXiv preprint arXiv:1803.10840 , 2018.[Sundararajan et al. , 2017] Mukund Sundararajan, AnkurTaly, and Qiqi Yan. Axiomatic attribution for deep net-works.

Proceedings of ICML’17 , 2017.[Tram`er et al. , 2018] Florian Tram`er, Alexey Kurakin, Nico-las Papernot, Ian Goodfellow, Dan Boneh, and Patrick Mc-Daniel. Ensemble adversarial training: Attacks and de-fenses.

Proceedings of ICLR’18 , 2018.[Veliˇckovi´c et al. , 2018] Petar Veliˇckovi´c, Guillem Cucurull,Arantxa Casanova, Adriana Romero, Pietro Lio, andYoshua Bengio. Graph attention networks.

Proceedingsof ICLR’18 , 2018.[Wang et al. , 2018] Xiaoyun Wang, Joe Eaton, Cho-JuiHsieh, and Felix Wu. Attack graph convolutional networksby adding fake nodes. arXiv preprint arXiv:1810.10751 ,2018.[Wu et al. , 2019] Felix Wu, Tianyi Zhang, Amauri HolandaSouza Jr., Christopher Fifty, Tao Yu, and Kilian Q. Wein-berger. Simplifying graph convolutional networks. arXivpreprint arXiv:1902.07153 , 2019.[Xie et al. , 2018] Cihang Xie, Jianyu Wang, Zhishuai Zhang,Zhou Ren, and Alan Yuille. Mitigating adversarial effectsthrough randomization.

Proceedings of ICLR’18 , 2018.[Xu et al. , 2013] Huan Xu, Yujiu Yang, Liangwei Wang, andWenhuang Liu. Node classiﬁcation in social networkvia a factor graph model. In

Paciﬁc-Asia Conference onKnowledge Discovery and Data Mining , pages 213–224.Springer, 2013.[Xu et al. , 2018] Weilin Xu, David Evans, and Yanjun Qi.Feature squeezing: Detecting adversarial examples in deepneural networks.

Proceedings of NDSS’18 , 2018.[Yuan et al. , 2019] Xiaoyong Yuan, Pan He, Qile Zhu, andXiaolin Li. Adversarial examples: Attacks and defensesfor deep learning.

IEEE transactions on neural networksand learning systems , 2019.[Z¨ugner et al. , 2018] Daniel Z¨ugner, Amir Akbarnejad, andStephan G¨unnemann. Adversarial attacks on neural net-works for graph data. In