Boosting Black-Box Attack with Partially Transferred Conditional Adversarial Distribution
Yan Feng, Baoyuan Wu, Yanbo Fan, Li Liu, Zhifeng Li, Shutao Xia
EEfficient Black-Box Adversarial Attack Guided by theDistribution of Adversarial Perturbations
Yan Feng , Baoyuan Wu , Yanbo Fan Zhifeng Li , Shutao Xia Tsinghua Shenzhen International Graduate School, Tsinghua University, China Tencent AI Lab, China [email protected] ; [email protected] Abstract
This work studied the score-based black-box adversarial attack problem, whereonly a continuous score is returned for each query, while the structure and pa-rameters of the attacked model are unknown. A promising approach to solve thisproblem is evolution strategies (ES), which introduces a search distribution tosample perturbations that are likely to be adversarial. Gaussian distribution iswidely adopted as the search distribution in the standard ES algorithm. However, itmay not be flexible enough to capture the diverse distributions of adversarial per-turbations around different benign examples. In this work, we propose to transformthe Gaussian-distributed variable to another space through a conditional flow-basedmodel, to enhance the capability and flexibility of capturing the intrinsic distribu-tion of adversarial perturbations conditioned on the benign example. Besides, tofurther enhance the query efficiency, we propose to pre-train the conditional flowmodel based on some white-box surrogate models, utilizing the transferability ofadversarial perturbations across different models, which has been widely observedin the literature of adversarial examples. Consequently, the proposed method couldtake advantages of both query-based and transfer-based attack methods, to achievesatisfied attack performance on both effectiveness and efficiency. Extensive experi-ments of attacking four target models on CIFAR-10 and Tiny-ImageNet verify thesuperior performance of the proposed method to state-of-the-art methods.
It has been well known [2, 13] that adversarial examples are the serious threat to deep neural networks.Although massive attack methods have been developed, most of them assume that all information ofthe attacked model is accessible, such that the gradient can be easily computed to generate adversarialperturbations, which is called white-box adversarial attack. However, a more practical setting inreal world scenarios is that the structure and parameters of the attacked model is inaccessible to theattacker, while only the feedback of each query is provided, which is called black-box adversarialattack. Further, if the feedback is only the discrete label, then it is dubbed decision-based black-boxattack; if the feedback is the continuous score ( e.g. , the posterior probability w.r.t. each class), then itis dubbed score-based black-box attack, which is also the focus of this work.The score-based black-box attack can be formulated as a derivative-free optimization problem. Apromising derivative-free optimization approach is evolution strategies (ES) [39]. The core ideaof the ES-based black-box attack is introducing a search distribution to model the distribution ofadversarial perturbations. Given the search distribution, several perturbations are sampled to obtainnew queries to the attacked model; then, the query feedbacks are adopted to update the searchdistribution to get better values of the black-box objective function. The Gaussian distribution iswidely used as the search distribution in many ES methods [46, 42, 17]. However, we don’t think *This work was done when Yan Feng was an intern at Tencent AI Lab. Correspondence to: Baoyuan Wu andShutao Xia. a r X i v : . [ c s . CR ] J u l hat the simple Gaussian distribution is a good choice for modeling the distribution of adversarialperturbations. Because it ignores the close dependency between adversarial examples and benignexamples. Considering that even the loss landscapes around different benign examples are quitediverse, it is difficult to imagine that the adversarial perturbations around different benign examplescould follow one identical distribution. A recent work in ES [12] proposed to transform the Gaussian-distributed variable to another space through a reversible flow-based generative model, such thatthe modeling capability for probabilistic distributions is enhanced. However, the flow-based modeldoesn’t take into account the variation due to benign examples. Inspired by that work, we propose toadopt a conditional generative flow model, called c-Glow , which is expected to be flexible enough tocapture the complex distribution of adversarial perturbations conditioned on diverse benign examples.However, due to the additional parameters of c-Glow, it may require more queries to learn a goodapproximation of the distribution of adversarial perturbations, while the query is the main cost inthe black-box attack. To accelerate the attack efficiency, we propose to pre-train the c-Glow modelbased on some white-box surrogate models, according to the observation [34, 35, 29] that adversarialexamples generated for one model may also be adversarial for another model, dubbed adversarialtransferability . Specifically, we propose to minimize the K-L divergence between the c-Glow modeland the energy-based model w.r.t. the adversarial loss, based on surrogate models. Consequently, theproposed method utilizes the advantages from both query-based and transfer-based attack methods,with the expectation to achieve high adversarial success rate and high attack efficiency simultaneously.The main contributions of this work are three-fold. We propose to utilize the conditional Glowmodel coupled with Gaussian as the search distribution in the ES algorithm for solving the score-based black-box adversarial attack problem. We propose to pre-train the c-Glow model viaapproximating the energy-based model of the perturbation distribution of surrogate models. Extensive experiments on benchmark datasets demonstrate the superiority of the proposed attackmethod to several state-of-the-art methods.
Here we only focus on black-box adversarial attack methods, which can be generally partitioned totwo categories, including decision-based and score-based adversarial attacks.
Decision-based Adversarial Attacks.
For decision-based attacks, an attacker can only acquirethe output label of the target model. A boundary search method [3] randomly sampled candidateperturbations following the normal distribution, and the perturbation with the lower objective isupdated as the new solution. An evolution based search method [10] utilized the history queries toapproximate a Gaussian distribution as the search distribution. [6] formulated the decision-basedattack problem as a continuous optimization by alternatively optimizing the perturbation magnitudeand perturbation direction. This method was further accelerated in [7] by only estimating the signof gradient. HopSkipJumpAttack [5] developed an iterative search algorithm by utilizing binaryinformation at the decision boundary to estimate the gradient. It is further improved in [27] bylearning a more representative subspace for perturbation sampling. Based on the observation of thelow curvature of the decision boundary around adversarial examples, [30] approximated the gradientusing the gradients of neighbour points; [36] locally approximated the decision boundary with ahyper-plane, and searched the closest point on the hyper-plane to the benign input as the perturbation.
Score-based Adversarial Attacks.
There are generally three sub-categories of score-based black-box attacks, including transfer-based attack, query-based attack and their combination .
1) Transfer-based methods attempt to generate adversarial perturbations utilizing the information of white-boxsurrogate models. For example, [34] proposed to firstly train a white-box surrogate model with adataset labeled by querying the target model, then utilize the gradient of the trained surrogate model togenerate adversarial perturbations to attack the target model. [29] found that adversarial perturbationsgenerated on an ensemble of source models show good attack performance on the target model.Although transfer-based attack methods are very efficient, the attack performance is often lowerthan query-based attack methods.
2) Query-based methods solve the black-box optimization byiteratively querying the target model. SimBA [14] randomly sampled a perturbation from a predefinedorthonormal basis, and then either added or subtracted this perturbation to the attacked image. [22]utilized the natural evolution strategy (NES) [45, 46] method to minimize a continuous expectation ofthe black-box objective function based on a search distribution. Bandit [23] improved the NES methodby incorporating data and temporal priors into the gradient estimation. SignHunter [1] adopted the2radient sign rather than the gradient as the search direction. Query-based methods often achievebetter attack performance than transfer-based methods, but require more queries.
3) Combinationmethods try to take advantages of both transfer-based and query-based methods, to achieve highattack success rate and high query efficiency simultaneously. The general idea is firstly learning sometypes of priors from surrogate models, then incorporating these priors into the query-based method toguide the attack procedure for the target model. For example, the prior used in N -Attack [28] is themean parameter of the search distribution in NES, which is learned using a regression neural networktrained based on surrogate models. Methods in [8] and [15] utilized the gradient of surrogate modelsas the gradient prior. The TREMBA method [21] treated the projection from a low-dimensional spaceto the original space as the prior, such that the perturbation could be search in the low-dimensionalspace. The hybrid method [43] directly adopted adversarial examples from surrogate models as theprior, but surrogate models could be updated using the returned prediction by the target model. Theproposed method also belongs to this type, but the prior we adopted is the perturbation distribution. We denote a classification model F : X → Y , with X being the input space, n = |X | indicating thedimension of the input space, and Y being the output space, Given a benign example x ∈ X andits ground-truth label y ∈ Y , F ( x , y ) indicates the classification score w.r.t. the y -th label. In thiswork, we adopt the logit as the classification score. The goal of adversarial attack is finding a smallperturbation η within a (cid:96) p -ball, i.e. , B (cid:15) = { η | η ∈ R n , (cid:107) η (cid:107) p ≤ (cid:15) } ( (cid:15) > being a attacker definedscalar, which will be specified in experiments), such that the prediction of x + η is different with theprediction of x . Specifically, the untargeted attack problem is formulated as min η L uadv ( η , x , y ) = max (cid:18) , F ( x + η , y ) − max j (cid:54) = y F ( x + η , j ) (cid:19) + δ (cid:0) η ∈ B (cid:15) (cid:1) , (1)where δ ( a ) = 0 if a is true, otherwise δ ( a ) = + ∞ . The targeted attack problem is formulated as min η L taradv ( η , x , t ) = max (cid:18) , max j (cid:54) = t F ( x + η , j ) − F ( x + η , t ) (cid:19) + δ (cid:0) η ∈ B (cid:15) (cid:1) . (2)Note that both L uadv ( η , x , y ) and L taradv ( η , x , t ) are non-negative. If is achieved, then the corre-sponding η is a successful adversarial perturbation. For clarity, hereafter we use L adv ( η , x ) torepresent the untargeted or targeted attack when there is no need to distinguish between them.In the case of score-based black-box adversarial attacks , the structure and parameters of the attackedmodel F is inaccessible to the attacker, while only the output score F ( x , y ) is returned for eachquery x . Consequently, the gradient of the attack objective L adv w.r.t. the perturbation η cannot bedirectly computed, which is the main challenge of black-box adversarial attacks. Algorithm 1
Evolution strategies for score-based black-box adversarial attacks input:
The black-box attack objective L adv ( · , x ) , benign input x , the ground-truth label y or the target label t ,search distribution π , population size k . repeat (Sampling) : sample k perturbations η , ..., η k ∼ π (Evaluation) : evaluate L adv ( η , x ) , ..., L adv ( η k , x ) (Update) : update π to increase the probability of producing perturbations of potentially betterobjective values, i.e. , lower L adv ( · , x ) until converge One promising approach for the black-box optimization is evolutionary strategies (ES) [37]. The mainidea is introducing a search distribution π to sample some perturbations η to obtain the better valuesof the black-box objective function, i.e. , the smaller L adv in the score-based black-box adversarialattack problem. The general procedure of ES for the score-based black-box adversarial attack problemis summarized in Algorithm 1. Many variants of ES have been developed, such as natural ES (NES)345, 46], co-variance matrix adaptation ES (CMA-ES) [17], self-adaptation ES (SA-ES) [18, 39], etc .The main difference among these variants is the update step of the search distribution π . Amongthese variants, CMA-ES has been considered as one of the state-of-the-art variants in ES, especiallyfor the optimization problem in high-dimensional space.The basic idea of CMA-ES is to update the parameters of π by maximizing the weighted averageof log-likelihoods (cid:80) mi =1 w i log P π ( η i : k ) , where log P π ( η ) denotes the log-likelihood of η from thedistribution π , where m, w i , η i : k will be defined soon later. Consequently, it is more likely to sampleperturbations of better values of the objective function, i.e. , lower values of L adv ( · , x ) . The searchdistribution π used in CMA-ES is set to Gaussian, i.e. , π := N ( µ , σ · C ) . Specifically, given the Sampling and
Evaluation step in Algorithm 1, the
Update step consists the following sequential parts: • Update µ : µ (cid:48) = µ , µ ← m (cid:88) i =1 w i · η i : k , (3)where η i : k indicates the i -th best perturbation out of k sampled perturbations, i.e. , L adv ( η k , x ) ≤L adv ( η k , x ) ≤ . . . L adv ( η k : k , x ) , and m ≤ k, (cid:80) mi =1 w i = 1 are hyper-parameters. • Update σ : p σ ← (1 − c σ ) p σ + (cid:112) c σ (2 − c σ ) µ eff C − ( µ − µ (cid:48) σ ) ,σ ← σ × exp (cid:18) c σ d σ (cid:18) (cid:107) p σ (cid:107) E (cid:107)N ( , I ) (cid:107) − (cid:19)(cid:19) , (4)where E (cid:107)N ( , I ) (cid:107) = √ n +12 ) / Γ( n ) with Γ( · ) being the gamma function [9]. • Update C : p c ← (1 − c σ ) p c + h σ (cid:112) c c (2 − c c ) µ eff ( µ − µ (cid:48) σ ) , ¯ w i = w i × (1 if w i ≥ else k/ (cid:107) C − ( µ − µ (cid:48) σ ) (cid:107) ) , C ← C + c p c p (cid:62) c + c µ m (cid:80) i =1 ¯ w i ( µ − µ (cid:48) σ )( µ − µ (cid:48) σ ) (cid:62) . (5)We refer the readers to [17] for the detailed meanings of p σ , p c , as well as the empirical settings ofall hyper-parameters ( m, w i =1 ,...,m , µ eff , d σ , c σ , c µ , c c , c ) . Furthermore, to reduce the number ofparameters, we simply adopt the diagonal co-variance matrix C , such that the search distribution canbe represented as π := N ( µ , diag ( σ )) with σ = [ σ ; σ ; . . . ; σ n ] . In most variants of ES, a simple distribution is adopted as the search distribution, such as Gaussiandistribution in CMA-ES. This simple setting has shown its effectiveness on solving many black-boxoptimization problems. However, it may be unsuitable for the black-box adversarial attack problem.Each adversarial perturbation is dependent on its corresponding benign example. Considering thediversity of benign examples, Gaussian distribution may not be capable and flexible enough toapproximate the complex perturbation distributions conditioned on different benign examples.
The c-Glow Model.
Inspired by the recent development in the literature of evolution strategies [12],we propose to replace the widely used Gaussian distribution by conditional generative flow modelscoupled with a Gaussian distribution as the search distribution π . Specifically, we adopt the onerecently proposed model, dubbed the conditional Glow (c-Glow) model [31]. It can be formulated asan inverse function g x , φ : z → η , and there exists g − x , φ : η → z . φ indicates the model parameter;the condition variable x corresponds to the benign example; z ∈ R |X | is a latent variable following asimple distribution (specified later); η ∈ R |X | represents the perturbation variable. Further, g φ , x canbe decomposed to the composition of M inverse functions, as follows: η = g x , φ ( z ) = g x , φ ( g x , φ ( ... ( g x , φ M ( z )) ... )) , (6)where φ = ( φ , . . . , φ M ) , and φ i indicates the parameter of g x , φ i ( · ) . Note that each function canbe implemented by a transformation layer. Then, the c-Glow model can be represented by a neuralnetwork with M layers, and we set M = 3 . Each layer consists of a conditional actnorm module,4ollowed by an conditional × convolutional module and a conditional coupling module. Due tothe space limit, the detailed definition of g x , φ i ( · ) will be presented in the supplementary material . Conditional Distribution with the c-Glow Model. If z = µ + σ (cid:12) z with z ∼ N ( , I ) , where (cid:12) is the entry-wise product and I indicates the identity matrix, utilizing the change of variables [44]of Eq. (6), then the conditional likelihood of η is formulated as log P θ ( η | x ) = log P , ( z ) + M +1 (cid:88) i =1 log (cid:12)(cid:12)(cid:12)(cid:12) det (cid:18) ∂g − x , φ i ( r i − ) ∂ r i − (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) , (7)where θ = ( φ , µ , σ ) , r i = g − φ i , x ( r i − ) , r = η , r M = z and r M +1 = z . det( · ) indicates thedeterminant of a matrix. P , ( · ) indicates the probability density function of the multi-variant normaldistribution N ( , I ) . Note that in the above equation, for simplicity, we treat the transformation z = µ + σ (cid:12) z as the M + 1 layer of the c-Glow model, i.e. , g x , φ M +1 ( z ) = µ + δ (cid:12) z with φ M +1 = ( µ , δ ) , which is also invertible and independent with x . Thus, we also have η = g x , θ ( z ) . Parameter Learning.
Consequently, we have π := P θ ( η | x ) . Compared to N ( µ , diag ( σ )) , thisnew search distribution P θ ( η | x ) is not only more capable to model the perturbation distribution dueto the projection from the c-Glow model, but also more flexible for different benign examples dueto its dependency on x . However, in order to unleash these potential advantages, a good c-Glowmodel is required. Similar to [12], one feasible approach is to alternatively update ( µ , σ ) and φ when maximizing the weighted average of log-likelihoods in the update step of CMA-ES (see Section3.2). However, it may require more queries to achieve good states of φ . Instead, we adopt a simpleapproach with two sequential steps, including: firstly pre-training the c-Glow model (includingboth ( µ , σ ) and φ ) in a different way (specified in the next sub-section); given the pre-trainedmapping parameter φ , optimizing ( µ , σ ) using the standard CMA-ES algorithm (see Section 3.2). Given a surrogate model F s : X → Y with the same input and output space with the target model F , we can adopt any off-the-shelf white-box adversarial attack method to generate adversarialperturbations. The adversarial loss of the untargeted attack is formulated as follows L uadv,s ( η , x , y ) = max (cid:18) F s ( x + η , y ) − max j (cid:54) = y F s ( x + η , j ) + ξ, (cid:19) + δ (cid:0) η ∈ B (cid:15) (cid:1) , (8)where the slack variable ξ ≥ is introduced to enhance the flexibility (its value is specified inexperiments), and B (cid:15) has been defined in Eq. (1). Based on L uadv,s ( η , x , y ) , we propose to utilizethe energy based model to define the distribution of the untargeted adversarial perturbation η aroundthe benign example ( x , y ) , as follows: P us ( η | x , y ) = exp (cid:0) − β · L uadv,s ( η , x , y ) (cid:1)(cid:82) η ∈ B (cid:15) exp (cid:0) − β · L uadv,s ( η , x , y ) (cid:1) d η . (9)Note that given F s , the normalization term ( i.e. , the denominator) is an intractable constant. Thus,we simply omit it hereafter, and set log P us ( η | x , y ) ≈ − λ · L uadv,s ( η , x , y ) , (10)where β, λ are two positive hyper-parameters. Later, we will use Eq. (10) to train the c-Glow model,and we only need to tune λ (see experiments). For the targeted attack of F s , the adversarial loss L taradv,s ( η , x , t ) ( t is the target label), as well as the perturbation distribution P tars ( η | x , t ) , can be de-fined similarly. They are not presented here for clarity. Hereafter, we will use L adv,s ( η , x ) , P s ( η | x ) to represent the adversarial loss and the perturbation distribution of F s , respectively, if there is noneed to distinguish between untargeted and targeted attacks. Recall the ES algorithm for adversarial attacks (see Algorithm 1), if the search distribution P θ ( η | x ) (see Eq. (7)) is exactly the perturbation distribution of the attacked model F , then the attack willbe very efficient. However, in the scenario of black-box attacks, it is infeasible to explicitly model5he perturbation distribution of F like that of F s , which requires a tremendous number of queries.Although the c-Glow model is capable to capture the perturbation distribution of F , it may require lotsof queries to achieve a good state of its parameters. Thus, we resort to the adversarial transferability[34, 29, 35] that the adversarial example generated for one model may be also adversarial for anothermodel. Inspired by this observation, we assume that there is also somewhat similarity between theperturbation distributions of different models. Thus, we propose to pre-train the c-Glow model byminimizing the KL divergence [26] between P s ( η | x ) and P θ ( η | x ) . Without loss of generality, herewe only consider one benign example x , then the training objective is formulated as min θ L = E P s ( η | x ) (cid:20) log P s ( η | x ) P θ ( η | x ) (cid:21) . (11)We adopt the gradient-based method to optimize this problem. The gradient of L w.r.t. θ is presentedin Theorem 1. The proof technique is inspired by that of AGAS [40] and CFG-GAN [24]. Due to thespace limit, the proof of Theorem 1 will be presented in the supplementary material . Note that eachterm within the expectation in Eq. (12) is tractable, thus ∇ θ L can be easily computed. In practice, K instantiations of z are sampled from N ( , I ) , then ∇ θ L is empirically estimated as the averagevalue over these K instantiations. K will be specified in experiments. Theorem 1.
Utilizing the definition η = g x , θ ( z ) and z ∼ N ( , I ) (see Section 4.1), and definingthe term D ( η , x ) = log P s ( η | x ) P θ ( η | x ) , then the gradient of L w.r.t. θ is computed as follows ∇ θ L = − E z ∼N ( , I ) (cid:20) exp D ( η , x ) ·∇ η D ( η , x ) (cid:62) (cid:12)(cid:12) η = g x , θ ( z ) · ∇ θ g x , θ ( z ) (cid:21) , (12) = − E z ∼N ( , I ) (cid:20) exp − λ ·L adv,s ( η , x ) P θ ( η | x ) · ∇ η D ( η , x ) (cid:62) (cid:12)(cid:12) η = g x , θ ( z ) · ∇ θ g x , θ ( z ) (cid:21) , where ∇ η D ( η , x ) = ∇ η (cid:2) − λ · L adv,s ( η , x ) − log P θ ( η | x ) (cid:3) . Following the setting in [11], we choose 1,000 images randomlyfrom the testing set of CIFAR-10 [25] and the validation set of Tiny-ImageNet [38] for evaluation,respectively. For both datasets, we normalize the input to [0 , and set the maximum distortionof adversarial images to (cid:15) = 8 / . The maximum number of queries is set to 10,000 for bothuntargeted and targeted attacks. As did in prior works [15, 32], we adopt the attack success rate (ASR),the mean and median number of queries of successful attacks to evaluate the attack performance. Target and Surrogate Models.
We consider four target models: VGG-15 [41], ResNet-Preact-110[19], DenseNet-BC-110 [20] and PyramidNet-110 [16]. The implementations of these models aredownloaded from a GitHub repository . We conduct the standard training on the training set of eachdataset to obtain the checkpoints of these target models. The top-1 error rates of these four targetmodels are (7.29%, 6.47%, 4.69%, 3.92%) on the standard testing set of CIFAR-10, and (28.33%,26.82%, 26.38%, 25.26%) on the standard validation set of Tiny-ImageNet, respectively. On eachdataset, when attacking one target model, we treat the other three as surrogate models. For clarity,hereafter we use VGG, ResNet, DenseNet, PyramidNet to represent these target models. Compared methods.
Several state-of-the-art score-based black-box attack methods are compared,including Bandits [23], SimBA [14], Subspace [15], P-RGF [8], TREMBA [21], MetaAttack [11]and Signhunter [1]. All of them are implemented using the source codes provided by their authors.
Implementation Details. 1) Pre-training of the c-Glow model is conducted on the standard trainingset of CIFAR-10 and Tiny-ImageNet, respectively. The adversarial loss L adv,s ( η , x ) in Eq. (12)is specified as the average of CW-L2 losses [4] w.r.t. three surrogate models, and ξ is set as 20.We adopt the normalized gradient descent (NGD) [33] method to achieve the stable training. Thebatch-size is set as 2 and the learning rate is 0.0002. We sample K = 32 instantiations of z for eachiteration of training. For finetuing the hyper-parameter λ , we randomly split 10% of the training set ofCIFAR-10 and Tiny-ImageNet as validation set, and search λ within the range { , , ..., } . The https://github.com/hysts/pytorch_image_classification % ), mean and median number of queries of untargeted attack andtargeted attack (target class being ) on CIFAR-10. The best and second-best values among methodsthat achieve more than 90% ASR are highlighted in bold and underline, respectively. Target Model → ResNet DenseNet VGG PyramidNetAttack Method ↓ ASR Mean Median ASR Mean Median ASR Mean Median ASR Mean MedianUntargetedAttack Bandits [23] 90.8 193.4 88.0 96.0 206.3 96.0 93.0 361.5 158.0 92.0 194.9 92.0SimBA [14] 93.2 432.1 235.0 74.0 480.5 223.0 68.3 632.3 237.0 84.0 455.5 270.0Subspace [15] 93.0 301.8 12.0 96.0 115.8 12.0 90.0 272.0 12.0 91.0 255.4 10.0P-RGF [8] 92.2 121.8 62.0 99.6 111.7 62.0 96.8 176.4 62.0 98.2 135.8 62.0TREMBA [21] 90.9 120.7 64.0 97.8 126.4 66.0 97.7 125.5 63.0 97.9 82.3 39.0MetaAttack [11]
TargetedAttack Bandits [23] 72.6 3660.1 2812.0 80.0 4154.8 3842.0 83.4 3967.6 3860.0 77.8 4484.6 3876.0SimBA [14] fine-tuned values of λ are 20 for CIFAR-10 and 50 for Tiny-ImageNet.
2) The CMA-ES algorithm is implemented using PyCMA , with the population size k = 20 and the selection size m = 10 . Themean µ is initialized using the pre-trained c-Glow model, while the co-variance matrix diag ( σ ) isinitialized as the identity matrix I . All other hyper-parameters are set as default values in PyCMA. In this case, one attack is successful if the predicted class of the adversarialexample is different from the ground-truth label. The results are reported in the top half of Table 1. Itshows that the proposed CG-ES achieves 100% ASR on ResNet, DenseNet and PyramidNet, and99.9% ASR on VGG, which demonstrates the effectiveness of our method. CG-ES is also very query-efficient. The mean number of queries is the lowest under all four target models in Table 1. Moresurprisingly, the median number of queries of CG-ES is just 1, which means that we successfully foolthe target model with just one query for more than attacked images. It reveals that the c-Glowmodel pre-trained on surrogate models is a good approximation to the perturbation distribution of thetarget model. In contrast, the second-best median queries are obtained by Subspace [15], which aremore than 10x of ours, and with much lower ASR. The curves of the average ASR on all evaluationimages v.s. the query number are shown in Fig. 1. It clearly highlights the superiority of our CG-ESmethod to all compared methods. Especially in the stage of low query numbers, CG-ES achievesvery high ASR efficiently. log (Queries) A S R ( % ) ResNet Bandits SimBA Subspace P-RGF TREMBA MetaAttack Signhunter CG-ES1 2 3 3.7 log (Queries) log (Queries) log (Queries) Figure 1: Attack success rate (ASR % ) w.r.t. query numbers for untargeted attacks on CIFAR-10. Targeted Attack.
Following [21], we conduct targeted attacks with three target classes, including0 (airplane), 4 (deer) and 9 (truck). When attacking for one target class, images with the sameground-truth class are skipped. Due to space limit, we report the attack results of the target class 0 inthe bottom half of Table 1, and leave the results of the other two target classes in the supplementarymaterial . As shown in Table 1, our CG-ES method achieves at least 98.8% ASR on all target models.Besides, the mean and median query numbers of CG-ES are significantly lower than that of allcompared methods, demonstrating its query efficiency. Signhunter [1] obtains a slightly higher ASRthan CG-ES on VGG (0.9% higher) and PyramidNet (1.1% higher), but with the cost of more than1.6x query numbers. https://github.com/CMA-ES/pycma % ), mean and median number of queries of untargeted attack andtargeted attack (target class being ) on Tiny-ImageNet. The best and second-best values amongmethods that achieve more than 90% ASR are highlighted in bold and underline, respectively. Target model → ResNet DenseNet VGG PyramidNetAttack Method ↓ ASR Mean Median ASR Mean Median ASR Mean Median ASR Mean MedianUntargetedAttack Bandits [23] 82.9 1846.6 168.0 77.6 2629.3 1194.0 81.8 2421.9 940.0 85.0 2508.1 1012.0SimBA [14] 99.4 616.9 398.0 99.0 1571.0 1198.0 97.5 1597.4 1168.0 97.5 1071.9 849.0Subspace [15] 78.6 642.0 6.0 86.9 778.4 10.0 81.4 975.9 10.0 83.3 856.7 10.0P-RGF [8] 98.2 203.2 112.0 91.2 209.5 112.0 91.8 452.0 112.0 95.3 482.9 112.0TREMBA [21] 99.1 139.3
TargetedAttack Bandits [23] 47.4 5374.6 5592.0 41.7 6081.0 6476.0 44.2 5674.9 5910.0 47.8 4717.4 4582.0SimBA [14]
The results are summarized in the top half of Table 2. It shows that CG-ESperforms better than compared methods at most cases. Specifically, when attacking the ResNet model,CG-ES achieves the highest ASR with the lowest mean and median number of queries among allmethods. When attacking DenseNet, CG-ES achieves ASR of 98.9% with the lowest mean andmedian number of queries. The Signhunter is slightly higher than ours in terms of ASR (1.1% higher),but its mean and median number of queries are 2.4x of ours. In terms of VGG, CG-ES achieves thesecond-highest ASR and the best values of both mean and median number of queries. In terms ofPyramidNet, CG-ES obtains the second-best values of both ASR and the mean number of queries,while the median query is just 1. In contrast, the median number of queries of Signhunter is 68x ofours. These comparisons demonstrate the effectiveness and efficiency of the proposed method.
Targeted Attack.
Similar to that on CIFAR-10, we also randomly select three target classes: 94(jellyfish), 113 (fly) and 171 (chain). Due to space limit, we report the results of the target class 94in the bottom half of Table 2, and leave other results in the supplementary material . As shown inTable 2, our CG-ES is more effective and efficient than compared method at most cases. Specifically,when attacking ResNet and PyramidNet, CG-ES obtains the best performance on ASR, the meanand median number of queries. When attacking DenseNet, CG-ES also obtains the lowest mean andmedian number of queries. Although SimBA obtains slightly higher ASR (0.9% higher) than ours,its mean and median number of queries are about 1.9x of ours. For the target model VGG, CG-ESachieves the second-best value of mean number of queries. Above results demonstrate the superiorperformance of CG-ES.
In all above results summarized in Tables 1-2, and there are 48evaluation results in total. Among these results, our CG-ES method obtains 36 best and 6 second-bestresults. It fully demonstrates the superior performance of CG-ES on both effectiveness and efficiency,to all compared methods. Moreover, CG-ES always achieves the lowest median numbers of queries(except the targeted attack on VGG of Tiny-ImageNet), and even 1 at 4 results. It reflects that thesearch distribution π is very close to the intrinsic perturbation distribution of the target model, due tothe powerful flexibility of the c-Glow model coupled with the Gaussian distribution, as well as thegood transferability of the c-Glow model pre-trained on surrogate models. Supplementary Material.
Due to the space limit, some important information will be presented inthe supplementary material, including: the detailed definition of the c-Glow model (see Section 4.1),the proof of Theorem 1, additional results of targeted attacks on both CIFAR-10 and Tiny-ImageNet,ablation studies about the effects of the c-Glow model and its initialization, as well as the empiricalverification of the energy-based model for capturing the perturbation distribution (see Section 4.2.1).
Future Extensions. 1)
The main idea of our method is replacing the search distribution in ES usingthe c-Glow model, while the ES algorithm is not influenced. Thus, our method is applicable toany ES variant, such as NES [45, 46]. As demonstrated in the last paragraph of Section 4.1, inthis work we simply fix the parameter φ of the c-Glow model as the pre-trained value, while only8ne-tuning the Gaussian parameters ( µ , σ ) . Although this simple setting has shown surprisinglygood performance, it is still interesting to explore what will happen if φ is also fine-tuned. It ispossible that the ASR could be further improved, as the search distribution φ is supposed to be moreclose to the perturbation distribution of the target model. Above two extensions will be explored inour future work. In this work, we proposed a novel search distribution in the evolution strategy (ES) method for solvingthe score-based black-box attack problem, based on the conditional Glow model coupled with theGaussian distribution. This novel search distribution is flexible to capture the intrinsic distribution ofadversarial perturbations conditioned on different benign examples. Besides, we proposed to pre-trainthe c-Glow model by approximating an energy-based model for the perturbation distribution ofsurrogate models. The pre-trained c-Glow model is then used as initialization in ES for attacking thetarget model. Consequently, the proposed CG-ES method takes advantages of both query-based andtransfer-based attack methods, to obtain high attack success rate and high efficiency simultaneously.Extensive experiments of attacking four models on two benchmark datasets have fully verified thesuperior attack performance of the proposed method, compared to several state-of-the-art methods.
References [1] Abdullah Al-Dujaili and Una-May O’Reilly. Sign bits are all you need for black-box attacks. In
ICLR , 2020.[2] Battista Biggio, Igino Corona, Davide Maiorca, Blaine Nelson, Nedim Srndic, Pavel Laskov,Giorgio Giacinto, and Fabio Roli. Evasion attacks against machine learning at test time. In
ECML PKDD , 2013.[3] Wieland Brendel, Jonas Rauber, and Matthias Bethge. Decision-based adversarial attacks:Reliable attacks against black-box machine learning models. In
ICLR , 2018.[4] Nicholas Carlini and David A. Wagner. Towards evaluating the robustness of neural networks.In
IEEE S&P , 2017.[5] Jianbo Chen, Michael I. Jordan, and Martin J. Wainwright. Hopskipjumpattack: A query-efficient decision-based attack. arXiv preprint arXiv:1904.02144 , 2019.[6] Minhao Cheng, Thong Le, Pin-Yu Chen, Huan Zhang, Jinfeng Yi, and Cho-Jui Hsieh. Query-efficient hard-label black-box attack: An optimization-based approach. In
ICLR , 2019.[7] Minhao Cheng, Simranjit Singh, Patrick H. Chen, Pin-Yu Chen, Sijia Liu, and Cho-Jui Hsieh.Sign-opt: A query-efficient hard-label adversarial attack. In
ICLR , 2020.[8] Shuyu Cheng, Yinpeng Dong, Tianyu Pang, Hang Su, and Jun Zhu. Improving black-boxadversarial attacks with a transfer-based prior. In
NeurIPS , 2019.[9] Philip J. Davis. Leonhard euler’s integral: A historical profile of the gamma function.
AmericanMathematical Monthly , 66(10):849–869, 1959.[10] Yinpeng Dong, Hang Su, Baoyuan Wu, Zhifeng Li, Wei Liu, Tong Zhang, and Jun Zhu. Efficientdecision-based black-box adversarial attacks on face recognition. In
CVPR , 2019.[11] Jiawei Du, Hu Zhang, Joey Tianyi Zhou, Yi Yang, and Jiashi Feng. Query-efficient meta attackto deep neural networks. In
ICLR , 2020.[12] Louis Faury, Clément Calauzènes, Olivier Fercoq, and Syrine Krichene. Improving evolutionarystrategies with generative neural networks. arXiv preprint arXiv:1901.11271 , 2019.[13] Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adver-sarial examples. In
ICLR , 2015.[14] Chuan Guo, Jacob R. Gardner, Yurong You, Andrew Gordon Wilson, and Kilian Q. Weinberger.Simple black-box adversarial attacks. In
ICML , 2019.[15] Yiwen Guo, Ziang Yan, and Changshui Zhang. Subspace attack: Exploiting promising subspacesfor query-efficient black-box attacks. In
NeurIPS , 2019.916] Dongyoon Han, Jiwhan Kim, and Junmo Kim. Deep pyramidal residual networks. In
CVPR ,2017.[17] Nikolaus Hansen. The CMA evolution strategy: A tutorial. arXiv preprint arXiv:1604.00772 ,2016.[18] Nikolaus Hansen, Dirk V. Arnold, and Anne Auger. Evolution strategies. In
Springer Handbookof Computational Intelligence . 2015.[19] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Identity mappings in deep residualnetworks. In
ECCV , 2016.[20] Gao Huang, Zhuang Liu, Laurens van der Maaten, and Kilian Q. Weinberger. Densely connectedconvolutional networks. In
CVPR , 2017.[21] Zhichao Huang and Tong Zhang. Black-box adversarial attack with transferable model-basedembedding. In
ICLR , 2020.[22] Andrew Ilyas, Logan Engstrom, Anish Athalye, and Jessy Lin. Black-box adversarial attackswith limited queries and information. In
ICML , 2018.[23] Andrew Ilyas, Logan Engstrom, and Aleksander Madry. Prior convictions: Black-box adversar-ial attacks with bandits and priors. In
ICLR , 2019.[24] Rie Johnson and Tong Zhang. A framework of composite functional gradient methods forgenerative adversarial models.
IEEE Transactions on Pattern Analysis and Machine Intelligence ,2019.[25] Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images.Technical report, Citeseer, 2009.[26] Solomon Kullback and Richard A Leibler. On information and sufficiency.
The Annals ofMathematical Statistics , pages 79–86, 1951.[27] Huichen Li, Xiaojun Xu, Xiaolu Zhang, Shuang Yang, and Bo Li. QEBA: query-efficientboundary-based blackbox attack. In
CVPR , 2020.[28] Yandong Li, Lijun Li, Liqiang Wang, Tong Zhang, and Boqing Gong. NATTACK: learning thedistributions of adversarial examples for an improved black-box attack on deep neural networks.In
ICML , 2019.[29] Yanpei Liu, Xinyun Chen, Chang Liu, and Dawn Song. Delving into transferable adversarialexamples and black-box attacks. In
ICLR , 2017.[30] Yujia Liu, Seyed-Mohsen Moosavi-Dezfooli, and Pascal Frossard. A geometry-inspired decision-based attack. In
ICCV , 2019.[31] You Lu and Bert Huang. Structured output learning with conditional generative flows. In
AAAI ,2020.[32] Seungyong Moon, Gaon An, and Hyun Oh Song. Parsimonious black-box adversarial attacksvia efficient combinatorial optimization. In
ICML , 2019.[33] Ryan Murray, Brian Swenson, and Soummya Kar. Revisiting normalized gradient descent: Fastevasion of saddle points.
IEEE Transactions on Automatic Control , 64(11):4818–4824, 2019.[34] Nicolas Papernot, Patrick D. McDaniel, and Ian J. Goodfellow. Transferability in machinelearning: from phenomena to black-box attacks using adversarial samples. arXiv preprintarXiv:1605.07277 , 2016.[35] Nicolas Papernot, Patrick D. McDaniel, Ian J. Goodfellow, Somesh Jha, Z. Berkay Celik, andAnanthram Swami. Practical black-box attacks against machine learning. In
AISACCS , 2017.[36] Ali Rahmati, Seyed-Mohsen Moosavi-Dezfooli, Pascal Frossard, and Huaiyu Dai. Geoda: ageometric framework for black-box adversarial attacks. In
CVPR , 2020.[37] I Rechenberg. Evolutionsstrategien. In
Simulationsmethoden in der Medizin und Biologie . 1978.1038] Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, ZhihengHuang, Andrej Karpathy, Aditya Khosla, Michael S. Bernstein, Alexander C. Berg, and Fei-FeiLi. Imagenet large scale visual recognition challenge.
International Journal of Computer Vision ,115(3):211–252, 2015.[39] H.-P Schwefel. Numerische optimierung von computer-modellen mittels der evolutionsstrategie:mit einer vergleichenden einführung in die hill-climbing-und zufallsstrategie. In
Birkhäuser ,1977.[40] Xinwei Shen, Tong Zhang, and Kani Chen. Bidirectional generative modeling using adversarialgradient estimation. arXiv preprint arXiv:2002.09161 , 2020.[41] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scaleimage recognition. In
ICLR , 2015.[42] Yi Sun, Daan Wierstra, Tom Schaul, and Jürgen Schmidhuber. Efficient natural evolutionstrategies. In
GECCO , 2009.[43] Fnu Suya, Jianfeng Chi, David Evans, and Yuan Tian. Hybrid batch attacks: Finding black-boxadversarial examples with limited queries. In
USENIX Security , 2020.[44] Esteban G. Tabak and Eric Vanden-Eijnden. Density estimation by dual ascent of the log-likelihood.
Communications in Mathematical Sciences , 8(1):217–233, 2010.[45] Daan Wierstra, Tom Schaul, Tobias Glasmachers, Yi Sun, Jan Peters, and Jürgen Schmidhuber.Natural evolution strategies.
The Journal of Machine Learning Research , 15(1):949–980, 2014.[46] Daan Wierstra, Tom Schaul, Jan Peters, and Jürgen Schmidhuber. Natural evolution strategies.In