CAAD 2018: Powerful None-Access Black-Box Attack Based on Adversarial Transformation Network
CCAAD 2018: Powerful None-Access Black-Box Attack Based onAdversarial Transformation Network
Xiaoyi DongUSTC [email protected]
Weiming ZhangUSTC [email protected]
Nenghai YuUSTC [email protected]
1. Method
In this paper, we propose an improvement of Adversar-ial Transformation Networks(ATN) [1]to generate adversar-ial examples, which can fool white-box models and black-box models with a state of the art performance and wonthe
SECOND place in the non-target task in CAAD 2018.In this section, we first introduce the whole architectureabout our method, then we present our improvement onloss functions to generate adversarial examples satisfyingthe L ∞ norm restriction in the non-targeted attack problem.Then we illustrate how to use a robust-enhance module tomake our adversarial examples more robust and have bettertransfer-ability. At last we will show our method on how toattack an ensemble of models. Our work is based on ATN and propose a new trainingframework and two powerful loss functions for improvingthe transfer-ablity and training speed. Figer 1 shows ourframeworkOur framework is composed by a Generate module anda Robust-enhance module. In the Generate module, thereare the Encoder and Decoder just like the ATN. But beforefeeding the clean image and adversarial example into thepre-trained model, we add a Robust-enhance module to im-itate the image pre-process used in some defence methods.The generating part could be defined as a neural network: g k,θ ( x ) : x ∈ χ → x (cid:48) (1)where θ is the parameter of g ( x ) , k ( x ) is the target modelwhich outputs a probability distribution across class labelsand k f ( x ) is the feature map before the target model’s lastaverage pooling layer. x is the input image and χ is thedistribution domain of the input image, x (cid:48) is the adversarialexample generate by the g ( x ) and argmax k ( x ) (cid:54) = argmax k ( x (cid:48) ) (2) With the L ∞ norm restriction, we abandon the space-domain loss and use the following loss functions to improveperformance. Feature Based Loss function : inspired by the SRGAN [7]and Guided denoise [8] , we use the target model’s featuremap before the last average pooling layer as the input im-ages’ feature and try to maximum the L distance betweenthe feature of real image and the adversarial example. l F ( x, x (cid:48) ) = 1 − L ( k f ( x ) , k f ( x (cid:48) )) (3)We use this feature level loss function in the DLightteam’s attack model and got the third prize. Prediction Based Loss function : we also concerned aboutthe loss based on the target model’s output and found a sim-ple but powerful loss function. l P ( x, x (cid:48) ) = (cid:40) P (cid:48) fir − P (cid:48) sec , if argmax f ( x ) = f irP (cid:48) sec − P (cid:48) fir , if argmax f ( x ) (cid:54) = f ir (4)where for target model f with N class, it’s output prediction f ( x (cid:48) ) = [ P (cid:48) , P (cid:48) , . . . P (cid:48) N ] , we define f ir and sec as labelsof the top two probabilities in f ( x (cid:48) ) and P fir and P sec aretheir corresponding probabilities. In the following experi-ments we will show that adversarial examples generated bythe model trained with this simple loss have strong transfer-ability when comparing with other methods.We use this loss function in the Hooin Zira’s submissionand won the second place in the Non-target attack competi-tion. As mentioned in 1.3, there are two main methods fordefending adversarial examples: adversarial training andimage pre-process. In order to attack the second defencemethods, we insert a Robust-enhance module in the trainingprocess, after we get the output adversarial examples withclip, we add this Robust-enhance module on the adversarial4321 a r X i v : . [ c s . L G ] N ov igure 1. Architecture of our method examples and feed the processed images to the followingtarget model. We find that with this Robust-enhance mod-ule, our adversarial examples are more robust to the secondsort defence methods and get more powerful transfer-abilitywhen attacking models trained with the adversarial exam-ples(first sort defence methods).Considering about the implement of back propagation,we use three basic image process in the Robust-enhancemodule. Random Noise : Add random noise.
Pre-trained Filter : We randomly use the Random Noise ora small pre-trained filter network to process the adversarialexamples.
Training Filter :We randomly use the Random Noise or asmall pre-trained filter network to process the adversarialexamples, during training the Generate module, we alsotrain the filter.
In this section, we show how to attack an ensemble ofmodels. Previous researches and competitions show thatensemble methods is a efficient method for enhancing per-formance and improve robustness in many area includingadversarial examples. Adversarial examples generated byensemble training are able to fool many white-box modelsat the same time and have better transfer-ability to attackblack-box models.We propose to attack multiple models by adding theloss together and we call this ensemble in loss . Asour loss functions are defined in both feature level andprediction level, this ensemble in loss method could easilyused with any kind of loss functions. When it comes to theback propagation step, every loss will calculate gradientsindividually and sum at each parameter. This will guide themodel to learn how to be more aggressive to all the targetmodels. Specifically, to attack an ensemble of N models,we fuse the losses as l ( x ) = N (cid:88) n =1 w n l n ( x ) (5)where l n ( x ) are the loss function of the n-th model, the loss function here could be the feature based loss function or theprediction based loss function w n is the ensemble weightwith w n > , we use it to keep balance of the gradients’magnitude from different models.Meanwhile, as show in Fig.1, we find when our adver-sarial examples are adversarial to a model, it will be moreadversarial in a few iterations. In order to forbid our gen-erate model tend to one of the target models, we add athreshold γ for our prediction based loss l P and get newloss l ensP ( x, x (cid:48) ) : l ensP ( x, x (cid:48) ) = max ( γ, l P ( x, x (cid:48) )) (6)
2. Experiments
In this section, in order to validate the effectiveness ofthe proposed methods, we generate adversarial examplesfor fooling classifiers pre-trained on the ImageNet dataset[2] , which consist of 1.2 million natural images collectedfrom Internet and categorized into 1000 classes. We firstspecify the experimental settings in Sec.4.1. Then we showthe results for attacking a single model in Sec.4.2 and anensemble of model in Sec.4.3.
We use eleven models, nine of which are nor-mally trained models–Inception V3(Inc-v3)[11], In-ception V4(Inc-v4)[10], Inception Resnet V2(IncResV2)[10], Resnet V2-101(Res-101) [4] , PolyNet[13],SENet154(SENet)[5],PNASNet 5-Large(PNASNet)[9],NASNet-A-Large(NASNet)[14], DenseNet 121(Den-121)[6]. In order to avoid the evaluate result influencedby resize operation and easy for the ensemble modelstraining, we fine-tune all the models with a × inputby modify the pooling size of last average pooling layer.the other twp models are trained by ensemble adversarialtraining—Inc-v3 ens ,Inc-v3 ens , as we don’t have enoughtime and resources for prepare these models, we used themodels shared by and transfer them from tensorflowmodel to pytorch model with . https://github.com/dongyp13/Non-Targeted-Adversarial-Attacks https://github.com/Microsoft/MMdnn ens Inc-v3 ens MeanFGSM 0.71 0.24 0.23 0.37 0.18 0.34 0.13 0.11 0.22PGD
Table 1. Results for single model attack. Inc-v3 is the target white-box model and others are black-box models. We compare our methods P-ATN, F-ATN with FGSM, PGD and MI-FGSM, we find our method have a similar performance comparing with start of the art method MI-FGSM on white-box attack and have a much higher fooling rate when attacking black-box models. We also test the performance of Robust-enhance module, P-ATN(No Robust) and F-ATN(No Robust) show the fooling rate of model without Robust-enhance module duringtraining, results shows that with Robust-enhance module, P-ATN’s performance have a great progress when attacking black-box models,but F-ATN only have a small improvement. The last row shows the average black-box fooling rate, P-ATN have the best performance
As we focus on fooling the target white or black boxmodels, we use the fooling rate instead of the attack suc-cess rate. We define the fooling rate as how many adversar-ial examples’ prediction labels are different from the originimages’ prediction label. Because the classifier is not cor-rect for all the input, in most cases, the attack success rateis higher than the fooling rate. We use the DEV imagesetreleased by CAAD to test all of our methods.In our experiments, we compare our methods toFGSM[3] (one-step gradient-based) methods, MI-FGSM(iterative methods) and PGD[12](iterative methods). Mean-while we also compare with the method we followed —ATN(based on Autoencoder), as ATN’s loss function is de-signed for targeted attack, we modify the loss for non-targeted by minimize the true label’s prediction. Sinceoptimization-based methods cannot explicitly control thedistance between the adversarial examples and the corre-sponding real images, we don’t compare with these meth-ods.
We show the fooling rates of attacks against the modelswe consider in Sec.4.1 in Table 1. The adversarial examplesare generated for Inc-v3 useing FGSM, MI-FGSM and PGDand four of our methods:feature loss based ATN(F-ATN),prediction loss based ATN(P-ATN) and both of the mod-els without the Robust-enhance module. Inc-v4, IncRes-v2, PolyNet, NasNet, Res-101, Inc-v3 ens , Inc-v3 ens areblack-box models for evaluate transfer-ability of all themethods.The maximum perturbation (cid:15) is set to 16 amongall experiments, with pixel value in [0,255]. The number ofiterations is 10 for MIM-FGSM, and the decay factor µ is1.0 as used in MIM. The noise mean factor β is 6 in bothP-ATN and F-ATNFrom the table we can observe that our two models couldattack the white-box model with a near 100% fooling ratelike MI-FGSM and better than FGSM and PGD. But when Roubst method Inc-v3 Inc-v3 ens Inc-v3 ens None 0.98 0.58 0.47Random Noise 0.97
Pre-trained Filter 0.97 0.43 0.56Training Filter 0.97 0.59 0.43
Table 2. Results. it comes to the black-box attack, it can be seen that theperformance of FGSM, MI-FGSM and PGD are decreaselargely, especially when attacking the adversarial trainedmodels Inc-v3 ens and Inc-v3 ens , all of the three attackis powerless. But with Robust-enhance module, both of ourF-ATN and P-ATN still keep a high fooling rate. The lastrow of Table 1 shows the average black-box fooling rate,P-ATN have the best performance.Meanwhile, We also test the performance of Robust-enhance module, P-ATN(No Robust) and F-ATN(No Ro-bust) show the fooling rate of model without Robust-enhance module during training, results show that withRobust-enhance module, P-ATN’s performance have a greatprogress when attacking black-box models, but F-ATN onlyhave a small improvement.Although our method improve the success rates greatlyfor black-box attack even the target is adversarial trained,the performance is not good enough(less than 90%), we willshow that with muti-model ensemble training, our methodswill get a better result. The Robust-enhance module is the most important partfor improving our model’s transfer-ability and robustness.Therefore, we study the difference between the RandomNoise method, Pretrain Filter method and Maxmin Filtermethod.4323ttack Resize Inc-v3 IncRes-v2 Res-101FGSM N 0.71 0.28 0.34FGSM Y 0.46 0.21 0.30PGD N 0.99 0.12 0.18PGD Y 0.36 0.09 0.13MI-FGSM N 0.99 0.38 0.45MI-FGSM Y
Table 3. Results. Ours is better.Figure 2.
We attack Inc-V3 model by P-ATN with three only ran-dom noise method, pretrained filer combine with noise andMaxmin method. For the noise method, the noise max meanfactor β is 6. We show the fooling rate of the generated ad-versarial examples against Inc-v3, Inc-v3 ens , Inc-v3 ens in Table 2. Table 2 shows the result of different methodsand random noise is the best one. We then study the adversarial examples’ robustness whenthe black-box model use some image preprocess methodsto defence attack. We use the same hyper-parameters forall the attack methods as Sec.4.2, and before we feed theadversarial examples to the target model, we resize it from299 to 399, then from 399 to 199, finally we resize it backto 299. And the attack performance shows in Table 3.The result shows that after the resize option, all the attackperformance for white-box attack(Inc-v3) decrease, espe-cially our methods. As for black-box attack(IncRes-v2 andRes-101) results, the gradients based methods’ performancejust have a slightly decrease and our method’s fooling rateare influenced seriously, but still better than other methods. Attack IncRes-v2 Inc-v3 ens MI-FGSM 0.955 0.949F-ATN 0.971 0.963P-ATN
Table 4. Results.
We finally study the influence of the size of adversarial per-turbation on the fooling rates. We attack the Inc-v3 modelby P-ATN and MI-FGSM with (cid:15) from 1 to 32 with a granu-larity 4 and the pixels range is [0,255]. We evaluate the at-tack performance on white-box model Inc-v3, a black-boxmodel IncRes V2. For P-ATN, we use Random Noise mod-ule with β = 6 and step size α for PGD and MI-FGSM is10. As it cost many time for training P-ATN with differentepsilon, we just train the model with epsilon 4,8,16,32 andclip to generate different perturbation.Figer. 2 show the result. We find that when attackingwhite-box model Inc-v3, MI-FGSM keeps a high foolingrate for all the epsilon. When the epsilon is small(2,4,6),P-ATN have a poor performance, with the epsilon grow,the fooling rate reach 100%. When attacking the black-box model, fooling rate of MI-FGSM grow linearly withthe size of perturbation, and P-ATN’s fooling rate grow ex-ponentially, when the epsilon is large than 8, P-ATN got abetter performance and as last reach 99%. There are three sub-competitions in Competition onAdversarial Attacks and Defenses 2018 organized byGeekPwn, which are the Non-targeted Adversarial Attack,Targeted Adversarial Attack and Defense Against Adver-sarial Attack. The organizers provide 1000 ImageNet-compatible images for evaluating the attack and defensesubmissions. in the non-targeted attack, we won the sec-ond place by P-ATN and third place by F-ATN.For both of the network, we used ten pretrained modelsmentioned in Sec4.1 except Inc-v3. and Robust-enhancemodule. The noise mean factor β is 6 in both P-ATN andF-ATN and we set the ensemble threshold factor γ as − . .For PolyNet and Inc-v3 ens , we set the weight as 0.5 andthe rest Inc-v3 ens , Inc-v4, IncRes V2, Res-101, SENet,PNASNet, NASNet and Den-121, we set the weight as 1.0.Table 4 shows the performance comparing with last year’swinner’s submission . References [1] S. Baluja and I. Fischer. Adversarial transformation net-works: Learning to generate adversarial examples.
CoRR ,abs/1703.09387, 2017.
2] J. Deng, W. Dong, R. Socher, and L. J. Li. Imagenet: A large-scale hierarchical image database. In
Computer Vision andPattern Recognition, 2009. CVPR 2009. IEEE Conferenceon , pages 248–255, 2009.[3] I. J. Goodfellow, J. Shlens, and C. Szegedy. Explainingand Harnessing Adversarial Examples.
ArXiv e-prints , Dec.2014.[4] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learningfor image recognition. pages 770–778, 2015.[5] J. Hu, L. Shen, and G. Sun. Squeeze-and-excitation net-works. 2017.[6] G. Huang, Z. Liu, L. V. D. Maaten, and K. Q. Weinberger.Densely connected convolutional networks. In
IEEE Con-ference on Computer Vision and Pattern Recognition , pages2261–2269, 2017.[7] C. Ledig, L. Theis, F. Huszar, J. Caballero, A. P. Aitken,A. Tejani, J. Totz, Z. Wang, and W. Shi. Photo-realisticsingle image super-resolution using a generative adversarialnetwork.
CoRR , abs/1609.04802, 2016.[8] F. Liao, M. Liang, Y. Dong, T. Pang, J. Zhu, and X. Hu. De-fense against adversarial attacks using high-level representa-tion guided denoiser.
CoRR , abs/1712.02976, 2017.[9] C. Liu, B. Zoph, M. Neumann, J. Shlens, W. Hua, L. J. Li,L. Fei-Fei, A. Yuille, J. Huang, and K. Murphy. Progressiveneural architecture search. 2017.[10] C. Szegedy, S. Ioffe, V. Vanhoucke, and A. Alemi. Inception-v4, inception-resnet and the impact of residual connectionson learning. 2016.[11] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna.Rethinking the inception architecture for computer vision.In
Computer Vision and Pattern Recognition , pages 2818–2826, 2016.[12] J. Uesato, B. O’Donoghue, A. van den Oord, and P. Kohli.Adversarial Risk and the Dangers of Evaluating AgainstWeak Attacks.
ArXiv e-prints , Feb. 2018.[13] X. Zhang, Z. Li, C. L. Chen, and D. Lin. Polynet: A pursuitof structural diversity in very deep networks. In
IEEE Con-ference on Computer Vision and Pattern Recognition , pages3900–3908, 2017.[14] B. Zoph, V. Vasudevan, J. Shlens, and Q. V. Le. Learn-ing transferable architectures for scalable image recognition.2017., pages3900–3908, 2017.[14] B. Zoph, V. Vasudevan, J. Shlens, and Q. V. Le. Learn-ing transferable architectures for scalable image recognition.2017.