Block Switching: A Stochastic Approach for Deep Learning Security
BBlock Switching: A Stochastic Approach for Deep LearningSecurity
Xiao Wang ∗ , Siyue Wang ∗ , Pin-Yu Chen , Xue Lin , and Peter Chin
1. Boston University 2. Northeastern University 3. IBM Research ∗ Equal [email protected] [email protected] [email protected] [email protected] [email protected]
ABSTRACT
Recent study of adversarial attacks has revealed the vulnerabilityof modern deep learning models. That is, subtly crafted perturba-tions of the input can make a trained network with high accuracyproduce arbitrary incorrect predictions, while maintain impercep-tible to human vision system. In this paper, we introduce BlockSwitching (BS), a defense strategy against adversarial attacks basedon stochasticity. BS replaces a block of model layers with multipleparallel channels, and the active channel is randomly assigned inthe run time hence unpredictable to the adversary. We show empir-ically that BS leads to a more dispersed input gradient distributionand superior defense effectiveness compared with other stochasticdefenses such as stochastic activation pruning (SAP). Compared toother defenses, BS is also characterized by the following features: (i)BS causes less test accuracy drop; (ii) BS is attack-independent and(iii) BS is compatible with other defenses and can be used jointlywith others.
ACM Reference format:
Xiao Wang ∗ , Siyue Wang ∗ , Pin-Yu Chen , Xue Lin , and Peter Chin . .Block Switching: A Stochastic Approach for Deep Learning Security. In Proceedings of AdvML’19: Workshop on Adversarial Learning Methods forMachine Learning and Data Mining at KDD, Anchorage, Alaska, USA, August5th, 2019,
Powered by rapid improvements of learning algorithms [11, 14, 16,31, 32], computing platforms [1, 12], and hardware implementations[10, 17], deep neural networks become the workhorse of more andmore real world applications, many of which are security critical,such as self driving cars [3] and image recognition [11, 14, 22, 28, 30],where malfunctions of these deep learning models lead to seriousloss.However, the vulnerability of deep neural networks against ad-versarial attacks is discovered by Szegedy et al. [24], who showsthat in the context of classification, malicious perturbations canbe crafted and added to the input, leading to arbitrary erroneouspredictions of the target neural network. While the perturbationscan be small in size and scale or even invisible to human eyes.This phenomenon triggered wide interests of researchers, and alarge number of attacking methods have been developed. Some typ-ical attack methods include Fast Gradient Sign Method (FGSM) byGoodfellow et al. [8], Jacobian-based Saliency Map Attack (JSMA)by Papernot et al. [20], and CW attack by Carlini and Wagner [5].These attacks utilize gradients of a specific object function withrespect to the input, and design perturbations accordingly in order
This work is supported by the Air Force Research Laboratory FA8750-18-2-0058. to have a desired output of the network. Among the attacks, CWattack is known to be the strongest and often used as a benchmarkfor evaluating model robustness.In the meantime, a rich body of defending methods have beendeveloped, attempting to improve model robustness in different as-pects. Popular directions include adversarial training [18], detection[9, 19], inputs rectifying [6, 29], and stochastic defense [7, 25–27].However, although these defenses alleviate the vulnerability ofdeep learning in some extent, they are either shown to be invalidagainst counter-measures of the adversary [4] or require additionalresources or sacrifices. A significant trade-off of these methods isbetween defense effectiveness and test accuracy, where a strongerdefense is often achieved at the cost of worse performance on cleanexamples[27].Motivated by designing defense method with less harm on testaccuracy, in this article we introduce Block Switching (BS) as aneffective stochastic defense strategy against adversarial attacks.BS involves assembling a switching block consisting of a numberof parallel channels. Since the active channel in the run time israndom, it prevents the adversary from exploiting the weakness ofa fixed model structure. On the other hand, with proper training,the BS model is capable of adapting the switch of active channels,and maintains high accuracy on clean examples. As a result, BSachieves drastic model variation, and thus have strong resistanceagainst adversary without noticeable drop in legitimate accuracy.The nature of BS also enables its usage jointly with other type ofdefenses such as adversarial training.Our experimental results show that a BS model with 5 channelscan reduce the fooling ratio (the percentage of generated adversarialexamples that successfully fool the target model) of CW attack from100% to 21.0% on MNIST dataset and to 22.2% on CIFAR-10 datasetrespectively with very minor testing accuracy loss on legitimateinputs. As comparison, another recent stochastic defense stochasticactivation pruning (SAP) only reduces the fooling ratio to 32.1%and 93.3% given the same attack. The fooling ratio can be furtherdeceased with more parallel channels.The rest of this article is organized in the following way: In Sec-tion 2, we introduce related works in both attacking and defendingsides. The defense strategy and analysis are given in Section 3. Ex-perimental results are given in Section 4. And Section 5 concludesthis work.
FGSM . Fast Gradient Sign Method (FGSM) [8] utilizes the gradientof the loss function to determine the direction to modify the pixels.They are designed to be fast, rather than optimal.Specifically, Adversarial examples are generated as following: x (cid:48) = x − ϵ · sign (∇( loss F , t ( x ))) (1) a r X i v : . [ c s . L G ] F e b here ϵ is the magnitude of the added distortion, t is the targetlabel. Since it only performs a single step of gradient descent, it isa typical example of “one-shot” attack. CW . Carlini & Wagner (CW) attack [5] generates adversarial ex-amples by solving the following optimization problem:minimize D ( δ ) + c · f ( x + δ ) subject to x + δ ∈ [ , ] n (2)where c > D and loss term f . The loss term f takes the following form: f ( x + δ ) = max ( max { Z ( x + δ ) i : i (cid:44) t } − Z ( x + δ ) t , − κ ) (3)where κ controls the confidence in attacks. Training a Block Switching model involves two phases. In thefirst phase, a number of sub-models with the same architectureare trained individually from random weights initialization. Withthe training process and data being the same, these models tendto have similar characteristics in terms of classification accuracyand robustness, yet different model parameters due to randominitialization and stochasticity in the training process.After the first round of training, each sub-model is split intotwo parts. The lower parts are grouped together and form theparallel channels of the switching block, while the upper parts arediscarded. The switching block is then connected to a randomlyinitialized common upper model as shown in Fig. 1. In the run time,a random channel is selected to be active that processes the inputwhile all other channel remains inactive, resulting in a stochasticmodel that has different behavior at different time.The whole BS model is then trained for the second round on thesame training dataset in order to regain classification accuracy. Inthis phase, the common upper model is forced to adapt inputs givenby different channels so that a legitimate example can be correctlyclassified given whichever channel is active. Usually, this phaseis much faster than the first round of training since the parallelchannels are already trained.
Let Y = (cid:101) F ( x ) denoted the learned mapping of a stochastic model.Note that (cid:101) F is a stochastic function and now Y is a random variable.The defending against adversarial attacks can be revealed in twoaspects. • Stochasticity of Inference : Since Y = (cid:101) F ( x ) is a random vari-able, an adversarial example that fools an instance F of thestochastic model (cid:101) F sampled at t may not be able to F sampledat t . • Stochasticity of Gradient
Due to the stochasticity of the net-work, the gradient of attacker’s objective loss with respect to theinput is also stochastic. That is, the gradient backpropagated tothe input is just an instance sampled from the gradient distribu-tion. And this instance may not represent the most promisinggradient descent direction.Note that these two aspects are actually correlated. From theattacker’s point of view, the goal is to find arg max x E [ A ( (cid:101) F ( x ) , T )] Figure 1: The steps of assembling a block switching. (a):Sub-models are trained individually. (b): The lower parts ofsub-models are used to initialize parallel channels of blockswitching. where A (·) outputs 1 if the attack is successful and 0 otherwise,and T is the target class. Therefore, the attacker is benefited fromusing stochastic gradients other than gradients from a fixed modelinstance, in order to generate adversarial examples that are robustto model variation. In another word, this means the adversarycannot benefit from simply disabling the variation of the stochasticmodel and craft perturbations using a fixed model instance.The above analysis holds for any stochastic model but the ques-tion is what makes a good randomizatin strategy against adversarialattacks? Intuitively, a good randomization strategy should causethe input gradients to have wider distributions. In an extreme case,if the gradient direction is uniformly distributed, performing gra-dient descent is no better than random walking, which means theattacker cannot take any advantage from the target model.Knowing this, we explain why block switching performs bet-ter than existing stochastic strategies such as SAP. In Fig. 2 wevisualize gradient distributions under CW attacks to a SAP modeland a BS model respectively. We observe that the gradient (of theattacker’s object function w.r.t the input) distribution of the SAPmodel is unimodal and concentrated While the gradient of BS hasa multimodal distribution in a wider range. This distribution indi-cates that it is harder to attack BS than SAP which is verified byour experiment results in Section 4.Usually dramatic variations of the stochastic model tend to harmclassification accuracy on clean inputs. That is why in SAP, smalleractivation outputs have more chance to be dropped. The reasonthat Block Switching maintain high test accuracy despite drasticmodel change is that, since each channel connected to the commonupper model is able function independently. As long as the common igure 2: We use three images (a-c): Gradient distributionsof CW attack on a SAP model. (d-f): Corresponding gradi-ent distributions on a block switching. Distributions in thesame column belong to the same input dimension. Each dis-tribution is sampled for 100 times. upper model can learn to adapt different knowledge representationsgiven by different channels, the stochastic model will not sufferfrom significant test accuracy loss.An interesting question that readers may ask is: why stochas-ticity of the model does not impede the second round of training?The fact is that although the gradients with respect to the inputare random variables, the gradients with respect to model parame-ters are not. Since gradients of the inactive channel are just zeros,only weights parameters in the activate channel will be updatedin each training step. Therefore, although the weights to be up-dated alternates, the gradients with respect to model parametersare deterministic at any time. In this section, we compare the defense effectiveness of regular, SAPand BS models against FGSM [8] and CW [2] attacks on MNIST [15]and CIFAR-10 [13] datasets. FGSM is a typical “one-shot” methodwhich performs only one gradient descent step and CW attack isknown to be the strongest attack method so far [2].Both of these two datasets contain separated training and test-ing sets. In our experiments, the training sets are used to trainthe defending models and the testing sets are used to evaluateclassification performance and generate adversarial examples.This section is organized in the following way: Details about thedefending models, including the models’ architectures and trainingmethods, are given in Section 4.1. Defending records against FGSM and CW attacks are shown in Section 4.2. Study on how the numberof channels in the block switching influences its the defendingeffectiveness and classification accuracy is provided in Section 4.3.
We use two standard Convolutional Neu-ral Networks (CNNs) architectures for MNIST and CIFAR-10 datasetsrespectively, as they serve as baseline models repeatedly in previousworks [21]. Both of these two CNNs have 4 convolutional layers, 2pooling layers and 2 fully-connected layers but the kernel size ofconvolution filters and layer width are different.Both models are trained using stochastic gradient descent withthe mini batch size of 128. Dropout [23] is used as regularizationduring training.
SAP can be applied post-hoc to a pre-trained model[ ? ]. Therefore, in order to make the experimental results morecomparable, we use the same trained weights for SAP model as ofthe regular model. Stochastic activation pruning is added betweenthe first and second fully-connected layers. The switching block in this experimentconsists of 5 channels. During the first round of training, 5 regularmodels are trained as described above. Each regular model is splitinto a lower part, containing all convolutional layers and the firstfully-connected layer, and a upper part, containing the secondfully-connected layer. The lower parts of regular model are kept,providing parallel channels of block switching while the upper partsare discarded. A upper model, which is the same as the upper partof regular models except that its weights are randomly initialized,is added on top of all channels. The whole block switching is thentrained on original training set for the second time. We found thatthe second round of training is much faster than the first round.On MNIST dataset block switching is retrained for 1 epoch and onCIFAR-10 dataset 5 epochs.
Table 1: Testing Accuracy of different models on MNIST andCIFAR-10 datasets.
Model Test Acc. on MNIST Test Acc. on CIFARRegular 99.04% 78.31 %SAP 99.02% 78.28 %Sub-models Avg. 99.02% 78.97%Switching 98.95% 78.73%
The test classification accuracy of all models is summarized inTable 1. The direct comparisons are between the regular model andthe SAP model, since they share the same weights; and the averageof sub-models used to construct block switching and block switch-ing itself. We can conclude that both SAP and block switching areexcellent in maintaining testing accuracy.
We use the fooling ratio, which is the percentage of adversarialexamples generated by a attack method that successfully fools aneural network model to predict the target label, to evaluate thedefense effectiveness of the target model. The lower the foolingratio is, the stronger the model is in defending adversarial attacks. e also record the average L norm of the generated adversarialexamples from legitimate input images, since it is only fair to com-pare two attacks at similar distortion levels. For attacks like CWattack that uses a leveraged object function between distortion andmisclassification, a large distortion also indicates that it is hard forthe attacking algorithm to find an adversarial example in a smallregion. For the sake of repro-ducibility of our experiments, we report the hyper-parameter set-tings we use for FGSM and CW attacks. FGSM has one hyper-parameter, the attacking strength ϵ as shown in equation 1. Whenusing ϵ = .
1, the L norm of adversarial examples roughly matchesCW, but the fooling ratio is way too small. Thus we also test thecase when ϵ = .
25 in order to provide a more meaningful compar-ison, although the L norm is significantly larger. For CW attack,gradient descent is performed for 100 iterations with step size of0.1. The number of binary searching iterations for c in 2 is set to 10.We use FGSM and CW attacks to generate adversarial examplestargeting the regular model, the SAP model and block switchingrespectively. Experimental results are shown in Table 2. Table 2: Fooling ratio (FR) and distortion of FGSM and CWattacks with different target models on MNIST dataset.
Attack Regular SAP SwitchingFR L2 FR L2 FR L2FGSM ϵ = . ϵ = .
25 34.0% 6.84 32.8% 6.84 20.3% 6.84CW 100.0% 2.28 32.1% 2.28 21.0% 2.37
Although the SAP model demonstrates its extra robustness againstboth FGSM and CW than the regular model, block switching is ap-parently superior and deceases the fooling ratio further.
We use ϵ = .
01 forFGSM in this experiment in order to have adversarial exampleswith similar distortion level comparing to examples generated byCW attack. The hyper-parameter setting for CW attack is the sameas above.Experimental results on CIFAR-10 datasets are shown in Table 3.And block switching significantly decreases fooling ratio of FGSMand CW to 8.1% and 22.2% respectively while the SAP model onlyshows minor advantages over the regular model.
Table 3: Fooling ratio (FR) and distortion of FGSM and C&Wattacks with different target models on CIFAR-10 dataset.
Attack Regular SAP SwitchingFR L2 FR L2 FR L2FGSM ϵ = .
01 25.0% 0.55 24.8% 0.55 8.1% 0.55CW 100.0% 0.54 93.3% 0.52 22.2% 0.69
To provide an analysis on how the number of channels in a blockswitching affect its defense effectiveness as well as testing accuracy,
Figure 3: Quantifying the impact of channel numbers: weplot defending effectiveness in terms of fooling ratio and L distortion, and testing classification accuracy of blockswitchings with 1 channel to 9 channels. we run CW attack on BS models with different number of channelsranging from 1 (which is a regular model) to 9.In Fig. 3 we plot the fooling ratio, distortion and test accuracyover different channel numbers: in general, the defense becomesstronger with more channels of block switching and the foolingratio is lowest, 12.1%, when using 9 channels. The fooling ratiodrops rapidly from 1 channel to 4 channels while the drop of foolingratio decelerates after 5 channels, which indicates the effectivenessprovided by switching channels starts to saturate. The increasingof distortion of adversarial examples also indicates that BS withmore channels are stronger when defending adversarial attacks.The trend of testing accuracy, on the other hand, is almost flat witha very slight descent from 78.31% to 78.17%. This indicates that BSis very effective in defending adversarial attacks with very minorclassification accuracy loss. In this paper, we investigate block switching as a defense againstadversarial perturbations. We provide analysis on how the switch-ing scheme defends adversarial attacks as well as empirical resultsshowing that a block switching model can decease the fooling ratioof CW attack from 100% to 12.1% . We also illustrate that strongerdefense can be achieved by using more channels at the cost of slightclassification accuracy drop.Block switching is easy to implement which does not requireadditional training data nor information about potential adversary.Also, it has no extra computational complexity than a regular modelin the inference phase since only one channel is used at a time. Inpractice, the parallel channels can be stored distributedly withperiodical updating, which can provide extra protection of themodel that prevents important model information leak.More importantly, BS demonstrates that it is possible to enhancemodel variation yet maintain test accuracy at the same time. Andwe hope this paper can inspire more works toward this direction. EFERENCES [1] Mart´ın Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, JeffreyDean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al.2016. Tensorflow: a system for large-scale machine learning.. In
OSDI , Vol. 16.265–283.[2] Naveed Akhtar and Ajmal Mian. 2018. Threat of adversarial attacks on deeplearning in computer vision: A survey. arXiv preprint arXiv:1801.00553 (2018).[3] Mariusz Bojarski, Davide Del Testa, Daniel Dworakowski, Bernhard Firner, BeatFlepp, Prasoon Goyal, Lawrence D Jackel, Mathew Monfort, Urs Muller, JiakaiZhang, et al. 2016. End to end learning for self-driving cars. arXiv preprintarXiv:1604.07316 (2016).[4] Nicholas Carlini and David Wagner. 2017. Adversarial examples are not easilydetected: Bypassing ten detection methods. In
Proceedings of the 10th ACMWorkshop on Artificial Intelligence and Security . ACM, 3–14.[5] Nicholas Carlini and David Wagner. 2017. Towards evaluating the robustnessof neural networks. In
Security and Privacy (SP), 2017 IEEE Symposium on . IEEE,39–57.[6] Nilaksh Das, Madhuri Shanbhogue, Shang-Tse Chen, Fred Hohman, Li Chen,Michael E Kounavis, and Duen Horng Chau. 2017. Keeping the bad guys out:Protecting and vaccinating deep learning with jpeg compression. arXiv preprintarXiv:1705.02900 (2017).[7] Guneet S. Dhillon, Kamyar Azizzadenesheli, Jeremy D. Bernstein, Jean Kossaifi,Aran Khanna, Zachary C. Lipton, and Animashree Anandkumar. 2018. Stochasticactivation pruning for robust adversarial defense. In
International Conference onLearning Representations . https://openreview.net/forum?id=H1uR4GZRZ[8] Ian Goodfellow, Jonathon Shlens, and Christian Szegedy. 2015. Explaining andHarnessing Adversarial Examples. (2015).[9] Kathrin Grosse, Praveen Manoharan, Nicolas Papernot, Michael Backes, andPatrick McDaniel. 2017. On the (statistical) detection of adversarial examples. arXiv preprint arXiv:1702.06280 (2017).[10] Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A Horowitz,and William J Dally. 2016. EIE: efficient inference engine on compressed deepneural network. In
Computer Architecture (ISCA), 2016 ACM/IEEE 43rd AnnualInternational Symposium on . IEEE, 243–254.[11] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residuallearning for image recognition. In
Proceedings of the IEEE conference on computervision and pattern recognition . 770–778.[12] Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long,Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolu-tional architecture for fast feature embedding. In
Proceedings of the 22nd ACMinternational conference on Multimedia . ACM, 675–678.[13] Alex Krizhevsky and Geoffrey Hinton. 2009.
Learning multiple layers of featuresfrom tiny images . Technical Report. Citeseer.[14] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classifica-tion with deep convolutional neural networks. In
Advances in neural informationprocessing systems . 1097–1105.[15] Yann LeCun. 1998. The MNIST database of handwritten digits. http://yann. lecun.com/exdb/mnist/ (1998).[16] Yann LeCun et al. 2015. LeNet-5, convolutional neural networks.
URL: http://yann.lecun. com/exdb/lenet (2015), 20.[17] Zhe Li, Caiwen Ding, Siyue Wang, Wujie Wen, Youwei Zhuo, Chang Liu, QinruQiu, Wenyao Xu, Xue Lin, Xuehai Qian, et al. 2019. E-RNN: Design optimizationfor efficient recurrent neural networks in FPGAs. In . IEEE, 69–80.[18] Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, andAdrian Vladu. 2017. Towards deep learning models resistant to adversarialattacks. arXiv preprint arXiv:1706.06083 (2017).[19] Jan Hendrik Metzen, Tim Genewein, Volker Fischer, and Bastian Bischoff. 2017.On detecting adversarial perturbations. arXiv preprint arXiv:1702.04267 (2017).[20] Nicolas Papernot, Patrick McDaniel, Somesh Jha, Matt Fredrikson, Z Berkay Celik,and Ananthram Swami. 2016. The limitations of deep learning in adversarialsettings. In
Security and Privacy (EuroS&P), 2016 IEEE European Symposium on .IEEE, 372–387.[21] Nicolas Papernot, Patrick McDaniel, Xi Wu, Somesh Jha, and Ananthram Swami.2016. Distillation as a defense to adversarial perturbations against deep neuralnetworks. In
Security and Privacy (SP), 2016 IEEE Symposium on . IEEE, 582–597.[22] Omkar M Parkhi, Andrea Vedaldi, Andrew Zisserman, et al. 2015. Deep FaceRecognition.. In
BMVC , Vol. 1. 6.[23] Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and RuslanSalakhutdinov. 2014. Dropout: a simple way to prevent neural networks fromoverfitting.
The Journal of Machine Learning Research
15, 1 (2014), 1929–1958.[24] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, DumitruErhan, Ian Goodfellow, and Rob Fergus. 2013. Intriguing properties of neuralnetworks. arXiv preprint arXiv:1312.6199 (2013).[25] Siyue Wang, Xiao Wang, Shaokai Ye, Pu Zhao, and Xue Lin. 2018. Defending DNNAdversarial Attacks with Pruning and Logits Augmentation. In . IEEE, 1144–1148. [26] Siyue Wang, Xiao Wang, Pu Zhao, Wujie Wen, David Kaeli, Peter Chin, andXue Lin. 2018. Defensive dropout for hardening deep neural networks underadversarial attacks. In
Proceedings of the International Conference on Computer-Aided Design . ACM, 71.[27] Xiao Wang, Siyue Wang, Pin-Yu Chen, Yanzhi Wang, Brian Kulis, Xue Lin,and Peter Chin. 2019. Protecting neural networks with hierarchical randomswitching: towards better robustness-accuracy trade-off for stochastic defenses.In
Proceedings of the 28th International Joint Conference on Artificial Intelligence .AAAI Press, 6013–6019.[28] Xiao Wang, Jie Zhang, Tao Xiong, Trac Duy Tran, Sang Peter Chin, and RalphEtienne-Cummings. 2018. Using deep learning to extract scenery information inreal time spatiotemporal compressed sensing. In . IEEE, 1–4.[29] Cihang Xie, Jianyu Wang, Zhishuai Zhang, Zhou Ren, and Alan Yuille. 2017. Miti-gating adversarial effects through randomization. arXiv preprint arXiv:1711.01991 (2017).[30] An Zhao, Kun Fu, Siyue Wang, Jiawei Zuo, Yuhang Zhang, Yanfeng Hu, andHongqi Wang. 2017. Aircraft recognition based on landmark detection in remotesensing images.
IEEE Geoscience and Remote Sensing Letters
14, 8 (2017), 1413–1417.[31] Pu Zhao, Sijia Liu, Yanzhi Wang, and Xue Lin. 2018. An admm-based universalframework for adversarial attacks on deep neural networks. In
Proceedings of the26th ACM international conference on Multimedia . 1065–1073.[32] Pu Zhao, Siyue Wang, Cheng Gongye, Yanzhi Wang, Yunsi Fei, and Xue Lin.2019. Fault sneaking attack: A stealthy framework for misleading deep neuralnetworks. In . IEEE,1–6.. IEEE,1–6.