Beneficial Perturbations Network for Defending Adversarial Examples
BBeneficial Perturbations Network for Defending Adversarial Examples
Shixian Wen , Laurent Itti
University of Southern California3641 Watt WayLos Angeles, California [email protected]
Abstract
Adversarial training, in which a network is trained on bothadversarial and clean examples, is one of the most trusteddefense methods against adversarial attacks. However, thereare three major practical difficulties in implementing and de-ploying this method - expensive in terms of running memoryand computation costs; accuracy trade-off between clean andadversarial examples; cannot foresee all adversarial attacks attraining time. Here, we present a new solution to ease thesethree difficulties - Beneficial perturbation Networks (BPN).BPN generates and leverages beneficial perturbations (some-what opposite to well-known adversarial perturbations) asbiases within the parameter space of the network, to neutral-ize the effects of adversarial perturbations on data samples.Thus, BPN can effectively defend against adversarial exam-ples. Compared to adversarial training, we demonstrate thatBPN can significantly reduce the required running memoryand computation costs, by generating beneficial perturbationsthrough recycling of the gradients computed from training onclean examples. In addition, BPN can alleviate the accuracytrade-off difficulty and the difficulty of foreseeing multiple at-tacks, by improving the generalization of the network, thanksto increased diversity of the training set achieved through neu-tralization between adversarial and beneficial perturbations.
Introduction
Neural networks have lead to a series of breakthroughs inmany fields, such as image classification tasks [10, 3], andnatural language processing [6, 2]. Model performance onclean examples was the main evaluation criterion for theseapplications until the unveiling of weaknesses to adversarialattacks by [20, 1]. Neural networks were shown to be vulner-able to adversarial perturbations: carefully computed smallperturbations added to legitimate clean examples to createso-called "adversarial examples" can cause misclassificationon state-of-the-art machine learning models. For example,consider a task of recognizing handwritten digits "1" versus"2" (Fig. 1 a , a ). Adversarial perturbations aimed at mis-classifying an image of digit 1 as digit 2 may be obtainedby backpropagating from the class digit 2 to the input space, Copyright c (cid:13) following any of the available adversarial directions. In inputspace, adding adversarial perturbations to the input imagecan be viewed as adding an adversarial direction vector (redarrows δ AP ) to the clean (non-perturbated) input image ofdigit 1. The resulting vector crosses the decision boundary.As a consequence, adversarial perturbations can force theneural network into misclassification, here from digit 1 todigit 2. Thus, building a deep learning system that can ro-bustly classify both adversarial examples and clean exampleshas emerged as a critical requirement.Researchers have proposed a number of adversarial de-fense strategies to increase the robustness of deep learningsystems. Adversarial training [8, 11], in which a network istrained on both adversarial examples ( x adv ) and clean exam-ples ( x cln ) with class labels y , is perhaps the most populardefense against adversarial attacks, withstanding strong at-tacks. Adversarial examples are the summation of adversarialperturbations lying inside the input space ( δ AP ) and cleanexamples: x adv = x cln + δ AP . Given a classifier with aclassification loss function L and parameters θ , the objectivefunction of adversarial training is: min θ L ( x adv , x cln , y ; θ ) (1)After adversarial training, the network learns a new de-cision boundary to incorporate both clean and adversarialexamples (Fig. 1 a , a ).Despite the efficacy of adversarial training in buildinga robust system, there are three major practical difficultieswhile implementing and deploying this method. Difficultyone: adversarial training is expensive in terms of runningmemory and computation costs (Fig. 2 a).
Producing an ad-versarial example requires multiple gradient computations.In a practical scenario, we further produce more than oneadversarial examples for each clean example [21]. Most im-plementations need to at least double the amount of runningmemory on GPU, to store those adversarial examples along-side the clean examples. In addition, during adversarial train-ing, the network has to train on both clean and adversarialexamples; hence, adversarial training typically requires atleast twice the computation power than just training on cleanexamples. For example, even on reasonably-sized datasets,such as ImageNet, adversarial training can take multiple days a r X i v : . [ c s . L G ] S e p urrent decision boundaryPrevious decision boundary Adversarial perturbationsBeneficial perturbations (cid:31) (cid:30)(cid:29) (cid:31) (cid:28)(cid:29) Clean examples in input spaceAdversarial examplesin input space (cid:27) (cid:26)(cid:25)(cid:24) (cid:27) (cid:23)(cid:22)(cid:21) (a ) (a ) (a )Input Space Input Space Input Space(b ) (b ) (b )Activation Space Activation representationof clean examplesActivation representationof adversarial examples (cid:27) (cid:26)(cid:25)(cid:24) (cid:27) (cid:23)(cid:22)(cid:21) Activation representationof adversarial examplescouteracted by beneficial perturbations (cid:27) (cid:20)(cid:19)(cid:24)
Activation Space Activation SpaceAdversarialperturbedimages attest time Additionaladversarialtraining needed to counteract adverversarial perturbationsAdversarialperturbedimages attest time Counteractingadversarialperturbationsby alreadylearned beneficialperturbations
Figure 1: Difference between adversarial training ( a - a , input space) and BPN ( b - b , activation space) on recognizinghandwritten digits "1" versus "2". ( a ): After training a model on clean input images, digits "1" and "2" are separated by a purpledecision boundary. ( a ): Adding adversarial perturbations to test input images of digit 1 can be viewed as adding adversarialdirection vectors (red arrows δ AP ) to the clean (non-perturbated) input images. Such adversarial vectors cross the decisionboundary, forcing the neural network into misclassification (here from digit 1 to digit 2). ( a ): Adversarial training: traininga model on both clean and adversarial examples learns a new decision boundary to incorporate both clean and adversarialexamples, but at great computation and running memory cost. ( b ) and ( b ) are similar to ( a ) and ( a ), but they are representedin activation space. ( b ): BPN. Beneficial perturbations are opposite to the effects in activation space of adversarial perturbationsapplied to inputs. Adding beneficial perturbations to the activation representation of adversarial examples can be viewed asadding beneficial direction vectors (green arrows δ BP ) to the representations of adversarial examples of digit 1. The resultingvectors cross the decision boundary and drag the misclassified adversarial examples back to the correct classification region.on a single GPU. [12] used 53 P100 GPUs and [24] used100 V100s for target adversarial training on ImageNet. As aconsequence, although adversarial training remains amongthe most trusted defenses, it has only been within reach ofresearch labs having hundreds of GPUs. Difficulty two: ac-curacy trade-off between clean examples and adversarialexamples - although adversarial training can improve the ro-bustness against adversarial examples, it sometimes hurtsaccuracy on clean examples. Thus, there is an accuracy trade-off between the adversarial examples and clean examples[7, 18, 19, 25]. Because most of the test data in real appli-cations are clean examples, test accuracy on clean examplesshould be as good as possible. Thus, this accuracy trade-off hinders the practical usefulness of adversarial trainingbecause it often ends up lowering performance on clean ex-amples.
Difficulty three: impractical to foresee multipleattacks - even though one might have sufficient computation resources to train a network on both adversarial and cleanexamples, it is unrealistic and expensive to introduce all un-known attack samples into the adversarial training. For exam-ple, [21] proposed Ensemble Adversarial Training which canincrease the diversity of adversarial perturbations in a train-ing set by generating adversarial perturbations transferredfrom other models (they won the competition on Defensesagainst Adversarial Attacks), but again at an extraordinarycomputation and running memory cost. Thus, broad diversityof adversarial examples is crucial for adversarial training.In this paper, we introduce Beneficial Perturbations Net-work (BPN) to address these three difficulties. BPN generatesand leverages beneficial perturbations (somewhat oppositeto well-known adversarial perturbations) as biases withinthe parameter space of the network, to neutralize the effectsof adversarial perturbations on data samples. We evaluatedBPN on three datasets (MNIST, FashionMNIST and TinyIm- lean examplesClassical neural network AGradient ExtractorAdversarial examplesPerturb imageClean examplesClassical adversarial trainingClassical neural network AClasses(a) BPNClean examplesBPNClassesGenerate beneficialperturbations fromthe training ofclean exampleswith negligible costs(b)
Figure 2: Difference in training pipelines between adversar-ial training and BPN to defend against adversarial examples.(a) classical adversarial training has two steps: (1) Generatingadversarial perturbations from corresponding clean examplesand adding adversarial perturbations to the clean examples(creation of adversarial examples). (2) Training the network,usually on both clean and adversarial examples. (b) BPN cre-ates a shortcut with only one step: training on clean examples.The feasibility of shortcut is because BPN can generate ben-eficial perturbations during the training of clean examples,with negligible additional costs. Thus, BPN reduces at leasthalf computation and running memory costs compared to typ-ical adversarial training. The learned beneficial perturbationscan neutralize the effects of adversarial perturbations of thedata samples at test time.ageNet). Our results show:(1) When training only on clean examples (our main usecase scenario), BPN achieved good classification accuracy onadversarial examples, while saving at least half of the com-putation and running memory costs compared to standardadversarial training. The saving is because BPN creates ashortcut (Fig. 2 b) compared to adversarial training (Fig. 2 a).The feasibility of the shortcut is because BPN can generatebeneficial perturbations (which can be used to neutralize theeffects of adversarial perturbations of the data samples attest time) during the training of clean examples. In contrast,adversarial training requires training on both clean and adver-sarial examples. For example, on TinyImageNet dataset, BPNachieved 53.29% accuracy on adversarial examples, 3575%better than the performance of classical network (baseline)trained on clean examples only.(2) When slightly more computation is available, one canalso train BPN on adversarial examples only. BPN alleviatedthe accuracy trade-off by increasing the diversity of the train-ing set. BPN boosted the classification accuracy on both cleanand adversarial examples because the diversification of the training set - the extracted beneficial perturbations would con-vert some adversarial examples into clean examples throughthe neutralization (Eqn. 2). For quantitative results, on Tiny-ImageNet dataset, BPN achieved 20.69% and 79.92% correctclassification on clean and adversarial examples, 10.8% and16.63% better than the performance of the classical network(baseline) trained on adversarial examples only.(3) When more computation and GPU memory are avail-able, BPN can be trained on both clean and adversarial ex-amples. In this case, BPN improved the generalization ofthe network through the diversification of the training set.Thus, it improved classification accuracy on both clean andadversarial examples. For example, on TinyImageNet dataset,BPN achieved 66.84% and 88.16% correct classification onclean and adversarial examples, 0.4% and 2.81% better thanthe performance of classical network (baseline) trained onboth clean and adversarial examples.
Related Work and background information
Beneficial perturbations for overcomingcatastrophic forgetting: [22] used bias units to store the beneficial perturbations (op-posite to the well-known adversarial perturbations). [22]showed that the beneficial perturbations, stored inside task-dependent bias units, can bias the network outputs toward thecorrect classification region for each task, allowing a singleneural network to have multiple input to output mappings.Multiple input to output mappings alleviate the catastrophicforgetting problem [16] in sequential learning scenarios (apreviously learned mapping of an old task is erased duringlearning of a new mapping for a new task).
Beneficial Perturbation Network (BPN)
Understanding beneficial perturbations
To understand beneficial perturbations in more detail, we firstrevisit the meaning of adversarial perturbations in adversarialexamples. With adversarial examples [20], it has been shownthat noise patterns (calculated from a specific class) added toinput images can bias a network to misclassify the perturbedinput images into that specific class. Here, we leverage thisidea, but, instead of adding input "noise" (adversarial pertur-bations) calculated from other classes to force the networkinto misclassification, we add "noise" (beneficial perturba-tions) calculated from the input’s own correct class to assistcorrect classification. Instead of adding perturbations to theinput images, during the training of clean examples, we up-date the bias term in each layer of the neural network to storethe beneficial perturbations by recycling gradient informationalready computed. During testing on adversarial examples,the stored beneficial perturbations can neutralize adversarialperturbations on data samples. Thus, BPN can make correctclassification on adversarial examples. For example, in acti-vation space (Fig. 1 b , b ), consider a task of recognizinghandwritten digits "1" versus "2". Adding beneficial perturba-tions to the activation representation of adversarial examplescan be viewed as adding an beneficial direction vector (greenarrows δ BP ) to the adversarial examples of digit 1. The re-sulting vector crosses the decision boundary and drag the eature extractionInputsclean examples convolutional and non-linear layer Fully connected layer OutputsBeneficialperturbation bias b BP NeuronClassifier
Figure 4: BPN extension to deep convolutional neural network. Deep convolutional neural network are made with two parts:feature extraction part (convolutional and non-linear layers) and classifier part (fully connected layer). We introduce beneficialperturbation bias (replace the normal bias term) to the last few fully connected layers of the deep convolutional network andupdate them using FGSM.misclassified adversarial examples back to the correct classi-fication region. Thus, the beneficial perturbations neutralizethe effects of the adversarial perturbations and recover theclean examples x cln ≈ x cln + δ AP + δ BP (2)since δ AP and δ BP cancel out. As a result, instead of up-dating the decision boundary by training on both clean andadversarial examples, BPN can correctly classify both cleanand adversarial examples by training only on clean exampleswithout updating its decision boundary. V i-1 V i b i-1 W i-1 V i = W i-1 V i-1 + b i-1 V i = W i-1 V i-1 + (cid:31) (cid:30)(cid:29) i-1 V i-1 V i W i-1 (cid:31) (cid:30)(cid:29) i-1 V i-1 V i W i-1 (cid:31) (cid:30)(cid:29) i-1 Grad d (cid:31) (cid:30)(cid:29) i-1 = Ɛ sign( ∑ Grad )db i-1 = ∑ Grad V i-1 V i b i-1 W i-1 Grad ForwardBackward Normal Network BPNi - layer i W - Weight
V - Activations b - Bias (cid:31) (cid:30)(cid:29) (cid:28)(cid:27) (cid:26)(cid:25)(cid:24)(cid:25)(cid:23)(cid:22)(cid:21)(cid:22)(cid:20)(cid:19)(cid:27)(cid:18)(cid:25)(cid:17)(cid:16)(cid:15)(cid:17)(cid:26)(cid:20)(cid:16)(cid:22)(cid:14)(cid:24)(cid:13)(cid:27)(cid:26)(cid:22)(cid:20)(cid:13)
Grad - Gradients from next layer db - Gradients for bias d (cid:31) (cid:30)(cid:29) (cid:28) Gradients for beneficial perturbation bias
Figure 3: Structure difference between normal network (base-line) and BPN for forward (a-b) and backward pass (c-d).(a) Forward rules of normal network (baseline). (b) Forwardrules of BPN. Beneficial perturbation bias ( b i − BP ) is the sameas normal bias ( b i − ) in forward pass. (c) Backward rules ofnormal network (baseline). We only demonstrated the updaterules for normal bias term. (d) Backward rules of BPN. Thedifference is that we update the beneficial perturbations biasterm using FGSM. Formulation of beneficial perturbations
Beneficial perturbations are formulated as an additive contri-bution to each layer’s weighted activations (Fig. 3 b): V i = W i − V i − + b i − BP (3)where W i − , V i − and b i − BP is the weight, activation andbeneficial perturbation bias at layer i − . Beneficial pertur-bation bias has the same structure as the normal bias term b (Fig. 3 a), but it is used to store the beneficial perturbations( δ BP ).Instead of adding input "noise" (adversarial perturbations)to input space calculated from other classes to force the net-work into misclassification, we can add "noise" (beneficialperturbations δ BP ) to the activation space calculated by theinput’s own correct class to assist in correct classification.Thus, the beneficial perturbations at each layer i is obtainedby updating beneficial perturbation bias using the Fast Gradi-ent Sign Method (FGSM; [8]) with the input’s own correctclass: b iBP = b iBP + η db iBP (4) db iBP = (cid:15) sign ( ∇ b iBP L ( b iBP , y true , θ )) (5)where η is the learning rate, b iBP is the beneficial perturbationbias, db iBP is the gradient for beneficial perturbation bias, (cid:15) is a hyperparameter to decide how strong we go to the FastGradient Sign direction, y true is the true label (input’s owncorrect class), θ is the parameters of neural network.Coincidentally, we always use the input’s own correct classto train a neural network. Thus, we can generate the gradientfor beneficial perturbations at layer i − by recycling thecomputed gradients while we are training the network (Fig. 3d). We use Eqn. 6 to replace the Eqn. 5: db i − BP = (cid:15) sign ( (cid:88) Grad ) (6)where, db i − BP is the gradient for beneficial perturbations bias, Grad is the gradient from next layer i , (cid:15) is same as Eqn. 5.hus, to generate beneficial perturbations, we do not intro-duce any extra computation costs beyond FGSM. The for-ward (backward) pass computation costs of BPN are only0.00% (0.006%) more than the costs of classical networktraining on the clean examples (Tab. 1). This feature enablesBPN to save at least half of computation and running memorycosts compared to standard adversarial training (Fig. 2). network Computationcost Forward(FLOPS) Backward(FLOPS)Classical Network(ResNet-50) 51,112,224 51,112,225BPN (ResNet-50) 51,112,224 51,115,321 Table 1: Computation costs of BPN trained on clean examplescompared to a classical network trained on clean examples onRestNet-50. For forward (backward) pass, The computationcosts of BPN are 0.00% (0.006%) more than the classicalnetwork.
Extend the BPN to deep convolutional network
Most deep convolutional neural networks are made with twoparts: a feature extraction part (convolutional and non-liearlayers) and a classifier (fully connected layer). Here, we in-troduce beneficial perturbations bias ( b BP ) to the last fewfully connected layers of the deep convolutional network, re-placing the normal bias term (Fig. 4). We use FGSM (Eqn. 6)to update those beneficial perturbation biases.In summary, through the training of clean examples, BPNgenerates beneficial perturbations as biases within the lastfew fully connected layer of the deep neural network. BPNleverages these beneficial perturbations to defend againstfuture adversarial examples by neutralizing the effects ofadversarial perturbations in the datasets. In addition, if BPNis trained on adversarial examples alone or on a combinationof adversarial and clean examples, the neutralization can di-versify the training set by converting adversarial examples toclean examples. As a consequence, the diversification furtherimproves the generalization of the BPN. The generalizationeases the difficulties of accuracy trade-off and impracticabil-ity to foresee multiple attacks. Experiments
Datasets
MNIST.
MNIST [14] is a dataset with handwritten digits, hasa training set of 60,000 examples, and a test set of 10,000 ex-amples.
FashionMnist.
FashionMnist [23] is a dataset witharticle images, has a training set of 60,000 examples, and atest set of 10,000 examples.
TinyImageNet.
TinyImageNetis a subset of the ImageNet [4] - a large visual datasets. Tiny-ImageNet consists of 200 classes and has a training set of100k examples, and a test set of 10k examples.
Network structure
For MNIST and FashionMNIST (LeNet).
We use the con-volutional and non-linear layers of LeNet as feature extrac-tion part [14] (classical LeNet). Then, for classifier part, we create our version of LeNet (LeNet with beneficial perturba-tion bias) by adding beneficial perturbation bias into the fullyconnected layers, replacing the normal bias.
For TinyIma-geNet (ResNet-18).
We use the convolutional and non-linearlayers of ResNet-18 [9] as feature extraction part (classi-cal ResNet-18). Then, We use three fully connected layerwith 1028 hidden units as classifier. We create our versionof ResNet-18 (ResNet-18 with beneficial perturbation bias)by adding beneficial perturbation bias into the fully con-nected layers, replacing the normal bias. We trained the BPN(ResNet-18) with 5000 epoches on TinyImageNet.
Various attack methods
To demonstrate how BPN can successfully defend againsta broad range of adversarial attacks, we tested our BPNstructure on adversarial examples generated from variousattack methods: (1,2,3)
PGD Linf, L2, L1 [15]: Projected Gradient DescendAttack with order = Linf, L2, L1. (4)
Basic Iterative AttackL2 [13]: Perturbing the input with gradient of the loss withrespect to the input and with several step for each epsilon. (5)
FGSM [8]: One step fast gradient sign method. (6)
AkaBasic Iterative Attack [13]: Like GradientSignAttack, butwith several steps for each epsilon.
Results
BPN can defend adversarial examples withadditional negligible computation costs
When the neural network can only be trained on clean ex-amples because of modest computation power, BPN canachieve much better test accuracy on adversarial examplesthan the classical network (baseline, Tab. 2, MNIST: 98.88%vs. 18.08%, FashionMNIST: 54.07% vs. 11.87%, TinyIma-geNet: 53.29% vs. 1.45% ). Thus, for companies with modestcomputation resources, BPN can help a system achieve mod-erate robustness against adversarial examples, while onlyintroducing additional negligible computation costs (e.g., onFashionMNIST, our method only uses 59% training timecompared to the training time of adversarial training withjust one adversarial example per clean example, saving 43.51minutes training time for 500 training epochs on an NVIDIATesla-V100 platform (We only introduced one Sign and onemultiplication operation for each fully connected layer Tab. 1.The reason of extra 9% costs is because we create a customlayer in Pytorch framework. The framework introduces alot of overheads for custom layer. It should be greatly im-proved if the custom layers is incorporated in the Pytorchframework with C++ implementation. This would eliminatethe overheads for custom layer and reduce the extra costs to0.006%). The saving would be huge on a larger dataset suchas Imagenet [5]).
BPN can alleviate the accuracy trade-off throughthe diversification of the training set
BPN can alleviate the accuracy trade-off difficulty and in-crease the classification accuracy for both clean and adversar-ial examples. Although training a classical neural network Aon adversarial examples can achieve a high test accuracy onable 2: Training on clean examples for BPN and classicalnetwork (CN). Testing on clean examples (Cln Ex) and ad-versarial examples (Adv Ex) (generated by FGSM, (cid:15) = 0.3for MNIST, FashionMNIST and TinyImageNet). CN doespoorly on adversarial examples. While, BPN can successfullydefend adversarial examples.
Datasets &network structure MNISTLeNet FasMNISTLeNet TinyImageNetResNet-18Cln Ex BPN
Adv Ex BPN
CN 18.08 11.87 1.45 adversarial examples (Tab. 3, MNIST: 99.01%, FashionM-NIST: 91.49%, TinyImageNet: 68.52%), it hurts test accuracyon clean examples. Compared to a classical network B trainedon clean examples, the test accuracy on clean examples ofclassical network A decreases from 99.01%, 89.17%, 64.30%(Tab. 2) to 95.54%, 65,64%, 18.67% (Tab. 3) for MNIST,FashionMNIST and ImageNet datasets. In comparison toclassical network B, by training BPN only on adversarialexamples, BPN not only achieves better test accuracy onadversarial examples (Tab. 3, MNIST: 99.27%, FashionM-NIST: 92.07%, TinyImageNet: 79.92%), but also achievesa better test accuracy on clean examples (Tab. 3, MNIST:97.32%, FashionMNIST: 71.54%, TinyImageNet: 20.69%).This accuracy on clean examples is still worse than that ofclassical network B (training only on clean examples), butit is much better than the accuracy of classical network A(training only on adversarial examples). The reason is thatbeneficial perturbations would convert some adversarial ex-amples into clean examples because of the neutralization(Eqn. 2). As a consequence, the increased diversity of theclean examples improves the generalization of BPN.Table 3: Training on adversarial examples for BPN and clas-sical network (CN). Testing on clean examples (Cln Ex) andadversarial examples (Adv Ex) (generated by FGSM, (cid:15) =0.3 for MNIST , FashionMNIST and ImageNet). BPN canachieve a better classification accuracy on clean examplesthan CN.
Datasets &network structure MNISTLeNet FasMNISTLeNet TinyImageNetResNet-18Cln Ex BPN
CN 95.54 65.64 18.67Adv Ex BPN
CN 99.01 91.49 68.52
BPN can improve generalization throughdiversification of the training set
Training a classical network on both clean and adversarial ex-amples can achieve good test accuracy on both clean and ad-versarial examples. By diversifying the training set, BPN canachieve even better test accuracy. BPN can achieve slightlyhigher accuracy on clean examples than the classical network (Tab. 4 MNIST 99.13% vs. 99.09%, FashionMNIST 89.65%vs. 89.49%, TinyImageNet 66.84% vs. 66.56% ). In addition,BPN can achieve higher accuracy on adversarial examplesthan classical Network MNIST 97.62% vs. 97.01%, Fash-ionMNIST 95.39% vs. 94.98%, TinyImageNet 88.16% vs.85.75%). The reason is that the generalization of BPN can beimproved through diversification of the training set caused byneutralization (Eqn. 2). BPN and classical network can bothachieve high accuracy in this scenario. However, we shouldnormally avoid this scenario because training on both cleanand adversarial examples is expensive in terms of runningmemory and computation costs.Table 4: Training on both clean and adversarial examples forBPN and classical network (CN).Testing on clean examples(Cln Ex) and adversarial examples (Adv Ex) (generated byFGSM, (cid:15) = 0.3 for MNIST, FashionMNIST and ImageNet).Both BPN and CN can achieve a high classification accu-racy. However, we should avoid this scenario because of theexpensive running memory and computation costs.
Datasets &network structure MNISTLeNet FasMNISTLeNet TinyImageNetResNet-18Cln Ex BPN
CN 99.09 89.49 66.56Adv Ex BPN
CN 97.01 94.98 85.75
Influence of adversarial perturbation budget
The higher the adversarial perturbation budget, the higher thechance it can successfully attack a neural network. However,attacks with higher adversarial perturbation budgets are easierto detect by a program or by humans. For example, (cid:15) = 0 . (Fig. 5a) represents very high noise, which makes FashionM-NIST images difficult to classify, even by humans. But thedistribution differences between the adversarial examples andclean examples are so large that they can be easily capturedby defense programs. Thus, (cid:15) ≤ . is a good attack sincethe differences caused by adversarial perturbations are toosmall to be detected by most defense programs. For smalladversarial perturbations (Fig. 5b (cid:15) ≤ . ), by just trainingon clean images, BPN achieves moderate robustness againstadversarial examples with negligible costs. Thus, it is reallybeneficial to adapt our method for companies with modestcomputation power, who still want to achieve moderate ro-bustness against adversarial examples. BPN can successfully defend various attackmethods
When the neural network can only be trained on clean exam-ples because of modest computation power, we trained BPNon clean examples and tested on adversarial examples gener-ated by various attack methods discussed in the experimentssection.From Tab. 5, BPN can successfully defend various attackmethods. Particularly, BPN can achieve better classificationaccuracy than classical network for PGD Linf, PGD L2, PGDable 5: Training on clean examples of MNIST and TinyImageNet for BPN and classical network (CN). Testing on adversarialexamples generated by a variety of adversarial attack methods. BPN can successfully defend those adversarial examples.Dataset& network Attacks PGD Linf PGD L2 PGD L1 Basic IterativeAttack L2 Aka BasicIterative Attack FGSMMNIST BPN
CN 2.18 97.26
CN 0.00 15.11 15.12 15.11 0.5 1.29 A cc u r a cy on A d v e r s a r i a l e x a m p l e s ɛ value in Fast Gradient Sign Method (b) MNIST datasetFashionMNIST dataset
Adversarial examples ɛ =0.3 (a) Figure 5: (a) Adversarial example with high adversarial per-turbation budget ( (cid:15) = 0.3). (b) Test accuracy on adversarialexamples after training BPN only on clean examples fromMNIST (Blue) or FashionMNIST (Orange) datasets.L1, Basic Iterative Attack L2, Aka Basic iterative Attack andFGSM. However, despite the excellent performance of BPNon PGD Linf and FGSM, the performances on PGD L2, PGDL1, Basic Iterative Attack L2 and Aka Basic Iterative Attackare moderate. We will discuss how to further improve BPNon a variety of attack methods in the discussion section.
Discussion
We proposed a new solution for defending against adversarialexamples, which we refer to as Beneficial Perturbation Net-work (BPN). BPN, for the first time, leverages the beneficialperturbations (opposite to well-known adversarial perturba-tions) to counteract the effects of adversarial perturbationsinput data. Compared to adversarial training, this approachintroduces there main advantages - (1) We demonstrated thatBPN can effectively defend adversarial examples with negli-gible additional running memory and computation costs. (2)We demonstrated that BPN can alleviate the accuracy trade-off by increasing the diversity of the training set. (3) Further,we demonstrated the increased diversity of the training setcan improve generalization of the network by converting some adversarial examples into clean examples.
Beneficial perturbations: the opposite "twins" of ad-versarial perturbations
Beneficial perturbations can beviewed as the opposite "twins" of adversarial perturbations.Much research is underway on how to generate more andmore advanced adversarial perturbations [15, 13, 20, 17, 8,13] to fool the more and more sophisticated machine learningsystems. However, there is a little research [22] on how togenerate beneficial perturbations and possible applications ofbeneficial perturbations.
Future Beneficial Perturbations research
In this re-search, we use one of the easiest method (FGSM) availablein the adversarial perturbations world [8] to generate bene-ficial perturbations. We demonstrated that using beneficialperturbations can effectively defend adversarial examples byneutralizing the effects of adversarial perturbations of datasamples. More research could be done to improve BPN - (1) Updating rules of Beneficial perturbations: other thanFGSM implemented in this paper, one could use other meth-ods (e.g., PGD) to update the beneficial perturbation bias. Asa consequence, BPN might be more robust to various kindsof adversarial examples. (2) Network structure for storingand generating beneficial perturbations: in this paper, weused beneficial perturbation biases to store and generate thebeneficial perturbations. The structure of the beneficial per-turbation biases is the same as normal bias. Then, we replacethe normal bias term of the last few fully connected layersof deep convolutional networks with our beneficial perturba-tion bias. This approach might not be the optimal structureto store and generate beneficial perturbations. It might bebetter to embed the beneficial perturbations into the earlierconvolutional and non-linear layers of the deep convolutionalnetwork to store and generate those beneficial perturbationsmore effectively and efficiently.
Acknowledgment
This work was supported by the National Science Foundation(grant number CCF-1317433), C-BRIC (one of six centersin JUMP, a Semiconductor Research Corporation (SRC) pro-gram sponsored by DARPA), and the Intel Corporation. Theauthors affirm that the views expressed herein are solely theirown, and do not represent the views of the United Statesgovernment or any agency thereof.
References [1] Biggio, B.; Corona, I.; Maiorca, D.; Nelson, B.; Šrndi´c,N.; Laskov, P.; Giacinto, G.; and Roli, F. 2013. Evasionttacks against machine learning at test time. In
JointEuropean conference on machine learning and knowledgediscovery in databases , 387–402. Springer.[2] Brown, T. B.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan,J.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.;Askell, A.; et al. 2020. Language models are few-shotlearners. arXiv preprint arXiv:2005.14165 .[3] Chen, X.; Fan, H.; Girshick, R.; and He, K. 2020. Im-proved baselines with momentum contrastive learning. arXiv preprint arXiv:2003.04297 .[4] Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; andFei-Fei, L. 2009a. ImageNet: A Large-Scale HierarchicalImage Database. In
CVPR09 .[5] Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; andFei-Fei, L. 2009b. Imagenet: A large-scale hierarchicalimage database. In , 248–255. Ieee.[6] Devlin, J.; Chang, M.-W.; Lee, K.; and Toutanova, K.2018. Bert: Pre-training of deep bidirectional trans-formers for language understanding. arXiv preprintarXiv:1810.04805 .[7] Di, X.; Yu, P.; and Tian, M. 2018. Towards ad-versarial training with moderate performance improve-ment for neural network classification. arXiv preprintarXiv:1807.00340 .[8] Goodfellow, I. J.; Shlens, J.; and Szegedy, C. 2014.Explaining and harnessing adversarial examples. arXivpreprint arXiv:1412.6572 .[9] He, K.; Zhang, X.; Ren, S.; and Sun, J. 2015. Deep resid-ual learning for image recognition.
CoRR abs/1512.03385.[10] He, K.; Zhang, X.; Ren, S.; and Sun, J. 2016. Deepresidual learning for image recognition. In
Proceedingsof the IEEE conference on computer vision and patternrecognition , 770–778.[11] Huang, R.; Xu, B.; Schuurmans, D.; and Szepesvári,C. 2015. Learning with a strong adversary.
CoRR abs/1511.03034.[12] Kannan, H.; Kurakin, A.; and Goodfellow, I. 2018. Ad-versarial logit pairing. arXiv preprint arXiv:1803.06373 .[13] Kurakin, A.; Goodfellow, I.; and Bengio, S. 2016. Ad-versarial examples in the physical world. arXiv preprintarXiv:1607.02533 .[14] LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P.; et al.1998. Gradient-based learning applied to document recog-nition.
Proceedings of the IEEE arXiv preprint arXiv:1706.06083 .[16] McCloskey, M., and Cohen, N. J. 1989. Catastrophicinterference in connectionist networks: The sequentiallearning problem. In
Psychology of learning and motiva-tion , volume 24. Elsevier. 109–165. [17] Narodytska, N., and Kasiviswanathan, S. P. 2016. Sim-ple black-box adversarial perturbations for deep networks. arXiv preprint arXiv:1612.06299 .[18] Raghunathan, A.; Xie, S. M.; Yang, F.; Duchi, J. C.;and Liang, P. 2019. Adversarial training can hurt general-ization. arXiv preprint arXiv:1906.06032 .[19] Stanforth, R.; Fawzi, A.; Kohli, P.; et al. 2019. Are la-bels required for improving adversarial robustness? arXivpreprint arXiv:1905.13725 .[20] Szegedy, C.; Zaremba, W.; Sutskever, I.; Bruna, J.;Erhan, D.; Goodfellow, I.; and Fergus, R. 2013. In-triguing properties of neural networks. arXiv preprintarXiv:1312.6199 .[21] Tramèr, F.; Kurakin, A.; Papernot, N.; Goodfellow, I.;Boneh, D.; and McDaniel, P. 2017. Ensemble adver-sarial training: Attacks and defenses. arXiv preprintarXiv:1705.07204 .[22] Wen, S., and Itti, L. 2019. Beneficial perturbationnetwork for continual learning.
CoRR abs/1906.10528.[23] Xiao, H.; Rasul, K.; and Vollgraf, R. 2017. Fashion-mnist: a novel image dataset for benchmarking machinelearning algorithms. arXiv preprint arXiv:1708.07747 .[24] Xie, C.; Wu, Y.; Maaten, L. v. d.; Yuille, A. L.; and He,K. 2019. Feature denoising for improving adversarialrobustness. In
Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition , 501–509.[25] Zhang, H.; Yu, Y.; Jiao, J.; Xing, E. P.; Ghaoui, L. E.;and Jordan, M. I. 2019. Theoretically principled trade-off between robustness and accuracy. arXiv preprintarXiv:1901.08573arXiv preprintarXiv:1901.08573