[PDF] Beneficial Perturbations Network for Defending Adversarial Examples

Abstract

Deep neural networks can be fooled by adversarial attacks: adding carefully computed small adversarial perturbations to clean inputs can cause misclassification on state-of-the-art machine learning models. The reason is that neural networks fail to accommodate the distribution drift of the input data caused by adversarial perturbations. Here, we present a new solution - Beneficial Perturbation Network (BPN) - to defend against adversarial attacks by fixing the distribution drift. During training, BPN generates and leverages beneficial perturbations (somewhat opposite to well-known adversarial perturbations) by adding new, out-of-network biasing units. Biasing units influence the parameter space of the network, to preempt and neutralize future adversarial perturbations on input data samples. To achieve this, BPN creates reverse adversarial attacks during training, with very little cost, by recycling the training gradients already computed. Reverse attacks are captured by the biasing units, and the biases can in turn effectively defend against future adversarial examples. Reverse attacks are a shortcut, i.e., they affect the network's parameters without requiring instantiation of adversarial examples that could assist training. We provide comprehensive empirical evidence showing that 1) BPN is robust to adversarial examples and is much more running memory and computationally efficient compared to classical adversarial training. 2) BPN can defend against adversarial examples with negligible additional computation and parameter costs compared to training only on clean examples; 3) BPN hurts the accuracy on clean examples much less than classic adversarial training; 4) BPN can improve the generalization of the network 5) BPN trained only with Fast Gradient Sign Attack can generalize to defend PGD attacks.

Full PDF

BBeneﬁcial Perturbations Network for Defending Adversarial Examples

Shixian Wen , Laurent Itti

University of Southern California3641 Watt WayLos Angeles, California [email protected]

Abstract

Adversarial training, in which a network is trained on bothadversarial and clean examples, is one of the most trusteddefense methods against adversarial attacks. However, thereare three major practical difﬁculties in implementing and de-ploying this method - expensive in terms of running memoryand computation costs; accuracy trade-off between clean andadversarial examples; cannot foresee all adversarial attacks attraining time. Here, we present a new solution to ease thesethree difﬁculties - Beneﬁcial perturbation Networks (BPN).BPN generates and leverages beneﬁcial perturbations (some-what opposite to well-known adversarial perturbations) asbiases within the parameter space of the network, to neutral-ize the effects of adversarial perturbations on data samples.Thus, BPN can effectively defend against adversarial exam-ples. Compared to adversarial training, we demonstrate thatBPN can signiﬁcantly reduce the required running memoryand computation costs, by generating beneﬁcial perturbationsthrough recycling of the gradients computed from training onclean examples. In addition, BPN can alleviate the accuracytrade-off difﬁculty and the difﬁculty of foreseeing multiple at-tacks, by improving the generalization of the network, thanksto increased diversity of the training set achieved through neu-tralization between adversarial and beneﬁcial perturbations.

Introduction

Neural networks have lead to a series of breakthroughs inmany ﬁelds, such as image classiﬁcation tasks [10, 3], andnatural language processing [6, 2]. Model performance onclean examples was the main evaluation criterion for theseapplications until the unveiling of weaknesses to adversarialattacks by [20, 1]. Neural networks were shown to be vulner-able to adversarial perturbations: carefully computed smallperturbations added to legitimate clean examples to createso-called "adversarial examples" can cause misclassiﬁcationon state-of-the-art machine learning models. For example,consider a task of recognizing handwritten digits "1" versus"2" (Fig. 1 a , a ). Adversarial perturbations aimed at mis-classifying an image of digit 1 as digit 2 may be obtainedby backpropagating from the class digit 2 to the input space, Copyright c (cid:13) following any of the available adversarial directions. In inputspace, adding adversarial perturbations to the input imagecan be viewed as adding an adversarial direction vector (redarrows δ AP ) to the clean (non-perturbated) input image ofdigit 1. The resulting vector crosses the decision boundary.As a consequence, adversarial perturbations can force theneural network into misclassiﬁcation, here from digit 1 todigit 2. Thus, building a deep learning system that can ro-bustly classify both adversarial examples and clean exampleshas emerged as a critical requirement.Researchers have proposed a number of adversarial de-fense strategies to increase the robustness of deep learningsystems. Adversarial training [8, 11], in which a network istrained on both adversarial examples ( x adv ) and clean exam-ples ( x cln ) with class labels y , is perhaps the most populardefense against adversarial attacks, withstanding strong at-tacks. Adversarial examples are the summation of adversarialperturbations lying inside the input space ( δ AP ) and cleanexamples: x adv = x cln + δ AP . Given a classiﬁer with aclassiﬁcation loss function L and parameters θ , the objectivefunction of adversarial training is: min θ L ( x adv , x cln , y ; θ ) (1)After adversarial training, the network learns a new de-cision boundary to incorporate both clean and adversarialexamples (Fig. 1 a , a ).Despite the efﬁcacy of adversarial training in buildinga robust system, there are three major practical difﬁcultieswhile implementing and deploying this method. Difﬁcultyone: adversarial training is expensive in terms of runningmemory and computation costs (Fig. 2 a).

Producing an ad-versarial example requires multiple gradient computations.In a practical scenario, we further produce more than oneadversarial examples for each clean example [21]. Most im-plementations need to at least double the amount of runningmemory on GPU, to store those adversarial examples along-side the clean examples. In addition, during adversarial train-ing, the network has to train on both clean and adversarialexamples; hence, adversarial training typically requires atleast twice the computation power than just training on cleanexamples. For example, even on reasonably-sized datasets,such as ImageNet, adversarial training can take multiple days a r X i v : . [ c s . L G ] S e p urrent decision boundaryPrevious decision boundary Adversarial perturbationsBeneficial perturbations (cid:31) (cid:30)(cid:29) (cid:31) (cid:28)(cid:29) Clean examples in input spaceAdversarial examplesin input space (cid:27) (cid:26)(cid:25)(cid:24) (cid:27) (cid:23)(cid:22)(cid:21) (a ) (a ) (a )Input Space Input Space Input Space(b ) (b ) (b )Activation Space Activation representationof clean examplesActivation representationof adversarial examples (cid:27) (cid:26)(cid:25)(cid:24) (cid:27) (cid:23)(cid:22)(cid:21) Activation representationof adversarial examplescouteracted by beneficial perturbations (cid:27) (cid:20)(cid:19)(cid:24)

Activation Space Activation SpaceAdversarialperturbedimages attest time Additionaladversarialtraining needed to counteract adverversarial perturbationsAdversarialperturbedimages attest time Counteractingadversarialperturbationsby alreadylearned beneficialperturbations

Figure 1: Difference between adversarial training ( a - a , input space) and BPN ( b - b , activation space) on recognizinghandwritten digits "1" versus "2". ( a ): After training a model on clean input images, digits "1" and "2" are separated by a purpledecision boundary. ( a ): Adding adversarial perturbations to test input images of digit 1 can be viewed as adding adversarialdirection vectors (red arrows δ AP ) to the clean (non-perturbated) input images. Such adversarial vectors cross the decisionboundary, forcing the neural network into misclassiﬁcation (here from digit 1 to digit 2). ( a ): Adversarial training: traininga model on both clean and adversarial examples learns a new decision boundary to incorporate both clean and adversarialexamples, but at great computation and running memory cost. ( b ) and ( b ) are similar to ( a ) and ( a ), but they are representedin activation space. ( b ): BPN. Beneﬁcial perturbations are opposite to the effects in activation space of adversarial perturbationsapplied to inputs. Adding beneﬁcial perturbations to the activation representation of adversarial examples can be viewed asadding beneﬁcial direction vectors (green arrows δ BP ) to the representations of adversarial examples of digit 1. The resultingvectors cross the decision boundary and drag the misclassiﬁed adversarial examples back to the correct classiﬁcation region.on a single GPU. [12] used 53 P100 GPUs and [24] used100 V100s for target adversarial training on ImageNet. As aconsequence, although adversarial training remains amongthe most trusted defenses, it has only been within reach ofresearch labs having hundreds of GPUs. Difﬁculty two: ac-curacy trade-off between clean examples and adversarialexamples - although adversarial training can improve the ro-bustness against adversarial examples, it sometimes hurtsaccuracy on clean examples. Thus, there is an accuracy trade-off between the adversarial examples and clean examples[7, 18, 19, 25]. Because most of the test data in real appli-cations are clean examples, test accuracy on clean examplesshould be as good as possible. Thus, this accuracy trade-off hinders the practical usefulness of adversarial trainingbecause it often ends up lowering performance on clean ex-amples.

Difﬁculty three: impractical to foresee multipleattacks - even though one might have sufﬁcient computation resources to train a network on both adversarial and cleanexamples, it is unrealistic and expensive to introduce all un-known attack samples into the adversarial training. For exam-ple, [21] proposed Ensemble Adversarial Training which canincrease the diversity of adversarial perturbations in a train-ing set by generating adversarial perturbations transferredfrom other models (they won the competition on Defensesagainst Adversarial Attacks), but again at an extraordinarycomputation and running memory cost. Thus, broad diversityof adversarial examples is crucial for adversarial training.In this paper, we introduce Beneﬁcial Perturbations Net-work (BPN) to address these three difﬁculties. BPN generatesand leverages beneﬁcial perturbations (somewhat oppositeto well-known adversarial perturbations) as biases withinthe parameter space of the network, to neutralize the effectsof adversarial perturbations on data samples. We evaluatedBPN on three datasets (MNIST, FashionMNIST and TinyIm- lean examplesClassical neural network AGradient ExtractorAdversarial examplesPerturb imageClean examplesClassical adversarial trainingClassical neural network AClasses(a) BPNClean examplesBPNClassesGenerate beneficialperturbations fromthe training ofclean exampleswith negligible costs(b)

Figure 2: Difference in training pipelines between adversar-ial training and BPN to defend against adversarial examples.(a) classical adversarial training has two steps: (1) Generatingadversarial perturbations from corresponding clean examplesand adding adversarial perturbations to the clean examples(creation of adversarial examples). (2) Training the network,usually on both clean and adversarial examples. (b) BPN cre-ates a shortcut with only one step: training on clean examples.The feasibility of shortcut is because BPN can generate ben-eﬁcial perturbations during the training of clean examples,with negligible additional costs. Thus, BPN reduces at leasthalf computation and running memory costs compared to typ-ical adversarial training. The learned beneﬁcial perturbationscan neutralize the effects of adversarial perturbations of thedata samples at test time.ageNet). Our results show:(1) When training only on clean examples (our main usecase scenario), BPN achieved good classiﬁcation accuracy onadversarial examples, while saving at least half of the com-putation and running memory costs compared to standardadversarial training. The saving is because BPN creates ashortcut (Fig. 2 b) compared to adversarial training (Fig. 2 a).The feasibility of the shortcut is because BPN can generatebeneﬁcial perturbations (which can be used to neutralize theeffects of adversarial perturbations of the data samples attest time) during the training of clean examples. In contrast,adversarial training requires training on both clean and adver-sarial examples. For example, on TinyImageNet dataset, BPNachieved 53.29% accuracy on adversarial examples, 3575%better than the performance of classical network (baseline)trained on clean examples only.(2) When slightly more computation is available, one canalso train BPN on adversarial examples only. BPN alleviatedthe accuracy trade-off by increasing the diversity of the train-ing set. BPN boosted the classiﬁcation accuracy on both cleanand adversarial examples because the diversiﬁcation of the training set - the extracted beneﬁcial perturbations would con-vert some adversarial examples into clean examples throughthe neutralization (Eqn. 2). For quantitative results, on Tiny-ImageNet dataset, BPN achieved 20.69% and 79.92% correctclassiﬁcation on clean and adversarial examples, 10.8% and16.63% better than the performance of the classical network(baseline) trained on adversarial examples only.(3) When more computation and GPU memory are avail-able, BPN can be trained on both clean and adversarial ex-amples. In this case, BPN improved the generalization ofthe network through the diversiﬁcation of the training set.Thus, it improved classiﬁcation accuracy on both clean andadversarial examples. For example, on TinyImageNet dataset,BPN achieved 66.84% and 88.16% correct classiﬁcation onclean and adversarial examples, 0.4% and 2.81% better thanthe performance of classical network (baseline) trained onboth clean and adversarial examples.

Related Work and background information

Beneﬁcial perturbations for overcomingcatastrophic forgetting: [22] used bias units to store the beneﬁcial perturbations (op-posite to the well-known adversarial perturbations). [22]showed that the beneﬁcial perturbations, stored inside task-dependent bias units, can bias the network outputs toward thecorrect classiﬁcation region for each task, allowing a singleneural network to have multiple input to output mappings.Multiple input to output mappings alleviate the catastrophicforgetting problem [16] in sequential learning scenarios (apreviously learned mapping of an old task is erased duringlearning of a new mapping for a new task).

Beneﬁcial Perturbation Network (BPN)

Understanding beneﬁcial perturbations

To understand beneﬁcial perturbations in more detail, we ﬁrstrevisit the meaning of adversarial perturbations in adversarialexamples. With adversarial examples [20], it has been shownthat noise patterns (calculated from a speciﬁc class) added toinput images can bias a network to misclassify the perturbedinput images into that speciﬁc class. Here, we leverage thisidea, but, instead of adding input "noise" (adversarial pertur-bations) calculated from other classes to force the networkinto misclassiﬁcation, we add "noise" (beneﬁcial perturba-tions) calculated from the input’s own correct class to assistcorrect classiﬁcation. Instead of adding perturbations to theinput images, during the training of clean examples, we up-date the bias term in each layer of the neural network to storethe beneﬁcial perturbations by recycling gradient informationalready computed. During testing on adversarial examples,the stored beneﬁcial perturbations can neutralize adversarialperturbations on data samples. Thus, BPN can make correctclassiﬁcation on adversarial examples. For example, in acti-vation space (Fig. 1 b , b ), consider a task of recognizinghandwritten digits "1" versus "2". Adding beneﬁcial perturba-tions to the activation representation of adversarial examplescan be viewed as adding an beneﬁcial direction vector (greenarrows δ BP ) to the adversarial examples of digit 1. The re-sulting vector crosses the decision boundary and drag the eature extractionInputsclean examples convolutional and non-linear layer Fully connected layer OutputsBeneficialperturbation bias b BP NeuronClassifier

Figure 4: BPN extension to deep convolutional neural network. Deep convolutional neural network are made with two parts:feature extraction part (convolutional and non-linear layers) and classiﬁer part (fully connected layer). We introduce beneﬁcialperturbation bias (replace the normal bias term) to the last few fully connected layers of the deep convolutional network andupdate them using FGSM.misclassiﬁed adversarial examples back to the correct classi-ﬁcation region. Thus, the beneﬁcial perturbations neutralizethe effects of the adversarial perturbations and recover theclean examples x cln ≈ x cln + δ AP + δ BP (2)since δ AP and δ BP cancel out. As a result, instead of up-dating the decision boundary by training on both clean andadversarial examples, BPN can correctly classify both cleanand adversarial examples by training only on clean exampleswithout updating its decision boundary. V i-1 V i b i-1 W i-1 V i = W i-1 V i-1 + b i-1 V i = W i-1 V i-1 + (cid:31) (cid:30)(cid:29) i-1 V i-1 V i W i-1 (cid:31) (cid:30)(cid:29) i-1 V i-1 V i W i-1 (cid:31) (cid:30)(cid:29) i-1 Grad d (cid:31) (cid:30)(cid:29) i-1 = Ɛ sign( ∑ Grad )db i-1 = ∑ Grad V i-1 V i b i-1 W i-1 Grad ForwardBackward Normal Network BPNi - layer i W - Weight

V - Activations b - Bias (cid:31) (cid:30)(cid:29) (cid:28)(cid:27) (cid:26)(cid:25)(cid:24)(cid:25)(cid:23)(cid:22)(cid:21)(cid:22)(cid:20)(cid:19)(cid:27)(cid:18)(cid:25)(cid:17)(cid:16)(cid:15)(cid:17)(cid:26)(cid:20)(cid:16)(cid:22)(cid:14)(cid:24)(cid:13)(cid:27)(cid:26)(cid:22)(cid:20)(cid:13)

Grad - Gradients from next layer db - Gradients for bias d (cid:31) (cid:30)(cid:29) (cid:28) Gradients for beneficial perturbation bias

Figure 3: Structure difference between normal network (base-line) and BPN for forward (a-b) and backward pass (c-d).(a) Forward rules of normal network (baseline). (b) Forwardrules of BPN. Beneﬁcial perturbation bias ( b i − BP ) is the sameas normal bias ( b i − ) in forward pass. (c) Backward rules ofnormal network (baseline). We only demonstrated the updaterules for normal bias term. (d) Backward rules of BPN. Thedifference is that we update the beneﬁcial perturbations biasterm using FGSM. Formulation of beneﬁcial perturbations

Beneﬁcial perturbations are formulated as an additive contri-bution to each layer’s weighted activations (Fig. 3 b): V i = W i − V i − + b i − BP (3)where W i − , V i − and b i − BP is the weight, activation andbeneﬁcial perturbation bias at layer i − . Beneﬁcial pertur-bation bias has the same structure as the normal bias term b (Fig. 3 a), but it is used to store the beneﬁcial perturbations( δ BP ).Instead of adding input "noise" (adversarial perturbations)to input space calculated from other classes to force the net-work into misclassiﬁcation, we can add "noise" (beneﬁcialperturbations δ BP ) to the activation space calculated by theinput’s own correct class to assist in correct classiﬁcation.Thus, the beneﬁcial perturbations at each layer i is obtainedby updating beneﬁcial perturbation bias using the Fast Gradi-ent Sign Method (FGSM; [8]) with the input’s own correctclass: b iBP = b iBP + η db iBP (4) db iBP = (cid:15) sign ( ∇ b iBP L ( b iBP , y true , θ )) (5)where η is the learning rate, b iBP is the beneﬁcial perturbationbias, db iBP is the gradient for beneﬁcial perturbation bias, (cid:15) is a hyperparameter to decide how strong we go to the FastGradient Sign direction, y true is the true label (input’s owncorrect class), θ is the parameters of neural network.Coincidentally, we always use the input’s own correct classto train a neural network. Thus, we can generate the gradientfor beneﬁcial perturbations at layer i − by recycling thecomputed gradients while we are training the network (Fig. 3d). We use Eqn. 6 to replace the Eqn. 5: db i − BP = (cid:15) sign ( (cid:88) Grad ) (6)where, db i − BP is the gradient for beneﬁcial perturbations bias, Grad is the gradient from next layer i , (cid:15) is same as Eqn. 5.hus, to generate beneﬁcial perturbations, we do not intro-duce any extra computation costs beyond FGSM. The for-ward (backward) pass computation costs of BPN are only0.00% (0.006%) more than the costs of classical networktraining on the clean examples (Tab. 1). This feature enablesBPN to save at least half of computation and running memorycosts compared to standard adversarial training (Fig. 2). network Computationcost Forward(FLOPS) Backward(FLOPS)Classical Network(ResNet-50) 51,112,224 51,112,225BPN (ResNet-50) 51,112,224 51,115,321 Table 1: Computation costs of BPN trained on clean examplescompared to a classical network trained on clean examples onRestNet-50. For forward (backward) pass, The computationcosts of BPN are 0.00% (0.006%) more than the classicalnetwork.

Extend the BPN to deep convolutional network

Most deep convolutional neural networks are made with twoparts: a feature extraction part (convolutional and non-liearlayers) and a classiﬁer (fully connected layer). Here, we in-troduce beneﬁcial perturbations bias ( b BP ) to the last fewfully connected layers of the deep convolutional network, re-placing the normal bias term (Fig. 4). We use FGSM (Eqn. 6)to update those beneﬁcial perturbation biases.In summary, through the training of clean examples, BPNgenerates beneﬁcial perturbations as biases within the lastfew fully connected layer of the deep neural network. BPNleverages these beneﬁcial perturbations to defend againstfuture adversarial examples by neutralizing the effects ofadversarial perturbations in the datasets. In addition, if BPNis trained on adversarial examples alone or on a combinationof adversarial and clean examples, the neutralization can di-versify the training set by converting adversarial examples toclean examples. As a consequence, the diversiﬁcation furtherimproves the generalization of the BPN. The generalizationeases the difﬁculties of accuracy trade-off and impracticabil-ity to foresee multiple attacks. Experiments

Datasets

MNIST.

MNIST [14] is a dataset with handwritten digits, hasa training set of 60,000 examples, and a test set of 10,000 ex-amples.

FashionMnist.

FashionMnist [23] is a dataset witharticle images, has a training set of 60,000 examples, and atest set of 10,000 examples.

TinyImageNet.

TinyImageNetis a subset of the ImageNet [4] - a large visual datasets. Tiny-ImageNet consists of 200 classes and has a training set of100k examples, and a test set of 10k examples.

Network structure

For MNIST and FashionMNIST (LeNet).

We use the con-volutional and non-linear layers of LeNet as feature extrac-tion part [14] (classical LeNet). Then, for classiﬁer part, we create our version of LeNet (LeNet with beneﬁcial perturba-tion bias) by adding beneﬁcial perturbation bias into the fullyconnected layers, replacing the normal bias.

For TinyIma-geNet (ResNet-18).

We use the convolutional and non-linearlayers of ResNet-18 [9] as feature extraction part (classi-cal ResNet-18). Then, We use three fully connected layerwith 1028 hidden units as classiﬁer. We create our versionof ResNet-18 (ResNet-18 with beneﬁcial perturbation bias)by adding beneﬁcial perturbation bias into the fully con-nected layers, replacing the normal bias. We trained the BPN(ResNet-18) with 5000 epoches on TinyImageNet.

Various attack methods

To demonstrate how BPN can successfully defend againsta broad range of adversarial attacks, we tested our BPNstructure on adversarial examples generated from variousattack methods: (1,2,3)

PGD Linf, L2, L1 [15]: Projected Gradient DescendAttack with order = Linf, L2, L1. (4)

Basic Iterative AttackL2 [13]: Perturbing the input with gradient of the loss withrespect to the input and with several step for each epsilon. (5)

FGSM [8]: One step fast gradient sign method. (6)

AkaBasic Iterative Attack [13]: Like GradientSignAttack, butwith several steps for each epsilon.

Results

BPN can defend adversarial examples withadditional negligible computation costs

When the neural network can only be trained on clean ex-amples because of modest computation power, BPN canachieve much better test accuracy on adversarial examplesthan the classical network (baseline, Tab. 2, MNIST: 98.88%vs. 18.08%, FashionMNIST: 54.07% vs. 11.87%, TinyIma-geNet: 53.29% vs. 1.45% ). Thus, for companies with modestcomputation resources, BPN can help a system achieve mod-erate robustness against adversarial examples, while onlyintroducing additional negligible computation costs (e.g., onFashionMNIST, our method only uses 59% training timecompared to the training time of adversarial training withjust one adversarial example per clean example, saving 43.51minutes training time for 500 training epochs on an NVIDIATesla-V100 platform (We only introduced one Sign and onemultiplication operation for each fully connected layer Tab. 1.The reason of extra 9% costs is because we create a customlayer in Pytorch framework. The framework introduces alot of overheads for custom layer. It should be greatly im-proved if the custom layers is incorporated in the Pytorchframework with C++ implementation. This would eliminatethe overheads for custom layer and reduce the extra costs to0.006%). The saving would be huge on a larger dataset suchas Imagenet [5]).

BPN can alleviate the accuracy trade-off throughthe diversiﬁcation of the training set

BPN can alleviate the accuracy trade-off difﬁculty and in-crease the classiﬁcation accuracy for both clean and adversar-ial examples. Although training a classical neural network Aon adversarial examples can achieve a high test accuracy onable 2: Training on clean examples for BPN and classicalnetwork (CN). Testing on clean examples (Cln Ex) and ad-versarial examples (Adv Ex) (generated by FGSM, (cid:15) = 0.3for MNIST, FashionMNIST and TinyImageNet). CN doespoorly on adversarial examples. While, BPN can successfullydefend adversarial examples.

Datasets &network structure MNISTLeNet FasMNISTLeNet TinyImageNetResNet-18Cln Ex BPN

Adv Ex BPN

CN 18.08 11.87 1.45 adversarial examples (Tab. 3, MNIST: 99.01%, FashionM-NIST: 91.49%, TinyImageNet: 68.52%), it hurts test accuracyon clean examples. Compared to a classical network B trainedon clean examples, the test accuracy on clean examples ofclassical network A decreases from 99.01%, 89.17%, 64.30%(Tab. 2) to 95.54%, 65,64%, 18.67% (Tab. 3) for MNIST,FashionMNIST and ImageNet datasets. In comparison toclassical network B, by training BPN only on adversarialexamples, BPN not only achieves better test accuracy onadversarial examples (Tab. 3, MNIST: 99.27%, FashionM-NIST: 92.07%, TinyImageNet: 79.92%), but also achievesa better test accuracy on clean examples (Tab. 3, MNIST:97.32%, FashionMNIST: 71.54%, TinyImageNet: 20.69%).This accuracy on clean examples is still worse than that ofclassical network B (training only on clean examples), butit is much better than the accuracy of classical network A(training only on adversarial examples). The reason is thatbeneﬁcial perturbations would convert some adversarial ex-amples into clean examples because of the neutralization(Eqn. 2). As a consequence, the increased diversity of theclean examples improves the generalization of BPN.Table 3: Training on adversarial examples for BPN and clas-sical network (CN). Testing on clean examples (Cln Ex) andadversarial examples (Adv Ex) (generated by FGSM, (cid:15) =0.3 for MNIST , FashionMNIST and ImageNet). BPN canachieve a better classiﬁcation accuracy on clean examplesthan CN.

Datasets &network structure MNISTLeNet FasMNISTLeNet TinyImageNetResNet-18Cln Ex BPN

CN 95.54 65.64 18.67Adv Ex BPN

CN 99.01 91.49 68.52

BPN can improve generalization throughdiversiﬁcation of the training set

Training a classical network on both clean and adversarial ex-amples can achieve good test accuracy on both clean and ad-versarial examples. By diversifying the training set, BPN canachieve even better test accuracy. BPN can achieve slightlyhigher accuracy on clean examples than the classical network (Tab. 4 MNIST 99.13% vs. 99.09%, FashionMNIST 89.65%vs. 89.49%, TinyImageNet 66.84% vs. 66.56% ). In addition,BPN can achieve higher accuracy on adversarial examplesthan classical Network MNIST 97.62% vs. 97.01%, Fash-ionMNIST 95.39% vs. 94.98%, TinyImageNet 88.16% vs.85.75%). The reason is that the generalization of BPN can beimproved through diversiﬁcation of the training set caused byneutralization (Eqn. 2). BPN and classical network can bothachieve high accuracy in this scenario. However, we shouldnormally avoid this scenario because training on both cleanand adversarial examples is expensive in terms of runningmemory and computation costs.Table 4: Training on both clean and adversarial examples forBPN and classical network (CN).Testing on clean examples(Cln Ex) and adversarial examples (Adv Ex) (generated byFGSM, (cid:15) = 0.3 for MNIST, FashionMNIST and ImageNet).Both BPN and CN can achieve a high classiﬁcation accu-racy. However, we should avoid this scenario because of theexpensive running memory and computation costs.

Datasets &network structure MNISTLeNet FasMNISTLeNet TinyImageNetResNet-18Cln Ex BPN

CN 99.09 89.49 66.56Adv Ex BPN

CN 97.01 94.98 85.75

Inﬂuence of adversarial perturbation budget

The higher the adversarial perturbation budget, the higher thechance it can successfully attack a neural network. However,attacks with higher adversarial perturbation budgets are easierto detect by a program or by humans. For example, (cid:15) = 0 . (Fig. 5a) represents very high noise, which makes FashionM-NIST images difﬁcult to classify, even by humans. But thedistribution differences between the adversarial examples andclean examples are so large that they can be easily capturedby defense programs. Thus, (cid:15) ≤ . is a good attack sincethe differences caused by adversarial perturbations are toosmall to be detected by most defense programs. For smalladversarial perturbations (Fig. 5b (cid:15) ≤ . ), by just trainingon clean images, BPN achieves moderate robustness againstadversarial examples with negligible costs. Thus, it is reallybeneﬁcial to adapt our method for companies with modestcomputation power, who still want to achieve moderate ro-bustness against adversarial examples. BPN can successfully defend various attackmethods

When the neural network can only be trained on clean exam-ples because of modest computation power, we trained BPNon clean examples and tested on adversarial examples gener-ated by various attack methods discussed in the experimentssection.From Tab. 5, BPN can successfully defend various attackmethods. Particularly, BPN can achieve better classiﬁcationaccuracy than classical network for PGD Linf, PGD L2, PGDable 5: Training on clean examples of MNIST and TinyImageNet for BPN and classical network (CN). Testing on adversarialexamples generated by a variety of adversarial attack methods. BPN can successfully defend those adversarial examples.Dataset& network Attacks PGD Linf PGD L2 PGD L1 Basic IterativeAttack L2 Aka BasicIterative Attack FGSMMNIST BPN

CN 2.18 97.26

CN 0.00 15.11 15.12 15.11 0.5 1.29 A cc u r a cy on A d v e r s a r i a l e x a m p l e s ɛ value in Fast Gradient Sign Method (b) MNIST datasetFashionMNIST dataset

Adversarial examples ɛ =0.3 (a) Figure 5: (a) Adversarial example with high adversarial per-turbation budget ( (cid:15) = 0.3). (b) Test accuracy on adversarialexamples after training BPN only on clean examples fromMNIST (Blue) or FashionMNIST (Orange) datasets.L1, Basic Iterative Attack L2, Aka Basic iterative Attack andFGSM. However, despite the excellent performance of BPNon PGD Linf and FGSM, the performances on PGD L2, PGDL1, Basic Iterative Attack L2 and Aka Basic Iterative Attackare moderate. We will discuss how to further improve BPNon a variety of attack methods in the discussion section.

Discussion

We proposed a new solution for defending against adversarialexamples, which we refer to as Beneﬁcial Perturbation Net-work (BPN). BPN, for the ﬁrst time, leverages the beneﬁcialperturbations (opposite to well-known adversarial perturba-tions) to counteract the effects of adversarial perturbationsinput data. Compared to adversarial training, this approachintroduces there main advantages - (1) We demonstrated thatBPN can effectively defend adversarial examples with negli-gible additional running memory and computation costs. (2)We demonstrated that BPN can alleviate the accuracy trade-off by increasing the diversity of the training set. (3) Further,we demonstrated the increased diversity of the training setcan improve generalization of the network by converting some adversarial examples into clean examples.

Beneﬁcial perturbations: the opposite "twins" of ad-versarial perturbations

Beneﬁcial perturbations can beviewed as the opposite "twins" of adversarial perturbations.Much research is underway on how to generate more andmore advanced adversarial perturbations [15, 13, 20, 17, 8,13] to fool the more and more sophisticated machine learningsystems. However, there is a little research [22] on how togenerate beneﬁcial perturbations and possible applications ofbeneﬁcial perturbations.

Future Beneﬁcial Perturbations research

In this re-search, we use one of the easiest method (FGSM) availablein the adversarial perturbations world [8] to generate bene-ﬁcial perturbations. We demonstrated that using beneﬁcialperturbations can effectively defend adversarial examples byneutralizing the effects of adversarial perturbations of datasamples. More research could be done to improve BPN - (1) Updating rules of Beneﬁcial perturbations: other thanFGSM implemented in this paper, one could use other meth-ods (e.g., PGD) to update the beneﬁcial perturbation bias. Asa consequence, BPN might be more robust to various kindsof adversarial examples. (2) Network structure for storingand generating beneﬁcial perturbations: in this paper, weused beneﬁcial perturbation biases to store and generate thebeneﬁcial perturbations. The structure of the beneﬁcial per-turbation biases is the same as normal bias. Then, we replacethe normal bias term of the last few fully connected layersof deep convolutional networks with our beneﬁcial perturba-tion bias. This approach might not be the optimal structureto store and generate beneﬁcial perturbations. It might bebetter to embed the beneﬁcial perturbations into the earlierconvolutional and non-linear layers of the deep convolutionalnetwork to store and generate those beneﬁcial perturbationsmore effectively and efﬁciently.

Acknowledgment

This work was supported by the National Science Foundation(grant number CCF-1317433), C-BRIC (one of six centersin JUMP, a Semiconductor Research Corporation (SRC) pro-gram sponsored by DARPA), and the Intel Corporation. Theauthors afﬁrm that the views expressed herein are solely theirown, and do not represent the views of the United Statesgovernment or any agency thereof.

References [1] Biggio, B.; Corona, I.; Maiorca, D.; Nelson, B.; Šrndi´c,N.; Laskov, P.; Giacinto, G.; and Roli, F. 2013. Evasionttacks against machine learning at test time. In

JointEuropean conference on machine learning and knowledgediscovery in databases , 387–402. Springer.[2] Brown, T. B.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan,J.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.;Askell, A.; et al. 2020. Language models are few-shotlearners. arXiv preprint arXiv:2005.14165 .[3] Chen, X.; Fan, H.; Girshick, R.; and He, K. 2020. Im-proved baselines with momentum contrastive learning. arXiv preprint arXiv:2003.04297 .[4] Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; andFei-Fei, L. 2009a. ImageNet: A Large-Scale HierarchicalImage Database. In

CVPR09 .[5] Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; andFei-Fei, L. 2009b. Imagenet: A large-scale hierarchicalimage database. In , 248–255. Ieee.[6] Devlin, J.; Chang, M.-W.; Lee, K.; and Toutanova, K.2018. Bert: Pre-training of deep bidirectional trans-formers for language understanding. arXiv preprintarXiv:1810.04805 .[7] Di, X.; Yu, P.; and Tian, M. 2018. Towards ad-versarial training with moderate performance improve-ment for neural network classiﬁcation. arXiv preprintarXiv:1807.00340 .[8] Goodfellow, I. J.; Shlens, J.; and Szegedy, C. 2014.Explaining and harnessing adversarial examples. arXivpreprint arXiv:1412.6572 .[9] He, K.; Zhang, X.; Ren, S.; and Sun, J. 2015. Deep resid-ual learning for image recognition.

CoRR abs/1512.03385.[10] He, K.; Zhang, X.; Ren, S.; and Sun, J. 2016. Deepresidual learning for image recognition. In

Proceedingsof the IEEE conference on computer vision and patternrecognition , 770–778.[11] Huang, R.; Xu, B.; Schuurmans, D.; and Szepesvári,C. 2015. Learning with a strong adversary.

CoRR abs/1511.03034.[12] Kannan, H.; Kurakin, A.; and Goodfellow, I. 2018. Ad-versarial logit pairing. arXiv preprint arXiv:1803.06373 .[13] Kurakin, A.; Goodfellow, I.; and Bengio, S. 2016. Ad-versarial examples in the physical world. arXiv preprintarXiv:1607.02533 .[14] LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P.; et al.1998. Gradient-based learning applied to document recog-nition.

Proceedings of the IEEE arXiv preprint arXiv:1706.06083 .[16] McCloskey, M., and Cohen, N. J. 1989. Catastrophicinterference in connectionist networks: The sequentiallearning problem. In

Psychology of learning and motiva-tion , volume 24. Elsevier. 109–165. [17] Narodytska, N., and Kasiviswanathan, S. P. 2016. Sim-ple black-box adversarial perturbations for deep networks. arXiv preprint arXiv:1612.06299 .[18] Raghunathan, A.; Xie, S. M.; Yang, F.; Duchi, J. C.;and Liang, P. 2019. Adversarial training can hurt general-ization. arXiv preprint arXiv:1906.06032 .[19] Stanforth, R.; Fawzi, A.; Kohli, P.; et al. 2019. Are la-bels required for improving adversarial robustness? arXivpreprint arXiv:1905.13725 .[20] Szegedy, C.; Zaremba, W.; Sutskever, I.; Bruna, J.;Erhan, D.; Goodfellow, I.; and Fergus, R. 2013. In-triguing properties of neural networks. arXiv preprintarXiv:1312.6199 .[21] Tramèr, F.; Kurakin, A.; Papernot, N.; Goodfellow, I.;Boneh, D.; and McDaniel, P. 2017. Ensemble adver-sarial training: Attacks and defenses. arXiv preprintarXiv:1705.07204 .[22] Wen, S., and Itti, L. 2019. Beneﬁcial perturbationnetwork for continual learning.

CoRR abs/1906.10528.[23] Xiao, H.; Rasul, K.; and Vollgraf, R. 2017. Fashion-mnist: a novel image dataset for benchmarking machinelearning algorithms. arXiv preprint arXiv:1708.07747 .[24] Xie, C.; Wu, Y.; Maaten, L. v. d.; Yuille, A. L.; and He,K. 2019. Feature denoising for improving adversarialrobustness. In

Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition , 501–509.[25] Zhang, H.; Yu, Y.; Jiao, J.; Xing, E. P.; Ghaoui, L. E.;and Jordan, M. I. 2019. Theoretically principled trade-off between robustness and accuracy. arXiv preprintarXiv:1901.08573arXiv preprintarXiv:1901.08573