[PDF] MetaAdvDet: Towards Robust Detection of Evolving Adversarial Attacks

Abstract

Deep neural networks (DNNs) are vulnerable to adversarial attack which is maliciously implemented by adding human-imperceptible perturbation to images and thus leads to incorrect prediction. Existing studies have proposed various methods to detect the new adversarial attacks. However, new attack methods keep evolving constantly and yield new adversarial examples to bypass the existing detectors. It needs to collect tens of thousands samples to train detectors, while the new attacks evolve much more frequently than the high-cost data collection. Thus, this situation leads the newly evolved attack samples to remain in small scales. To solve such few-shot problem with the evolving attack, we propose a meta-learning based robust detection method to detect new adversarial attacks with limited examples. Specifically, the learning consists of a double-network framework: a task-dedicated network and a master network which alternatively learn the detection capability for either seen attack or a new attack. To validate the effectiveness of our approach, we construct the benchmarks with few-shot-fashion protocols based on three conventional datasets, i.e. CIFAR-10, MNIST and Fashion-MNIST. Comprehensive experiments are conducted on them to verify the superiority of our approach with respect to the traditional adversarial attack detection methods.

Full PDF

MMetaAdvDet: Towards Robust Detection of Evolving AdversarialAttacks

Chen Ma [email protected] of Software, TsinghuaUniversity & Beijing NationalResearch Center for InformationScience and Technology (BNRist)Beijing, China

Chenxu Zhao [email protected] AI ResearchBeijing, China

Hailin Shi [email protected] AI ResearchBeijing, China

Li Chen ∗ [email protected] of Software, TsinghuaUniversity & BNRistBeijing, China Junhai Yong [email protected] of Software, TsinghuaUniversity & BNRistBeijing, China

Dan Zeng [email protected] UniversityShanghai, China

ABSTRACT

Deep neural networks (DNNs) are vulnerable to adversarial attackwhich is maliciously implemented by adding human-imperceptibleperturbation to images and thus leads to incorrect prediction. Ex-isting studies have proposed various methods to detect the newadversarial attacks. However, new attack methods keep evolvingconstantly and yield new adversarial examples to bypass the ex-isting detectors. It needs to collect tens of thousands samples totrain detectors, while the new attacks evolve much more frequentlythan the high-cost data collection. Thus, this situation leads thenewly evolved attack samples to remain in small scales. To solvesuch few-shot problem with the evolving attacks, we propose ameta-learning based robust detection method to detect new ad-versarial attacks with limited examples. Specifically, the learningconsists of a double-network framework: a task-dedicated networkand a master network which alternatively learn the detection ca-pability for either seen attack or a new attack. To validate theeffectiveness of our approach, we construct the benchmarks withfew-shot-fashion protocols based on three conventional datasets,i.e. CIFAR-10, MNIST and Fashion-MNIST. Comprehensive exper-iments are conducted on them to verify the superiority of ourapproach with respect to the traditional adversarial attack de-tection methods. The implementation code is available online athttps://github.com/sharpstill/MetaAdvDet.

CCS CONCEPTS • Computing methodologies → Computer vision problems . ∗ Corresponding author: Li ChenPermission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for components of this work owned by others than ACMmust be honored. Abstracting with credit is permitted. To copy otherwise, or republish,to post on servers or to redistribute to lists, requires prior specific permission and/or afee. Request permissions from [email protected].

KEYWORDS adversarial example detection, meta-learning, few-shot learning,evolving adversarial attacks

ACM Reference Format:

Chen Ma, Chenxu Zhao, Hailin Shi, Li Chen, Junhai Yong, and Dan Zeng.2019. MetaAdvDet: Towards Robust Detection of Evolving Adversarial At-tacks. In

Proceedings of the 27th ACM International Conference on Multime-dia (MM ’19), October 21–25, 2019, Nice, France.

ACM, New York, NY, USA,10 pages. https://doi.org/10.1145/3343031.3350887

The evolving adversarial attacks threaten the deep convolutionalneural networks (DNNs) via adding human-imperceptible perturba-tion to clean images and thus lead to incorrect prediction. Variousdefense methods have been proposed for detecting attacks, whichdistinguish adversarial images and real images via capturing thefeatures of DNNs under attacks [3, 28, 45, 51]. However, new at-tack methods keep constantly evolving and yield new adversarialexamples to bypass existing detector. For example, C&W attack [4]is proposed to circumvent all existing detection techniques at thattime. Certain detection techniques have been proposed to detectnew attacks [8, 43], these techniques are promising. However, mostof them need tens of thousands of examples to train which areinfeasible in practice. Because new attacks evolve much faster thanthe high-cost data collection, which results in a few-shot learningproblem with evolving attacks. This issue makes the detection ofadversarial examples still challenging.Therefore, we study on how to tackle such few-shot learningproblem, and propose a meta-learning based training approachwith the learning-to-learn strategy. It focuses on learning to de-tect new attack from one or few instances of that attack. We nameour approach as MetaAdvDet, refers to

Meta -learning

Adv ersarial

Det ection approach. To this end, the approach is equipped with adouble-network framework for learning from tasks , which is de-fined as the small data collection with real examples and randomlychosen type of attacks. The purpose of introducing the tasks is to a r X i v : . [ c s . C V ] A ug 𝒊=𝟏𝑲 𝑮 𝒊 step 4: outer update parameters of 𝓜 𝜽 real adv … task 1task K task 2 real adv … 𝓣𝓣𝓣 𝜽 The learned 𝓜 can detect new attacks with limited examples step1: copy parameters real … support set query setstep2: inner updates on support set 𝜽𝜽 𝜽 real adv … … adv support set query setinner updates … real adv adv … inner updates task 3 support set query setsupport set query setinner updates step 3: collect gradients … real 𝜽𝓜 𝓜 Inner update pipelinecollect gradient & outer update pipelinecopy parameters pipeline

Figure 1: The procedure of MetaAdvDet training in onemini-batch (best viewed in color). The approach consistsof a double-network framework: M and T . T is the task-dedicated network which focuses on learning each task. Itcopies parameters from master network M at the beginning,and then trains on the support set. After a couple of itera-tions (inner update step), T converges and computes the gra-dient G i on the query set of task i . M accumulates the gradi-ents (cid:80) Ki =1 G i to update its parameters M θ which is preparedfor the next mini-batch learning. The learned M can be usedto detect new attacks with limited new samples. More detailscan be found in Sec.3.2 and Algorithm 1. simulate new attack scenarios. To better learn from tasks, MetaAd-vDet uses one network to focus on learning individual tasks, and theother network to learn the general detection strategy over multipletasks. Fig. 1 illustrates the training procedure of one mini-batch,more details are described in Sec. 3.2. Each task is divided into support set and query set , which are used for learning either basicdetection capability on old attacks, or minimizing the test error onnew attacks. After training, the framework efficiently detects newattack with fine-tuning on limited examples. In contrast, the DNNbased methods that use‘ the traditional training approach performmuch worse in detecting new attacks than ours.To comprehensively validate the detection techniques in terms ofevolving attacks, we propose evaluations in following dimensionsto validate the superiority of our approach in the few-shot problem. Cross-adversary Dimension . To assess the capability of de-tecting new types of attacks in test set with few-shot samples..

Cross-domain Dimension . To assess the capability of detect-ing all attacks across different domains with few-shot samples.

Cross-architecture Dimension . To assess the capability of de-tecting the adversarial examples that are generated by attackingthe classifier with new architecture.

White-box attack dimension . To assess the capability of de-tecting white-box attacks with few-shot samples.To validate the effectiveness of our approach from above dimen-sions, we propose benchmarks with the few-shot-fashion protocol on three conventional datasets, i.e.

CIFAR-10, MNIST and Fashion-MNIST datasets. The benchmarks include the generated adversarialexamples by using various types of attacks, and it also defines thepartition of train set and test set to simulate the scenario of testingthe evolving attack’s detection.In experiments, we compare our approach with end-to-end state-of-the-art methods using these benchmarks, and the results showthat our approach surpasses the existing method by a large margin.We summarize the main contributions below:(1) To the best of our knowledge, we are the first to define theadversarial attack detection problem as a few-shot learning problemof detecting evolving new attacks.(2) We propose a meta-learning based approach: MetaAdvDet, itis equipped with a double-network framework with the learning-to-learn strategy for detecting evolving attacks. Benefiting fromthe learning-to-learn strategy, our approach is able to achieve highperformance in detecting new attacks.(3) To comprehensively validate our approach in terms of evolv-ing attacks, we construct benchmarks with the few-shot-fashionprotocol on three datasets, i.e.

CIFAR-10, MNIST and Fashion-MNIST. The benchmarks define the partition of train set and testset to simulate the scenario of testing the evolving attack. We be-lieve the proposed benchmark is useful for the future research ofdefending evolving attacks.

Many attempts have been made to detect or defense against adver-sarial attack. We first introduce the defense techniques, and thenwe introduce the meta-learning techniques that related to our work.

The adversary algorithm is used to generate the adversarial exam-ples which makes the classifier to output incorrect prediction. Manydefense techniques have been proposed to defend against adversar-ial attack, these techniques generally fall into two categories.The first category attempts to build a robust model that classifiesthe adversarial example correctly, such as [1, 26, 38, 42]. However,certain new attacks [7, 23] are deliberately implemented to grasp theweakness of these methods to circumvent the defense. For example,Athalye et al. [2] identifies the obfuscated gradients, which is akind of gradient masking, that leads to a false sense of security indefenses. Based on their findings, the new attacks are proposed tocircumvent 7 of 9 defenses relying on obfuscated gradients.Due to the difficulty, the second category of defense techniquesturn to distinguish the adversarial examples from real ones, inorder to improve security and detect malicious users. This categoryrefers to adversarial attack detection. Unlike the first category,adversarial detection does not need to classify the adversarial imagecorrectly, but only to identify them. Essentially, a detector is alsoa binary classifier which is trained on the real and adversarialexamples. Based on this idea, certain detection techniques [5, 30, 43]build a subnet classifier to capture the hidden layer’s features ofthe adversarial example. Other methods include (1) capturing thedifference of DNN’s output between real and adversarial imageswhen applying certain transformation to the input images [3, 8, 45,51], (2) utilizing the intrinsic dimensionality of adversarial regions[28], (3) employing new loss function to encourage DNN to learn more distinguishable representation [34, 48], (4) using statisticaltest [15], and (5) using the capsule network [12].However, the high-cost data collection cannot keep up with theevolution frequency of the attacks, which leads the training fordetecting new attacks difficulty. For example, when a new attackfirst appears without publishing the source code, most of defend-ers have insufficient examples to train the detector. This situationmakes the issue of detecting evolving attacks highly urgent. Wecategorize this issue as a new defense problem, which is a few-shotlearning problem of detecting evolving attacks.

Few-shot learning problem [41, 47] has been studied for a long time,which is defined as learning from few samples. The meta-learningtechniques [10, 11, 18, 24, 31] are promising for addressing thefew-shot learning problem, which usually trains a meta-learner onthe distribution of few-shot tasks so that the it can generalize andperform well on the unseen task. Model-agnostic meta-learning(MAML) [11] is a typical meta-learning approach, which learns ainternal representation that is widely suitable for many tasks. Itlearns a proper weight initialization on the support set and thenupdates itself to perform well on the query set. To update theweights more efficiently, Meta-SGD [24] makes the meta-learner notonly to learn the weight initialization but also update direction andlearning rate. For better understanding in this field, we introducethe terminologies of meta-learning, as describe below.

Task : A meta-learning model (meta-learner) should be trainedover a variety of tasks and optimized for the best performanceon the task distribution, including potentially unseen tasks. Theconcept of “task” in this paper is totally different from the conceptof “multi-task learning”, but only a manner of data partition thatthe meta-learner used to train.

Support&query set : Each task is split into two subsets, whichare the support set for learning the basic classification on old tasks,and the query set for training in the train stage or testing in thetest stage. It should be emphasized that the support set and queryset from the same task have the same data distribution.

Way is the class in each task that the meta-learner wish todiscriminate, whose number may be specified arbitrarily and donot need to equal the ground truth class number.

Shot is the number of samples in each way of the support set. Forexample, an N -way, K -shot classification task includes the supportset with K labeled examples for each of N classes.Based on the spirit of meta-learning, we propose the trainingmethod with a double-network framework and introduce the double-update scheme for achieving fast adaption capacity. Experimentsshow the superiority of our approach in detecting new attacks. The evolving adversarial attacks are hard to distinguish due tothe insufficient new adversarial examples for training the detector,results in the few-shot learning problem. One of the keys for solv-ing this problem is to use the power of meta-learning techniques.Typical meta-learning methods ( e.g.

MAML [11]) are trained forlearning the task distribution. Because the categories of data in each

Real supportSupport 2C&WsupportFGSM attack examplesC&W attack examples … PGD attack examplesReal examples

Task 1

Way-1FGSMsupportFGSMquery Way-2Real support

Real query

Adversarial example dataset

Task K Way-1PGDsupportPGDquery Way-2Real support

Real query … construct tasks … Task 2

Way-1C&WsupportC&W query

Way-2Real supportReal query 𝜽 𝒯 ℳ meta learner

Training stage: outer update

Testing stage: evaluation

Sample query set of a mini-batch

FGSMquery Real queryQuery 1

FGSM support Real supportSupport 1 Real queryQuery 2C&Wquery Real queryQuery K PGDqueryReal supportSupport K PGDsupport

Sample support set of a mini-batch … …

Inner update for T iterations

Figure 2: The details of constructing tasks for trainingand testing the meta-learner (including T and M ). Eachtask (support&query set) is sampled independently fromthe dataset, and each mini-batch consists of K tasks. Thetraining employs a double-update scheme: inner update andouter update. The inner update represents that T learns onsupport set, and the outer update represents M updates withthe accumulated gradients of T on query set (as described inFig. 1 and Sec. 3.2). M can be used to detect new attacks. task are randomly chosen, the meta-leaner acquires fast adaptioncapability to unseen data type via learning these tasks. To modelthe attack detection technique in the meta-learning style frame-work, we collect various types of attacks to construct adversarialexample dataset into the multiple tasks form (Fig. 2). Each taskis a small data collection with a randomly chosen attack whichrepresents one attacking scenario, so the large amount of tasksmake the meta-learner experience various attacking scenarios, sothat it can adapt to new attacks rapidly. Our approach is equippedwith a double-network framework with learning-to-learn strategy,which focuses on learning how to learn new tasks faster by reusingprevious experience, rather than considering new tasks in isolation.Specifically, one network of our framework focuses on learningfrom individual tasks (named task-dedicated network T ), the othernetwork updates its parameters based on the gradient accumulatedfrom the T (named master network M ), to learn a general strategyover all tasks (Fig. 1). This double-network framework leads to thedouble update scheme, corresponding to the two networks.Fig. 2 shows the details of constructing tasks for training and test-ing the meta-learner, Fig. 1 demonstrates the procedure of trainingin one mini-batch, detailed steps are shown in Algorithm 1. As we mentioned earlier, the learning-to-learn strategy is proposedto learn new attacks by reusing previous experience of detectingold attacks. Following the typical setting of meta-learning, all thetraining data are organized into tasks, each task is divided to twosubsets, namely, the support set for learning basic capability ofdetecting old attacks, and the query set acts as the surrogate of newattacks for achieving rapid adaption in detecting new attacks of testset. To learn the tasks, the meta-learner includes a double-networkframework, i.e. the master network M , and a task-dedicated net-work T which is cloned from M to learn from individual tasks. T updates its parameters T θ based on each task’s support set, andthen it calculates its gradient of the query set, which will be ac-cumulated to update M ’s parameters M θ (Fig. 1). The same M θ ill be copied and overwritten to the T θ before learning next task.The M and T output the classification probability to distinguishthe real and adversarial example, corresponding to the two-wayconfiguration. The two-way configuration stipulates that one ofthe ways should use real examples in all tasks. Two options areconsidered, i.e. the randomized-way setting, whose two-way labelsare shuffled in each task; and the fixed-way setting, which useslabel 1 for real example and label 0 for adversarial example in alltasks. We will compare the effect of above two options in Sec. 5.4.Algorithm 1 shows the training procedure, Fig. 1 shows thedetail of one mini-batch training procedure. T copies the all theparameters from M at the beginning of learning task T i , where thesubscript i denotes the task index. Then, the inner update step up-dates the parameters of T by using the support set of T i for multipleiterations. Line 8 and line 9 demonstrate this step, which is the samewith the supervised learning in the traditional DNN: we directlyfeed input images to T and uses the gradient descent to update itsparameters based on the classification ground truth. Unlike existingmethods that applying transformation on input images [45, 51], weshould note that the input image is not applied any transformationin this step of our approach. Finally, the meta-learner acquires rapidadaption capability by considering to minimize the test error on newdata, this is the role that the outer update step upon the query setplays. More specifically, we calculate the cross entropy loss L onthe query set of task T i to obtain the gradient G i w.r.t. T θ , which isaccumulated from learning all tasks T , . . . , T K and finally sent to M . Because M and T use the same network structure and param-eters, the accumulated gradient can be used to update parametersof M , namely M θ . Thus, (cid:80) Ki =1 G i updates the M θ for learning thestrategy over the multi-task distribution. Algorithm 1

MetaAdvDet training procedure

Input: master network M and its parameters M θ , task-dedicatednetwork T and its parameters T θ , the feed-forward function f T θ of T , max iterations N , inner-update learning rate λ , outer-update learning rate λ , inner updates iteration T , the multi-task format dataset D , cross entropy loss function L . Output: the learned network M for iter ← N do sample K tasks T i , i ∈{ , ··· , K } from D for i ← K do S i and Q i ← support set and query set of T i T θ ← M θ ▷ copy parameters from M to T T θ ′ ← T θ ▷ T θ will be used in the outer update for t ← T do Calculate ∇ T θ ′ L ( f T θ ′ ) by using S i T θ ′ ← T θ ′ − λ ∇ T θ ′ L ( f T θ ′ ) ▷ inner update end for G i ← ∇ T θ L ( f T θ ′ ) by using Q i end for M θ ← M θ − λ (cid:80) Ki =1 G i ▷ outer update end for return M Following popular few-shot-fashion testing procedure [39], theevaluation restricts that the method needs to be evaluated on all test tasks. Algorithm 2 shows the testing procedure. The few-shot-fashion testing procedure should include a fine-tune step by us-ing few-shot examples, as shown in line 6 of Algorithm 2. In ex-periments, we adopt a general binary classifier with DNN as thebaseline for comparison. DNN uses a single network for training,whereas MetaAdvDet uses a double-network framework to obtainthe learning-to-learn strategy. The experiment proves the superior-ity of our approach in detecting new attacks (Sec. 5.5).

Algorithm 2

MetaAdvDet testing procedure

Input: master network M and its learned parameters M θ , task-dedicated network T and its parameters T θ , the feed-forwardfunction f T θ of T , fine-tune iterations T , learning rate λ , testtasks T i , i ∈{ i , ··· , N } which is obtained by reorganizing the testset, cross entropy loss L , ground truth Y i , i ∈{ i , ··· , N } of thequery set. Output: the average F1 score over all tasks for T i ← T to T N do ▷ iterate over all test tasks S i and Q i ← support set and the query set of T i T θ ← M θ ▷ copy parameters to ensure each task is testedindependently for t ← T do Calculate ∇ T θ L ( f T θ ) by using S i T θ ← T θ − λ ∇ T θ L ( f T θ ) ▷ fine-tune step end for ˆ Y i ← f T θ ( Q i ) ▷ get prediction of query set of task i score i ← F1( ˆ Y i , Y i ) end for F1 score ← N (cid:80) Ni =1 score i return F1 scoreThe F1 score of the query set is adopted as the metric for evalu-ating the performance of detection techniques (Sec. 5.2). Note thatthe F1 score is calculated upon individual tasks, the final F1 scoreis obtained via averaging F1 scores of all tasks, which follows thefew-shot-fashion testing procedure of MiniImagenet [39], steps areshown in line 11 of Algorithm 2. All the compared methods shoulduse this metric and include the fine-tune step for fair comparison.

In order to validate the effectiveness of our approach, we constructthe adversarial example datasets based on the conventional datasets.The built datasets use fifteen adversaries to yield examples whosedata sources come from CIFAR-10 [20], MNIST [22] and Fashion-MNIST [50] datasets, named AdvCIFAR, AdvMNIST and AdvFash-ionMNIST respectively. To train the detectors for distinguishingreal examples and adversarial examples, each dataset includes anadditional real example’s category whose data are directly trans-fered from original dataset ( i.e.

CIFAR-10 etc. ). All fifteen types ofadversarial examples are generated by utilizing CleverHans library[35], which attacks the classifiers with three architectures for eachadversary, namely 4 conv-layers network (conv-4), ResNet-10 [16]and ResNet-18 [16]. Note that MI-FGSM, BIM and PGD attacksadopt the ℓ ∞ norm version, C&W and Deepfool attacks adopt the norm version. Such adoptions are based on the attack successfulrate. In addition, the adversarial examples of L-BFGS attack [44]is used as the validation set. The BPDA attack [2] that utilizes theobfuscate gradients of defense is not used, because our approachdoes not reply on obfuscate gradients. The statistical data for theadversarial example datasets are shown in Tab. 1. Table 1: Our adversarial example datasets contain the exam-ples generated by attacking different architectures, includ-ing a 4 conv-layers network (conv-4), ResNet-10 and ResNet-18.This table lists the amount of adversarial examples whichare generated by successfully attacking the conv-4 network. adversary AdvCIFAR AdvMNIST AdvFashionMNISTtrain test train test train testFGSM [14] ,

851 9260 23 ,

646 3853 48 ,

368 7999

MI-FGSM [9] ,

742 9205 58 ,

445 9701 56 ,

744 9362

BIM [21] ,

076 9118 58 ,

522 9715 56 ,

587 9341

PGD [29] ,

911 9504 58 ,

439 9693 57 ,

060 9428

C&W [4] ,

810 9509 58 ,

121 9651 57 ,

072 9435 jsma [37] ,

141 9807 26 ,

377 4305 39 ,

804 6770

EAD [6] ,

146 8908 59 ,

458 9885 56 ,

157 9283

SPSA [46] ,

183 8436 1260 245 12 ,

604 2225

Spatial Transformation [49] ,

075 9320 58 ,

820 9770 59 ,

520 9917

VAT [32] ,

758 5788 11 ,

392 1869 24 ,

774 4159 semantic [17] ,

704 6415 52 ,

398 8723 47 ,

401 7918

MaxConfidence [13] ,

293 9565 57 ,

676 9604 57 ,

309 9469

Deepfool [33] ,

740 8879 59 ,

461 9886 56 ,

171 9294

NewtonFool [19] ,

240 8916 59 ,

473 9884 56 ,

249 9294

L-BFGS [44](validation set)

To validate the effectiveness of detection techniques in detectingnew attacks, we configure the train set and test set contain nocommon type of adversarial examples to simulate this situation. Tothis end, the attacks are grouped based on their categories, and wepropose the cross-adversary benchmark which assigns the differentadversary groups to the train set and test set.

Table 2: The definition of adversary groups in the cross-adversary benchmark.

Train Adversary Group Test Adversary Group Validation Train&TestFGSM, MI-FGSM, BIM, PGD, C&W,jsma, SPSA, VAT, MaxConfidence EAD, semantic, Deepfool, Spa-tial Transformation, Newtonfool L-BFGS same domain

Tab. 2 shows the adversary groups of cross-adversary benchmark.The grouping principles of this benchmark are: (1) each adversaryshould be assigned to one group only. (2) The similar adversariesshould be assigned into the same group. For example, the MI-FGSMadversary is a modification of FGSM, and thus they are similarand we make them into one group. Based on this benchmark, thetrain set and test set should not include attacks of the same groupsimultaneously. Note that in this benchmark, the adversaries oftrain group extract the train set of the adversarial example dataset( e.g. train set of FGSM in Tab. 1) to train the detectors. Similarly, thedetectors are evaluated on the test set of the test group’s adversaries.

The concept of a domain indicates an adversarial dataset, e.g.

Ad-vMNIST. Since different domains have different data distributions,

Table 3: The cross-domain benchmark consists of 2 proto-cols on the AdvMNIST and AdvFashionMNIST.

Protocol Train Domain Test Domain Attack Types Test Shots which leads the cross-domain benchmark to a more challengingbenchmark. To evaluate the capability of detecting the adversarialexamples generated from new domain, the detectors are trainedone domain (

Train Domain ), and tested on the other domain (

TestDomain ). In this benchmark, we focus on the transferability be-tween two datasets, namely AdvMNIST and AdvFashionMNIST, aslisted in Tab. 3. Note that in this benchmark, all types of the attacksare used to train the detector.

Table 4: The cross-architecture benchmark consists of 4 pro-tocols, this benchmark indicates the examples of train setand test set are generated by attacking different networks.

Protocol Train Arch Test Arch Attack Types Test Shots Train&Test

Existing studies show that the adversarial examples generatedby attacking one architecture can fool another architecture [27, 36].To validate the detection capability in this situation, this bench-mark stipulates that the train set and test set should include theadversarial examples come from attacking different architectures.For example, the detector is trained on the adversarial examplesgenerated by attacking a classifier with the conv-4 network (

TrainArch ), but tested on the ones of ResNet-10 (

Test Arch ). Tab. 4 showsthe detail of this benchmark, all types of attacks are used to train thedetectors. Three architectures are used, namely conv-4, ResNet-10and ResNet-18. Note that the concept of architecture in this bench-mark is only related to the classifier’s backbone during adversarialexamples generation, but not related to the detector model.

The white-box attack means the adversary has the information ofboth the image classifier and is aware of the detector. It has fullknowledge of the detector. In other words, the adversary needs tofool both the classifier and detector simultaneously, making it morechallenge to defend. We use the targeted iterative FGSM (I-FGSM)[21] and C&W [4] attacks to simulate white-box attacks with themethod presented in Carlini and Wagner [4]. The basic idea is toconstruct a combined model which combines the original classifiermodel and the detector. The original classifier has N output labels,then the new model outputs N +1 labels with the last label indicateswhether the input is an adversarial example. More specifically, let’sdenote the new model as B which combines the classifier C andthe detector D . B ’s output logits is denoted as Z B , C ’s output isdenoted as Z C and D ’s output as Z D . The Z B is constructed usingthe following formula: Z B ( x ) i = (cid:40) Z C ( x ) i if i ≤ NZ D ( x ) × × max Z C ( x ) if i = N + 1 (1)t is easy to see that when an input is detected as an adversarial ex-ample by D , then Z D would be larger than 0.5 and it leads Z B ( x ) N +1 to be larger than Z B ( x ) i for 1 ≤ i ≤ N . If an input is detected as areal example, B classifies it the same label as C does. In this way,the new model B combines C and D .Now, we can use the targeted iterative FGSM (I-FGSM) or C&Wadversary to attack this new model B to generate the white-boxadversarial example. The target label is set to make C classify thisexample incorrectly but make the example bypass the detector D .In MetaAdvDet, D represents for the learned master network M which would be attacked. Although the white-box attack leads M to misclassify, MetaAdvDet can benefit from the learning-to-learn strategy for recovering to the correct prediction with limitedwhite-box examples provided, as steps shown in Algorithm 2. Table 5: The modules of one block in the conv-3 backbone ofMetaAdvDet and other compared methods. The conv-3 back-bone consists of 3 such blocks in total, and the last block con-nects to a fully-connected layer to output a vector with twoprobabilities. index module parameter configuration1 conv-layer 3 × × Table 6: The default parameters configuration which is usedin the ablation study in Sec. 5.4, and also used in other com-parative experiments of Sec. 5.5, Sec. 5.6, Sec. 5.7 and Sec. 5.8. name default value descriptionshots 1 number of examples in a way, MetaAdvDet shouldset the same shots in both training and testing.ways 2 alias of class number, data of the same way comefrom using the same adversary to attack the samecategory’s images.train query set size 70 number of examples of a query set in training.test query set size 30 number of examples of a query set in testing.task number K

30 number of tasks in each mini-batch.inner update times 12 iteration times of inner update during trainingfine-tune times 20 iteration times of fine-tune during testing.total tasks 20000 total tasks in the constructed tasks.inner learning rate 0.001 learning rate of inner update.outer learning rate 0.0001 learning rate of outer update.dataset AdvCIFAR the dataset for ablation studybackbone conv-3 the backbone of MetaAdvDet & compared methodsbenchmark cross-adversary the benchmark for ablation study

In the construction of tasks, we set the task number to be 20 , λ is set to 0 .

001 empirically. Becauseof the summation of gradients in Algorithm 1, the outer-updatelearning rate λ is set to 0 . λ .The training epoch is set to 4, because after 4 epochs, we observethat the F1 score on the validation set is stable. The query set sizeused for outer-update is set to 70 for two ways, that is 35 samples in each way. The fine-tune iteration times is set to 20 which reachesthe stable performance (Fig. 4b). All parameters configuration isshown in Tab. 6 which is set empirically based on validation set. Our metric restricts that all compared methods need to be evaluatedon 1000 testing tasks to cover all test samples. To quantify thedetection performance of all detection methods, we adopt the F1score which follows Liang et al .[25] and Sabokrou et al .[40]. It isdefined as the harmonic mean between precision and recall :recall =

T PT P + F N , precision = T PT P + FP F1 = 2 × precision × recallprecision + recall (2)We use label 1 to represent the real example and 0 to represent theadversarial example, so TP is the number of correctly detected realexamples, FN is the number of real examples that are incorrectlydetected as adversarial examples, and FP is the number of adversar-ial images that are detected as real examples. Note that the final F1score is obtained via averaging F1 scores of all tasks (Algorithm 2). The selection of compared state-of-the-art methods are based onthe consideration of two principles: (1) In order to comply withthe few-shot-fashion benchmarks, the compared method must bean end-to-end learning approach to be fine-tuned in test stage. (2)The compared methods are able to detect new attacks, in order toevaluate and compare the detection technique in terms of evolv-ing adversarial attacks. Based on above principles, MetaAdvDetis compared with a image rotation transformation based detector,named

TransformDet [45]; and a detection technique based on asecret fingerprint, named

NeuralFP [8]. The NeuralFP is trainedfor 100 epochs on each dataset, and TransformDet is trained for 10epochs on each dataset. In the fine-tune step, because NeuralFP istrained on real examples, we extract the real examples from sup-port set to fine-tune. In addition, NeuralFP obtains the F1 score viadetermining the best threshold for each task. We configure thesemethods following their original settings [8, 45] (Tab. 7).

Table 7: The configuration of train, validation and test set ofall compared methods on the proposed benchmarks.

Method Train Set Test Set Validation SetDNN train set of adversar-ial example datasets ( e.g.

AdvCIFAR etc. ) constructed tasksof test set inadversarialexample datasets,each task containsthe support set andthe query set. Theperformance isevaluated on thequery set. constructed tasksof validation set inadversarialexample datasets,each task containsthe support set andthe query set. Theperformance isevaluated on thequery set.DNN (balanced) train set of adversarialexample datasets whichis down-sampling tokeep class balancedNeuralFP [8] real examples of train setin the original dataset( e.g.

CIFAR-10 etc. )TransformDet[45] train set of adversarialexample datasets, down-sampling if necessary.MetaAdvDet constructed tasks oftrain set in adversarialexample datasets e adopt a binary neural network classifier as the baseline,denotes as

DNN . DNN is trained on all data of adversarial exampledataset and its backbone is the same with MetaAdvDet, which isa 3 conv-layers network (Tab. 5). Because the adversarial exampledatasets are the highly class imbalance datasets which contain muchmore adversarial examples than real ones, we train the other DNNby using the balanced data between adversarial and real samples bydown-sampling, which is denoted as

DNN (balanced) . The datasetconfigurations of different methods are listed in Tab. 7.

To inspect the effect of each key parameter respectively, we conductthe control experiments on AdvCIFAR by adjusting one parameterwhile keeping other parameters fixed as listed Tab. 6. Fig. 3 and Fig.4 are the results of the cross-adversary benchmark. (a) train query set size study (b) task number K study Figure 3: Ablation study results of train query set size andtask number of a training mini-batch. (a) shots study (b) fine-tune iterations study

Figure 4: Ablation study results of shots and fine-tune itera-tions. MetaAdvDet outperforms the baseline DNN and DNN(balanced) by a large margin.

From Fig. 4, following conclusions can be drawn:1) MetaAdvDet outperforms DNN with only a few fine-tuningiterations, e.g.

MetaAdvDet even surpasses the results of all fine-tune iterations of DNN by only using a single iteration (Fig. 4b).2) The balanced training data of DNN (balanced) helps to improveperformance over DNN with the few-shot fine-tunings (Fig. 4b).

Table 8: F1 score of randomized-way and fixed-way settingsin AdvCIFAR. The randomized-way indicates that the labelsof two ways are shuffled in each task. The fixed-way useslabel 1 as real example and label 0 as adversarial example. shots fixed-way randomized-way1

Tab. 8 illustrates the F1 score results of a randomized-way andfixed-way assignment settings in AdvCIFAR dataset. It shows thatthe result of fixed-way setting outperforms that of randomized-waysetting. In following experiments, we use the fixed-way setting.

To compare the performance of our approach and other state-of-the-art methods under the cross-adversary benchmark. In this section,we collect the results of TransformDet [45], NeuralFP [8], baselineDNN and DNN (balanced). Tab. 9 shows that MetaAdvDet outper-forms the baseline and other methods in nearly all datasets. Thus,we can conclude MetaAdvDet is able to achieve high performancein detecting new attack with limited examples of that attack.Our approach is particularly effective in detecting the attacksthat exhibit quite different appearance from the training attacks.Typical representative attacks are Spatial Transformation, etc. , theresults of three representative attacks are shown in Tab. 10.

Table 9: F1 score of the cross-adversary benchmark, this ta-ble shows the results in using the adversarial examples gen-erated by attacking the classifier with conv-4 architecture.

Dataset Method F1 score1-shot 5-shotAdvCIFAR DNN 0.495 0.639DNN (balanced) 0.536 0.643NeuralFP [8]

AdvMNIST DNN 0.812 0.852DNN (balanced) 0.797 0.808NeuralFP [8] 0.780 0.906TransformDet [45] 0.840 0.904MetaAdvDet (ours)

AdvFashionMNIST DNN 0.782 0.885DNN (balanced) 0.744 0.850NeuralFP [8] 0.798 0.817TransformDet [45] 0.712 0.879MetaAdvDet (ours)

Table 10: F1 score of representative adversaries on the Adv-CIFAR dataset, cross-adversary benchmark.

Dataset Adversary Method F1 score1-shot 5-shotAdvCIFAR Spatial Transformation [49] DNN 0.498 0.599DNN (balanced) 0.529 0.589NeuralFP [8] 0.708 0.696TransformDet [45] 0.633 0.660MetaAdvDet (ours) semantic [17] DNN 0.488 0.644DNN (balanced) 0.529 0.657NeuralFP [8] 0.698 0.700TransformDet [45] 0.662 0.688MetaAdvDet (ours)

NewtonFool [19] DNN 0.511 0.664DNN (balanced) 0.542 0.670NeuralFP [8]

In the cross-domain benchmark, the models are trained in onedomain, and tested in the other domain’s test set (Sec. 4.3). We useDNN (balanced) instead of DNN in this benchmark because theall types of attacks are used to train which results in the highlyimbalanced data classification issue if using DNN. Tab. 11 shows theresult, which demonstrates that MetaAdvDet has an advantage inhard test set. For example, when training on AdvMNIST and testingon AdvFashionMNIST, MetaAdvDet outperforms DNN (balanced)by a large margin ( improvement in 1-shot). able 11: F1 score of the cross-domain benchmark, this tableshows the results which are evaluated on the adversarial ex-amples generated by attacking the conv-4 network. All thetypes of attacks are used to train the detectors.

Train Domain Test Domain Method F1 score1-shot 5-shotAdvMNIST AdvFashionMNIST DNN (balanced) 0.698 0.813NeuralFP [8] 0.748 0.811TransformDet [45] 0.664 0.808MetaAdvDet (ours)

AdvFashionMNIST AdvMNIST DNN (balanced) 0.950 0.977NeuralFP [8] 0.775 0.836TransformDet [45] 0.934 0.940MetaAdvDet (ours)

Table 12: F1 score of cross-architecture benchmark.

Dataset Train Arch Test Arch Method F1 score1-shot 5-shotAdvCIFAR ResNet-10 ResNet-18 NeuralFP [8] 0.713 0.709TransformDet [45] 0.758 0.880DNN (balanced) 0.702 0.768MetaAdvDet (ours)

ResNet-18 ResNet-10 NeuralFP [8] 0.712 0.703TransformDet [45] 0.788 0.874DNN (balanced) 0.711 0.752MetaAdvDet (ours) conv-4 ResNet-10 NeuralFP [8] 0.712 0.703TransformDet [45] 0.763 0.868DNN (balanced) 0.723 0.779MetaAdvDet (ours)

ResNet-10 conv-4 NeuralFP [8] 0.709 0.702TransformDet [45] 0.766 0.885DNN (balanced) 0.739 0.790MetaAdvDet (ours)

AdvMNIST ResNet-10 ResNet-18 NeuralFP [8] 0.906 0.882TransformDet [45] 0.973 0.988DNN (balanced) 0.943 0.972MetaAdvDet (ours)

ResNet-18 ResNet-10 NeuralFP [8] 0.894 0.738TransformDet [45] 0.967 0.990DNN (balanced) 0.912 0.953MetaAdvDet (ours) conv-4 ResNet-10 NeuralFP [8] 0.894 0.738TransformDet [45]

DNN (balanced) 0.897 0.959MetaAdvDet (ours) 0.963 0.983ResNet-10 conv-4 NeuralFP [8] 0.917 0.961TransformDet [45] 0.984 0.992DNN (balanced) 0.958 0.978MetaAdvDet (ours)

AdvFashionMNIST ResNet-10 ResNet-18 NeuralFP [8] 0.813 0.856TransformDet [45] 0.936 0.974DNN (balanced) 0.848 0.932MetaAdvDet (ours)

ResNet-18 ResNet-10 NeuralFP [8] 0.820 0.838TransformDet [45] 0.935 0.972DNN (balanced) 0.829 0.918MetaAdvDet (ours) conv-4 ResNet-10 NeuralFP [8] 0.820 0.838TransformDet [45] 0.946 0.970DNN (balanced) 0.920 0.968MetaAdvDet (ours)

ResNet-10 conv-4 NeuralFP [8] 0.817 0.911TransformDet [45] 0.945 0.979DNN (balanced) 0.886 0.945MetaAdvDet (ours)

Tab. 12 shows the results of cross-architecture benchmark. BecauseNeuralFP is trained on the real samples, thus the same NeuralFPmodel is tested on the examples of different test architectures (TestArch). Tab. 12 shows that MetaAdvDet outperforms other methods under different train and test architecture combinations, provingthe superiority of MetaAdvDet in the cross-architecture benchmark.

In Tab. 13, we present the detection performance of the white-boxbenchmark. The NeuralFP [8] result is omitted because it detectsthe attack by setting threshold rather than conducting classification,which cannot be used in the method of Sec. 4.5. Tab. 13 shows that:(1) MetaAdvDet can effectively detect white-box attack even withonly one white-box example provided. (2) White-box attack targetson the master network of the meta-learner in MetaAdvDet, whereasit targets on the detector itself in other methods.

Table 13: F1 score of white-box attack benchmark.

Dataset Method I-FGSM Attack C&W Attack1-shot 5-shot 1-shot 5-shotCIFAR-10 DNN (balanced) 0.466 0.537 0.459 0.527TransformDet [45]

MNIST DNN (balanced) 0.857 0.956 0.814 0.913TransformDet [45] 0.864 0.952 0.775 0.893MetaAdvDet (ours)

FashionMNIST DNN (balanced) 0.745 0.890 0.726 0.853TransformDet [45] 0.837 0.920 0.747 0.853MetaAdvDet (ours)

Table 14: The inference time(ms) of all methods.

Method DNN NeuralFP [8] TransformDet [45] MetaAdvDet (ours)Inference time (ms) . ± .

01 2185 . ± .

10 69 . ± .

97 4 . ± . We further evaluate the inference time (excluding fine-tune steps)of all the methods measured in millisecond on one NVIDIA GeforceGTX 1080Ti GPU in Tab. 14. It shows that MetaAdvDet obtains thecomparable inference time to DNN, due to that both methods use thesame network architecture and the same feed-forward procedurefor inference. In contrast, TransformDet applies multiple transfor-mations on the input image which increases the inference time.NeuralFP tests multiple thresholds to determine the best thresholdfor detection, which significantly increases the inference time.

In this paper, we present a meta-learning based adversarial attackdetection approach for detecting evolving adversary attacks withlimited examples. To this end, the approach is equipped with adouble-network framework which includes a task-dedicated net-work and a master network to learn from either individual tasks orthe task distribution. In this way, the rapid adaption capability ofdetecting new attacks is achieved. Experimental results concludethat: (1) Tab. 9, Tab. 11 and Tab. 12 show that NeuralFP gets lowerF1 scores than ours under different benchmarks. It manifests thatNeuralFP which is trained on real examples cannot detect evolv-ing attacks effectively. (2) We get the lowest results in AdvCIFARdataset (Tab. 9, Tab. 12 and Tab. 13), which manifests the adversarialexamples generated in CIFAR-10 are more difficult to detect. (3)MetaAdvDet performs well in the benchmarks of cross-adversary(Tab. 9), cross-domain (Tab. 11), cross-architecture (Tab. 12) andwhite-box attack (Tab. 13), proving that MetaAdvDet is a suitablemethod for detecting evolving attacks with limited examples.

EFERENCES [1] Naveed Akhtar, Jian Liu, and Ajmal Mian. 2018. Defense against universaladversarial perturbations. In

Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition . 3389–3398.[2] Anish Athalye, Nicholas Carlini, and David Wagner. 2018. Obfuscated Gra-dients Give a False Sense of Security: Circumventing Defenses to Adversar-ial Examples. In

Proceedings of the 35th International Conference on MachineLearning (Proceedings of Machine Learning Research) , Jennifer Dy and AndreasKrause (Eds.), Vol. 80. PMLR, StockholmsmÃďssan, Stockholm Sweden, 274–283.http://proceedings.mlr.press/v80/athalye18a.html[3] Arjun Nitin Bhagoji, Daniel Cullina, and Prateek Mittal. 2017. DimensionalityReduction as a Defense against Evasion Attacks on Machine Learning Classifiers.

CoRR abs/1704.02654 (2017). arXiv:1704.02654 http://arxiv.org/abs/1704.02654[4] Nicholas Carlini and David A. Wagner. 2017. Towards Evaluating the Robustnessof Neural Networks. In

IEEE Symposium on Security and Privacy (SP) . 39–57.https://doi.org/10.1109/SP.2017.49[5] Fabio Carrara, Fabrizio Falchi, Roberto Caldelli, Giuseppe Amato, Roberta Fu-marola, and Rudy Becarelli. 2017. Detecting Adversarial Example Attacks toDeep Neural Networks. In

Proceedings of the 15th International Workshop onContent-Based Multimedia Indexing (CBMI ’17) . ACM, New York, NY, USA, Article38, 7 pages. https://doi.org/10.1145/3095713.3095753[6] Pin-Yu Chen, Yash Sharma, Huan Zhang, Jinfeng Yi, and Cho-Jui Hsieh. 2018.Ead: elastic-net attacks to deep neural networks via adversarial examples. In

Thirty-second AAAI conference on artificial intelligence .[7] Pin-Yu Chen, Huan Zhang, Yash Sharma, Jinfeng Yi, and Cho-Jui Hsieh. 2017.Zoo: Zeroth order optimization based black-box attacks to deep neural networkswithout training substitute models. In

Proceedings of the 10th ACM Workshop onArtificial Intelligence and Security . ACM, 15–26.[8] Sumanth Dathathri, Stephan Zheng, Richard M Murray, and Yisong Yue. 2018.Detecting Adversarial Examples via Neural Fingerprinting. arXiv preprintarXiv:1803.03870 (2018).[9] Yinpeng Dong, Fangzhou Liao, Tianyu Pang, Hang Su, Jun Zhu, Xiaolin Hu, andJianguo Li. 2018. Boosting Adversarial Attacks With Momentum. In

The IEEEConference on Computer Vision and Pattern Recognition (CVPR) .[10] Sergey Levine Trevor Darrell Erin Grant, Chelsea Finn and Thomas L. Grif-fiths. 2018. Recasting Gradient-Based Meta-Learning as Hierarchical Bayes. In

International Conference on Learning Representations .[11] Chelsea Finn, Pieter Abbeel, and Sergey Levine. 2017. Model-agnostic meta-learning for fast adaptation of deep networks. In

Proceedings of the 34th Interna-tional Conference on Machine Learning-Volume 70 . JMLR. org, 1126–1135.[12] Nicholas Frosst, Sara Sabour, and Geoffrey Hinton. 2018. DARCCC: DetectingAdversaries by Reconstruction from Class Conditional Capsules. arXiv preprintarXiv:1811.06969 (2018).[13] Ian Goodfellow, Yao Qin, and David Berthelot. 2019. Evaluation Methodologyfor Attacks Against Confidence Thresholding Models. https://openreview.net/forum?id=H1g0piA9tQ[14] Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. [n. d.]. Explaining andharnessing adversarial examples (2014). arXiv preprint arXiv:1412.6572 ([n. d.]).[15] Kathrin Grosse, Praveen Manoharan, Nicolas Papernot, Michael Backes, andPatrick McDaniel. 2017. On the (statistical) detection of adversarial examples. arXiv preprint arXiv:1702.06280 (2017).[16] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residuallearning for image recognition. In

Proceedings of the IEEE conference on computervision and pattern recognition . 770–778.[17] Hossein Hosseini, Baicen Xiao, Mayoore Jaiswal, and Radha Poovendran. 2017.On the limitation of convolutional neural networks in recognizing negativeimages. In . IEEE, 352–358.[18] Muhammad Abdullah Jamal and Guo-Jun Qi. 2019. Task Agnostic Meta-Learningfor Few-Shot Learning. In

The IEEE Conference on Computer Vision and PatternRecognition (CVPR) .[19] Uyeong Jang, Xi Wu, and Somesh Jha. 2017. Objective metrics and gradientdescent algorithms for adversarial examples in machine learning. In

Proceedingsof the 33rd Annual Computer Security Applications Conference . ACM, 262–277.[20] Alex Krizhevsky. 2009.

Learning multiple layers of features from tiny images .Technical Report.[21] Alexey Kurakin, Ian Goodfellow, and Samy Bengio. 2017. Adversarial examplesin the physical world.

ICLR Workshop (2017). https://arxiv.org/abs/1607.02533[22] Yann LeCun and Corinna Cortes. 2010. MNIST handwritten digit database.http://yann.lecun.com/exdb/mnist/. (2010). http://yann.lecun.com/exdb/mnist/[23] Yandong Li, Lijun Li, Liqiang Wang, Tong Zhang, and Boqing Gong. 2019. NAT-TACK: Learning the Distributions of Adversarial Examples for an ImprovedBlack-Box Attack on Deep Neural Networks. arXiv preprint arXiv:1905.00441 (2019).[24] Zhenguo Li, Fengwei Zhou, Fei Chen, and Hang Li. 2017. Meta-sgd: Learning tolearn quickly for few-shot learning. arXiv preprint arXiv:1707.09835 (2017).[25] B. Liang, H. Li, M. Su, X. Li, W. Shi, and X. Wang. 2018. Detecting AdversarialImage Examples in Deep Neural Networks with Adaptive Noise Reduction.

IEEE Transactions on Dependable and Secure Computing (2018), 1–1. https://doi.org/10.1109/TDSC.2018.2874243[26] Fangzhou Liao, Ming Liang, Yinpeng Dong, Tianyu Pang, Xiaolin Hu, and JunZhu. 2018. Defense Against Adversarial Attacks Using High-Level Representa-tion Guided Denoiser. In

The IEEE Conference on Computer Vision and PatternRecognition (CVPR) .[27] Yanpei Liu, Xinyun Chen, Chang Liu, and Dawn Song. 2017. Delving intoTransferable Adversarial Examples and Black-box Attacks. In

Proceedings of 5thInternational Conference on Learning Representations .[28] Xingjun Ma, Bo Li, Yisen Wang, Sarah M. Erfani, Sudanthi Wijewickrema, GrantSchoenebeck, Michael E. Houle, Dawn Song, and James Bailey. 2018. Charac-terizing Adversarial Subspaces Using Local Intrinsic Dimensionality. In

Interna-tional Conference on Learning Representations . https://openreview.net/forum?id=B1gJ1L2aW[29] Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, andAdrian Vladu. 2018. Towards Deep Learning Models Resistant to Adversar-ial Attacks. In

International Conference on Learning Representations . https://openreview.net/forum?id=rJzIBfZAb[30] Jan Hendrik Metzen, Tim Genewein, Volker Fischer, and Bastian Bischoff. 2017.On Detecting Adversarial Perturbations. In

International Conference on LearningRepresentations .[31] Nikhil Mishra, Mostafa Rohaninejad, Xi Chen, and Pieter Abbeel. 2018. A SimpleNeural Attentive Meta-Learner. In

International Conference on Learning Represen-tations . https://openreview.net/forum?id=B1DmUzWAW[32] Takeru Miyato, Shin-ichi Maeda, Masanori Koyama, Ken Nakae, and Shin Ishii.2016. Distributional smoothing with virtual adversarial training.

InternationalConference on Learning Representations (2016).[33] Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. 2016.DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks. In

TheIEEE Conference on Computer Vision and Pattern Recognition (CVPR) .[34] Tianyu Pang, Chao Du, Yinpeng Dong, and Jun Zhu. 2018. Towards RobustDetection of Adversarial Examples. In

Advances in Neural Information ProcessingSystems 31 , S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi,and R. Garnett (Eds.). Curran Associates, Inc., 4579–4589. http://papers.nips.cc/paper/7709-towards-robust-detection-of-adversarial-examples.pdf[35] Nicolas Papernot, Fartash Faghri, Nicholas Carlini, Ian Goodfellow, Reuben Fein-man, Alexey Kurakin, Cihang Xie, Yash Sharma, Tom Brown, Aurko Roy, Alexan-der Matyasko, Vahid Behzadan, Karen Hambardzumyan, Zhishuai Zhang, Yi-LinJuang, Zhi Li, Ryan Sheatsley, Abhibhav Garg, Jonathan Uesato, Willi Gierke,Yinpeng Dong, David Berthelot, Paul Hendricks, Jonas Rauber, and Rujun Long.2018. Technical Report on the CleverHans v2.1.0 Adversarial Examples Library. arXiv preprint arXiv:1610.00768 (2018).[36] Nicolas Papernot, Patrick McDaniel, and Ian Goodfellow. 2016. Transferabilityin machine learning: from phenomena to black-box attacks using adversarialsamples. arXiv preprint arXiv:1605.07277 (2016).[37] Nicolas Papernot, Patrick McDaniel, Somesh Jha, Matt Fredrikson, Z Berkay Celik,and Ananthram Swami. 2016. The limitations of deep learning in adversarialsettings. In .IEEE, 372–387.[38] Nicolas Papernot, Patrick McDaniel, Xi Wu, Somesh Jha, and Ananthram Swami.2016. Distillation as a defense to adversarial perturbations against deep neuralnetworks. In

IEEE Symposium on Security and Privacy (SP) . IEEE.[39] Sachin Ravi and Hugo Larochelle. 2017. Optimization as a Model for Few-Shot Learning. In

International Conference on Learning Representations . https://openreview.net/forum?id=rJY0-Kcll[40] Mohammad Sabokrou, Mohammad Khalooei, Mahmood Fathy, and Ehsan Adeli.2018. Adversarially learned one-class classifier for novelty detection. In

Proceed-ings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) .3379–3388.[41] Jake Snell, Kevin Swersky, and Richard Zemel. 2017. Prototypical networksfor few-shot learning. In

Advances in Neural Information Processing Systems .4077–4087.[42] Yang Song, Taesup Kim, Sebastian Nowozin, Stefano Ermon, and Nate Kushman.2018. PixelDefend: Leveraging Generative Models to Understand and Defendagainst Adversarial Examples. In

International Conference on Learning Represen-tations . https://openreview.net/forum?id=rJUYGxbCW[43] D. J. Sorin, M. M. K. Martin, M. D. Hill, and D. A. Wood. 2002. SafetyNet: im-proving the availability of shared memory multiprocessors with global check-point/recovery. In

Proceedings 29th Annual International Symposium on ComputerArchitecture . 123–134. https://doi.org/10.1109/ISCA.2002.1003568[44] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan,Ian Goodfellow, and Rob Fergus. 2014. Intriguing properties of neural networks.In

International Conference on Learning Representations . https://openreview.net/forum?id=B1gJ1L2aW[45] Shixin Tian, Guolei Yang, and Ying Cai. 2018. Detecting Adversarial ExamplesThrough Image Transformation. https://aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/1740846] Jonathan Uesato, Brendan O’Donoghue, Pushmeet Kohli, and Aaron van denOord. 2018. Adversarial Risk and the Dangers of Evaluating Against WeakAttacks. In

Proceedings of the 35th International Conference on Machine Learning(Proceedings of Machine Learning Research) , Jennifer Dy and Andreas Krause(Eds.), Vol. 80. PMLR, StockholmsmÃďssan, Stockholm Sweden, 5025–5034. http://proceedings.mlr.press/v80/uesato18a.html[47] Oriol Vinyals, Charles Blundell, Timothy Lillicrap, Koray Kavukcuoglu, and DaanWierstra. 2016. Matching Networks for One Shot Learning. In

Proceedings of the30th International Conference on Neural Information Processing Systems (NIPS’16) .Curran Associates Inc., USA, 3637–3645. http://dl.acm.org/citation.cfm?id=3157382.3157504[48] Weitao Wan, Yuanyi Zhong, Tianpeng Li, and Jiansheng Chen. 2018. RethinkingFeature Distribution for Loss Functions in Image Classification. In

The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) .[49] Chaowei Xiao, Jun-Yan Zhu, Bo Li, Warren He, Mingyan Liu, and Dawn Song.2018. Spatially Transformed Adversarial Examples. In

International Conferenceon Learning Representations . https://openreview.net/forum?id=HyydRMZC-[50] Han Xiao, Kashif Rasul, and Roland Vollgraf. 2017. Fashion-MNIST:a Novel Image Dataset for Benchmarking Machine Learning Algorithms.arXiv:cs.LG/cs.LG/1708.07747[51] Weilin Xu, David Evans, and Yanjun Qi. 2018. Feature Squeezing: DetectingAdversarial Examples in Deep Neural Networks. In25th Annual Network andDistributed System Security Symposium, NDSS 2018, San Diego, California, USA,February 18-21, 2018