Online Alternate Generator against Adversarial Attacks
JJOURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 1
Online Alternate Generator against AdversarialAttacks
Haofeng Li, Yirui Zeng, Guanbin Li, Liang Lin, Yizhou Yu
Abstract —The field of computer vision has witnessed phenom-enal progress in recent years partially due to the developmentof deep convolutional neural networks. However, deep learningmodels are notoriously sensitive to adversarial examples whichare synthesized by adding quasi-perceptible noises on real images.Some existing defense methods require to re-train attacked targetnetworks and augment the train set via known adversarialattacks, which is inefficient and might be unpromising withunknown attack types. To overcome the above issues, we proposea portable defense method, online alternate generator, whichdoes not need to access or modify the parameters of the targetnetworks. The proposed method works by online synthesizinganother image from scratch for an input image, instead ofremoving or destroying adversarial noises. To avoid pretrainedparameters exploited by attackers, we alternately update thegenerator and the synthesized image at the inference stage.Experimental results demonstrate that the proposed defensivescheme and method outperforms a series of state-of-the-artdefending models against gray-box adversarial attacks.
Index Terms —Deep Neural Network, Adversarial Attack, Im-age Classification
I. I
NTRODUCTION I N recent years, deep convolutional neural networks haveobtained state-of-the-art performances on many machinelearning benchmarks, since they can harvest adaptive featureson large-scale training sets, in comparison to traditional meth-ods based on handcrafted features. However, deep learningmodels are found to be vulnerable with adversarial attacks ,which aim at synthesizing adversarial samples that are per-ceptually similar to real images but can mislead attackedmodels to yield totally incorrect labels, as shown in Fig. 1(b).Adversarial examples can be generated by applying quasi-perceptible perturbations which does not change labels recog-nized by human subjects. Such perturbations can be computedvia constrained optimization or backward propagation with anincorrect supervision. Thus, given a pretrained target network that might be accessed and attacked by hackers, how to protectit from adversarial attacks remains an important problem.
This work was supported in part by the National Key Research & Develop-ment Program (No.2020YFC2003902), in part by the Guangdong Basic andApplied Basic Research Foundation (No.2020B1515020048), in part by theNational Natural Science Foundation of China (No.61976250, No.61702565,No.U1811463), in part by the Fundamental Research Funds for the CentralUniversities (No.18lgpy63), and was also sponsored by CCF-Tencent OpenResearch Fund (Corresponding author is Guanbin Li).H. Li is with Shenzhen Research Institute of Big Data, The ChineseUniversity of Hong Kong (Shenzhen), Shenzhen 518172, China (e-mail:[email protected]).Y. Zeng, G. Li and L. Lin are with the school of Data and Com-puter Science, Sun Yat-sen University, Guangzhou 510006, China (e-mail:[email protected]; [email protected]; [email protected]).Y. Yu is with Deepwise AI Lab (e-mail: [email protected]). L a d y bu g P a n t h e r (c) Ours (org input) (d) Ours (adv input)Difference from Original ImageDifference from Original Image Fig. 1. The effectiveness of removing adversarial noise by our proposedmethod. (a) and (b) denote original images and their adversarial examples. (c)and (d) denote our generated results by taking original images and adversarialexamples as reference respectively. The second row and the fourth row arenormalized absolute difference between original images (a) and (b-d). Ourresults have different noise distribution from input adversarial examples. Thepredicted probability of correct labels is attached under each sample of (a-d). Our proposed defense method significantly increases the probability ofpredicting correct labels.
Many defense methods are developed to resist adversarialattacks on deep learning models. These defense methodscan be roughly categorized into two groups. One group ofdefenses act as a preprocessing component which does notrequire accessing, modifying or re-training the attacked tar-get network. These methods are portable and practical withdifferent target networks or tasks since the knowledge oftarget networks may be confidential in real applications. Thesemethods usually resort to image denoising and smoothing toremove adversarial noises, or image transformations that coulddestroy adversarial noises to some extent. Another group ofdefense methods require to access or re-train the parametersof target network. We argue that these methods may beimpractical and inefficient in real applications. For examples,adversarial training methods need to obtain the knowledge ofadversarial attacks and might be unpromising to resist unseen attack types. Kurakin et al. [1] also suggest that adversarialtraining with single-step attacks does not confer robustness toiterative adversarial samples. Ensemble adversarial training [2]requires augmenting the train set by N × M times, designing a r X i v : . [ c s . C V ] S e p OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 2 and training N different target networks, which is inefficient. M is the number of different known adversarial attacks used inadversarial training. It is difficult to transfer DefenseGAN [3]on large images since training GAN with large images isunstable and might need to adjust the network architecturefor different datasets.Motivated by the above observations, in this paper we aimat addressing such a problem: developing a portable defensemethod that protects a pretrained target network from unseen adversarial attacks with images of large size .We define a portable defense as a method that does notneed to access, modify or re-train the attacked target network.We claim that developing a portable defense is importantbecause some parameters of the target network might becommercially confidential. Re-designing and re-training thetarget network could cause heavy load. A portable defenseallows itself as a reusable self-contained component invokedvia API. Developing a defense method against unseen attacksis critical since it is impractical to know the attack typeused by attackers. On the other hand, developing a defensemethod working with large images, whose size is not smallerthan × , is practical and meaningful. Given the same L ∞ norm upper bound, the number of possible adversarialsamples increases exponentially with the number of pixels.Thus defending against adversarial attacks with larger imagesis more difficult. Tramer et al. [2] also suggest that resultsobtained on simple datasets [4] with small images does notalways generalize to harder tasks, for example, a classificationbenchmark with larger images.To address the above-mentioned issues, we conceive a noveldefending framework, online alternate generator. The proposeddefense scheme works by online synthesizing an image thatshares the same semantics with the input image but is almostfree from adversarial noises. To avoid the model parametersstolen and exploited by attackers, we also propose to updatethe generator and the synthesized image at the inferencestage, in an iterative and alternate manner. Besides, Gaussiannoise is utilized as an additional perturbation in updatingthe synthesized image, to prevent the generator from fittingadversarial noises.Our proposed method enjoys the following strengths. First,since the proposed method does not require any knowledgeof target networks or adversarial attacks, it is a portable defense that can theoretically protect arbitrary target classifiersfrom arbitrary unseen adversarial attacks. Second, a Gaussianperturbation is added to yield an image, which not only intro-duces randomness but also decreases the probability of fittingadversarial noise on the given input. Third, as the proposedmethod adopts online training, its model parameters are notfixed during inference and cannot be accessed beforehand bypotential attackers in real scene to synthesize an adversarialexample.In summary, this paper has the following contributions: • We develop a novel portable defense framework, onlinealternate generator, which can resist unseen adversarialattacks for unlimited pretrained classifiers, without theknowledge of target networks or adversarial attacks. • We propose a stopping criteria which does not requireaccessing any adversarial samples such that the proposedmethod can deal with unseen attacks. • We verify the transferability and generalization of theproposed method with different adversarial attacks, targetmodels and benchmarks. Different selections of hyper-parameters are also well investigated. • This paper presents extensive experimental results toverify that the proposed framework surpasses a widerange of existing state-of-the-art defense algorithms.II. R
ELATED W ORK
A. Adversarial Attack
Deep convolutional neural networks have demonstratedpowerful fitting capacity in solving computer vision problemsfor last several years, but they are threatened by adversarialattacks [5], [6], [7], [8], [9], [10], [11], [12], [13], [14],[15], [16]. Fast Gradient Sign Method (FGSM) [6] synthe-sizes adversarial examples by adding some weighted gradientswhich increases the prediction loss of the attacked targetnetwork, as shown in I adv = I + (cid:15) · sign ( ∂L ( F ( I, θ ) , Y I ) /∂I ) where I denotes a real image and I adv is the generatedadversarial example. sign ( · ) returns 1 for positive input, andreturns -1 for negative one. L ( x, y ) denotes a loss functionthat estimates the difference between a prediction x and theground truth y . F ( · , θ ) is the attacked target neural networkwith parameters θ . Y I denotes the ground truth annotation ofimage I . (cid:15) is the l ∞ norm of the adversarial perturbationand the weight of the gradient with respect to I . Attackstrength can be controlled by (cid:15) . The above algorithm isreferred as untargeted attack. A targeted variant [1] of FGSMencourages the attacked network to predict high probabilityat some deliberately incorrect category Y adv , as formulatedin I adv = I − (cid:15) · sign ( ∂L ( F ( I, θ ) , Y adv ) /∂I ) . IterativeGradient Sign Method (IGSM) [1], [17] iteratively applyFGSM multiple times with a small step size to locate a strongeradversarial sample, as shown in the following. I (cid:48) t +1 = I t − α · sign ( ∂L ( F ( I t , θ ) , Y adv ) /∂I ) I t +1 = clip ( I (cid:48) t +1 , I − (cid:15), I + (cid:15) ) (1)where I t is the adversarial example synthesized after t itera-tions and I = I . I (cid:48) t +1 is a temporary variable. clip ( · , l, u ) withlower bound l and upper bound u , is an element-wise operatorthat ensures the L ∞ of the adversarial perturbation | I t +1 − I | within the bound (cid:15) . Momentum based Iterative FGSM (MI-FGSM) [12] introduces a momentum term to stabilize thegradient descent and avoid the adversarial example stuck inpoor local minima, which helps to synthesize more transfer-able adversarial examples. Athalye et al. [18] propose threetricks, Backward Pass Differentiable Approximation, Expecta-tion over Transformation and Reparameterization, which havebroken into most existing obfuscated gradients based defensesunder a white-box attack setting. B. Gray-box Attack Setting
Adversarial attacks have multiple settings, according to howmuch knowledge that adversaries can access. The settings
OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 3 consist of white-box, gray-box and black-box. Adversariesin white-box attacks can obtain all information about targetmodels and defense methods. In black-box setting, adversariesdo not know the architecture, the training data, the parametersof target classifiers and defense methods. Gray-box attackshave more than one definition. Previous works have defineddifferent gray-box adversarial attacks and developed defensemethods under their own settings. Taran et al. [19] definea gray-box setting, where the attacker knows target networkarchitecture, training/testing data and the output label for eachinput, but has no knowledge of the network parameters andthe defense mechanism. Zheng and Hong [20] utilize anothergray-box setting, where attackers know the architecture of thetarget network and its defense strategy, but have no knowledgeof their parameters. Different from the above settings whereadversaries cannot obtain the network parameters, Guo etal. [21] introduce the following gray-box setting: the adversaryhas access to the model architecture and the model parameters,but is unaware of defense strategies. In this paper we adopt thegray-box setting in [21]. It is a strong gray-box attack settingsince both the architecture and parameters of target networkscan be accessed by adversaries.
C. Defense against Adversarial Attack
Many methods that aim at protecting some neural modelfrom adversarial examples are proposed recently [22], [21],[23], [24], [3], [25], [26], [27], [28], [29], [30], [31]. Safe-tyNet [22] incorporates a deep convolutional neural networkwith a RBF-SVM which converts the final ReLU outputsto discrete codes to detect adversarial examples. Guo etal. [21] utilize bit-depth quantization, JPEG compression, totalvariance minimization and image quilting to destory or removeadversarial noise. Xie et al. [23] resort to random resizing andrandom padding during inference stage to leverage the cuethat many adversarial examples are not scale invariant. PixelDefend [24] iteratively updates each pixel of an input imageusing a pretrained PixelCNN [32] that is learned to predict apixel value based on other pixels. Defend-GAN [3] learns tomodel the data distribution of clean images, and solves a vectorin its learned latent space to approximate an input image. Thelatent vector is exploited to synthesize a substitute for the inputvia a generative neural network. HGD [25] makes use of high-level feature extracted by the attacked target network to trainan image denoiser that can remove adversarial noise. SinceHGD requires accessing the intermediate outputs of targetclassifiers to tune the denoiser, it is not a portable defense.MagNet [33] utilizes a detector to reject inputs far from themanifold boundary of training data, and an auto-encoder asreformer to find a substitute that is close to the input on themanifold. It is not straight-forward to fairly compare MagNetwith other methods that do not reject adversarial examples.MagNet is based on the assumption that samples of someclassification task are on a manifold of lower dimensions, andis only evaluated with small-size images. Pixel Deflection [26]iteratively and locally swaps two randomly sampled pixelsaccording to their positions, before applying image denoiser todestroy adversarial noise. DIP [34] online trains a CNN that reconstructs the input image from a noise map. The outputof the trained CNN is sent to be classified. Different fromDIP, the proposed method updates an image and a CNN fromscratch in an alternate way. The CNN does not directly outputthe synthesized image but approximates the energy of a datadistribution for input images.III. M
ETHOD
This section describes the details of our proposed defenseframework, online alternate generator, and its mathematicalexplanation. The proposed method can be regarded as apre-processing component that protects a target network or target classifier from adversarial samples. Target network isdefined as a pre-trained neural model that is exposed to beattacked. The parameters of a target network can be accessedby attackers. That is to say, adversarial samples are synthesizedwith the target network. The proposed method is portable andpractical, and it does not require accessing, modifying or re-training the parameters of the target network. Different fromsome existing pre-processing defenses that try to remove ordestroy adversarial noises, our proposed method synthesizes animage from scratch. The synthesized image is almost identicalin appearance and semantics to the original image, but containsmuch less adversarial perturbations, and hence achieve morerobust classification.The overall pipeline of the proposed method is describedin Algorithm 1. Given an input image that may be an ad-versarial sample, we define reference image (denoted as I z in Algorithm 1) as the input image, and synthesize anotherimage I s to replace the original input before passing it intothe target network for classification. For each input image, theparameters θ of the generator F are initialized randomly andthe synthesized image I is filled with zeros at the beginning. T N denotes the maximum number of outer iterations while T I denotes the maximum number of inner iterations. Withineach outer iteration, the synthesized image is updated for T I times while the network parameters of our proposed methodis updated once.Training the generator and synthesizing theimage are conducted alternately. For each input image, thegenerator is updated for exactly T N times while the synthe-sized image I s is updated for T N ∗ T I times. Notice that θ t inAlgorithm 1 does not include any parameters of the targetnetwork, but only the parameters of the proposed defensemethod. In Line 7 of Algorithm 1, I T I +1 is assigned to I ,since we employ a circular array I { ,...,T I +1 } to restore T I + 1 latest snapshots of the synthesized image. An overview of theproposed method is illustrated in Fig. 2, it is composed ofan image updating procedure and a network updating process.Please refer to Algorithm 1 for the detailed iteration. In thefollowing sections, we will focus on the computation of bothimage updating and network updating, and the theoretical basisbehind this alternative updating mechanism. A. Image Updating
Let us discuss how to update a synthesized image initializedby zeros, with a reference image I z and a neural network F . F denotes the proposed generator instead of the target OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 4 update image update network maximize likelihood estimation F C l a y e r C O N V l a y e r minimize energy function C O N V l a y e r F C l a y e r update network update image forward flowbackward flow100 filters, 3 strides ref syn syn F ( , ) F ( , ) 𝑇 𝐼 steps 𝑇 𝑁 steps Fig. 2. A Defense Framework: Online Alternate Generator. ref denotes the reference image I z . syn denotes the synthesized image I s . F ( · , θ ) is the neuralnetwork with parameters θ . The blue-boundary box represents image updating according to Eq. (3) while the orange-boundary box denotes network updatingaccording to Eq. (5). The blue bending arrow means the inner iterations of image updating in Algorithm 1 while the orange bending arrows denotes the outeriterations of network updating. Algorithm 1
Online Alternate Generation Algorithm
Input: I z , reference image with potential adversarial noise Output: I s , synthesized image Randomly initialize θ I = 0 for t = 1 to T N do for s = 1 to T I do Update I s +1 with I s according to Eq. (3) end for I = I T I +1 Update θ t +1 with θ t according to Eq. (5) end for I s = I T I +1 return I s classifier. Suppose I z and I s are sampled from the samedata distribution denoted as p ( I ; θ ) = (1 /Z ) e − U ( I ; θ ) where I denotes an image. θ represents the parameters of the modeland Z is a normalization term. e denotes exponential and U isan energy function. Then we utilize the neural network F inthe proposed framework to approximate the energy function,i.e., F ( I, θ ) = − U ( I ; θ ) . To maximize the probability densityof the synthesized image I s , we update I s to minimizethe energy function by I s +1 = I s − α∂U ( I s ; θ ) /∂I = I s − α ( − ∂F ( I s , θ ) /∂I ) where I s is the current synthesizedimage and I s +1 is the updated image. α denotes the learningrate. ∂F ( I, θ ) /∂I is the gradients of neural network F ( · , θ ) with respect to image I , and can be computed by backwardpropagation. In a sense, generating an image is to reconstructthe reference image. However, synthesizing an image withadversarial noise is undesirable. Thus we further introduce anoise model during image updating, I s +1 = I s − α ( − ∂F ( I s , θ ) /∂I ) + (cid:15) g D (2)where D denotes some noise distribution, such as a gaussiannoise N (0 , . (cid:15) g is the strength of the noise. Adding noiseduring image synthesis can increase the difficulty in recoveringsubtle details, and thus decrease the chance of fitting adversar-ial noise. Langevin Dynamics, which is originally to simulatehow particles move under a random force, has a similar formwith Eq (2), a modified gradient decent with a Gaussianperturbation. To comprehend the relationship between α and (cid:15) g , we resort to Langevin Dynamics and come up with Eq (3)following a previous work [35]. I s +1 = I s − ( (cid:15) g /
2) ( I s − ∂F ( I s , θ ) /∂I ) + (cid:15) g N (0 , (3)where (cid:15) g controls the magnitude of the Gaussian noise. (cid:15) g / corresponds to the learning rate α . − (cid:15) g / is the inertiafactor of I s . Since random fluctuation is used to generate animage, the distribution of images is changed into p ( I ; θ ) =(1 /Z ) e − U ( I ; θ ) (1 / (2 π ) | S | / ) e − || I || . The multiplicative termon the right is a Gaussian distribution with σ = 1 . | S | denotesthe number of elements in image I . B. Network Updating
The following details how to update the neural network F such that the synthesized image I s gradually approximates thereference image I z . Notice that at the very beginning F isinitialized by randomization. Thus we update F to maximizethe likelihood with respect to I z . Let L ( θ ) = log ( p ( I z ; θ )) . θ is trained along the direction maximizing the log likelihood L ( θ ) with gradient descent: θ t +1 = θ t + β∂L ( θ t ) /∂θ , where θ t denotes the current parameters at the time step t . θ t +1 isthe updated parameters. ∂L ( θ t ) /∂θ denotes the gradients ofthe log likelihood function w.r.t θ . As suggested in [36], thegradient is computed in: ∂L ( θ ) /∂θ = ∂F ( I z ; θ t ) /∂θ − E p ( I ; θ ) [ ∂F ( I ; θ t ) /∂θ ] , (4)where E p ( I ; θ ) [ · ] is the expectation with I following the dis-tribution p ( I ; θ ) . The expectation is not explicitly calculatedbut approximated by sampling. Langevin Dynamics, which isadopted to update the image, is also a tool to sample image I from distribution p ( I ; θ ) . Thus we choose ∂F ( I s ; θ t ) /∂θ to approximate the expectation E p ( I ; θ ) [ · ] for simplicity. Thenlearning neural network F can be formulated as in Equation 5. θ t +1 = θ t + β ( ∂F ( I z ; θ ) /∂θ − ∂F ( I s ; θ ) /∂θ ) (5)For a testing image, T N samples ∂F ( I s ; θ ) /∂θ are selectedto approximate E p ( I ; θ ) [ ∂F ( I ; θ ) /∂θ ] . In practice, T N rangesfrom 200 to 300. Thus, hundreds of samples ∂F ( I s ; θ ) /∂θ areused to approximate the expectation E p ( I ; θ ) [ ∂F ( I ; θ ) /∂θ ] andsuch approximation is effective. OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 5
C. Analysis
This subsection presents explanation on why the synthesizedimage I s approximate the reference image I z after alternatelyupdating I s and the network F . In the following, we consider F as a very simple prototype. The prototype model containsa convolution layer, a ReLU layer and a summation operator.The prototype model contains a convolution layer, a ReLUlayer and a summation operator. I , the input of prototypemodel, could be reshaped as a vector of shape × chw .The convolution layer has K kernels and the kernel size is r h × r w . The weight of the convolution is denoted as W of size cr h r w × K , while the bias B is of size × K . For simplicity,we let h = r h and w = r w so that the convolution operatoris only applied on a position. The output of the convolutionis IW + B , of size × K . The ReLU function λ is appliedon each element of IW + B . The summation operator sumsup all elements of λ ( IW + B ) . Thus F ( I ) outputs a singlescalar and is formulated as: F ( I ) = (cid:88) Kk =1 λ ( IW k + B k ) (6)According to the definition of ReLU, λ ( x ) = max (0 , x ) .ReLU function can be represented as a multiplication betweenthe input and a Heaviside step function u . For x > , u ( x ) equals 1. Otherwise, u ( x ) is 0. Thus λ ( x ) = u ( x ) x . Then weformulate Eq (6) as: F ( I ) = (cid:88) Kk =1 ( u ( IW k + B k )( IW k + B k ))= (cid:88) Kk =1 ( u ( I, θ )( IW k + B k )) , (7)where u ( IW k + B k ) is a single scalar depending on I and θ .We denote u ( IW k + B k ) as u ( I, θ ) for simplicity.Assume that the synthesized image I s and the referenceimage I z belong to the same data distribution p ( I ; θ ) . Sincewe introduce Gaussian noise to synthesize I s , we assume that p ( I ; θ ) = 1 Z e − U ( I ; θ ) π ) | S | / e −|| I || / (8)Let I ∗ denote the image that maximizes the probability p ( I ; θ ) .The proposed method is to synthesize an image I s that approx-imates to I ∗ . Let F ( I ; θ ) approximate − U ( I ; θ ) . To maximizethe probability, we need to maximize − U ( I ; θ ) − || I || / as: F ( I ) − || I || / − || I || / I (cid:88) Kk =1 ( u ( I, θ ) W k ) + (cid:88) Kk =1 u ( I, θ ) B k = − || I − (cid:88) Kk =1 ( u ( I, θ ) W k ) || / C ( u ( I, θ ) , θ ) (9)where C ( u ( I, θ ) , θ ) depends on I and θ . When we updateimage I s as I s +1 , we can set u ( I, θ ) as u ( I s , θ ) and fixthe network parameters θ . Then we come up with I ∗ = (cid:80) Kk =1 ( u ( I s , θ ) W k ) to maximize Eq (9). Recall how we update I s +1 as shown in Eq (3): I s +1 = I s − ( (cid:15) g / I s − ∂F ( I s , θ ) /∂I ) + (cid:15) g N (0 , − (cid:15) g / I s + (cid:15) g / · ∂F ( I s , θ ) /∂I + (cid:15) g N (0 , − (cid:15) g / I s + (cid:15) g / (cid:88) Kk =1 ( u ( I s , θ ) W k ) + (cid:15) g N (0 , − (cid:15) g / I s + (cid:15) g / · I ∗ + (cid:15) g N (0 , (10)As Eq ( v ) , updating I s +1 is to do linear interpolation between I s and I ∗ with a Gaussian perturbation. Thus the pixel valuesof the synthesized image I s will approximate to those of I ∗ ,which is the peak of highest probability. Since I z is a sampleof high probability in the same data distribution with I ∗ , I s will also be visually similar to I z . When updating the network,the proposed method tunes θ to guarantee that the probabilityof sampling I s is as high as sampling I z .In our real implementation (shown in Fig. 2) of the proposedmethod, the neural network F , which contains a convolution,a fully-connected layer and a non-linear activation, can beapproximated by the above-mentioned model. Therefore theabove analysis also works for our actual implementation.Here we discuss the time complexity of the proposed onlinealternate generation algorithm (shown in Algorithm 1). Theproposed algorithm consists of two nested loops. The outerloop contains T N iterations while the inner loop has T I itera-tions. Assume that computing Eq. (3) takes the same constanttime as computing Eq. (5). The overall time complexity is O ( T N ( T I + 1)) = O ( T N · T I ) . In practice, it takes about 30seconds to process an input image of size × . D. Stopping Criteria
The proposed online alternate generation terminates after T N outer iterations as shown in Algorithm 1. On one hand,larger network step T N leads to unnecessary computations.On the other hand, with smaller T N , the proposed generatorcould fail to reproduce the semantics of an input image,which will seriously drop the classification accuracy of targetnetworks. More importantly, to develop a portable defenseagainst unseen attacks, we need to determine T N withouttuning it on known adversarial samples. Here, we proposeto utilize images perturbed by Gaussian noises to choose thenetwork steps T N . It is based on an assumption that the onlinealternate generating process of adversarial samples and thosewith Gaussian noises are similar. The experimental details andresults are presented in Section IV-E.IV. E XPERIMENTS
A. Experimental Settinga) Dataset:
We evaluate the performance of our methodon ILSVRC 2012 dataset [39] and Oxford Flower-17 dataset[40]. Most images in ILSVRC 2012 dataset are large imageswhose size is not smaller than × . Most images inOxford Flower-17 dataset are larger than × . We claimthat it is practical and meaningful to develop a defense methodthat works with large images. Because experimental results OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 6
TABLE IT OP -1 ACCURACY OF DIFFERENT DEFENSE METHODS AGAINST
FGSM, IGSM, MI-FGSM
AND
C&W ON ILSVRC 2012
DATASET AND O XFORD F LOWER -17
DATASET . ∗ INDICATES THAT THE METHOD NEEDS TO BE TRAINED BEFORE INFERENCE . ILSVRC 2012 Oxford Flower-17Defense Methods
FGSM IGSM MI-FGSM C&W FGSM IGSM MI-FGSM C&WNone 8.35% 0.05% 0.30% 0.00% 27.65% 7.35% 0.59% 0.00%Mean Filter [37] 39.65% 67.65% 38.50% 75.55% 60.59% 73.24% 47.06% 66.47%JPEG [38] 19.70% 54.55% 0.85% 67.50% 45.88% 47.94% 20.88% 28.82%TVM [21] 41.00% 66.70% 53.45% 69.00% 69.12% 83.82% 70.88% 80.29%Pixel Deflection [26] 41.70% 55.80% 51.10% 57.00% 71.47% 82.06% 70.59% 78.82%Randomization [23] 40.05% 67.15% 44.80% 68.30% 72.65% 83.24% 62.94% 75.88% ∗ Pixel Defend [24] 20.15% 55.25% 20.05% 66.75% 48.82% 53.53% 27.06% 37.35%Ours 49.10% 75.25% 55.90% 77.95% 76.76% 87.35% 73.82% 84.41% obtained on simple dataset such as MNIST [4] do not alwaysgeneralize to harder tasks, as suggested by [2]. As suggestedin [23], it is less meaningful to attack misclassified images,we randomly choose 2000 correctly classified images (2images/class) from the validation set to perform experiments. b) Target Network:
We choose ResNet18 [41] as theattacked target network on Oxford FLower-17 dataset, andutilize ResNet50 on ILSVRC 2012 dataset. To demonstratethe transferability of out proposed defense method, we furtherconduct experiments on Oxford Flower-17 dataset with dif-ferent target networks, including VGG11 [42], MobileNet v2[43] and DenseNet121 [44]. c) Attack Methods:
We exploit FGSM [6] and IGSM[1] to construct targeted adversarial examples with randomlyselected target categories. We utilize MI-FGSM [12] and C&Wattack [45] to synthesize untargeted adversarial samples. Inthis paper, we adopt gray-box attacks [21] in which attackerscan access the target network and its parameters, but haveno knowledge of defense methods. L ∞ norm is adopted tobound adversarial perturbations. (cid:15) denotes the upper bound of L ∞ norm. We choose (cid:15) for each attack type such that the ad-versarial attack is strong enough and its resulting perturbationis imperceptible. For example, we attack a ResNet18 model bytargeted IGSM, as shown in Fig. 3. As (cid:15) grows from 0 to 5, thetop-1 accuracy of the attacked model drops rapidly. For (cid:15) largerthan 5, the top-1 accuracy becomes stable. In such case, largerperturbations do not lead to stronger attack but only synthesizenoisy and undesirable images. For FGSM, IGSM and MI-FGSM, (cid:15) is selected respectively as {
6, 6, 2 } on ILSVRC2012 dataset and {
6, 6, 4 } on Oxford Flower-17 dataset. Thenumber of iterations is set as min( (cid:15) +4, ceil(1.25 (cid:15) )), accordingto [1]. ceil( · ) denotes rounding up to an integer. The step sizeis set as 1. d) Defense Methods: In this paper, we compare ourmethod with six state-of-the-art portable defenses, includingMean Filter [37], JPEG compression and decompression [38],TVM [21], Pixel Deflection [26], Randomization [23] andPixel Defend [24]. All of these methods play a role as pre-processings. In addition, the first five methods can be usedwithout training. Although Pixel Defend needs to train aPixelCNN [32] network on the training set, it does not needto access the parameters of adversarial attacks. Pixel Defend
IGSM
Fig. 3. Top-1 accuracy of ResNet18 attacked by targeted IGSM with different (cid:15) ( l ∞ norm of adversarial perturbations) on Oxford Flower-17 dataset. has been successfully attacked by Athalye et al. [18] in white-box setting, but it is still meaningful to compare it with ourproposed method in the gray-box setting. B. Comparison with the State-of-the-art
We choose existing portable defenses that can work withunseen attack types and large images for comparison. Thesedefense methods do not require accessing target networksor attack types. This experiment is conducted with gray-boxattack setting. The parameters of all defenses are invisibleto attackers. As can be observed in TABLE I, our proposeddefense method achieves the best top-1 accuracy on bothILSVRC 2012 dataset and Oxford Flower-17 dataset, againstfour kinds of attacks, including FGSM, IGSM, MI-FGSMand C&W. On ILSVRC 2012 dataset, The proposed methodoutperforms the second best Pixel Deflection by 7.4% againstFGSM, and surpasses the second best Mean Filter by 7.6%against IGSM. Our proposed method outperforms the secondbest method TVM by 2.45% top-1 accuracy on MI-FGSMand Mean Filter by 2.4% top-1 accuracy on C&W. On OxfordFlower-17 dataset, the top-1 accuracy of our method is 4.1%higher than the second best against FGSM and 3.5% higherthan TVM against IGSM. The proposed method outperforms
OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 7
TABLE III
NVESTIGATION OF THE PROPOSED METHOD WITH E NSEMBLE A DVERSARIAL T RAINING . A
DVERSARIAL EXAMPLES ARE SYNTHESIZEDUSING
MI-FGSM ON ILSVRC 2012
DATASET . Attacks With
IncV3ens IncV3 + ours IncV3ens + oursIncV4 71.80% 73.30% 78.45%IncV3 34.70% 39.60% 71.70% the second best method, TVM, by 2.94% top-1 accuracy onMI-FGSM and 4.12% top-1 accuracy on C&W attack.TABLE I shows that top-1 accuracy on IGSM is betterthan FGSM, while the results in [17] suggest that IGSMis stronger than FGSM on white-box attacks with the same (cid:15) . We claim that our experimental results is reasonable be-cause our experiments is tested on gray-box attacks. In ourcases, defense methods shown in TABLE I have modifiedthe adversarial perturbations in adversarial samples. Sinceadversarial perturbations computed in iterative methods ‘fit’the target network better than single-step methods, minorchanges on iterative adversarial samples could reduce moreattack effects than single-step adversarial samples. It can beregarded as an evidence supporting that stronger adversariesdecrease transferability [17]. Experimental results in PixelDeflection [26] also show that classifier accuracy of IGSMis higher than FGSM on gray-box attacks. Notice that thenumeric results of Pixel Deflection in our experiment maydiffer from those in [26]. It is due to that we adopt L ∞ norm following the original FGSM/IGSM while Prakash etal. [26] use L norm. Different norm may result in differentdistribution of adversarial noises and attack effects.We discuss two reasons why the proposed method arebetter than the previous works. First, the proposed methodsynthesizes a new image with less adversarial noises to replacethe original input. In Sec III-C we have shown that the syn-thesized image I s approximates to the original input I z . Theintroduced Gaussian perturbation in Eq (3) avoids recoveringthe adversarial noises of I z . Second, the transferability amongCNN models may be a reason. When updating image I s , F has also been updated for hundreds of times. That is to say, I s is synthesized based on hundreds of different CNN models(a model F with different parameters). The pixel values of I s could be suitable for another CNN model (such as the targetclassifier) to extract effective features. C. Investigation with Adversarial Training
This section presents how our proposed method work withEnsemble Adversarial Training method [2]. Noted that adver-sarial training based methods are not portable defense as theyrequire to re-train the attacked target networks. As shownin TABLE II, IncV3ens, IncV3+ours and IncV3ens+oursrespectively denote an ensemble adversarial training IncV3(Inception-V3) [46] model, a plain IncV3 model defended byour proposed method and the Ensemble Adversarial Trainingmodel defended by our proposed method. ‘Attacks With’denotes the pre-trained models used to synthesize adversarialexamples. The pre-trained IncV3 model has the same architec-
TABLE IIIT OP -1 ACCURACY OF DIFFERENT DEFENSE METHODS AGAINST
IGSM ON O XFORD F LOWER -17
DATASET WITH DIFFERENT TARGET NETWORKS . ∗ INDICATES THAT THE METHOD REQUIRES OFFLINE TRAINING . Defense Methods ResNet18 VGG11 MobileNet v2 DenseNet121
None 7.35% 9.71% 5.59% 5.29%Mean Filter 73.24% 69.12% 82.65% 60.00%JPEG 47.94% 50.00% 58.53% 27.56%TVM 83.82% 67.94% 88.53% 73.24%Pixel Deflection 82.06% 77.06% 82.65% 82.56%Randomization 83.24% 72.65% 86.75% 72.53% ∗ Pixeldefend 53.53% 50.88% 61.47% 35.29%Ours 87.35% 80.29% 91.18% 87.65% ture but different parameters from IncV3ens. Thus attacks withIncV4 (Inception-V4) [47] and IncV3 are in black-box settingand gray-box setting respectively. The results indicate that theproposed method outperforms Ensemble Adversarial Trainingby 1.5% and 4.9% top-1 accuracy in black-box and gray-box setting respectively. Besides, our proposed method signif-icantly enhances the top-1 accuracy of the IncV3ens model by6.65% and 37% top-1 accuracy against attacks with IncV4 andIncV3 respectively. Notice that Ensemble Adversarial Trainingand the proposed method obtain a relatively lower accuracy(less than 40%) on gray-box setting. Such situation could bedue to that the IncV3 model is relatively more sensitive to theMI-FGSM attack than other models such as ResNet. Besides,we find that combining the proposed method with EnsembleAdversarial Training can achieve acceptable top1-accuracy of71.70%.
D. Transferability Analysis
This section investigates whether our defense method istransferable to protect different kinds of target networks.Experiments are conducted on Oxford Flower-17 dataset.Adversarial examples are generated using IGSM in this sec-tion. The target networks includes VGG11, MobileNet v2and DenseNet121. These target networks are initialized withweights pretrained on ImageNet, and then fine-tuned on thetraining set of Oxford FLower-17 dataset. The top-1 accuracyof VGG11, MobileNet v2 and DenseNet121 are 91.47%,97.35% and 96.47% respectively on clean images from thetest set. We determine (cid:15) according to the criteria discussed inthe above section, and select (cid:15) = 6 for ResNet18, (cid:15) = 8 forVGG11, (cid:15) = 6 for MobileNet v2 and (cid:15) = 8 for DenseNet121.As shown in TABLE III, the proposed method demonstrateshigher top-1 accuracy than other state-of-the-art defense algo-rithms. Our method exceeds TVM by 3.5% top-1 accuracy onResNet18, and outperforms Pixel Deflection by 3.2% top1-accuracy on VGG11. On MobileNet v2, the proposed methodsurpasses the second best TVM by 2.6% top-1 accuracy.The top1-accuracy of our method is 5.1% higher than thesecond best Pixel Deflection on DenseNet121. Notice that theproposed method does not need offline training beforehand,and therefore does not obtain any knowledge or responseof the attacked target networks. Tuning hyper-parameters isalso independent of attacked target models. Thus the proposed
OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 8
Fig. 4. Comparison among reference images with different noises. ‘Accuracy’denotes top-1 accuracy. ‘Step’ corresponds to Network Steps.
Gaussian Noise means that images are degraded by Gaussian Noise.
FGSM and
IGSM denoteadversarial noises generated by FGSM and IGSM. method is not biased towards any specific models, but enjoyssuperior transferability with various types of target networks.
E. Ablation Study
Our proposed defense method contains several essentialhyper-parameters, including Network Steps T N , Image Steps T I and kernel size. We investigate how different settings ofthese hyper-parameters affect the performance of our methodon Oxford Flower-17 dataset. When investigating a hyper-parameter, other hyper-parameters are set as default valuessuggested in Section IV-A. a) Network Steps T N : To determine T N , the number ofiteration (in the outer loop), a straightforward way is to gen-erate adversarial samples with training set, run the proposedmethod on these adversarial samples, and find out a suitablevalue for T N . However, as a defender we usually do not knowthe exact attack type in testing stage. Thus it is meaningful todetermine T N without the knowledge of adversarial attacks.Since our proposed method does not estimate the attack typeof an input image, we assume that using samples with non-adversarial noises also can determine the iteration number.Then we conduct an experiment to verify the assumption. Wegenerate three sets of noisy samples for training set. One set isgenerated by adding Gaussian noises ( µ = 0 , σ = ± . ). Theother two sets are generated by adding adversarial noises ofFGSM and IGSM. We run the proposed method for differentiterations on these three sets of data. We use the target imageclassifier to classify the images synthesized by the proposedmethod at different iterations. The results is shown in Fig 4 thathas taken average of the whole training set. It does reflect mostof cases. The resulting curve of IGSM, FGSM and Gaussiannoises are marked with red, blue and green color in Fig 4. Ascan be seen, these three curves have almost the same trend,which converge and become flat after around 200 steps. Thuswe can barely rely on the curve corresponding to Gaussiannoise to determine a suitable Network Steps. On ILSVRC 2012dataset, Network Steps is selected as 600 in the same way. TABLE IVC
OMPARISON AMONG DIFFERENT I MAGE S TEPS . Image Steps T I
10 20 30 40
Top-1 Accuracy
TABLE VC
OMPARISON AMONG DIFFERENT KERNEL SIZES . Kernel Sizes × ×
11 15 ×
15 21 × Top-1 Accuracy b) Image Steps T I : We evaluate four different ImageSteps with kernel size set to × and Network Steps setto 300. As shown in TABLE IV, the top-1 accuracy increasesas more Image Steps is adopted. However, as larger ImageSteps leads to higher time cost, we select T = 20 to make atrade-off between performance and efficiency. c) Kernel Size: We evaluate four different kernel sizesof the first convolution layer, with Image Steps set to 20 andNetwork Steps set to 300. TABLE V shows that our methodperforms the best with kernel size × . Small receptivefield is susceptible to adversarial perturbations. Convolutionswith smaller receptive field benefit the reconstruction of imagedetails, even including the adversarial perturbation on thegiven inference image. On the contrary, images synthesizedby convolutions with larger receptive field may lower thequality of small patterns and details, which also degrades theclassification accuracy. F. Investigation with Natural/Clean Images
We present the results on both clean and adversarial sam-ples, as shown in TABLE VI. The results are obtained onOxford Flower-17 with C&W as attack and ResNet18 astarget classifier. None denotes the target classifier withoutany defenses. In TABLE VI, JPEG and Pixel Defend arethe best on clean images, but their top-1 accuracy are lessthan 40% and the worst on adversarial images. TVM isworse than our method on both clean and adversarial samples.Pixel Deflection is close to our method on clean images butour method exceeds it by 5.59% top-1 accuracy against theadversarial attack. Randomization surpasses our method by2.9% on clean images but our method outperforms it by 8.6%on adversarial examples. Comparing with DIP, our proposedmethod obtains 1.5% higher top1-accuracy against adversarialsamples, and 2.4% lower accuracy on clean images. Overall,the proposed method achieves the state-of-the-art trade-offbetween natural images and adversarial examples.
G. Visualization of Online Alternate Generation
This section visualizes the synthesized image of our defensemethod during the alternate generation. As shown in Fig. 5,the columns captioned with ‘Iteration k ’ are the intermedi-ate synthesized images of our proposed defense method atiteration k . The maximum number of iterations correspondsto Network Steps that controls how many times the neural
OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 9
Iteration10 Iteration600Iteration20 Iteration30 Iteration50 Iteration100 (a) (b)OriginalImage AdversarialExample i). ILSVRC 2012 dataset
Iteration10 Iteration300Iteration20 Iteration30 Iteration50 Iteration100 (a) (b)OriginalImage AdversarialExample ii). Oxford Flower-17 dataset
Fig. 5. The visualization of our defense method on ILSVRC 2012 and Oxford Flower-17 dataset under targeted Iterative FGSM attack. We exhibit imagesgenerated by our defense method at different iterations. (a) visualizes the difference between adversarial examples and original images. (b) visualizes thedifference between our results and original images. As can be observed, our results have different noise distribution from input adversarial examples.TABLE VII
NVESTIGATION WITH NATURAL / CLEAN IMAGES . Defense Methods Clean Images Adversarial
None 97.06% 0.00%Mean Filter [37] 94.71% 66.47%JPEG [38] 97.06% 28.82%TVM [21] 90.88% 80.29%Pixel Deflection [26] 92.35% 78.82%Randomization [23] 95.00% 75.88%Pixel Defend [24] 96.76% 37.35%DIP [34] 94.41% 82.94%Ours 92.06% 84.41% model in our method is updated. As the iteration numberincreases, the synthesized image become clearer and sharper.The synthesized image at the last iteration is visually similarto the original image. The column captioned with (a) isthe residual map of an adversarial example (synthesized byIterative FGSM). The residual map of a synthesized image isdefined as a normalized pixel-wise absolute difference betweenthe synthesized image and its corresponding original image.The column captioned with (b) is the residual map of thesynthesized image in our method. Comparing column (a) withcolumn (b), the residual map of an adversarial examples differsfrom that of the synthesized image in our method, whichsuggests that the noise distribution between an adversarialexample and its corresponding synthesized image is quitedistinct. The synthesized images could be less affected by theadversarial noises.
H. Whether Alternate Update is Necessary
To understand whether the alternate update scheme in ourproposed method is necessary, let us consider a simple auto-encoder. The auto-encoder employs online training strategy asthe proposed method to achieve portable defense, but doesnot adopt two-step update. Given an image, we first randomlyinitialize the parameters of the auto-encoder, and then online tune its parameters by taking the image as input and supervi-sion. After T N -step updates, the auto-encoder takes the imageas input and outputs another image. The output serves as asubstitute to be sent into a target classifier. For fair comparison,we instantiate the auto-encoder in a symmetric form, and itsencoder part as the same as the proposed generator. The auto-encoder consists of a convolution layer, two fully-connectedlayers, a non-linear activation and a deconvolution layer. Thefully-connected layers are in between the convolution and thedeconvolution while the non-linear layer is located in betweenthese two fully-connected layers. The first fully-connectedlayer encodes a tensor into a scalar while the second onedecodes a scalar into a tensor. The auto-encoder achieves82.94% on Oxford Flower-17 benchmark with C&W as attackand ResNet18 as target classifier. Our proposed method usingalternate update obtains 84.41%, slightly better than the auto-encoder using one-step update. It may be due to that thealternate update with Langevin Dynamics could better samplea representative point in some latent space, and hence recovermore accurate semantics for an input image.V. C ONCLUSION
In this paper we develop a novel portable defense frameworkthat reconstructs an image with less adversarial noise andalmost the same semantics as an input image. The recon-structed image acts as a substitute to defend against adversarialattacks. The hyper-parameters of our proposed defense do notneed to be tuned on any adversarial examples, which avoid abias towards some known attacks. The proposed defense doesnot access, modify any parameters or intermediate outputs oftarget models, which allows the defense portably transferableto a wide range of target classifiers. Experimental results showthat our method obtains the state-of-the-art performance.R
EFERENCES[1] A. Kurakin, I. J. Goodfellow, and S. Bengio, “Adversarial machine learn-ing at scale,” in
International Conference on Learning Representations ,2017.
OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 10 [2] F. Tramer, A. Kurakin, N. Papernot, I. J. Goodfellow, D. Boneh, andP. D. Mcdaniel, “Ensemble adversarial training: Attacks and defenses,”in
International Conference on Learning Representations , 2018.[3] P. Samangouei, M. Kabkab, and R. Chellappa, “Defense-gan: Protect-ing classifiers against adversarial attacks using generative models,” in
International Conference on Learning Representations , 2018.[4] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learningapplied to document recognition,”
Proceedings of the IEEE , vol. 86,no. 11, pp. 2278–2324, 1998.[5] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. J.Goodfellow, and R. Fergus, “Intriguing properties of neural networks,”in
International Conference on Learning Representations , 2014.[6] I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessingadversarial examples,” in
International Conference on Learning Repre-sentations , 2015.[7] S. Moosavidezfooli, A. Fawzi, and P. Frossard, “Deepfool: A simple andaccurate method to fool deep neural networks,” in
Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition , 2016,pp. 2574–2582.[8] C. Xie, J. Wang, Z. Zhang, Y. Zhou, L. Xie, and A. L. Yuille,“Adversarial examples for semantic segmentation and object detection,”in
Proceedings of the IEEE International Conference on ComputerVision , 2017, pp. 1378–1387.[9] J. H. Metzen, M. C. Kumar, T. Brox, and V. Fischer, “Universal adversar-ial perturbations against semantic image segmentation,” in
Proceedingsof the IEEE International Conference on Computer Vision , 2017, pp.2774–2783.[10] S. Moosavidezfooli, A. Fawzi, O. Fawzi, and P. Frossard, “Universaladversarial perturbations,” in
Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition , 2017, pp. 86–94.[11] M. M. Cisse, Y. Adi, N. Neverova, and J. Keshet, “Houdini: Foolingdeep structured visual and speech recognition models with adversarialexamples,” in
Advances in Neural Information Processing Systems ,2017, pp. 6977–6987.[12] Y. Dong, F. Liao, T. Pang, H. Su, J. Zhu, X. Hu, and J. Li, “Boost-ing adversarial attacks with momentum,” in
Proceedings of the IEEEConference on Computer Vision and Pattern Recognition , 2018.[13] O. Poursaeed, I. Katsman, B. Gao, and S. J. Belongie, “Generativeadversarial perturbations,” in
Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition , 2018, pp. 4422–4431.[14] Z. Zhao, D. Dua, and S. Singh, “Generating natural adversarial exam-ples,” in
International Conference on Learning Representations , 2018.[15] A. Arnab, O. Miksik, and P. H. S. Torr, “On the robustness of semanticsegmentation models to adversarial attacks,” in
Proceedings of the IEEEConference on Computer Vision and Pattern Recognition , 2018, pp. 888–897.[16] C. Xiao, J. Zhu, B. Li, W. He, M. Liu, and D. Song, “Spatially trans-formed adversarial examples,” in
International Conference on LearningRepresentations , 2018.[17] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towardsdeep learning models resistant to adversarial attacks,” in
InternationalConference on Learning Representations , 2018.[18] A. Athalye, N. Carlini, and D. A. Wagner, “Obfuscated gradients give afalse sense of security: Circumventing defenses to adversarial examples,”in
Proceedings of International Conference on Machine Learning , 2018,pp. 274–283.[19] O. Taran, S. Rezaeifar, and S. Voloshynovskiy, “Bridging machinelearning and cryptography in defence against adversarial attacks,” in
European Conference on Computer Vision . Springer, 2018, pp. 267–279.[20] Z. Zheng and P. Hong, “Robust detection of adversarial attacks bymodeling the intrinsic properties of deep neural networks,” in
Advancesin Neural Information Processing Systems , 2018, pp. 7924–7933.[21] C. Guo, M. Rana, M. Cisse, and L. V. Der Maaten, “Countering adver-sarial images using input transformations,” in
International Conferenceon Learning Representations , 2018.[22] J. Lu, T. Issaranon, and D. A. Forsyth, “Safetynet: Detecting andrejecting adversarial examples robustly,” in
Proceedings of the IEEEInternational Conference on Computer Vision , 2017, pp. 446–454.[23] C. Xie, J. Wang, Z. Zhang, Z. Ren, and A. L. Yuille, “Mitigatingadversarial effects through randomization,” in
International Conferenceon Learning Representations , 2018.[24] Y. Song, T. Kim, S. Nowozin, S. Ermon, and N. Kushman, “Pixelde-fend: Leveraging generative models to understand and defend againstadversarial examples,” in
International Conference on Learning Repre-sentations , 2018. [25] F. Liao, M. Liang, Y. Dong, T. Pang, X. Hu, and J. Zhu, “Defense againstadversarial attacks using high-level representation guided denoiser,” in
Proceedings of the IEEE Conference on Computer Vision and PatternRecognition , 2018.[26] A. Prakash, N. Moran, S. Garber, A. Dilillo, and J. A. Storer, “Deflectingadversarial attacks with pixel deflection,” in
Proceedings of the IEEEConference on Computer Vision and Pattern Recognition , 2018.[27] N. Akhtar, J. Liu, and A. S. Mian, “Defense against universal adversarialperturbations,”
Proceedings of the IEEE Conference on Computer Visionand Pattern Recognition , pp. 3389–3398, 2018.[28] J. H. Metzen, T. Genewein, V. Fischer, and B. Bischoff, “On detectingadversarial perturbations,” in
International Conference on LearningRepresentations , 2017.[29] H. Li, G. Li, and Y. Yu, “Rosa: robust salient object detection againstadversarial attacks,”
IEEE transactions on cybernetics , 2019.[30] X. He, S. Yang, G. Li, H. Li, H. Chang, and Y. Yu, “Non-local contextencoder: Robust biomedical image segmentation against adversarial at-tacks,” in
Proceedings of the AAAI Conference on Artificial Intelligence ,vol. 33, 2019, pp. 8417–8424.[31] J. Yang, R. Xu, R. Li, X. Qi, X. Shen, G. Li, and L. Lin, “An adversarialperturbation oriented domain adaptation approach for semantic segmen-tation,” in
Proceedings of the AAAI Conference on Artificial Intelligence ,2020.[32] A. V. Den Oord, N. Kalchbrenner, and K. Kavukcuoglu, “Pixel recurrentneural networks,” in
International Conference on Machine Learning ,2016, pp. 1747–1756.[33] D. Meng and H. Chen, “Magnet: A two-pronged defense againstadversarial examples,” in
Proceedings of the 2017 ACM SIGSACConference on Computer and Communications Security , ser. CCS ’17.New York, NY, USA: ACM, 2017, pp. 135–147. [Online]. Available:http://doi.acm.org/10.1145/3133956.3134057[34] A. Kattamis, T. Adel, and A. Weller, “Exploring properties of the deepimage prior,” in
Advances in Neural Information Processing SystemsWorkshop , 2019.[35] Y. Lu, S. Zhu, and Y. N. Wu, “Learning frame models using cnn filters,”in
National Conference on Artificial Intelligence , 2016, pp. 1902–1910.[36] J. Xie, W. Hu, S. Zhu, and Y. N. Wu, “Learning sparse frame modelsfor natural image patterns,”
International Journal of Computer Vision ,vol. 114, no. 2, pp. 91–112, 2015.[37] X. Li and F. Li, “Adversarial examples detection in deep networks withconvolutional filter statistics,” in
International Conference on ComputerVision , 2017, pp. 5775–5783.[38] G. K. Dziugaite, Z. Ghahramani, and D. M. Roy, “A study of theeffect of jpg compression on adversarial images,” arXiv preprintarXiv:1608.00853 , 2016.[39] J. Deng, W. Dong, R. Socher, and L. J. Li, “Imagenet: A large-scalehierarchical image database,” in
Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition , 2009, pp. 248–255.[40] M.-E. Nilsback and A. Zisserman, “A visual vocabulary for flowerclassification,” in
Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition , vol. 2, 2006, pp. 1447–1454.[41] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for imagerecognition,” in
Proceedings of the IEEE Conference on Computer Visionand Pattern Recognition , 2016, pp. 770–778.[42] K. Simonyan and A. Zisserman, “Very deep convolutional networks forlarge-scale image recognition,” in
International Conference on LearningRepresentations , 2015.[43] M. Sandler, A. G. Howard, M. Zhu, A. Zhmoginov, and L. Chen,“Mobilenetv2: Inverted residuals and linear bottlenecks,” in
Proceedingsof the IEEE Conference on Computer Vision and Pattern Recognition ,2018.[44] G. Huang, Z. Liu, L. V. Der Maaten, and K. Q. Weinberger, “Denselyconnected convolutional networks,” in
Proceedings of the IEEE Confer-ence on Computer Vision and Pattern Recognition , 2017, pp. 2261–2269.[45] N. Carlini and D. Wagner, “Towards evaluating the robustness of neuralnetworks,” in
IEEE Symposium on Security and Privacy (SP) , 2017, pp.39–57.[46] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinkingthe inception architecture for computer vision,” in
Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition , 2016,pp. 2818–2826.[47] C. Szegedy, S. Ioffe, V. Vanhoucke, and A. A. Alemi, “Inception-v4,inception-resnet and the impact of residual connections on learning,” in
Thirty-First AAAI Conference on Artificial Intelligence , 2017.
OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 11
Haofeng Li is a research scientist in Shenzhen Re-search Institute of Big Data, The Chinese Universityof Hong Kong (Shenzhen). He received his Ph.D.degree from the Department of Computer Science,the University of Hong Kong in 2020, and his B.Sc.degree from School of Data and Computer Science,Sun Yat-Sen University in 2015. He is a recipient ofHong Kong PhD Fellowship. He has been servingas a reviewer for TIP, Pattern Recognition, Neuro-computing and IEEE Access. His current researchinterests include computer vision, image processingand deep learning.
Yirui Zeng received her B.S. degree in theSchool of Electronics and Information engineering, Chongqing University in 2017. She is currentlypursuing the masters degree in the School of Elec-tronics and Information engineering, Sun Yat-SenUniversity. Her current research interests includecomputer vision and deep learning.
Guanbin Li (M’15) is currently an associate profes-sor in School of Data and Computer Science, SunYat-sen University. He received his PhD degree fromthe University of Hong Kong in 2016. His currentresearch interests include computer vision, imageprocessing, and deep learning. He is a recipient ofICCV 2019 Best Paper Nomination Award. He hasauthorized and co-authorized on more than 60 papersin top-tier academic journals and conferences. Heserves as an area chair for the conference of VISAPP.He has been serving as a reviewer for numerousacademic journals and conferences such as TPAMI, IJCV, TIP, TMM, TCyb,CVPR, ICCV, ECCV and NeurIPS.
Liang Lin (M’09, SM’15) is a full Professor of SunYat-sen University. He is the Excellent Young Sci-entist of the National Natural Science Foundation ofChina. From 2008 to 2010, he was a Post-DoctoralFellow at the University of California, Los Angeles.From 2014 to 2015, as a senior visiting scholar, hewas with the Hong Kong Polytechnic University andthe Chinese University of Hong Kong. He currentlyleads the SenseTime R & D teams to develop cutting-edge and deliverable solutions on computer vision,data analysis and mining, and intelligent roboticsystems. He has authored and co-authored more than 100 papers in top-tier academic journals and conferences. He has been serving as an associateeditor of IEEE Trans. Human-Machine Systems, The Visual Computer andNeurocomputing. He served as Area/Session Chair for numerous conferences,including ICME, ACCV, and ICMR. He was the recipient of the Best PaperRunners-Up Award at ACM NPAR 2010, a Google Faculty Award in 2012,the Best Paper Diamond Award at IEEE ICME 2017, and the Hong KongScholars Award in 2014. He is a Fellow of IET.