[PDF] Learning to Defend by Learning to Attack

Abstract

Adversarial training provides a principled approach for training robust neural networks. From an optimization perspective, adversarial training is essentially solving a bilevel optimization problem. The leader problem is trying to learn a robust classifier, while the follower problem is trying to generate adversarial samples. Unfortunately, such a bilevel problem is difficult to solve due to its highly complicated structure. This work proposes a new adversarial training method based on a generic learning-to-learn (L2L) framework. Specifically, instead of applying existing hand-designed algorithms for the inner problem, we learn an optimizer, which is parametrized as a convolutional neural network. At the same time, a robust classifier is learned to defense the adversarial attack generated by the learned optimizer. Experiments over CIFAR-10 and CIFAR-100 datasets demonstrate that L2L outperforms existing adversarial training methods in both classification accuracy and computational efficiency. Moreover, our L2L framework can be extended to generative adversarial imitation learning and stabilize the training.

Full PDF

LLearning to Defense by Learning to Attack ∗ Haoming Jiang ∗ , Zhehui Chen ∗ , Yuyang Shi, Bo Dai, Tuo Zhao † March 12, 2020

Abstract

Adversarial training is a principled approach for training robust neural networks. Froman optimization perspective, adversarial training is solving a bilevel optimization problem (ageneral form of minimax approaches): The leader problem targets on learning a robust classiﬁer;The follower problem tries to generate adversarial samples. Unfortunately, such a bilevel problemis very challenging to solve due to its highly complicated structure. This work proposes a newadversarial training method based on a generic learning-to-learn (L2L) framework. Speciﬁcally,instead of applying hand-designed algorithms for the follower problem, we learn an optimizer,which is parametrized by a convolutional neural network. Meanwhile, a robust classiﬁer islearned to defense the adversarial attacks generated by the learned optimizer. Our experimentsover CIFAR datasets demonstrate that L2L improves upon existing methods in both robustaccuracy and computational eﬃciency. Moreover, the L2L framework can be extended to otherpopular bilevel problems in machine learning.

This decade has witnessed great breakthroughs in deep learning in a variety of applications, suchas computer vision (Taigman et al., 2014; Girshick et al., 2014; He et al., 2016; Liu et al., 2017).Recent studies (Szegedy et al., 2013), however, show that most of these deep learning models arevery vulnerable to adversarial attacks. Speciﬁcally, by injecting a small perturbation to a normalsample, one can obtain an adversarial sample. Although the adversarial sample is semanticallyindistinguishable from the normal one, it can fool deep learning models and undermine the securityof deep learning, causing reliability problems in autonomous driving, biometric authentication, etc.Researchers have devoted many eﬀorts to study eﬃcient adversarial attack and defense (Szegedyet al., 2013; Goodfellow et al., 2014b; Nguyen et al., 2015; Zheng et al., 2016; Madry et al., 2017; Carliniand Wagner, 2017). There is a growing body of work on generating adversarial samples, e.g., fast ∗ Code will be released after the acceptance of the paper. † H. Jiang, Z. Chen, Y. Shi, and T. Zhao are aﬃliated with School of Industrial and Systems Engineering at GeorgiaTech; B. Dai is aﬃliated with Google Brain; Tuo Zhao is the corresponding author; Email:[email protected]. a r X i v : . [ c s . L G ] M a r radient sign method (FGSM, Goodfellow et al. (2014b)), projected gradient method (PGM, Kurakinet al. (2016)), Carlini-Wagner (CW, Paszke et al. (2017)), etc. As for defense, existing methods can beuniﬁed as a bilevel optimization problem as follows:(Leader) min θ E P ∗ (cid:104) (cid:96) ( f θ ( (cid:101) x ) , (cid:101) y ) (cid:105) , (1)(Follower) subject to P ∗ ∈ argmax (cid:101) P ∈P E (cid:101) P (cid:104) q f θ (cid:16) ( x , y ) , ( (cid:101) x , (cid:101) y ) (cid:17)(cid:105) , (2)where f θ denotes the neural network classiﬁer with parameter θ , ( x , y ) denotes the clean samplefrom distribution D , q f θ ( · , · ) denotes a measure depending on network f θ , and P denotes a set ofjoint distributions of perturbed sample ( (cid:101) x , (cid:101) y ) and clean sample ( x , y ) . Here (cid:101) P ∈ P satisﬁes that ineach sample ( (cid:101) x , (cid:101) y ) is close to ( x , y ) and the marginal distribution of (cid:101) P over ( x , y ) is D . By solving(2), P ∗ essentially represents an eﬀective adversarial distribution. Existing adversarial trainingmethods use diﬀerent approaches to ﬁnd P ∗ under diﬀerent q f θ and P . For example, Goodfellowet al. (2014b) consider a special case of this problem, distributionally robust optimization (DRO,Gao and Kleywegt (2016); Rahimian and Mehrotra (2019)). In DRO, q f θ in (2) is the same as (cid:96) in(1) and (cid:101) P ∈ P satisﬁes that in each sample (cid:101) y = y , i.e., train the network f θ over adversarial samplesand still require f θ to yield the correct labels. Another example is adversarial interpolation training(AIT, Haichao Zhang (2019)), where q f θ is the cosine similarity between the features of adversarialsample and clean sample, and P is a set of adversarial distribution yielded by mixup (Zhang et al.,2017). More details are in Section 2.In the optimization literature, (1) and (2) are referred to as leader and follower optimizationproblems, respectively. Such a bilevel formulation naturally provides us a uniﬁed perspectiveon prior works of robustifying the neural network: The leader aims to ﬁnd a robust networkwith parameter θ so that the loss given by the training distribution from the follower problemis minimized; The follower targets on ﬁnding an optimal distribution that maximizes a certainmeasure, which yields a distribution of adversarial samples.Though the bilevel problem is straightforward and well formulated, it is hard to solve. Eventhe simplest version of bilevel problem, linear-linear bilevel optimization, is shown to be NP-hard (Colson et al., 2007). In our case, the problem becomes even more challenging, since the lossfunction (cid:96) in the leader problem is highly nonconvex in θ and the follower targets on ﬁnding anoptimal distribution under a nonconcave measure q f θ . Besides, in general, the feasible domainof the follower problem is a space of continuous distributions; while, in practice, we have onlyﬁnite samples to approximate the original problem. Such a gap makes the problem even morechallenging to solve.There are several approaches to solve the original problem (1) and (2). Under the DRO setting,Goodfellow et al. (2014b) propose to use FGSM to solve the DRO. However, Kurakin et al. (2016)then ﬁnd that FGSM with true label suﬀers from a “label leaking” issue, which ruins adversarialtraining. Madry et al. (2017) further suggest to ﬁnd adversarial samples by PGM and obtain a betterresult than FGSM, since FGSM essentially is one iteration PGM; Alternatively, Haichao Zhang (2019)propose to combine FGSM and mixup to yield an adversarial samples for both feature and label. All2hese methods need to ﬁnd an adversarial ( (cid:101) x i , (cid:101) y i ) for each clean sample ( x i , y i ) , thus the dimensionof the overall search space for all samples is substantial, which makes the computation expensive.More recently, Li et al. (2019) propose to use the natural evolution strategy to learn an adversarialdistribution over feature under the black-box setting, which is beyond the scope of this paper.To address the above challenges, we propose a new learning-to-learn (L2L) framework thatprovides a more principled and eﬃcient way for solving adversarial training. Speciﬁcally, weparameterize the optimizer of the follower problem by a neural network denoted by g φ ( A f θ ( x , y )) ,where A f θ ( x , y ) denotes the input of the optimizer g φ and φ is the parameter of the optimizer.We also call the optimizer as the attacker. Since the neural network is very powerful in functionapproximation, our parameterization ensures that g φ is able to yield strong adversarial samples.Under our framework, instead of directly solving (2), we update the parameter φ of the optimizer g φ . Our training procedure becomes updating the parameters of two neural networks, which isquite similar to generative adversarial network (GAN, Goodfellow et al. (2014a)). The proposed L2Lis a generic framework and can be extended to other bilevel optimization problems, e.g., generativeadversarial imitation learning, which is studied in Section 5.Diﬀerent from the hand-designed methods that compute the adversarial perturbation δ i = (cid:101) x i − x i for each individual sample ( x i , y i ) using gradients from backpropagation, our methods generate theperturbations for all samples through the shared optimizer g φ . This enables the optimizer g φ to learnpotential common structures of the perturbations. Therefore, our method is capable of yieldingstrong perturbations and accelerating the training process. Furthermore, the L2L framework is veryﬂexible: we can either choose diﬀerent input A f θ ( x , y ) , or use diﬀerent architecture. For example,we can include gradient information in A f θ ( x , y ) and use a recurrent neural network (RNN) tomimic multi-step gradient-type methods. Instead of simply computing the high order informationwith ﬁnite diﬀerence approximation or multiple gradients, by parameterizing the algorithm as aneural network, our proposed methods can capture this information in a much adaptive way Finnet al. (2017). Our experiments demonstrate that our proposed method not only outperforms existingadversarial training methods, e.g., PGM training, but also enjoys the computational eﬃciency overCIFAR-10 and CIFAR-100 datasets (Krizhevsky and Hinton, 2009).The research on L2L has a long history (Schmidhuber, 1987, 1992, 1993; Younger et al., 2001;Hochreiter et al., 2001; Andrychowicz et al., 2016). The basic idea is that the updating formula ofcomplicated optimization algorithms is ﬁrst modeled in a parametric form, and then the parametersare learned by some simple algorithms, e.g., stochastic gradient algorithm. Among existing works,Hochreiter et al. (2001) propose a system allowing the output of backpropagation from one networkto feed into an additional learning network, with both networks trained jointly; Based on this,Andrychowicz et al. (2016) further show that the design of an optimization algorithm can be cast asa learning problem. Speciﬁcally, they use long short-term memory RNNs to model the algorithmand allow the RNNs to exploit structure in the problems of interest in an adaptive way, which isundoubtedly one of the most popular methods for learning-to-learn.However, there are two major drawbacks of the existing L2L methods: (1) It requires a large3mount of datasets (or a large number of tasks in multi-task learning) to guarantee the learnedoptimizer to generalize, which signiﬁcantly limits their applicability (most of the related works onlyconsider the image encoding as the motivating application); (2)

The number of layers/iterationsin RNNs for modeling algorithms cannot be large to avoid signiﬁcant computational burden inbackpropagation.Our contribution is that we ﬁll the blank of L2L framework in solving bilevel optimizationproblems, and our proposed methods do not suﬀer from the aforementioned drawbacks: (1)

Diﬀerent f θ and ( x , y ) essentially yield diﬀerent follower problems. Therefore, for adversarialtraining, we have suﬃciently many tasks for learning-to-learn; (2) The follower problem does notneed a large scale RNN, and we use a convolutional neural network (CNN) or a length-two RNN(sequence of length equals 2) as our attacker network, which eases computation.

Notations . Given a ∈ R , denote ( a ) + as max( a, . Given two vectors x , y ∈ R d , denote x i as the i -th element of x , || x || ∞ = max i | x i | as the (cid:96) ∞ -norm of x , x ◦ y = [ x y , · · · , x d y d ] (cid:62) as element-wiseproduct, and e i is the vector with i -th element as and others as . Denote the simplex in R d by ∆ ( d ) , the (cid:96) ∞ -ball centered at x with radius (cid:15) by B ( x , (cid:15) ) = { y ∈ R d : || y − x || ∞ ≤ (cid:15) } and the projectionto B (0 , (cid:15) ) as Π (cid:15) ( δ ) = sign( δ ) ◦ max( | δ | , (cid:15) ) , where sign and max are element-wise operators. This paper focuses on the defense for (cid:96) ∞ -norm attack. In this section, we ﬁrst introduce two popularcases of the original problem in the literature: distributionally robust optimization (DRO) andadversarial interpolation training (AIT). Then we discuss the fundamental hardness of solving theoriginal problem and the drawbacks of existing approaches. Instead of using population loss in problem (1) and (2), we use empirical loss in the followingcontext, since in practice we only have ﬁnite samples. Given n samples { ( x i , y i ) } ni =1 , where x i isthe i -th image and y i is the one-hot vector for the corresponding label, DRO solves the followingproblem: min θ n n (cid:88) i =1 (cid:104) (cid:96) ( f θ ( x i + δ i ) , y i ) (cid:105) , (3)subject to δ i ∈ argmax δ ∈B (0 ,(cid:15) ) (cid:96) ( f θ ( x i + δ ) , y i ) . (4)The standard pipeline of DRO version is shown in Algorithm 1. Since the step of generatingadversarial perturbation δ i in Algorithm 1 is intractable, most adversarial training methods adopthand-designed algorithms. For example, Kurakin et al. (2016) propose to solve follower problem (4)approximately by ﬁrst order methods such as PGM. Speciﬁcally, PGM iteratively updates the4dversarial perturbation by the projected sign gradient ascent method for each sample: Given onesample ( x i , y i ) , at the t -th iteration, PGM takes δ ti ← Π (cid:15) (cid:16) δ t − i + η · sign (cid:16) ∇ x (cid:96) ( f θ ( (cid:101) x ti ) , y i ) (cid:17)(cid:17) , (5)where (cid:101) x ti = x i + δ t − i , η is the perturbation step size, T is a pre-deﬁned total number of iterations,and δ i = , t = 1 , · · · , T . Finally PGM takes δ i = δ Ti . Note that FGSM essentially is one-iterationPGM. Besides, some works adopt other optimization methods, e.g., momentum gradient method(Dong et al., 2018), and L-BFGS (Tabacof and Valle, 2016). Algorithm 1

Distributionally Robust Optimization.

Input: { ( x i , y i ) } ni =1 : data, α : learning rate, N : number of iterations, (cid:15) : perturbation magnitude. for t ← to N do Sample a minibatch M t for i in M t do δ i ← argmax δ ∈B (0 ,(cid:15) ) (cid:96) ( f θ ( x i + δ ) , y i ) // Generate adversarial data. θ ← θ − α |M t | (cid:80) i ∈M t ∇ θ (cid:96) ( f θ ( x i + δ i ) , (cid:101) y i ) // Update θ over adversarial data.Alternatively, AIT adopts the mixup method to generate an adversarial distribution for a givensample ( x i , y i ) and then randomly select a sample ( (cid:101) x i , (cid:101) y i ) from this adversarial distribution. Speciﬁ-cally, AIT solves the following problem: min θ n n (cid:88) i =1 E ( (cid:101) x i , (cid:101) y i ) ∼ D i (cid:104) (cid:96) ( f θ ( (cid:101) x i ) , (cid:101) y i ) (cid:105) , (6)where D i = { ( (cid:101) x ji , (cid:101) y ji ) } nj =1 is generated as follows: (cid:101) x ji = argmin (cid:101) x ∈B ( x i ,(cid:15) ) f s θ ( x j ) · f s θ ( (cid:101) x ) || f s θ ( x j ) || || f s θ ( (cid:101) x ) || , (cid:101) y ji = argmin (cid:101) y ∈ ∆ ( C ) ∩B ( y i ,(cid:15) y ) || (cid:101) y − − y j C − || , (7)where f s θ ( · ) denotes the output of the s -th layer of network f θ , and C denotes the number of classes.The standard pipeline is shown in Algorithm 2. To ease the computation, Haichao Zhang (2019)use one-step gradient update as the solution of (7). Ideally, we want to obtain the optima for the follower problem, i.e., P ∗ := argmax (cid:101) P ∈P E (cid:101) P (cid:104) q f θ (cid:16) ( x , y ) , ( (cid:101) x , (cid:101) y ) (cid:17)(cid:105) . However, the measure q f θ depends on network f θ , which makes solving P ∗ intractable. Therefore,in reality the sample ( (cid:101) x i , (cid:101) y i ) from the obtained solution (cid:101) P is very unlikely to be the sample ( x ∗ i , y ∗ i ) P ∗ . This then often leads to a highly unreliable or even completely wrong search direction, i.e., (cid:104)∇ θ (cid:96) ( f θ ( (cid:101) x i ) , (cid:101) y i ) , ∇ θ (cid:96) ( f θ ( x ∗ i ) , y ∗ i ) (cid:105) < , which may further results in a limiting cycle shown in Figure 1 (Detailed discussion is in Ap-pendix A). This becomes even worse when sample noises exist. Moreover, among the methodsmentioned earlier, except FGSM, all require numerous queries for gradients, which is computation-ally expensive. Algorithm 2

Adversarial Interpolation Training.

Input: { ( x i , y i ) } ni =1 : data, α : learning rate, N : number of iterations, (cid:15), (cid:15) y : perturbation magnitudes, s : the output layer of network for the follower. for t ← to N do Sample a minibatch M t for i in M t do Sample another index j , (cid:101) y i ← (1 − (cid:15) y ) y i + (cid:15) y ( − y j ) / ( C − , (cid:101) x i ← argmin (cid:101) x ∈B ( x i ,(cid:15) ) f s θ ( x j ) · f s θ ( (cid:101) x ) || f s θ ( x j ) || || f s θ ( (cid:101) x ) || // Generate adversarial data. θ ← θ − α |M t | (cid:80) i ∈M t ∇ θ (cid:96) ( f θ ( (cid:101) x i ) , (cid:101) y i ) // Update θ over adversarial data.Figure 1: Illustration for the hardness of problem (1) and (2) . A wrong update direction leads to a limitingcycle and algorithms fail to converge. More details in Appendix A.

Since the hand-designed methods for bilevel problem (1) and (2) do not perform well, we propose tolearn an optimizer for the follower problem. Speciﬁcally, we parameterize δ = (cid:101) x − x , the diﬀerence6etween the adversarial sample and clean input , by a neural network g φ ( A f θ ( x , y )) with input A f θ ( x , y ) summarizing the information of data and classiﬁer f θ ( · ) . We ﬁrst show how our methodworks on the DRO approach: We convert DRO problem (3) and (4) to min θ n n (cid:88) i =1 (cid:96) ( f θ ( x i + g φ ( A f θ ( x i , y i ))) , y i ) , (8)where φ ∗ is deﬁned as the solution to the following optimization problem: φ ∗ ∈ argmax φ n n (cid:88) i =1 (cid:96) ( f θ ( x i + g φ ( A f θ ( x i , y i ))) , y i ) , subject to g φ ( A f θ ( x i , y i )) ∈ B (0 , (cid:15) ) , i ∈ [1 , ..., n ] . The optimizer g φ targets on generating optimal perturbations under constraints g φ ( A f θ ( x i , y i )) ∈ B (0 , (cid:15) ) . These constraints can be easily handled by a tanh activation function and a (cid:15) scaler in thelast layer of g φ . OptimizerClassiﬁer x AAACnnicbVFNaxRBEO0dv+L6lejRS+MSEA/LTCLoSSKKeAlJMJssZJdQ01Oz22x/0V2jWYY5e/CqP85/Y8/uHpwkBU0/Xr2iXlXlTslAafq3l9y5e+/+g62H/UePnzx9tr3z/CzYygscCausH+cQUEmDI5KkcOw8gs4VnueLT23+/Dv6IK05paXDqYaZkaUUQJE6ubrcHqTDdBX8Jsg2YMA2cXy50/s5KayoNBoSCkK4yFJH0xo8SaGw6U+qgA7EAmZYr/w1fDdSBS+tj88QX7EdHegQljqPSg00D9dzLXlb7qKi8v20lsZVhEasG5WV4mR5OywvpEdBahkBCC+jQy7m4EFQXEl/9/82NHPxUwqaLr9wrefQnSuAaZnPGDfg8TD6+qjcHHKkerJyWTb10WnW1CIO1tS6qU2Uf0O6XZpbVXQK8qt1hUeDP4TVGkzxZkJUYAmVoqgiauLpsuuHugnO9obZ/nDv5O3g4MPmiFvsJXvFXrOMvWMH7Cs7ZiMmGLJf7Df7k/DkS3KYHK2lSW9T84J1Ihn/AwTC0+I= Unfold

Multi-StepOne-StepZero-Step

Optimizer OptimizerClassiﬁer x AAACnnicbVFNaxRBEO0dv+L6lejRS+MSEA/LTCLoSSKKeAlJMJssZJdQ01Oz22x/0V2jWYY5e/CqP85/Y8/uHpwkBU0/Xr2iXlXlTslAafq3l9y5e+/+g62H/UePnzx9tr3z/CzYygscCausH+cQUEmDI5KkcOw8gs4VnueLT23+/Dv6IK05paXDqYaZkaUUQJE6ubrcHqTDdBX8Jsg2YMA2cXy50/s5KayoNBoSCkK4yFJH0xo8SaGw6U+qgA7EAmZYr/w1fDdSBS+tj88QX7EdHegQljqPSg00D9dzLXlb7qKi8v20lsZVhEasG5WV4mR5OywvpEdBahkBCC+jQy7m4EFQXEl/9/82NHPxUwqaLr9wrefQnSuAaZnPGDfg8TD6+qjcHHKkerJyWTb10WnW1CIO1tS6qU2Uf0O6XZpbVXQK8qt1hUeDP4TVGkzxZkJUYAmVoqgiauLpsuuHugnO9obZ/nDv5O3g4MPmiFvsJXvFXrOMvWMH7Cs7ZiMmGLJf7Df7k/DkS3KYHK2lSW9T84J1Ihn/AwTC0+I= Optimizer r ˜ x q ( x, ˜ x ( t ) ; ✓ ) AAAC2nicbVFLbxMxEHaWVwmPpnDksiKqlCIU7RYkkHopjwMXRBFNW6kbolnvbGLF613sWUhk+cIFIa78AX4NVzjyb3AeQmzbkSx//uYbfeOZtJLCUBT9aQWXLl+5em3jevvGzVu3Nztbd45MWWuOA17KUp+kYFAKhQMSJPGk0ghFKvE4nb5Y5I8/ojaiVIc0r3BYwFiJXHAgT406e4mCVMLIJiRkhnbmXPihN3v47/ne9mjH7dlk6WU1Zi6hCRK4nVGnG/WjZYTnQbwGXbaOg9FW60uSlbwuUBGXYMxpHFU0tKBJcImundQGK+BTGKNd+rlw21NZmJfaH0Xhkm3ooDBmXqReWQBNzNncgrwod1pT/nRohapqQsVXRnktQyrDxaDCTGjkJOceANfCdxjyCWjg5MfZ3v7fhsaVv6QE1+Sn1aJn0/yXAbVgXqKfgMbXvq9nsppAimSTZZe5s28OY2e5/5izhbPKy98hXSxNS5k1CtLZqkKjwk+8LApQ2YOEKMMcakleReT86uKzizoPjnb78aP+7tvH3f3n6yVusHvsPuuxmD1h++wVO2ADxtkP9pP9Yr+DJPgcfA2+raRBa11zlzUi+P4X1o7tRA== ˜ x ( t ) AAACrnicbVFLbxMxEHaWVwmPpnDksiKqVDhEuy0SHMvjwAVRRJNW6obI651NrPglexYaWT7zI7jCj+Lf4E1yYNuOZPnTN99ovpkpjeAOs+xvL7l1+87dezv3+w8ePnq8O9h7MnG6sQzGTAttz0vqQHAFY+Qo4NxYoLIUcFYu37f5s+9gHdfqFFcGppLOFa85oxip2WC3QC4q8Jfhmz/AF2E2GGajbB3pdZBvwZBs42S21/tZVJo1EhQyQZ27yDODU08tciYg9IvGgaFsSefg135Duh+pKq21jU9humY7OiqdW8kyKiXFhbuaa8mbchcN1m+mnivTICi2aVQ3IkWdtsOnFbfAUKwioMzy6DBlC2opw7ii/v7/bXBu4icEDV1+aVrPrjuXo6plPkDcgIVP0ddbYRa0BPTF2mUd/OfTPHgWBwteBq+i/CvgzdJSi6pTUF5uKiwo+MG0lFRVLwvECmraCIwqxPZ0+dVDXQeTw1F+NDr88mp4/G57xB3yjDwnByQnr8kx+UhOyJgw0pBf5Df5k2TJJJkms4006W1rnpJOJIt/EDzaJA== ˜ x AAACpnicbVFNb9NAEN2YrzZ8tXDkYhFVqjhEdovUnlARHLgARTRNUR2V8XqcrLJf2h1DI8tnfgBX+GH9N6yTHHDbkVb79OaN5s1MbqXwlCRXvejO3Xv3H2xs9h8+evzk6db2s1NvKsdxxI007iwHj1JoHJEgiWfWIahc4jifv2vz4x/ovDD6hBYWJwqmWpSCAwXqW0ZCFlhfNhdbg2SYLCO+CdI1GLB1HF9s935lheGVQk1cgvfnaWJpUoMjwSU2/azyaIHPYYr10mcT7wSqiEvjwtMUL9mODpT3C5UHpQKa+eu5lrwtd15ReTiphbYVoearRmUlYzJxO3RcCIec5CIA4E4EhzGfgQNOYTX9nf/b0NSGT0pouvzctp59dy4PumXeY9iAw4/B11tpZ5Aj1dnSZdnUn0/SpuZhsKZWTa2D/CvS7dLcyKJTkF+uKhxq/MmNUqCLVxlRgSVUkoKKqD1dev1QN8Hp3jDdH+59eT04erM+4gZ7wV6yXZayA3bEPrBjNmKcKfab/WF/o93oUzSKxitp1FvXPGediL7/A9Ip15g= ˜ x AAACpnicbVFNb9NAEN2YrzZ8tXDkYhFVqjhEdovUnlARHLgARTRNUR2V8XqcrLJf2h1DI8tnfgBX+GH9N6yTHHDbkVb79OaN5s1MbqXwlCRXvejO3Xv3H2xs9h8+evzk6db2s1NvKsdxxI007iwHj1JoHJEgiWfWIahc4jifv2vz4x/ovDD6hBYWJwqmWpSCAwXqW0ZCFlhfNhdbg2SYLCO+CdI1GLB1HF9s935lheGVQk1cgvfnaWJpUoMjwSU2/azyaIHPYYr10mcT7wSqiEvjwtMUL9mODpT3C5UHpQKa+eu5lrwtd15ReTiphbYVoearRmUlYzJxO3RcCIec5CIA4E4EhzGfgQNOYTX9nf/b0NSGT0pouvzctp59dy4PumXeY9iAw4/B11tpZ5Aj1dnSZdnUn0/SpuZhsKZWTa2D/CvS7dLcyKJTkF+uKhxq/MmNUqCLVxlRgSVUkoKKqD1dev1QN8Hp3jDdH+59eT04erM+4gZ7wV6yXZayA3bEPrBjNmKcKfab/WF/o93oUzSKxitp1FvXPGediL7/A9Ip15g= Figure 2:

An illustration of L2L: A neural network models optimizer for generating attack.

This L2L framework is very ﬂexible: We can choose diﬀerent A f θ ( x , y ) as the input and mimicmulti-step algorithms shown in Figure 2. Here we provide three examples for the DRO: Naive Attacker.

This is the simplest example among our methods, taking the original image x i asthe input, i.e., A f θ ( x i , y i ) = x i and δ i = g φ ( x i ) . Under this setting, L2L training is similar to GAN training. The major diﬀerence is that the generatorin GAN yields synthetic data by transforming random noises, while the naive attacker generatesperturbations via training samples.

Gradient Attacker.

Motivated by FGSM, we design an attacker which takes the gradient informationinto computation. Speciﬁcally, we concatenate the image x i and the gradient ∇ x (cid:96) ( f θ ( x i ) , y i ) as theinput of g , i.e., A f θ ( x i , y i ) = (cid:104) x i , ∇ x (cid:96) ( f θ ( x i ) , y i ) (cid:105) and δ i = g φ (cid:16) [ x i , ∇ x (cid:96) ( f θ ( x i ) , y i )] (cid:17) . This helps to handle the constraints δ ∈ B (0 ,(cid:15) ) . riginal Input Classifier ! BackpropagationPerturbation Perturbed InputFollower ObjLeader Obj + ConcatenateInput and Gradient Attacker " st pass2 ed pass3 rd pass Figure 3:

The architecture of PGM adversarial training with gradient attacker network.

Since more information is provided, we expect the attacker network to be more eﬀective to learnand yield more powerful perturbations.

Multi-Step Gradient Attacker.

Motivated by PGM, we adapt the RNN to mimic a multi-stepgradient update. Speciﬁcally, we use the gradient optimizer network as the cell of RNN sharingthe same parameter φ . As we mentioned earlier, the number of layers/iterations in the RNNfor modeling algorithms cannot be very large so as to avoid signiﬁcant computational burden inbackpropagation. In this paper, we focus on a length-two RNN to mimic a two-step gradient update.The corresponding perturbation becomes: (cid:101) x i = x i + Π (cid:15) (cid:16) δ (0) i + g φ (cid:16) [ (cid:101) x (0) i , ∇ x (cid:96) ( f θ ( (cid:101) x (0) i ) , y i )] (cid:17) , where (cid:101) x (0) i = x i + δ (0) i and δ (0) i = g φ (cid:16) [ x i , ∇ x (cid:96) ( f θ ( x i ) , y i )] (cid:17) . Algorithm 3

Learning-to-learn-based DRO with gradient attacker

Input: { ( x i , y i ) } ni =1 : clean data, α , α : learning rates, N : number of epochs. for t ← to N do Sample a minibatch M t for i in M t do u i ← ∇ x (cid:96) ( f θ ( x i ) , y i ) , δ i ← g φ ([ x i , u i ]) // Generate perturbation by g φ . θ ← θ − α |M t | (cid:80) i ∈M t ∇ θ (cid:96) ( f θ ( x i + δ i ) , y i ) // Update θ over adversarial data. φ ← φ + α |M t | (cid:80) i ∈M t ∇ φ (cid:96) ( f θ ( x i + δ i ) , y i ) // Update φ over adversarial data.Taking gradient attackers as an example, Figure 3 illustrates how L2L works and jointly trainstwo networks: The ﬁrst forward pass is used to obtain gradient of the classiﬁcation loss over theclean data; The second forward pass is used to generate perturbation δ i by the attacker g ; The thirdforward pass is used to calculate the adversarial loss (cid:96) in (8). Since our gradient attacker networkonly needs one backpropagation to query gradient, it amortizes the adversarial training cost, which8eads to better computational eﬃciency. Moreover, L2L may adapt to the underlying optimizationproblem and yield better solution for the follower problem. The corresponding procedure of L2L isshown in Algorithm 3. Algorithm 4

Learning-to-learn with Adversarial Interpolation Training

Input: { ( x i , y i ) } ni =1 : data, α : learning rate, N : number of iterations, (cid:15), (cid:15) y : perturbation magnitudes. for t ← to N do Sample a minibatch M t for i in M t do Sample another index j , (cid:101) y i ← (1 − (cid:15) y ) y i + (cid:15) y ( − y j ) / ( C − , u i = ∇ x i q f θ ( x i , x j ) , δ i ← g φ ( x i , u i ) // Generate perturbation by g φ . φ ← φ − α |M t | (cid:80) i ∈M t ∇ φ q f θ ( x i + δ i , x j ) // Update φ over adversarial data. θ ← θ − α |M t | (cid:80) i ∈M t ∇ θ (cid:96) ( f θ ( x i + δ i ) , (cid:101) y i ) // Update θ over adversarial data.It is straightforward to extend L2L to AIT as shown in Algorithm 4. For the feature perturbation,we simply replace the gradient of (cid:96) , ∇ x (cid:96) ( f θ ( x i ) , y i ) , by the gradient of q f θ ( x i , x j ) = f s θ ( x i ) · f s θ ( x j ) || f s θ ( x i ) || || f s θ ( x j ) || , ∇ x i q f θ ( x i , x j ) in the attacker input. Taking gradient network as an example, given a sample ( x i , y i ) ,we ﬁrst randomly select another sample ( x j , y j ) , and yield the adversarial training feature as follows: (cid:101) x i = x i + g φ (cid:18) [ x i , ∇ x i q f θ ( x i , x j )] (cid:19) , (9)and adopt the corresponding label vector (cid:101) y i from (7). To demonstrate the eﬀectiveness and computational eﬃciency of our methods, we conduct experi-ments over both CIFAR-10 and CIFAR-100 datasets. We compare our methods with original PGMtraining and adversarial interpolation training. All implementations are done in PyTorch with onesingle NVIDIA 2080 Ti GPU. Here we discuss the white-box setting, which is the most direct way toevaluate the robustness.

Classiﬁer Network.

All experiments adopt a 34-layer wide residual network (WRN-34-10, Zagoruykoand Komodakis (2016)) implemented by Zhang et al. (2019) as the classiﬁer network. For eachmethod, we train the classiﬁer network from scratch.

Attacker.

Table 1 presents the architecture of our attacker network . The ResBlock uses the samestructure as the generator proposed in Miyato et al. (2018). The detailed structure of ResBlock isprovided in Appendix C. Batch normalization (BN) and activations, e.g., ReLU and tanh , are appliedwhen speciﬁed. The tanh function can easily make the output of attacker satisfy the constraints. We provide another attacker architecture with down-sampling modules in the Appendix C. With such an attacker,L2L adversarial training is less stable, but faster.

Attacker architecture: k, c, s, p denote the kernel size, output channels, stride and padding parametersof convolutional layers, respectively.

Conv: [ k = 3 × , c = 64 , s = 1 , p = 1 ], BN+ReLUResBlock: [ k = 3 × , c = 128 , s = 1 , p = 1 ]ResBlock: [ k = 3 × , c = 256 , s = 1 , p = 1 ]ResBlock: [ k = 3 × , c = 128 , s = 1 , p = 1 ]Conv: [ k = 3 × , c = 3 , s = 1 , p = 1 ], tanh White-box and Black-box . We compare diﬀerent methods under both white-box and black-boxsettings. Under the white-box setting, attackers can access all parameters of target models andgenerate adversarial examples based on the models; whereas under the black-box setting, accessingparameters is prohibited. Therefore, we adopt the standard transfer attack method from Liu et al.(2016).

Robust Evaluation . We evaluate the robustness of the networks by PGM and CW attacks withthe maximum perturbation magnitude (cid:15) = 0 . (after rescaling the pixels to [0 , ) over CIFAR-10and CIFAR-100. For PGM attack, we use 20 and 100-iteration PGM with a perturbation step size η = 0 . , and for each sample we initialize the adversary perturbation randomly in the B (0 , − ) .For CW attack, we adopt the implementation from Paszke et al. (2017), and set the maximumnumber of iterations as . For each method, we repeat runs with diﬀerent random initial seedand report the worst result . For CIFAR-10, we also evaluate the robustness of our Grad L2L and2-Step L2L networks using random attacks, for which we uniformly sample perturbations in B (0 , . adding to each test sample. We also evaluate the robustness of our Grad L2L and 2-StepL2L networks under their own attackers. For simplicity, we denote PGM Net as the classiﬁer with PGM training, and Naive L2L, Grad L2L,and 2-Step L2L as the classiﬁers using L2L training with corresponding attackers.

Original PGM.

For CIFAR-10, we directly report the result from Madry et al. (2017) as the baseline;For CIFAR-100, we train a PGM Net as the baseline: To update the classiﬁer’s parameter θ , we usethe stochastic gradient descent (SGD) algorithm with Polyak’s momentum (parameter . , Liu et al.(2018)) and weight decay (parameter × − , Krogh and Hertz (1992)). In addition, we adapt thesetting from Madry et al. (2017) but train the network for epochs with initial learning rate . ,decay schedule [30,60,90], and decay rate . . We use a 10-iteration PGM with the perturbation stepsize . in (5) to generate adversarial samples. PGM+L2L.

We train two networks for 100 epochs. To update classiﬁer’s parameter θ , we use thesame conﬁguration as original PGM training; To update the attacker’s parameter φ , we use Adam Due to the space limit, we leave the results of the black-box setting in Appendix B. More detailed robustness checklist is provided in Appendix D. [0 . , . , Kingma and Ba (2014)) with initial learning rate − (no learningrate decay) and weight decay (parameter × − ) so that it adaptively balances the updates in bothleader and follower optimization problems.Table 2: Results of diﬀerent defense methods under the white-box setting.

Defense Method Attack Dataset Clean Accuracy Robust AccuracyStability Training (Zheng et al., 2016) PGM-20 CIFAR10 94.64% 0.15%PGM Net (Madry et al., 2017) PGM-20 CIFAR10 87.30% 47.04%Naive L2L PGM-20 CIFAR10 94.53% 0.01%Grad L2L PGM-20 CIFAR10 85.84% 51.17%2-Step L2L PGM-20 CIFAR10 85.35% 54.32%Grad L2L PGM-100 CIFAR10 85.84% 47.72%2-Step L2L PGM-100 CIFAR10 85.35% 52.12%Grad L2L CW CIFAR10 85.84% 53.5%2-Step L2L CW CIFAR10 85.35% 57.07%Grad L2L Random CIFAR10 85.84% 82.67%2-Step L2L Random CIFAR10 85.35% 83.10%Grad L2L Grad L2L CIFAR10 85.84% 49.68%2-Step L2L 2-Step L2L CIFAR10 85.35% 52.71%PGM Net PGM-20 CIFAR100 62.68% 23.75%Grad L2L PGM-20 CIFAR100 62.18% 28.67%2-Step L2L PGM-20 CIFAR100 60.95% 31.03%PGM Net PGM-100 CIFAR100 62.68% 22.06%Grad L2L PGM-100 CIFAR100 62.18% 26.69%2-Step L2L PGM-100 CIFAR100 60.95% 29.75%PGM Net CW CIFAR100 62.68% 25.95%Grad L2L CW CIFAR100 62.18% 29.65%2-Step L2L CW CIFAR100 60.95% 32.28%

Experiment Results.

Table 2 shows the results of all PGM training methods over CIFAR-10 andCIFAR-100 under the white-box setting. As can be seen, without gradient information, Naive L2Lis vulnerable to the PGM attack. However, when the attacker utilizes the gradient information,Grad L2L and 2-Step L2L signiﬁcantly outperform the PGM Net over CIFAR-10 and CIFAR-100, witha slight loss for the clean accuracy. From the experiments on CIFAR-10, our Grad L2L and 2-StepL2L are robust to random attacks, where the accuracy is only slightly lower than the clean accuracy.Furthermore, the accuracy of our Grad/2-Step L2L model under the Grad/2-Step L2L attacker iscomparable to the accuracy under PGM attacks, which shows that L2L attackers are able to generatestrong attacks. As can be seen, PGM-100 is stronger than Grad L2L attacker ( . vs. . ),but similar to the 2-Step L2L attacker ( . vs. . ). This means 2-Step L2L attacker is muchstronger than Grad L2L attacker and explains why 2-Step L2L is stronger than Grad L2L and PGMnet.In addition, Table 3 shows the one epoch running time of all methods over CIFAR-10 andCIFAR-100. As can be seen, Grad L2L and 2-Step L2L is much faster than PGM Net. By further11igure 4: Robust accuracy against perturbation magnitude and number of iteration of PGM over CIFAR-100adversarial samples;. (Top) Absolute accuracy; (Bottom) Performance gain over PGM Net. More results areprovided in Appendix D. comparing the accuracy of Grad/2-Step L2L and PGM Net in Table 2, we ﬁnd that our proposedL2L methods enjoy computational eﬃciency. In addition, Figure 4 presents the robust accuracyagainst number of iterations (ﬁxed perturbation magnitude (cid:15) = 0 . ) and perturbation magnitude(ﬁxed number of iterations T = 10 ). As can be seen, 2-Step L2L is much more robust than PGM Net.Table 3: One epoch running time. (Unit: s)

Dataset Plain Net PGM Net Naive L2L Grad L2L 2-Step L2LCIFAR-10 . ± . . ± . . ± . . ± . . ± . CIFAR-100 . ± . . ± . . ± . . ± . . ± . We conduct the experiments of AIT over CIFAR-10 using the code from Haichao Zhang (2019). Original AIT.

We follow the experimental setting in Haichao Zhang (2019), but use a WRN-34-10.To update classiﬁer’s parameter θ , we use the same conﬁguration in original PGM training. Wechoose the perturbation magnitude over label (cid:15) y as . . In addition, we train the whole network for200 epochs with initial learning rate . , decay schedule [60,90], and decay rate . . Moreover, ineach epoch, we ﬁrst use FGSM to yield training samples via (7), and then train the AIT Net overthese adversarial samples. AIT+L2L.

We train two networks for 200 epochs. To update classiﬁer’s parameter θ , we adoptthe conﬁguration of SGD from the original AIT; To update attacker’s parameter φ , we use Adam https://github.com/Adv-Interp/adv_interp [0 . , . ) with initial learning rate as − (no learning rate decay) andweight decay (parameter × − ).Table 4: Results of AIT based defense methods under the white-box setting.

Defense Method Attack Dataset Clean Accuracy Robust AccuracyAIT PGM-20 CIFAR10 90.43% 75.33%Grad L2L PGM-20 CIFAR10 91.65% 80.87%AIT PGM-100 CIFAR10 90.43% 67.84%Grad L2L PGM-100 CIFAR10 91.65% 79.20%AIT CW-20 CIFAR10 90.43% 64.79%Grad L2L CW-20 CIFAR10 91.65% 74.88%AIT CW-100 CIFAR10 90.43% 61.69%Grad L2L CW-100 CIFAR10 91.65% 73.46%

Experiment Results.

Table 4 shows the results of AIT methods over CIFAR-10 under the white-boxsetting. As can be seen, Grad L2L signiﬁcantly improves upon the AIT Net over CIFAR-10 on bothclean accuracy and robust accuracy.

Figure 5 provides an illustrative example of adversarial perturbations generated by FGSM, PGM-20and 2-Step L2L attacker for a cat in CIFAR-10. As can be seen, attacks for these two networks are verydiﬀerent. Moreover, the perturbation generated by the 2-Step L2L attacker is much more smooththan FGSM and PGM. In this example, 2-Step L2L labels all adversarial samples correctly; whereasthe PGM Net is fooled by PGM-20 attack and misclassiﬁes it as a dog .Figure 6 provides an illustrative example of adversarial perturbations generated by PGM, AITand Grad L2L for a dog in CIFAR-10. As can be seen, attacks for these two networks are very diﬀerent:the attacks for the Grad L2L is more abundant in three channels. In this example, Grad L2L labelsall adversarial samples correctly; whereas the AIT is fooled by all attacks and misclassiﬁes it as a horse . Our proposed L2L framework is quite general, and applicable to a broad class of minimax optimiza-tion problems. We present an extension of our proposed L2L framework to generative adversarialimitation learning (GAIL, Ho and Ermon (2016)) and provide numerical experiments for comparingthe original GAIL and GAIL with L2L on two environments: CartPole and Mountain Car (Brockmanet al., 2016). 13 dversarial Sample 5 times Magnitude 30 times Magnitude (a) PGM Net adversarial samples.

Adversarial Sample 5 times Magnitude 30 times Magnitude (b) 2-Step L2L adversarial samples.

Figure 5:

Illustrative adversarial examples of FGSM (Top), PGM-20 (Mid), and 2-Step L2L (Bottom)perturbations for a cat under PGM Net and 2-Step L2L with (cid:15) = 0 . . Adversarial Sample 5 times Magnitude 30 times Magnitude (a) AIT adversarial samples.

Adversarial Sample 5 times Magnitude 30 times Magnitude (b) Grad L2L adversarial samples.

Figure 6:

Illustrative adversarial examples of PGM-20 (Top), AIT (Mid), and Grad L2L (Bottom) perturba-tions for a dog under AIT Net and Grad L2L with (cid:15) = 0 . . GAIL aims to learn a policy from expert’s behavior, by recovering the expert’s cost function andextracting a policy from the recovered cost function, which can also be formulated as a bilevel14ptimization problem: min θ π E s,a ∼ π ( s ; θ π ) [log ( D ( s, a ; θ ∗ D ))] + E (cid:101) s, (cid:101) a ∼ π E [log(1 − D ( (cid:101) s, (cid:101) a ; θ ∗ D ))] − λH ( π ) , subject to θ ∗ D ∈ argmax θ D E s,a ∼ π ( s ; θ π ) [log ( D ( s, a ; θ D ))] + E (cid:101) s, (cid:101) a ∼ π E [log(1 − D ( (cid:101) s, (cid:101) a ; θ D ))] − λH ( π ) , (10)where π ( · ; θ π ) is the trained policy parameterized by θ π , π E denotes the expert policy, D ( · , · ; θ D ) isthe discriminator parameterized by θ D , λH ( π ) denotes a entropy regularizer with tuning parameter λ , ( s, a ) and ( (cid:101) s, (cid:101) a ) denote the state-action for the trained policy and expert policy, respectively. Byoptimizing 10, the discriminator D distinguishes the state-action ( s, a ) generated from the learnedpolicy π with the sampled trajectories ( (cid:101) s, (cid:101) a ) generated from some expert policy π E . In the originalGAIL training, for each iteration, we update the parameter of D , θ D , by stochastic gradient ascendand then update θ π by the trust region policy optimization (TRPO, Schulman et al. (2015)).Similar to the adversarial training with L2L, we apply our L2L framework to GAIL by parame-terizing the inner optimizer as a neural network U (; θ U ) with parameter θ U . Its input contains twoparts: parameter θ D and the gradient of loss function with respect to θ D : g D ( θ D , θ π ) = E s,a ∼ π ( s ; θ π ) [ ∇ θ D log ( D ( s, a ; θ D ))] + E (cid:101) s, (cid:101) a ∼ π E [ ∇ θ D log(1 − D ( (cid:101) s, (cid:101) a ; θ D ))] . In practice, we use a minibatch (several sample trajectories) to estimate g D ( θ D , θ π ) , denoted as (cid:98) g D ( θ D , θ π ) . Speciﬁcally, at the t -th iteration, we ﬁrst calculate (cid:98) g t D = (cid:98) g D ( θ t D , θ tπ ) and then update θ t +1D = U ( θ t D , (cid:98) g t D ; θ t U ) . Next, we update θ U by gradient ascend based on the sample estimate of E s,a ∼ π ( s ; θ tπ ) [ ∇ θ U log ( D ( s, a ; θ t +1D ))] + E (cid:101) s, (cid:101) a ∼ π E [ ∇ θ U log(1 − D ( (cid:101) s, (cid:101) a ; θ t +1D ))] . The detailed algorithm is presented in Algorithm 5.

Algorithm 5

Learning-to-learn-based generative adversarial imitation learning

Input: π E ( (cid:101) s ) : Expert; θ π : Policy; θ D : Discriminator; θ U : Updater. for t ← to N do Sample trajectories ( s, a ∼ π ( a ; θ π )) and expert trajectories ( (cid:101) s, (cid:101) a ∼ π E ( (cid:101) s )) . g t D ← | ( s,a ) | (cid:80) ( s,a ) [ ∇ θ D log ( D ( s, a ; θ t D ))] + | ( (cid:101) s, (cid:101) a ) | (cid:80) ( (cid:101) s, (cid:101) a ) [ ∇ θ D log(1 − D ( (cid:101) s, (cid:101) a ; θ t D ))] // Compute gradient. θ t +1D = U ( θ t D , g t D ; θ t U ) // Update the discriminator parameters. θ t +1U ← argmin θ U | ( s,a ) | (cid:80) ( s,a ) [log ( D ( s, a ; θ t +1D ))]+ | ( (cid:101) s, (cid:101) a ) | (cid:80) ( (cid:101) s, (cid:101) a ) [log(1 − D ( (cid:101) s, (cid:101) a ; θ t +1D ))] // Update θ U of updater. // Update policy parameter θ π by a policy step using the TRPO rule Ho and Ermon (2016). Updater Architecture.

We use a simple 3-layer perceptron with a skip layer as our updater. Thenumber hidden units are ( m → m → m → m ), where m is the dimension of θ D that depends on15he original task. For the ﬁrst and second layers, we use Parametric ReLU (PReLU, He et al. (2015))as the activation function, while the last layer has no activation function. Finally we add the outputto θ D in the original input as the updated parameter for the discriminator network. Hyperparameter Settings.

For all baselines we exactly follows the setting in Ho and Ermon (2016),except that we use a 2-layer discriminator with number of hidden units ( ( s, a ) → → → ) using tanh as the activation function. We use the same neural network architecture for π and the sameoptimizer conﬁguration. The expert trajectories are obtained by an expert trained using TRPO. ForL2L based GAIL, we also use Adam optimizer to update the θ U with the same conﬁguration asupdating θ D in the original GAIL. Numerical Results.

As can be seen in Figure 7, GAIL has a sudden performance drop after trainingfor a long time. We conjecture that this is because the discriminator overﬁts the expert trajectoriesand converges to a bad optimum, which is not generalizable. On the other hand, GAIL with L2L ismuch more stable. It is very important to real applications of GAIL: since the reward in real-worldenvironment is usually unaccessible, we cannot know whether there is a sudden performance dropor not. With L2L, we can stabilize the training and obtain a much more reliable algorithm forreal-world applications.

Mountain Car CartPole GAILL2L GAIL

Figure 7:

Reward vs. iteration of the trained policy using original GAIL and L2L GAIL under two environ-ments: Mountain Car and CartPole.

We discuss several closely related works: • By leveraging the Fenchel duality and feature embedding technique, Dai et al. (2016) converta learning conditional distribution problem to a minimax problem, which is similar to our naiveattacker. Both approaches, however, lack the primal information. In contrast, gradient attackernetwork considers the gradient information of primal variables, and achieves good results. • Goodfellow et al. (2014a) propose the GAN, which is very similar to our L2L framework. BothGAN and L2L contain one generator network and one classiﬁer network, and jointly train thesetwo networks. There are two major diﬀerence between GAN and our framework: (1) GAN aims to16ransform the random noises to the synthetic data which is similar to the training examples, whileours targets on transforming the training examples to the adversarial examples for robustifyingthe classiﬁer; (2) Our generator network does not only take the training examples (analogous tothe random noise in GAN) as the input, but also exploits the gradient information of the objectivefunction, since it essentially represents an optimization algorithm. The training procedure of thesetwo, however, are quite similar. We adopt some tricks from GAN training to our framework tostabilize training process, e.g., in Grad L2L, we use the two-time scale trick (Heusel et al., 2017). • There are some other works simply combining the GAN framework and adversarial trainingtogether. For example, Baluja and Fischer (2017) and Xiao et al. (2018) propose some ad hoc GAN-based methods to robustify neural networks. Speciﬁcally, for generating adversarial examples, theyonly take training examples as the input of the generator, which lacks the information of the outermimnimization problem. Instead, our proposed L2L methods (e.g., Grad L2L, 2-step L2L) connectouter and inner problems by delivering the gradient information of the objective function to thegenerator. This is a very important reason for our performance gain on the benchmark datasets. Asa result, the aforementioned GAN-based methods are only robust to simple attacks, e.g., FGSM, onsimple data sets, e.g., MNIST, but fail for strong attacks, e.g., PGM and CW, on complicated datasets, e.g. CIFAR, where our L2L methods achieve signiﬁcantly better performance.

Training Stability : For improving the training stability, we use both clean image and the corre-sponding gradient as the input of the attacker. Without such gradient information, the attackerseverely suﬀers from training instability, e.g., the Naive Attacker Network. Furthermore, we try an-other architecture with downsampling modules, called “slim attacker” in Appendix C. We observedthat the slim attacker also suﬀers from training instability. We suspect that the downsamplingcauses the loss of information. Thus, we tried to enhance the slim attacker by skip layer connections.In this way, the training is stabilized. However, the robust performance is still worse than theproposed architecture.

Beneﬁts of our L2L approach in adversarial training : (1) Since the neural network has been known to be powerful in function approximation, our attackernetwork g can yield strong adversarial perturbations. Since they are generated by the same attacker,the attacker g essentially learns some common structures across all samples; (2) Overparametrization is conjectured to ease the training of deep neural networks. We believe thatsimilar phenomena happen to our attacker network, and ease the adversarial training.

This paper proposes a L2L framework to improve the adversarial training, which is a bilevel opti-mization problem. Instead of applying the hand-designed algorithms for the follower problem, welearn an optimizer parametrized by a neural network. Our numerical results show that our proposedmethods signiﬁcantly improve the robustness of neural networks and enjoy the computational17ﬃciency.We remark that bilevel problems are notorious for their diﬃculty, and most of existing algorithmsare heuristic and ad hoc, which only works for a speciﬁc small class of problems. Our proposed L2Lframework is well structured and can be generalized to solve more complicated bilevel problems,e.g., GAIL. Taking our results as a start, we expect more principled and stronger follow-up workthat applies L2L to solve the bilevel problems.

References

Andrychowicz, M., Denil, M., Gomez, S., Hoffman, M. W., Pfau, D., Schaul, T., Shillingford, B.and De Freitas, N. (2016). Learning to learn by gradient descent by gradient descent. In

Advancesin Neural Information Processing Systems .Athalye, A., Carlini, N. and Wagner, D. (2018). Obfuscated gradients give a false sense of security:Circumventing defenses to adversarial examples. arXiv preprint arXiv:1802.00420 .Baluja, S. and Fischer, I. (2017). Adversarial transformation networks: Learning to generateadversarial examples. arXiv preprint arXiv:1703.09387 .Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J. and Zaremba, W.(2016). Openai gym. arXiv preprint arXiv:1606.01540 .Carlini, N., Athalye, A., Papernot, N., Brendel, W., Rauber, J., Tsipras, D., Goodfellow, I. andMadry, A. (2019). On evaluating adversarial robustness. arXiv preprint arXiv:1902.06705 .Carlini, N. and Wagner, D. (2017). Towards evaluating the robustness of neural networks. In . IEEE.Colson, B., Marcotte, P. and Savard, G. (2007). An overview of bilevel optimization.

Annals ofoperations research arXiv preprint arXiv:1607.04579 .Dong, Y., Liao, F., Pang, T., Su, H., Zhu, J., Hu, X. and Li, J. (2018). Boosting adversarial attacks withmomentum. In

The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) .Finn, C., Abbeel, P. and Levine, S. (2017). Model-agnostic meta-learning for fast adaptation of deepnetworks. arXiv preprint arXiv:1703.03400 .Gao, R. and Kleywegt, A. J. (2016). Distributionally robust stochastic optimization with wassersteindistance. arXiv preprint arXiv:1604.02199 . 18irshick, R., Donahue, J., Darrell, T. and Malik, J. (2014). Rich feature hierarchies for accurateobject detection and semantic segmentation. In

Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition .Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A.and Bengio, Y. (2014a). Generative adversarial nets. In

Advances in Neural Information ProcesingSystems .Goodfellow, I. J., Shlens, J. and Szegedy, C. (2014b). Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 .Haichao Zhang, W. X. (2019). Adversarial interpolation training: A simple approach for improvingmodel robustness.URL https://openreview.net/pdf?id=Syejj0NYvr

He, K., Zhang, X., Ren, S. and Sun, J. (2015). Delving deep into rectiﬁers: Surpassing human-levelperformance on imagenet classiﬁcation. In

Proceedings of the IEEE international conference oncomputer vision .He, K., Zhang, X., Ren, S. and Sun, J. (2016). Deep residual learning for image recognition. In

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition .Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B. and Hochreiter, S. (2017). GANs trainedby a two time-scale update ReLU converge to a local nash equilibrium. In

Advances in NeuralInformation Processing Systems .Ho, J. and Ermon, S. (2016). Generative adversarial imitation learning.

CoRR abs/1606.03476 .URL http://arxiv.org/abs/1606.03476

Hochreiter, S., Younger, A. S. and Conwell, P. R. (2001). Learning to learn using gradient descent.In

International Conference on Artiﬁcial Neural Networks . Springer.Kingma, D. and Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprintarXiv:1412.6980 .Krizhevsky, A. and Hinton, G. (2009). Learning multiple layers of features from tiny images. Tech.rep., Citeseer.Krogh, A. and Hertz, J. A. (1992). A simple weight decay can improve generalization. In

Advancesin neural information processing systems .Kurakin, A., Goodfellow, I. and Bengio, S. (2016). Adversarial machine learning at scale. arXivpreprint arXiv:1611.01236 . 19i, Y., Li, L., Wang, L., Zhang, T. and Gong, B. (2019). Nattack: Learning the distributions ofadversarial examples for an improved black-box attack on deep neural networks. arXiv preprintarXiv:1905.00441 .Liu, T., Chen, Z., Zhou, E. and Zhao, T. (2018). Toward deeper understanding of nonconvex stochasticoptimization with momentum using diﬀusion approximations. arXiv preprint arXiv:1802.05155 .Liu, W., Zhang, Y.-M., Li, X., Yu, Z., Dai, B., Zhao, T. and Song, L. (2017). Deep hypersphericallearning. In

Advances in Neural Information Processing Systems .Liu, Y., Chen, X., Liu, C. and Song, D. (2016). Delving into transferable adversarial examples andblack-box attacks. arXiv preprint arXiv:1611.02770 .Madry, A., Makelov, A., Schmidt, L., Tsipras, D. and Vladu, A. (2017). Towards deep learningmodels resistant to adversarial attacks. arXiv preprint arXiv:1706.06083 .Miyato, T., Kataoka, T., Koyama, M. and Yoshida, Y. (2018). Spectral normalization for generativeadversarial networks. In

International Conference on Learning Representations .URL https://openreview.net/forum?id=B1QRgziT-

Nguyen, A., Yosinski, J. and Clune, J. (2015). Deep neural networks are easily fooled: High conﬁdencepredictions for unrecognizable images. In

Proceedings of the IEEE Conference on Computer Visionand Pattern Recognition .Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga,L. and Lerer, A. (2017). Automatic diﬀerentiation in pytorch .Rahimian, H. and Mehrotra, S. (2019). Distributionally robust optimization: A review. arXivpreprint arXiv:1908.05659 .Samangouei, P., Kabkab, M. and Chellappa, R. (2018). Defense-gan: Protecting classiﬁers againstadversarial attacks using generative models. arXiv preprint arXiv:1805.06605 .Schmidhuber, J. (1987).

Evolutionary principles in self-referential learning, or on learning how to learn: themeta-meta-... hook . Ph.D. thesis, Technische Universit¨at M ¨unchen.Schmidhuber, J. (1992). Learning to control fast-weight memories: An alternative to dynamicrecurrent networks.

Neural Computation Neural Networks, 1993.,IEEE International Conference on . IEEE.Schulman, J., Levine, S., Abbeel, P., Jordan, M. and Moritz, P. (2015). Trust region policy optimization.In

International conference on machine learning . 20zegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I. and Fergus, R. (2013).Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199 .Tabacof, P. and Valle, E. (2016). Exploring the space of adversarial images. In . IEEE.Taigman, Y., Yang, M., Ranzato, M. and Wolf, L. (2014). Deepface: Closing the gap to human-levelperformance in face veriﬁcation. In

Proceedings of the IEEE Conference on Computer Vision andPattern Recognition .Xiao, C., Li, B., Zhu, J.-Y., He, W., Liu, M. and Song, D. (2018). Generating adversarial exampleswith adversarial networks. arXiv preprint arXiv:1801.02610 .Younger, A. S., Hochreiter, S. and Conwell, P. R. (2001). Meta-learning with backpropagation. In

Neural Networks, 2001. Proceedings. IJCNN’01. International Joint Conference on , vol. 3. IEEE.Zagoruyko, S. and Komodakis, N. (2016). Wide residual networks. arXiv preprint arXiv:1605.07146 .Zhang, H., Cisse, M., Dauphin, Y. N. and Lopez-Paz, D. (2017). mixup: Beyond empirical riskminimization. arXiv preprint arXiv:1710.09412 .Zhang, H., Yu, Y., Jiao, J., Xing, E. P., Ghaoui, L. E. and Jordan, M. I. (2019). Theoretically principledtrade-oﬀ between robustness and accuracy. arXiv preprint arXiv:1901.08573 .Zheng, S., Song, Y., Leung, T. and Goodfellow, I. (2016). Improving the robustness of deep neuralnetworks via stability training. In

Proceedings of the IEEE Conference on Computer Vision and PatternRecognition . 21 upplementary Materials

A Limiting Cycle

Limiting cycle is a well-known issue for bilevel machine learning problems [4,5]. The reason behindlimiting cycle is that diﬀerent from minimization problems, a bilevel optimization problem is morecomplicated and could be highly nonconvex-nonconcave, where the inner problem can not besolved exactly. Here we provide a simple bilevel problem example, which is convex-concave, but theiterations still cannot converge due to the inexact solutions. Speciﬁcally, we consider the followingoptimization problem: min x max y f ( x, y ) = xy. Then at the t -th iteration, the update direction will be ( − y t , x t ) . If we start from (1 , with a stepsize of . , this update will result in a limiting circle: x + y = 1 and never reach the stable equilibrium (0 , as shown in Figure 8.Figure 8: An example of the limiting circle: arrows denote the update directions B Black-box Attack

Under the black-box setting, we ﬁrst train a surrogate model with the same architecture of the targetmodel but a diﬀerent random seed, and then attackers generate adversarial examples to attack thetarget model by querying gradients from the surrogate model.The black-box attack highly relies on the transferability, which is the property that the adversarialexamples of one model are likely to fool others. However, the transferred attack is very unstable,and often has a large variation in its eﬀectiveness. Therefore, results of the black-box setting might22ot be reliable and eﬀective. Thus we only present one result here to demonstrate the robustness ofdiﬀerent models.Table 5:

Results of the black-box setting over CIFAR-10. We evaluate L2L methods with slim attackernetworks.

Surrogate Plain Net FGSM Net PGM NetFGSM PGM10 FGSM PGM10 FGSM PGM10Plain Net 40.03 5.60 74.42 75.25 67.37 65.92FGSM Net 79.20 85.02

Experiments under the black-box setting over CIFAR-100. Note that here we only evaluate L2Lmethods using the slim attacker network.

Surrogate Plain Net FGSM Net PGM NetFGSM PGM10 FGSM PGM10 FGSM PGM10Plain Net 21.04 9.04 50.57 54.06 40.06 41.30FGSM Net 42.87 50.73 Slim Network

Table 7 presents another architecture that we used in the L2L. In this network, the second convo-lutional layer uses downsampling, while the second last deconvolutional layer uses upsampling.Due to the downsampling, this network is computationally cheap and thus it is computationallyfast. For example the running time of per epoch for L2L with slim attacker is 480; whereas L2Lwith the original architecture is 620. However, it loses some information of input and is less stablethan the original architecture (Table 1). Inspired by residual learning in He et al. (2016), we addressthe above issues by using a skip layer connection to ease the training of this network. Speciﬁcally,the last layer takes the concatenation of A f θ ( x , y ) and the output of the second last layer as input.Figure 9 presents the architecture of ResBlocks. PReLU is a special type of Leaky ReLU with alearnable slope parameter. Table 7: Slim Attacker Network Architecture.

Conv: [ k = 3 × , c = 128 , s = 1 , p = 1 ], BN+ReLUResBlocks: [channel = 256]ResBlocks: [channel = 128], BNDeConv: [ k = 4 × , c = 16 , s = 2 , p = 1 ], BN+ReLUConv: [ k = 3 × , c = 3 , s = 1 , p = 1 ], tanh Input BN AddPReLU Conv BN PReLU Conv

Figure 9:

An illustration example for the architecture of ResBlocks.

Table 8 shows the results of L2L with the architecture shown in Table 7.Table 8:

Results of L2L with slim attacker under white-box setting over CIFAR.

CIFAR-10 CIFAR-100

Clean FGSM PGM20 CW Clean FGSM PGM20 CWNaive L2L 94.41 28.44 0.01 0.00 75.27 8.47 0.05 0.00Grad L2L 85.31 57.44 53.02 42.72 60.60 26.58 27.37 23.142-Step L2L 75.36 60.19 46.12 40.82 60.23 25.92 20.23 22.7024

Robustness Evaluation Checklist

Recently, there are many works on robustness defense that have been proven ineﬀective (Athalyeet al., 2018; Carlini et al., 2019). Our work follows the most reliable and widely used robust modelapproach âĂŤ adversarial training, which ﬁnds a set parameters to make the model robust. We donot make any modiﬁcation to ﬁnal classiﬁer model. Unlike previous works (e.g., Defense-GAN,Samangouei et al. (2018)), our model does not take the attacker as a part of the ﬁnal model and doesnot use shattered/obfuscated/masked gradient as a defense mechanism. We also demonstrate thatthe evaluation of the robustness of our proposed L2L method is trustworthy by verifying all itemslisted in Carlini et al. (2019).

D.1 Shattered/Obfuscated/Masked Gradient

In this section we verify that our proposed L2L method does not fall into the pitfall of shat-tered/obfuscated/masked gradient, which have proven ineﬀective. To see this, we checked everyitem recommended in Section 3.1 of Athalye et al. (2018):• One-step attacks perform better than iterative attacks: Figure 4 shows that the PGM attack isstronger with larger number of iterations.• Black-box attacks are better than white-box attacks: Appendix B shows that the black-boxtransfer attack is much weaker than white white-box attacks.• Unbounded attacks do not reach success: We evaluate the model robustness againstattack with extremely large perturbation to show that unbounded attacks do reach success. Speciﬁcally, we use the PGM-10 attack with various perturbation magnitudes (cid:15) ∈ [0 , and stepsize (cid:15) . Figure 10 shows that the PGM attack eventually reach success as theperturbation magnitude increases. = 10 )0204060 Figure 10:

Robust accuracy against perturbation magnitudes of PGM over CIFAR-100.

25 Random sampling ﬁnds adversarial examples: In Table 2, we show that random search is notbetter than gradient-based method and is rather weak against our model.• Increasing distortion bound does not increase success: Figure 4 shows that the PGM attackbecomes stronger as the perturbation magnitude increases.