[PDF] Adversarial Example Decomposition

Abstract

Research has shown that widely used deep neural networks are vulnerable to carefully crafted adversarial perturbations. Moreover, these adversarial perturbations often transfer across models. We hypothesize that adversarial weakness is composed of three sources of bias: architecture, dataset, and random initialization. We show that one can decompose adversarial examples into an architecture-dependent component, data-dependent component, and noise-dependent component and that these components behave intuitively. For example, noise-dependent components transfer poorly to all other models, while architecture-dependent components transfer better to retrained models with the same architecture. In addition, we demonstrate that these components can be recombined to improve transferability without sacrificing efficacy on the original model.

Full PDF

AAdversarial Example Decomposition

Horace He Aaron Lou * 1

Qingxuan Jiang * 1

Isay Katsman * 1

Serge Belongie Ser-Nam Lim Abstract

Research has shown that widely used deep neu-ral networks are vulnerable to carefully craftedadversarial perturbations. Moreover, these ad-versarial perturbations often transfer across mod-els. We hypothesize that adversarial weaknessis composed of three sources of bias: architec-ture, dataset, and random initialization. We showthat one can decompose adversarial examplesinto an architecture-dependent component, data-dependent component, and noise-dependent com-ponent and that these components behave intu-itively. For example, noise-dependent compo-nents transfer poorly to all other models, whilearchitecture-dependent components transfer bet-ter to retrained models with the same architec-ture. In addition, we demonstrate that these com-ponents can be recombined to improve transfer-ability without sacriﬁcing efﬁcacy on the originalmodel.

1. Introduction

Due to the recent successes of neural networks on a widevariety of tasks, they are now being widely applied in thereal-world. However, despite their major successes, recentworks have shown that in the presence of adversarially per-turbed input, they fail catastrophically (Szegedy et al., 2013;Goodfellow et al., 2014). Moreover, Szegedy et al. (2013);Goodfellow et al. (2014) showed that inputs adversariallygenerated for one model often cause other models to mis-classify images as well, a phenomenon commonly called transferability .Our understanding of the causes of transferability is fairlylimited. Tram`er et al. (2017) analyzes local similarity of de-cision boundaries to deﬁne a local decision boundary metricthat determines how transferable adversarial examples be- * Equal contribution Department of Computer Science, CornellUniversity, Ithaca, NY, USA Facebook AI, New York, New York,USA. Correspondence to: Horace He < [email protected] > . Proceedings of the th International Conference on MachineLearning , Stockholm, Sweden, PMLR 80, 2018. Copyright 2018by the author(s). tween two models are likely to be. However, many questionsare still open. The recent work Wu et al. (2018) hypothe-sized that adversarial perturbations could be decomposedinto initialization-speciﬁc and data-dependent components.It is also hypothesized that the data-dependent componentis primarily what contributes to transfer. However, Wu et al.(2018) provides neither theoretical nor empirical evidenceto justify this hypothesis.Our work aims to examine this hypothesis in greater detail.We ﬁrst augment the previous hypothesis to provide de-composition into three parts: architecture-dependent, data-dependent, and noise-dependent components. Given thisframework, our contributions are as follows: • We propose a method for decomposing adversarialperturbations into noise-dependent and noise-reducedcomponents. • We also present a method to further decompose thenoise-reduced component into architecture-dependentand data-dependent components. • Extensive experiments are conducted on CIFAR-10(Krizhevsky, 2009) using various architectures to showthe above two decompositions have the desired proper-ties. Results from an ablation study are given to showthe signiﬁcance of the nontrivial choices made in ourmethodology.

2. Motivation and Approach

Motivated by the reviewers’ comments on Wu et al. (2018),we seek to provide further evidence that an adversarialexample can be decomposed into model-dependent anddata-dependent portions. First, we augment our hypoth-esis to claim that an adversarial perturbation can be de-composed into architecture-dependent, data-dependent, andnoise-dependent components. We note that it is clear thatthese are the only things that could contribute in some wayto the adversarial example. An intuition behind why noise-dependent components exist and would not transfer despiteworking on the original dataset is shown in Figure 2.Not drawn explicitly in the ﬁgure is the architecture-dependent component. As neural networks induce biases in a r X i v : . [ s t a t . M L ] J un dversarial Example Decomposition Figure 1.

Noise Vector Decomposition . ∆ x noise , ∆ x data , ∆ x arch are as deﬁned in Section 2.1. Note the orthogonalityof ∆ x noise and ∆ x arch + ∆ x data ; though this is only anassumption, it is justiﬁed to be reasonable experimentally inthe ablation study of Section 3.3. Figure 2.

Varying Decision Boundaries . In the above ﬁgure, ∆ x is the adversarial perturbation, and ∆ x noise , ∆ x nr are asdeﬁned in Section 2.1. the decision boundary, and speciﬁc network architecturesinduce speciﬁc biases, we would expect that an adversarialexample could exploit these biases across all models withthe same architecture. We denote A = {A , A , . . . , A k } to be the set of modelarchitectures. Let M i = {M iα } to be a set of fully trainedmodels of architecture A i initialized with random noise.The superscript will be omitted when architecture is clear.We deﬁne an attack A ( x, y, M ij , L , ∆ x ) = ∆ x , where x is an image, y its corresponding label, M ij is a neuralnetwork model as deﬁned above, L is a loss function, ∆ x the initial perturbation of x , and ∆ x a perturbation of x suchthat L ( M ij ( x + ∆ x ) , y ) is maximal.For ﬁxed architecture A i , model M ij , and attack A , wedenote ∆ x noise , ∆ x arch , ∆ x data to be the three com-ponents of ∆ x introduced in previous sections. Let ∆ x noise reduced = ∆ x arch + ∆ x data ; we will use the shorthand ∆ x nr .Let P x ( x ) denotes the projection of vector x onto vector x . Let (cid:98) x = x || x || be the unit vector with same direction as x . ∆ x noise and ∆ x nr DecompositionDescription:

We ﬁx our architecture A and have {M , . . . , M n } as our set of trained models. Set L tobe the cross-entropy loss and let ∆ x = A ( x, y, M , L , ) be the generated adversarial perturbation for M . Proposition: ∆ x can be decomposed into ∆ x noise +∆ x nr such that the attack ∆ x noise is effective on M but transferspoorly to M , . . . , M n , while ∆ x nr transfers well on allmodels. The equations for computing ∆ x nr and ∆ x noise are givenin Equation 1 (see Appendix C for justiﬁcation). The tech-nique is illustrated in Figure 1. ∆ x nr ≈ A ( x, y, n − n (cid:88) j =2 M j , L , )∆ x noise ≈ ∆ x − P ∆ x nr (∆ x ) (1) ∆ x arch and ∆ x data DecompositionDescription:

We reuse notation from the above section,except that we now consider a set of different architectures A = {A , A , . . . A k } Proposition: ∆ x nr can be composed into ∆ x arch +∆ x data such that the attack ∆ x arch is effective on A but transfers poorly to A , · · · , A k , while ∆ x data transferswell on all models.The equations for computing ∆ x arch and ∆ x data are givenin Equation 2, in which we set ∆ x nr to be the noise re-duced perturbation generated on A (see Appendix C forjustiﬁcation). We approximate the expectation for ∆ x data by averaging across architectures. ∆ x data ≈ E A i ∈A  A ( x, y, n − n (cid:88) j =2 M ij , L , ∆ x nr )  ∆ x arch ≈ ∆ x nr − P ∆ x data (∆ x nr ) (2)

3. Results

We empirically verify the approaches given in the motiva-tion above and show that the isolated noise and architecture-dependent perturbations show the desired properties. Un-less stated otherwise, all perturbations are generated on dversarial Example Decomposition

Table 1. ∆ x noise decomposition (ResNet18) ∆ M orig {M avg } {M test } ∆ x ∆ x nr ∆ x noise Table 2. ∆ x noise decomposition (DenseNet121) ∆ M orig {M avg } {M test } ∆ x ∆ x nr ∆ x noise Table 2.

All numbers reported are fooling ratios. Observe that ∆ x noise exhibits exceptionally low transferability. CIFAR-10 (Krizhevsky, 2009) (original images rescaled to [ − , ) using iFGSM (Kurakin et al., 2016) with 10 itera-tions, distance metric L ∞ , and (cid:15) = 0 . . All experimentsare run on the ﬁrst 2000 CIFAR-10 test images. In addition,all models are trained for only 10 epochs due to compu-tational constraints. All percentages reported are foolingratios (Moosavi-Dezfooli et al., 2017). For results with othersettings, check Appendix A. ∆ x noise and ∆ x nr Decomposition

We start off with a set of 10 retrained ResNet18 (He et al.,2016) models {M i } . We attack the ﬁrst ResNet18 model M ( = M orig ) to get a perturbation ∆ x for a given x .We then follow the process outlined in Equation 1 to ob-tain ∆ x nr from the other 9 retrained ResNet18 models {M i> } ( = {M avg } ). We then test on an untouched setof 5 retrained ResNet18 models {M test } . We also do thesame process for DenseNet121 (Huang et al., 2017) insteadof ResNet18 and report their respective results in Tables 1and 2.We note that ∆ x noise achieves a far lower transfer rate thaneither ∆ x nr or ∆ x while still maintaining relatively higherror rate on the original model, providing evidence for thesuccess of this decomposition. To the best of our knowledge,this is the ﬁrst methodology that is able to construct adversar-ial examples with especially low transferability. Althoughthis is of low practical use, this is theoretically interest-ing. We note that although we attempt to generate ∆ x nr by multi-fooling across 9 retrained models, reducing noisein high dimensions is difﬁcult, so we are unable to achievea perfect decomposition of ∆ x noise . Ablation studies inAppendix B suggest that we may be able to achieve a betterdecomposition with a larger set of retrained models.3.1.1. R ECOMBINING COMPONENTS

As the components ∆ (cid:98) x noise and ∆ (cid:98) x nr are linearly indepen-dent unit vectors, and by deﬁnition, ∆ x is in the span ofthese vectors, we can ﬁnd unique scalars a and b such that a · ∆ (cid:98) x noise + b · ∆ (cid:98) x nr = ∆ x . Experimentally, we ﬁndthat under our setting, a ≈ . and b ≈ . . We notethat for our original perturbation, this is perhaps an undueamount of focus paid to the noise-speciﬁc perturbation. We can now try setting a and b to different ratios, which cor-respond to how much we wish to emphasize attacking theoriginal model vs. transferability. As we are now able toset an arbitrarily high a and b , allowing us to saturate theepsilon constraints, we sign maximize (ie: sign ( x ) · (cid:15) ), asmotivated in Goodfellow et al. (2014)) to level the playingﬁeld. The results in Table 3 show the results of performingthese experiments on ResNet18. We ﬁnd that we are ableto generate perturbations that perform equivalently with ∆ x on M orig , while performing substantially better whentransferring to {M avg } and M test . Table 3.

Linear Combinations of ∆ (cid:98) x noise and ∆ (cid:98) x nr b : a M orig {M avg } M test ∆ x nr ∆ x Table 3.

All numbers reported are fooling ratios. Observe that asyou increase the ratio b : a we obtain better transferability withlowered effectiveness on M orig . Also note that we are able toconstruct perturbations that are strictly superior to either ∆ x or ∆ x nr . ∆ x arch and ∆ x data Decomposition

To evaluate decomposition into architecture and data-speciﬁc components, we consider the four architecturesResNet18 (He et al., 2016), GoogLeNet (Szegedy et al.,2015), DenseNet121 (Huang et al., 2017), and SENet18(Hu et al., 2017). Results are given in Table 4. In eachexperiment we ﬁrst ﬁx a source architecture A i and gener-ate ∆ x nr by attacking 4 retrained copies of A i , denoted as {M source } . We then generate ∆ x data by attacking fourcopies of each A j (cid:54) = i for twelve models total. We thentest on another 4 retrained copies of A i called {M (cid:48) source } as well as {M (cid:48) other } , consisting of four copies of eachof the other three architectures {A j (cid:54) = i } . We see that forall four models, ∆ x arch obtains signiﬁcantly higher er- dversarial Example Decomposition Table 4. ∆ x arch decomposition source ∆ {M source } {M other } {M (cid:48) source } {M (cid:48) other } ∆ x nr ∆ x data ∆ x arch ∆ x nr ∆ x data ∆ x arch ∆ x nr ∆ x data ∆ x arch ∆ x nr ∆ x data ∆ x arch Table 4.

All numbers reported are fooling ratios. Note that for all architectures, the adversarial decomposition holds. Namely, ∆ x arch ismore transferable to its speciﬁc architecture than to others, whereas ∆ x data is equally transferable across architectures. ror rate on {M (cid:48) source } than on {M (cid:48) other } . In addition,the relative error between {M (cid:48) source } and {M (cid:48) other } for ∆ x arch are close to the relative error between {M source } and {M other } for ∆ x nr when averaged across models, sup-porting the success of our decomposition. We assume that ∆ x noise , ∆ x arch , and ∆ x data terms are orthogonal. We note that if these vectorshad no relation to each other, then due to the properties ofhigh dimensional space, they are approximately orthogonalwith very high probability.We vary orthogonality by modifying the method in Section2.2 to generate ∆ x noise with ∆ x − αP ∆ x nr (∆ x ) . When α = 1 , we recover the original algorithm, and when α = 0 , ∆ x noise = ∆ x . Experimentally varying the orthogonalityof ∆ x noise and ∆ x nr produces the results in Table 5; notethat we achieve the greatest difference in efﬁcacy betweenthe original model and transferred models when they arenear-orthogonal, suggesting that the assumption we made isreasonable.However, it is not true that orthogonal components achievethe best isolation (given by the fact that the peak differenceseems to be at α = 1 . ). This suggests that our currentmethod of decomposition may simply be an approximationfor the true components, and that a more nuanced methodmay be necessary for better isolation. Number of Models

We ﬁnd that the higher the number ofmodels we use to approximate ∆ x nr , the more successfullywe are able to isolate ∆ x noise . Check Appendix B for fullresults. Table 5.

Varying α α M orig {M avg } Difference ∆ x ∆ x nr Table 5.

All percentages reported are fooling ratios. Note that the α = 1 . setting is what produces maximal difference, which isslightly different from the assumed orthogonality ( α = 1 . ).

4. Conclusion

We demonstrate that it is possible to decompose adversar-ial perturbations into noise-dependent and data-dependentcomponents, a hypothesis reviewers thought was interestingbut unsupported in (Wu et al., 2018). We go even furtherby decomposing an adversarial perturbation into model re-lated, data related, and noise related perturbations. A majorcontribution here is a new method of analyzing adversar-ial examples; this creates many potential future directionsfor research. One interesting direction would be extendingthese decompositions to universal perturbations (Moosavi-Dezfooli et al., 2017; Poursaeed et al., 2017) and thus re-moving the dependence on individual data points. Anotheravenue to explore is analyzing various attacks and defensesand how they interplay with these various components. dversarial Example Decomposition

References

Goodfellow, I. J., Shlens, J., and Szegedy, C. Explaining andharnessing adversarial examples.

CoRR , abs/1412.6572,2014.He, K., Zhang, X., Ren, S., and Sun, J. Deep residuallearning for image recognition. , pp.770–778, 2016.Hu, J., Shen, L., and Sun, G. Squeeze-and-excitation net-works.

CoRR , abs/1709.01507, 2017.Huang, G., Liu, Z., van der Maaten, L., and Weinberger,K. Q. Densely connected convolutional networks. , pp. 2261–2269, 2017.Krizhevsky, A. Learning multiple layers of features fromtiny images. 2009.Kurakin, A., Goodfellow, I. J., and Bengio, S. Adversarialexamples in the physical world.

CoRR , abs/1607.02533,2016.Moosavi-Dezfooli, S.-M., Fawzi, A., Fawzi, O., andFrossard, P. Universal adversarial perturbations. , pp. 86–94, 2017.Poursaeed, O., Katsman, I., Gao, B., and Belongie, S. J. Gen-erative adversarial perturbations.

CoRR , abs/1712.02328,2017.Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan,D., Goodfellow, I. J., and Fergus, R. Intriguing propertiesof neural networks.

CoRR , abs/1312.6199, 2013.Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S. E.,Anguelov, D., Erhan, D., Vanhoucke, V., and Rabinovich,A. Going deeper with convolutions. , pp. 1–9, 2015.Tram`er, F., Kurakin, A., Papernot, N., Boneh, D., and Mc-Daniel, P. D. Ensemble adversarial training: Attacks anddefenses.

CoRR , abs/1705.07204, 2017.Wu, L., Zhu, Z., Tai, C., and E, W. Enhancing the trans-ferability of adversarial examples with noise reducedgradient, 2018. URL https://openreview.net/forum?id=ryvxcPeAb . dversarial Example Decomposition Appendix

A. Different attack settings

To show that our decomposition is effective across a varietyof attack settings, we perform the experiment of Section3.1 with three different iFGSM settings corresponding to (cid:15) = 0 . , . , . . Results are shown in Table 6. Table 6.

Varying (cid:15) (cid:15) ∆ {M orig } {M avg } {M test } ∆ x ∆ x nr ∆ x noise ∆ x ∆ x nr ∆ x noise ∆ x ∆ x nr ∆ x noise B. Varying number of models/iterations

We investigate the effectiveness of the Section 3.1 decompo-sition as we vary hyper-parameters. Results for increasingiFGSM iterations in Table 7 and results for increasing theresults for increasing the number of models are give in Table8.

Table 7.

Varying number of iterations used for iFGSM ∆ {M orig } {M avg } ∆ x ∆ x nr ∆ x noise ∆ x ∆ x nr ∆ x noise ∆ x ∆ x nr ∆ x noise C. Justiﬁcation of Equations

Justiﬁcation of Equations in 3.1

Recall that the equations are given by

Table 8.

Varying number of models used to approximate ∆ x nr ∆ {M orig } {M avg } {M test } ∆ x ∆ x nr ∆ x noise ∆ x ∆ x nr ∆ x noise ∆ x ∆ x nr ∆ x noise ∆ x nr ≈ A ( x, y, n − n (cid:88) j =2 M j , L , x noise ≈ ∆ x − P ∆ x nr (∆ x ) We assume that the expected value of our noise term ∆ x noise is over all random noise. This is motivated be-cause the random noise i at initialization is a Gaussiandistribution centered at , and it is reasonable to assume thatthe model distribution and the noise distribution follows asimilar pattern.Letting ∆ x j = A ( x, y, M j , L , over all random initializa-tion i , we claim that E j [∆ x j ] = ∆ x arch + ∆ x data . Since ∆ x arch and ∆ x data are noise independent, which meansthat ∆ x j = ∆ x jnoise + ∆ x arch + ∆ x data where ∆ x jnoise is the noise component corresponding withthe noise of model M j . Therefore, it follows that E j [∆ x j ] = ∆ x arch + ∆ x data + E j [∆ x jnoise ]= ∆ x arch + ∆ x data = ∆ x nr By the law of large numbers, it follows that lim n →∞ n n (cid:88) j =1 ∆ x j = ∆ x nr . Therefore, we note that,for sufﬁciently large n , it follows that n n (cid:88) j =1 ∆ x j ≈ ∆ x nr We see that, since the cross entropy loss L is additive andthe attack A that we examine are ﬁrst order differentiationmethods, we have dversarial Example Decomposition L ( 1 n − n (cid:88) j =2 M j ( x ) , y ) = 1 n − n (cid:88) j =2 L ( M j ( x ) , y )= ⇒ A ( x, y, n − n (cid:88) j =2 M j , L , n − n (cid:88) j =2 A ( x, y, M j , L , ≈ ∆ x nr To prove the other claim, we have already shown throughempirical results and an intuition that ∆ x noise and ∆ x nr are linearly independent that ∆ x noise and ∆ x nr are veryclose to orthogonal and compose ∆ x . Therefore, it followsthat we can take the use the projection of ∆ x nr implies that ∆ x noise + P ∆ x nr (∆ x ) ≈ ∆ x = ⇒ ∆ x noise ≈ ∆ x − P ∆ x nr (∆ x ) up to a scaling constant. Justiﬁcation of Equations in 3.2

Recall that the equations are, given ∆ nr generated on A , ∆ x data ≈ E A i ∈A  A ( x, y, n − n (cid:88) j =2 M ij , L , ∆ x nr )  ∆ x arch ≈ ∆ x nr − P ∆ x data (∆ x nr ) We make two core assumptions: • The value of ∆ E A [ x arch ] = 0 . This is a reasonableassumption since our generated architectures A shouldproduce roughly symmetric error vectors x arch . • A ( x, y, n − (cid:80) nj =2 M , L , ∆ x (cid:48) ) is equivalent A ( x, y, M , L , in the sense that the formerproduces a noised reduce gradient closer to ∆ x (cid:48) . Thisis reasonable because the space of there are manyadversarial perturbations (different directions) andchanging our start location won’t cripple our searchspace. Furthermore, we use this to generate a ∆ nr close to ∆ x (cid:48) .We claim that E A [∆ x nr ] = ∆ x data where we take ∆ x nr over architecture A . To see this, we note that E A [∆ x nr ] = E A [∆ x arch ] + E A [∆ x data ] = E A [∆ x data ] and so again we can approximate it with lim n →∞ n n (cid:88) i =1 ∆ x inr = ∆ x data where ∆ x inr is the ∆ x nr component generated for model A i . For sufﬁciently large n , it follows that n n (cid:88) i =1 ∆ x inr = ∆ x data Therefore we have ∆ x data = E A i ∈A [∆ x inr ] = E A i ∈A E j [( x, y, M ij , L , ≈ E A i ∈A  A ( x, y, n − n (cid:88) j =2 M ij , L ,  and by our assumption this is roughly equivalent to E A i ∈A  A ( x, y, n − n (cid:88) j =2 M ij , L , ∆ x nr )  as desired. To prove the other claim, we use an analogousargument to the one above as we have shown that ∆ x arch and ∆ x data are orthogonal and applying the same projectiontechnique yields ∆ x data + P ∆ x data (∆ x nr ) ≈ ∆ x nr = ⇒ ∆ x arch ≈ ∆ x − P ∆ x data (∆ x nr ))