Regional Image Perturbation Reduces L p Norms of Adversarial Examples While Maintaining Model-to-model Transferability
Utku Ozbulak, Jonathan Peck, Wesley De Neve, Bart Goossens, Yvan Saeys, Arnout Van Messem
RRegional Image Perturbation Reduces L p Norms of Adversarial ExamplesWhile Maintaining Model-to-model Transferability
Utku Ozbulak * 1 2
Jonathan Peck * 3 4
Wesley De Neve
Bart Goossens Yvan Saeys
Arnout Van Messem
Abstract
Regional adversarial attacks often rely on compli-cated methods for generating adversarial pertur-bations, making it hard to compare their efficacyagainst well-known attacks. In this study, weshow that effective regional perturbations can begenerated without resorting to complex methods.We develop a very simple regional adversarialperturbation attack method using cross-entropysign, one of the most commonly used losses inadversarial machine learning. Our experiments onImageNet with multiple models reveal that, on av-erage, of the generated adversarial examplesmaintain model-to-model transferability when theperturbation is applied to local image regions. De-pending on the selected region, these localizedadversarial examples require significantly less L p norm distortion (for p ∈ { , , ∞} ) compared totheir non-local counterparts. These localized at-tacks therefore have the potential to underminedefenses that claim robustness under the afore-mentioned norms.
1. Introduction
Recent advancements in the field of machine learning (ML)— more specifically, in deep learning (DL) — have sub-stantially increased the adoption rate of automated systemsin everyday life (Krizhevsky et al., 2012; He et al., 2016;Xie et al., 2017). However, since their inception, thesesystems have been criticized for their lack of interpretabil- * Equal contribution Department of Electronics and Infor-mation Systems, Ghent University, Ghent, Belgium Center forBiotech Data Science, Ghent University Global Campus, In-cheon, Republic of Korea Department of Applied Mathemat-ics, Computer Science and Statistics, Ghent University, Ghent,Belgium Data Mining and Modeling for Biomedicine, VIB In-flammation Research Center, Ghent, Belgium Department ofTelecommunications and Information Processing, Ghent Univer-sity - imec, Ghent, Belgium. Correspondence to: Utku Ozbulak < [email protected] > .Presented at the ICML 2020 Workshop on Uncertainty and Ro-bustness in Deep Learning. Copyright 2020 by the author(s). No localization Original image Adversarial image L = 0 . L = 6 . L ∞ = 0 . Perturbationlocalizationmasks(Center square)Adversarialexamples L = 0 . L = 2 . L ∞ = 0 . L = 0 . L = 2 . L ∞ = 0 . L = 0 . L = 6 . L ∞ = 0 . Perturbationlocalizationmasks(Random location)Adversarialexamples L = 0 . L = 9 . L ∞ = 0 . L = 0 . L = 9 . L ∞ = 0 . L = 0 . L = 8 . L ∞ = 0 . Figure 1. (Top) An input image and its adversarial counterpartcreated with IFGS. (Center and bottom) Perturbation localizationgrids illustrated with black-gray images and adversarial examplesgenerated by IFGS when the perturbation is only applied to the greyareas in the localization grids. L p norms of the perturbation areprovided under each image. All of the adversarial examples weregenerated using AlexNet and successfully transfer to ResNet-50. ity : it is often difficult or impossible to know precisely why an ML model produces a specific response for a giveninput, yet such information is highly relevant in many set-tings (Ghorbani et al., 2017; Kindermans et al., 2017). One a r X i v : . [ c s . L G ] J u l egional Image Perturbation Reduces L p Norms of Adversarial Examples While Maintaining Model-to-model Transferability manifestation of this shortcoming of current ML theoryto understand DL models is the phenomenon of adversar-ial examples (Szegedy et al., 2013), which has recentlyreceived much attention in the research community. Adver-sarial examples are data points specifically crafted by anadversary in order to force machine learning models intomaking mistakes. Often, these artificial examples are vi-sually indistinguishable from natural data points, makingit almost impossible for humans to detect them and call-ing into question the generalization ability of deep neuralnetworks (DNN) (Schmidt et al., 2018; Ilyas et al., 2019).Formally, adversarial examples are usually defined as fol-lows (Szegedy et al., 2013; Madry et al., 2017). Given anML model f and an input X , an adversarial example ˜ X satisfies (1) (cid:107) X − ˜ X (cid:107) p ≤ ε for some chosen L p norm andperturbation budget ε > and (2) f ( X ) (cid:54) = f ( ˜ X ) . In otherwords, the perturbed input ˜ X must be “close” to the originalinput X as measured by an L p norm and the classifier f must output different labels for X and ˜ X . However, forsufficiently small values of ε , the two inputs are indistin-guishable and should belong to the same class. Hence, theexistence of adversarial examples for very small perturba-tion budgets indicates a failure of DL models to accuratelycapture the data manifold. Interestingly, depending on theattack used, adversarial examples can be highly transfer-able : an adversarial sample ˜ X that fools a certain classifier f can also fool completely different classifiers trained forthe same task (Papernot et al., 2016; Cheng et al., 2019).This so-called transferability , i.e., the degree to which an ad-versarial sample can fool other models, is a popular metricfor assessing the effectiveness of a particular attack.Through the generations of research in computer vision itwas established that certain regions of images are more im-portant for the identification of an object of interest thanothers (Moravec, 1981; Schmid & Mohr, 1997; Lowe, 2004;Sun et al., 2014; Springenberg et al., 2014; Selvaraju et al.,2016). As such, research on localized adversarial attacksalso shows that adversarial perturbation applied to these important regions may change the prediction faster and withless L p perturbation than attacks that apply the perturbationto the entire image (Su et al., 2017; Karmon et al., 2018;Xu et al., 2018; Zajac et al., 2019). However, analyses toprevent adversarial examples often do not evaluate robust-ness against such regional attacks. Adversarial defenses areoften studied exclusively against well-understood attackssuch as FGS (Goodfellow et al., 2014), JSMA (Papernotet al., 2015), IFGS (Kurakin et al., 2016), Carlini & Wag-ner’s Attack (Carlini & Wagner, 2016), PGD (Madry et al.,2017), and BPDA (Athalye et al., 2018), where these attacksapply their perturbations to the entire image based on themagnitude of the loss gradient for each pixel and accord-ing to the L p norm constraints they set. We believe thislack of evaluation against regional attacks is because (1) regional attacks are often studied in permissive white-boxsettings which do not represent real-world scenarios and (2)the proposed attacks usually come with a completely newand complicated way of generating adversarial examples,thus making it not straightforward to apply these attacksto different datasets, especially not locally, as opposed towell-understood attacks.In this work, we show that so-called “global” adversarialattacks can be easily modified to become localized attacks.As such, different from previous research efforts on local-ized perturbation, our study does not propose a novel attack.Instead, we introduce a general method for localizing theperturbations generated by existing non-localized attacks.We achieve this by multiplying the original perturbationsby a simple binary mask (as shown in Figure 1), restrict-ing the perturbation to specific image regions. We analyzeboth the transferability and the L p norm properties of thecrafted adversarial examples, finding that the localized ex-amples are about as effective as the examples generated bythe original attacks (in terms of transferability), and withthe localized versions often requiring significantly less L p distortion. The implementation of the proposed method ispublicly available. The finding that we can significantly reduce the required L p distortion while maintaining similar levels of effectivenesspotentially undermines many existing defenses — certifiedor not — since these usually guarantee robustness againstspecific L p perturbation budgets (Wong & Kolter, 2017;Croce et al., 2018; Andriushchenko & Hein, 2019; Ghiasiet al., 2020). Reducing the required distortion attacks belowsuch thresholds could render these defenses ineffective.
2. Framework
Data
Adversarial examples are mainly studied onMNIST (LeCun et al., 1998), CIFAR (Krizhevsky & Hinton,2009), or ImageNet (Russakovsky et al., 2015). However,as the field of adversarial machine learning evolved, due toshortcomings in terms of color channels and image sizes,the MNIST and CIFAR datasets are not deemed to be suit-able for studies that represent real-world scenarios whereadversarial examples pose a threat (Carlini & Wagner, 2017).Following this observation, we use images taken from thetest set of the ImageNet dataset in order to generate adver-sarial examples.
Models
Although convolutional architectures were alreadyused in the work of LeCun et al. (1998), it was the suc-cess of AlexNet in 2012 that popularized DNN architec-tures (Krizhevsky et al., 2012). Recent research in the fieldof adversarial robustness also revealed AlexNet to be one github.com/utkuozbulak/regional-adversarial-perturbation egional Image Perturbation Reduces L p Norms of Adversarial Examples While Maintaining Model-to-model Transferability of the more robust architectures (Su et al., 2018). Follow-ing the success of AlexNet, VGG (Simonyan & Zisserman,2014) architectures were proposed with smaller convolu-tional kernel sizes. Thanks to their simple architecture,VGG architectures are still popular today in many computervision approaches. In order to overcome problems withvanishing gradients in deep architectures, He et al. (2016)proposed ResNet architectures, introducing the usage ofresidual layers. These residual architectures were later ex-panded upon and are currently some of the most frequentlyused architectures for solving a variety of problems in thefield of deep learning (Xie et al., 2017; Hara et al., 2018).Given the history of the aforementioned architectures in thefield of adversarial machine learning, as well as in otherdeep learning areas, we opted for the use of AlexNet, VGG-16, and ResNet-50 in our experiments.
3. Experimental Setup
Generating adversarial examples
Carlini & Wagner(2017) demonstrated the fragility of adversarial examplesgenerated by single-step attacks and argued that iterative at-tacks should be used for evaluating novel defenses. Iterativeattacks calculate and add perturbation to the input iterativelyaccording to the rule X n +1 = X n + P n , (1)where X n and P n represent the input and the perturbationgenerated at the n th iteration, respectively. In this study wegenerate the perturbation as follows: P n = α sign (cid:0) ∇ x J ( g ( θ, X n ) c ) (cid:1) , (2)where ∇ x J ( g ( θ, X n ) c ) represents the gradient with respectto X obtained with the cross-entropy loss ( J ) when tar-geting the class c . We use α = 0 . as the perturbationmultiplier which approximately corresponds to changingthe pixel values of images by / at each iteration andperform this attack for iterations. Typically, adversarialattacks such as FGS, IFGS, and PGD enforce a constrainton the magnitude (cid:107) X − ˜ X (cid:107) p of the perturbation. However,in order to make a valid comparison between adversarial ex-amples in terms of L , L , and L ∞ norms, we only enforcea discretization constraint, thus ensuring that the producedadversarial examples can be represented as valid images(i.e., the pixel values of ˜ X lie within the range [0 , , as canbe expected from regular images). Localizing adversarial perturbation
In a previous re-search effort, we successfully used the Hadamard productto select target pixels for generating adversarial examples inthe context of semantic segmentation (Ozbulak et al., 2019).In order to localize the perturbation to selected regions, weemploy a similar approach in this research effort, making use of X n +1 = X n + P n (cid:12) L , (3)where L is a localization mask , i.e., a binary tensor ofthe same shape as the input. In this tensor, regions wherethe perturbation needs to be applied are set to while theremainder is set to . Perturbation regions
In this study, we evaluate the useof three different perturbation regions, each with three dif-ferent settings. These regions are (1) randomly selectedpixels, (2) center square pixels, and (3) outer frame pixels.For (1), we randomly select { , , } of all pixels,where these percentages approximately correspond to a cen-ter square with a side length of { , , } pixels and anouter frame with a width of { , , } pixels, respectively.Thus, the number of selected pixels for all regions in eachof the three different settings is virtually the same. Visualexamples of the localization masks are provided in Figure 3in Appendix A. Calculating L p distances We calculate L p distances ( p =0 , , ∞ ) between genuine images and their adversarial coun-terparts, similar to calculations in Papernot et al. (2015) andCarlini & Wagner (2017). A detailed description of thesecalculations for our settings is also provided in Appendix A.
4. Experiments
We first analyze model-to-model transferability (also calledblack-box transferability) for adversarial examples with lo-calized perturbation. For each model-to-model pair, wegenerate , adversarial examples that transfer from thesource model to the target model. Using the same initialimages as these adversarial examples do, we now apply per-turbation to nine different regions (i.e., three regions, witheach region coming with three different settings). In Fig-ure 2, we present the percentage of adversarial examples thattransfer from model to model when localized perturbationis applied, as opposed to performing the adversarial attackwithout any localization constraints. We see that a largeportion of adversarial examples maintains model-to-modeltransferability when perturbation is applied to local regions.For the adversarial examples that maintain model-to-modeltransferability, Table 1 provides exhaustive details on themean and standard deviation of the L and L ∞ propertiesof the produced adversarial examples. L norms are omittedfrom this table because adversarial examples with regionalperturbation almost always have reduced mean L norms(Figure 5 and Table 2 in Appendix B). Adversarial pertur-bation applied to the center square of an image reduces themean L norm while it increases the mean L ∞ norm. How-ever, with additional experiments, we discover that of the individual adversarial examples with localized per-turbation have lower L ∞ distances than their non-locally egional Image Perturbation Reduces L p Norms of Adversarial Examples While Maintaining Model-to-model Transferability
52% 59% 100%60% 100% 56%100% 73% 66% A l e x N e t VGG - R e s N e t - AlexNet VGG-16 ResNet-50 S ou r ce M od e l Target Model(a) of pixels selected
67% 65% 100%70% 100% 63%100% 76% 67% A l e x N e t VGG - R e s N e t - AlexNet VGG-16 ResNet-50 S ou r ce M od e l Target Model(b) of pixels selected
78% 77% 100%83% 100% 75%100% 78% 75% A l e x N e t VGG - R e s N e t - AlexNet VGG-16 ResNet-50 S ou r ce M od e l Target Model(c) of pixels selected
Figure 2.
Percentage of adversarial examples with localized perturbation that transfer from source model (generated from) to target model(tested against) when 17%, 28%, and 45% of pixels are selected, respectively (combining all 3 localization approaches).
Table 1.
Mean (standard deviation) L { , ∞} distances calculated between genuine images and their adversarial counterparts, with theadversarial counterparts created by localization of perturbation (see the first column). Adversarial examples are created from the sourcemodels listed in the first row and transfer to the target models listed in the second row. L o ca li za ti on Source: AlexNet VGG-16 ResNet-50Target: VGG-16 ResNet-50 AlexNet ResNet-50 AlexNet VGG-16Norm: L L ∞ L L ∞ L L ∞ L L ∞ L L ∞ L L ∞ No Localization .
35 0 .
07 6 .
39 0 .
05 6 .
91 0 .
07 3 .
62 0 .
02 6 .
79 0 .
07 3 .
76 0 . .
37) (0 .
08) (4 .
50) (0 .
06) (4 .
17) (0 .
05) (3 .
16) (0 .
04) (4 .
31) (0 .
06) (2 .
20) (0 . C e n t e r px .
55 0 .
15 5 .
33 0 .
11 4 .
01 0 .
10 3 .
41 0 .
09 3 .
54 0 .
09 2 .
79 0 . .
36) (0 .
14) (3 .
52) (0 .
10) (2 .
64) (0 .
09) (2 .
77) (0 .
10) (2 .
54) (0 .
09) (2 .
23) (0 . px .
47 0 .
11 6 .
30 0 .
10 5 .
01 0 .
10 3 .
68 0 .
06 4 .
50 0 .
09 3 .
70 0 . .
48) (0 .
11) (4 .
45) (0 .
11) (2 .
99) (0 .
08) (3 .
05) (0 .
07) (2 .
93) (0 .
08) (3 .
09) (0 . px .
80 0 .
10 6 .
46 0 .
09 6 .
71 0 .
11 3 .
92 0 .
05 6 .
64 0 .
11 4 .
65 0 . .
48) (0 .
10) (4 .
33) (0 .
09) (3 .
79) (0 .
08) (3 .
07) (0 .
06) (3 .
90) (0 .
08) (3 .
54) (0 . F r a m e px .
86 0 .
18 10 . .
20 6 .
07 0 .
16 4 .
64 0 .
12 4 .
77 0 .
13 4 .
32 0 . .
37) (0 .
20) (7 .
94) (0 .
21) (3 .
74) (0 .
14) (3 .
61) (0 .
15) (2 .
88) (0 .
12) (2 .
90) (0 . px .
92 0 .
13 8 .
63 0 .
13 6 .
71 0 .
15 4 .
50 0 .
08 5 .
68 0 .
12 4 .
85 0 . .
60) (0 .
13) (6 .
52) (0 .
14) (4 .
04) (0 .
12) (2 .
96) (0 .
08) (3 .
25) (0 .
09) (3 .
44) (0 . px .
44 0 .
12 7 .
23 0 .
09 7 .
79 0 .
14 5 .
44 0 .
08 7 .
02 0 .
12 5 .
78 0 . .
72) (0 .
13) (4 .
94) (0 .
11) (4 .
17) (0 .
09) (3 .
89) (0 .
09) (3 .
90) (0 .
09) (3 .
79) (0 . R a ndo m
17% 8 .
11 0 .
15 7 .
41 0 .
13 5 .
20 0 .
13 4 .
43 0 .
10 4 .
59 0 .
10 3 .
81 0 . .
63) (0 .
16) (4 .
63) (0 .
14) (3 .
14) (0 .
11) (3 .
30) (0 .
12) (2 .
65) (0 .
09) (2 .
92) (0 . .
82 0 .
10 7 .
51 0 .
11 5 .
97 0 .
12 4 .
29 0 .
07 5 .
50 0 .
11 4 .
35 0 . .
99) (0 .
12) (4 .
54) (0 .
11) (3 .
55) (0 .
10) (2 .
91) (0 .
07) (3 .
14) (0 .
09) (3 .
02) (0 . .
42 0 .
10 6 .
76 0 .
09 7 .
21 0 .
12 4 .
61 0 .
06 7 .
39 0 .
12 5 .
04 0 . .
80) (0 .
11) (4 .
20) (0 .
09) (3 .
98) (0 .
09) (3 .
41) (0 .
07) (4 .
03) (0 .
08) (3 .
53) (0 . perturbed counterparts, showing that localized perturbationnevertheless reduces the L ∞ norm for a large number ofcases. The detailed breakdown of this analysis can be foundin Appendix B.Another important observation we make is the difference inperturbation for different regions. As can be seen, not all re-gions are equally important when it comes to manipulatingthe prediction of a DNN with adversarial perturbation. Weclearly observe adversarial perturbation applied to the cen-ter square being more influential than perturbation in otherregions. Surprisingly, applying perturbation to randomlyselected pixels requires less distortion than applying it tothe frame of an image, further highlighting the differencesbetween important and unimportant regions. Allowing per-turbation in a more condensed area versus a more expandedarea provides different results for the center square regionand the other two regions. Increasing the number of selectedpixels in the center square region also increases the L normof the perturbation, while doing so for frame and randompixels reduces the aforementioned norm.
5. Conclusion and Future Directions
We have proposed a simple and general method for localiz-ing perturbations generated by existing adversarial attacksto specific image regions. Our method is experimentallyconfirmed to be effective, maintaining high black-box trans-ferability at distortion levels that are significantly lower thanthe distortion levels required by existing attacks. The reduc-tion in the amount of perturbation achieved by our methodraises the concern that existing adversarial defenses may beundermined, since these are usually designed to be effectiveonly against non-local attacks requiring larger perturbationbudgets.Our main priority for future work is (1) to investigate towhat extent our localization method can fool state-of-the-artadversarial defenses as well as (2) to more precisely identifyregions of importance where this localized perturbation canbe made more effective, linking the observations made inthis study to the interpretability of DNNs. egional Image Perturbation Reduces L p Norms of Adversarial Examples While Maintaining Model-to-model Transferability
References
Andriushchenko, M. and Hein, M. Provably Robust BoostedDecision Stumps and Trees Against Adversarial Attacks.In
Advances in Neural Information Processing Systems ,pp. 12997–13008, 2019.Athalye, A., Carlini, N., and Wagner, D. Obfuscated Gradi-ents Give A False Sense Of Security: Circumventing De-fenses To Adversarial Examples.
CoRR , abs/1802.00420,2018.Carlini, N. and Wagner, D. A. Towards Evaluating TheRobustness of Neural Networks.
CoRR , abs/1608.04644,2016.Carlini, N. and Wagner, D. A. Adversarial Examples AreNot Easily Detected: Bypassing Ten Detection Methods.
CoRR , abs/1705.07263, 2017.Cheng, S., Dong, Y., Pang, T., Su, H., and Zhu, J. Improv-ing Black-box Adversarial Attacks with a Transfer-basedPrior. In
Advances in Neural Information ProcessingSystems , pp. 10932–10942, 2019.Croce, F., Andriushchenko, M., and Hein, M. Provable Ro-bustness of ReLU Networks via Maximization of LinearRegions.
CoRR , abs/1810.07481, 2018.Ghiasi, A., Shafahi, A., and Goldstein, T. Breaking CertifiedDefenses: Semantic Adversarial Examples with SpoofedRobustness Certificates.
CoRR , abs/2003.08937, 2020.Ghorbani, A., Abid, A., and Zou, J. Interpretation Of NeuralNetworks Is Fragile.
CoRR , abs/1710.10547, 2017.Goodfellow, I., Shlens, J., and Szegedy, C. Explaining andHarnessing Adversarial Examples.
CoRR , abs/1412.6572,2014.Hara, K., Kataoka, H., and Satoh, Y. Can Spatiotemporal 3dCNNs Retrace the History of 2d CNNs and ImageNet? In
Proceedings of the IEEE conference on Computer Visionand Pattern Recognition , pp. 6546–6555, 2018.He, K., Zhang, X., Ren, S., and Sun, J. Deep Residual Learn-ing For Image Recognition. In
Proceedings of the IEEEconference on computer vision and pattern recognition ,pp. 770–778, 2016.Ilyas, A., Santurkar, S., Tsipras, D., Engstrom, L., Tran,B., and Madry, A. Adversarial Examples Are Not Bugs,They Are Features. In
Advances in Neural InformationProcessing Systems , pp. 125–136, 2019.Karmon, D., Zoran, D., and Goldberg, Y. Lavan: Localizedand Visible Adversarial Noise.
CoRR , abs/1801.02608,2018. Kindermans, P.-J., Hooker, S., Adebayo, J., Alber, M.,Sch¨utt, K. T., D¨ahne, S., Erhan, D., and Kim, B. The (Un)Reliability Of Saliency Methods.
CoRR , abs/1711.00867,2017.Krizhevsky, A. and Hinton, G. Learning Multiple Layers OfFeatures From Tiny Images. Technical report, Citeseer,2009.Krizhevsky, A., Sutskever, I., and Hinton, G. E. Imagenetclassification with deep convolutional neural networks.In
Advances in neural information processing systems ,pp. 1097–1105, 2012.Kurakin, A., Goodfellow, I., and Bengio, S. AdversarialExamples In The Physical World.
CoRR , abs/1607.02533,2016.LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. Gradient-Based Learning Applied To Document Recognition.
Pro-ceedings of the IEEE , 86(11):2278–2324, 1998.Lowe, D. G. Distinctive Image Features from Scale-invariant Keypoints.
International journal of computervision , 60(2):91–110, 2004.Madry, A., Makelov, A., Schmidt, L., Tsipras, D., andVladu, A. Towards Deep Learning Models ResistantTo Adversarial Attacks.
CoRR , abs/1706.06083, 2017.Moravec, H. P. Rover visual obstacle avoidance. In
Pro-ceedings of the 7th International Joint Conference onArtificial Intelligence - Volume 2 , IJCAI81, pp. 785790,San Francisco, CA, USA, 1981. Morgan Kaufmann Pub-lishers Inc.Ozbulak, U., Van Messem, A., and De Neve, W. ImpactOf Adversarial Examples On Deep Learning Models ForBiomedical Image Segmentation. In
International Con-ference on Medical Image Computing and Computer-Assisted Intervention , pp. 300–308. Springer, 2019.Papernot, N., McDaniel, P. D., Jha, S., Fredrikson, M., Celik,Z. B., and Swami, A. The Limitations Of Deep LearningIn Adversarial Settings.
CoRR , abs/1511.07528, 2015.Papernot, N., McDaniel, P. D., and Goodfellow, I. Trans-ferability In Machine Learning: From Phenomena ToBlack-Box Attacks using Adversarial Samples.
CoRR ,abs/1605.07277, 2016.Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S.,Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein,M., Berg, A. C., and Fei-Fei, L. ImageNet Large ScaleVisual Recognition Challenge.
International Journal ofComputer Vision , 115(3):211–252, 2015. egional Image Perturbation Reduces L p Norms of Adversarial Examples While Maintaining Model-to-model Transferability
Schmid, C. and Mohr, R. Local Grayvalue Invariants forImage Retrieval.
IEEE transactions on pattern analysisand machine intelligence , 19(5):530–535, 1997.Schmidt, L., Santurkar, S., Tsipras, D., Talwar, K., andMadry, A. Adversarially Robust Generalization RequiresMore Data. In
Advances in Neural Information Process-ing Systems , pp. 5014–5026, 2018.Selvaraju, R. R., Das, A., Vedantam, R., Cogswell, M.,Parikh, D., and Batra, D. Grad-Cam: Why Did You SayThat? Visual Explanations From Deep Networks ViaGradient-Based Localization.
CVPR 2016 , 2016.Simonyan, K. and Zisserman, A. Very Deep ConvolutionalNetworks For Large-Scale Image Recognition.
CoRR ,abs/1409.1556, 2014.Springenberg, J. T., Dosovitskiy, A., Brox, T., and Ried-miller, M. Striving For Simplicity: The All ConvolutionalNet.
CoRR , abs/1412.6806, 2014.Su, D., Zhang, H., Chen, H., Yi, J., Chen, P.-Y., and Gao, Y.Is Robustness the Cost of Accuracy?–A ComprehensiveStudy on the Robustness of 18 Deep Image ClassificationModels. In
Proceedings of the European Conference onComputer Vision (ECCV) , pp. 631–648, 2018.Su, J., Vargas, D. V., and Sakurai, K. One Pixel Attack ForFooling Deep Neural Networks.
CoRR , abs/1710.08864,2017.Sun, Y., Wang, X., and Tang, X. Deep Learning Face Repre-sentation from Predicting 10,000 Classes. In
Proceedingsof the IEEE conference on computer vision and patternrecognition , pp. 1891–1898, 2014.Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan,D., Goodfellow, I., and Fergus, R. Intriguing PropertiesOf Neural Networks.
CoRR , abs/1312.6199, 2013.Wong, E. and Kolter, J. Z. Provable Defenses AgainstAdversarial Examples via the Convex Outer AdversarialPolytope.
CoRR , abs/1711.00851, 2017.Xie, S., Girshick, R., Doll´ar, P., Tu, Z., and He, K. Ag-gregated Residual Transformations for Deep Neural Net-works. In
Proceedings of the IEEE conference on com-puter vision and pattern recognition , pp. 1492–1500,2017.Xu, K., Liu, S., Zhao, P., Chen, P.-Y., Zhang, H., Fan,Q., Erdogmus, D., Wang, Y., and Lin, X. Structuredadversarial attack: Towards general implementation andbetter interpretability.
CoRR , abs/1808.01664, 2018.Zajac, M., Zołna, K., Rostamzadeh, N., and Pinheiro, P. O.Adversarial framing for image and video classification.In
Proceedings of the AAAI Conference on Artificial In-telligence , volume 33, pp. 10077–10078, 2019. egional Image Perturbation Reduces L p Norms of Adversarial Examples While Maintaining Model-to-model Transferability
A. Experimental Details
Center squarelocalization .
1% 28 .
6% 44 . Randomlocalization .
0% 28 .
0% 45 . Framelocalization .
0% 28 .
0% 45 . Figure 3.
The localization masks used in this study. The givenpercentages correspond to the number of selected pixels comparedto the total number of available pixels.
Perturbation regions
In Figure 3, we provide visualiza-tions of the selected localization masks, where the givenpercentages correspond to the proportion of selected pixelsout of all available pixels.Note that for the perturbation localized on the image frame,unlike Zajac et al. (2019), we do not expand the size of theimage. We simply exercise the perturbation on the selectedoutermost pixels.
Calculating L p distances Between initial images of size × and their adversarial counterparts, we calculate L , L , and L ∞ distances as follows: L ( X , ˜ X ) = (cid:80) i (cid:80) j { X i,j − ˜ X i,j (cid:54) =0 } × , (4) L ( X , ˜ X ) = || X − ˜ X || , (5) L ∞ ( X , ˜ X ) = max( | X − ˜ X | ) , (6)where X and ˜ X represent an initial image and its adver-sarial counterpart, respectively. In this framework, an L ∞ norm of means that the added perturbation changed a pixelfrom black to white (i.e., to ), or vice versa. An L norm of means all pixels are modified by the adversarialperturbation. No localization Center Frame Random L = 0 . L = 3 . L ∞ = 0 . L = 0 . L = 0 . L ∞ = 0 . L = 0 . L = 3 . L ∞ = 0 . L = 0 . L = 1 . L ∞ = 0 . L = 0 . L = 10 . L ∞ = 0 . L = 0 . L = 3 . L ∞ = 0 . L = 0 . L = 25 . L ∞ = 0 . L = 0 . L = 7 . L ∞ = 0 . L = 0 . L = 8 . L ∞ = 0 . L = 0 . L = 2 . L ∞ = 0 . L = 0 . L = 5 . L ∞ = 0 . L = 0 . L = 0 . L ∞ = 0 . L = 0 . L = 11 . L ∞ = 0 . L = 0 . L = 6 . L ∞ = 0 . L = 0 . L = 4 . L ∞ = 0 . L = 0 . L = 4 . L ∞ = 0 . L = 0 . L = 7 . L ∞ = 0 . L = 0 . L = 4 . L ∞ = 0 . L = 0 . L = 6 . L ∞ = 0 . L = 0 . L = 4 . L ∞ = 0 . L = 0 . L = 11 . L ∞ = 0 . L = 0 . L = 3 . L ∞ = 0 . L = 0 . L = 18 . L ∞ = 0 . L = 0 . L = 18 . L ∞ = 0 . Figure 4. L , L , and L ∞ distances between the initial imagesand their adversarial counterparts for the adversarial examples thatoriginate from the same initial image but that were perturbed usingdifferent localization methods. All of the adversarial examplessuccessfully transfer to models they are not originated from. egional Image Perturbation Reduces L p Norms of Adversarial Examples While Maintaining Model-to-model Transferability
99% 99% — — —
99% 99% A l e x N e t VGG - R e s N e t - AlexNet VGG-16 ResNet-50 S ou r ce M od e l Target Model(a) of pixels selected
99% 99% — — —
99% 99% A l e x N e t VGG - R e s N e t - AlexNet VGG-16 ResNet-50 S ou r ce M od e l Target Model(b) of pixels selected
99% 99% — — —
99% 99% A l e x N e t VGG - R e s N e t - AlexNet VGG-16 ResNet-50 S ou r ce M od e l Target Model(c) of pixels selected
82% 85% — — —
77% 81% A l e x N e t VGG - R e s N e t - AlexNet VGG-16 ResNet-50 S ou r ce M od e l Target Model(a) of pixels selected
80% 79% — — —
80% 76% A l e x N e t VGG - R e s N e t - AlexNet VGG-16 ResNet-50 S ou r ce M od e l Target Model(b) of pixels selected
77% 76% — — —
80% 74% A l e x N e t VGG - R e s N e t - AlexNet VGG-16 ResNet-50 S ou r ce M od e l Target Model(c) of pixels selected
32% 40% — — —
52% 43% A l e x N e t VGG - R e s N e t - AlexNet VGG-16 ResNet-50 S ou r ce M od e l Target Model(a) of pixels selected
28% 49% — — —
49% 47% A l e x N e t VGG - R e s N e t - AlexNet VGG-16 ResNet-50 S ou r ce M od e l Target Model(b) of pixels selected
34% 45% — — —
62% 55% A l e x N e t VGG - R e s N e t - AlexNet VGG-16 ResNet-50 S ou r ce M od e l Target Model(c) of pixels selected
Figure 5.
The percentage of adversarial examples with regional perturbation that have less perturbation in terms of L norm (top), L norm (middle), and L ∞ norm (bottom) compared to their counterparts with “global” perturbation. Percentages are calculated based on theadversarial examples with localized perturbation that transfer from source model to target model. B. Additional Experimental Results
In Figure 4, we provide a number of qualitative examples,showing the L , L , and L ∞ norms of adversarial perturba-tion generated using various localization settings. All of theexamples presented in Figure 4 are generated using AlexNetand transfer to ResNet-50.For the experiments discussed in the main paper, Fig-ure 5 provides the percentage of adversarial examples thathave lower L p norm than their counterparts generated with“global” perturbation. Our experiments show that regionalperturbation almost always leads to lower L norms com- pared to non-regional perturbation, whereas in the case of L and L ∞ norms, this depends on the initial image-targetclass combination.For the sake of completeness, In Table 2 we provide theexhaustive details of L norms of adversarial perturbationsfor the experiment described in the main paper. Since theperturbation region is what is controlled in this experiment,the resulting perturbations have much less L deviationcompared to L or L ∞ . egional Image Perturbation Reduces L p Norms of Adversarial Examples While Maintaining Model-to-model Transferability
Table 2.
Mean (standard deviation) L distances calculated between genuine images and their adversarial counterparts, with the adversarialcounterparts created by localization of perturbation (see the first column). Adversarial examples are created from the source models listedin the first row and transfer to the target models listed in the second row. L o ca li za ti on Source: AlexNet VGG-16 ResNet-50Target: VGG-16 ResNet-50 AlexNet ResNet-50 AlexNet VGG-16Norm: L L L No Localization .
93 0 .
94 0 .
90 0 .
84 0 .
89 0 . .
08) (0 .
07) (0 .
07) (0 .
11) (0 .
10) (0 . C e n t e r px .
15 0 .
15 0 .
15 0 .
15 0 .
15 0 . .
01) (0 .
01) (0 .
01) (0 .
01) (0 .
01) (0 . px .
27 0 .
28 0 .
27 0 .
26 0 .
27 0 . .
01) (0 .
01) (0 .
01) (0 .
03) (0 .
01) (0 . px .
43 0 .
43 0 .
43 0 .
40 0 .
43 0 . .
02) (0 .
02) (0 .
02) (0 .
04) (0 .
02) (0 . F r a m e px .
16 0 .
16 0 .
16 0 .
16 0 .
16 0 . .
01) (0 .
01) (0 .
01) (0 .
01) (0 .
01) (0 . px .
27 0 .
27 0 .
27 0 .
26 0 .
27 0 . .
01) (0 .
01) (0 .
01) (0 .
02) (0 .
01) (0 . px .
43 0 .
43 0 .
43 0 .
42 0 .
43 0 . .
02) (0 .
02) (0 .
02) (0 .
04) (0 .
04) (0 . R a ndo m
17% 0 .
16 0 .
16 0 .
16 0 .
16 0 .
16 0 . .
01) (0 .
01) (0 .
01) (0 .
01) (0 .
01) (0 . .
28 0 .
28 0 .
28 0 .
27 0 .
28 0 . .
01) (0 .
01) (0 .
01) (0 .
02) (0 .
01) (0 . .
44 0 .
43 0 .
43 0 .
41 0 .
43 0 . .
80) (0 .
11) (4 .
20) (0 .
09) (3 .
98) (0 ..