[PDF] A study of the effect of JPG compression on adversarial images

Abstract

Neural network image classifiers are known to be vulnerable to adversarial images, i.e., natural images which have been modified by an adversarial perturbation specifically designed to be imperceptible to humans yet fool the classifier. Not only can adversarial images be generated easily, but these images will often be adversarial for networks trained on disjoint subsets of data or with different architectures. Adversarial images represent a potential security risk as well as a serious machine learning challenge---it is clear that vulnerable neural networks perceive images very differently from humans. Noting that virtually every image classification data set is composed of JPG images, we evaluate the effect of JPG compression on the classification of adversarial images. For Fast-Gradient-Sign perturbations of small magnitude, we found that JPG compression often reverses the drop in classification accuracy to a large extent, but not always. As the magnitude of the perturbations increases, JPG recompression alone is insufficient to reverse the effect.

Full PDF

AA study of the effect of JPG compression onadversarial images

Gintare Karolina Dziugaite

Department of EngineeringUniversity of Cambridge

Zoubin Ghahramani

Department of EngineeringUniversity of Cambridge

Daniel M. Roy

Department of Statistical SciencesUniversity of Toronto

Abstract

Neural network image classiﬁers are known to be vulnerable to adversarial images,i.e., natural images which have been modiﬁed by an adversarial perturbation specif-ically designed to be imperceptible to humans yet fool the classiﬁer. Not only canadversarial images be generated easily, but these images will often be adversarialfor networks trained on disjoint subsets of data or with different architectures.Adversarial images represent a potential security risk as well as a serious machinelearning challenge—it is clear that vulnerable neural networks perceive imagesvery differently from humans. Noting that virtually every image classiﬁcation dataset is composed of JPG images, we evaluate the effect of JPG compression on theclassiﬁcation of adversarial images. For Fast-Gradient-Sign perturbations of smallmagnitude, we found that JPG compression often reverses the drop in classiﬁcationaccuracy to a large extent, but not always. As the magnitude of the perturbationsincreases, JPG recompression alone is insufﬁcient to reverse the effect.

Neural networks are now widely used across machine learning, including image classiﬁcation, wherethey achieve state-of-the-art accuracy on standard benchmarks [Rus+15; He+15]. However, neuralnetworks have recently been shown to be vulnerable to adversarial examples [Sze+13], i.e., inputsto the network that have undergone imperceptible perturbations speciﬁcally optimized to cause theneural network to strongly misclassify.Most neural networks trained for image classiﬁcation are trained on images that have undergoneJPG compression. Adversarial perturbations are unlikely to leave an image in the space of JPGimages, and so this paper explores the idea that JPG (re)compression could remove some aspects ofthe adversarial perturbation. Our experiments show that JPG compression often succeeds in reversingthe adversarial nature of images that have been modiﬁed by a small-magnitude perturbation producedby the Fast Gradient Sign method of Goodfellow, Shlens, and Szegedy [GSS14]. However, as themagnitude of the perturbation increases, JPG compression is unable to recover a non-adversarialimage and therefore JPG compression cannot, by itself, guard against the security risk of adversarialexamples.We begin by discussing related work and in particular a recent preprint by Kurakin, Goodfellow,and Bengio [KGB16] showing independent work that the effect of certain varieties of adversarialperturbations can even survive being printed on paper and recaptured by a digital camera. This same

Presented at the International Society for Bayesian Analysis (ISBA 2016) World Meeting, June 13–17, 2016. a r X i v : . [ c s . C V ] A ug reprint also reports on the effect of JPG compression quality on adversarial perturbations. Ourexperiments are complimentary, as we vary the magnitude of the perturbation. Szegedy et al. [Sze+13] were the ﬁrst to demonstrate adversarial examples: working within the contextof image classiﬁcation, they found the smallest additive perturbation η to an image x that caused thenetwork to misclassify the image x + η . In their paper introducing the concept, they demonstratedthe surprising phenomenon that adversarial examples generalized across neural networks trainedon disjoint subsets of training data, as well as across neural networks with different architecturesand initializations. Papernot et al. [Pap+16] exploited this property to demonstrate how one couldconstruct adversarial examples for a network of an unknown architecture by training an auxiliaryneural network on related data.These ﬁndings highlight that adversarial examples pose a potential security risk in real-world applica-tions of neural networks such as autonomous car navigation and medical image analysis. Adversarialexamples also pose a challenge for machine learning, because they expose an apparently large gapbetween the inductive bias of humans and machines. In part due to both challenges, there has been aﬂood of work following the original demonstration of adversarial examples that attempts to explainthe phenomenon and protect systems.Goodfellow, Shlens, and Szegedy [GSS14] argued that neural networks are vulnerable to adversarialperturbations due to the linear nature of neural networks and presented some experimental evidencethat neural network classiﬁers with non-linear activations are more robust. Tabacof and Valle [TV15]demonstrated empirically that adversarial examples are not isolated points and the neural networks aremore robust to random noise than adversarial noise. Billovits, Eric, and Agarwala [BEA16] visualizedhow adversarial perturbations change activations in a convolutional neural network. They also rana number of experiments to better understand which images are more susceptible to adversarialperturbations depending on the magnitude of the classiﬁer’s prediction on clean versions of the image.Several authors have proposed solutions to adversarial examples with mixed success [Pap+15; GR14].Gu and Rigazio [GR14] proposed the use of an autoencoder (AE) to remove adversarial perturbationsfrom inputs. While the AE could effectively remove adversarial noise, the combination of the AEand the neural network was even less robust to adversarial perturbations. They proposed to use acontractive AE instead, which increased the size of the perturbation needed to alter the classiﬁer’spredicted class.While most of the work has been empirical, Fawzi, Fawzi, and Frossard [FFF15] gave a theoreticalanalysis of robustness to adversarial examples and random perturbations for binary linear andquadratic classiﬁers. They compute upper bounds on the robustness of linear and quadratic classiﬁers.The upper bounds suggests that quadratic classiﬁers are more robust to adversarial perturbations thanlinear ones.A recent paper by Kurakin, Goodfellow, and Bengio [KGB16] makes several signiﬁcant contributionsto the understanding of adversarial images. In addition to introducing several new methods forproducing large adversarial perturbations that remain imperceptible, they demonstrate the existenceof adversarial examples “in the physical world”. To do so, Kurakin, Goodfellow, and Bengio computeadversarial images for the Inception classiﬁer [Sze+15], print these adversarial images onto paper ,and then recapture the images using a cell-phone camera. They demonstrate that, even after thisprocess of printing and recapturing, a large fraction of the images remain adversarial. The authorsalso experimented with multiple transformations of adversarial images: changing brightness andcontrast, adding Gaussian blur, and varying JPG compression quality. This last aspect of their workrelates to the experiments we report here. What is the nature of adversarial examples? Why do they exist? And why are they robust to changesin training data, network architecture, etc?Adversarial perturbations are considered interesting because they are judged to be imperceptibleby humans, yet they are (by deﬁnition) extremely perceptible to neural network classiﬁers, even2 adv ( × ) • JPG ( adv ( x ) ) • to • - × : data •• • ° ••• • ~ Figure 1:

The red dots represent the data and the grey line the data subspace. The solid blue arrow isthe adversarial perturbation that moves the data point x away from the data subspace and the dottedblue arrow is the projection on the subspace. In the case where the perturbation is approximatelyorthogonal to the JPG subspace, JPG compression brings the adversarial example back to the datasubspace.across a wide variety of training regimes. A basic hypothesis underlying this work is that, in anychallenging high-dimensional classiﬁcation task where the inputs naturally live in (or near) a complexlower-dimensional data subspace , adversarial examples will lie outside this data subspace, takingadvantage of the fact that the training objective for the neural network is essentially agnostic to thenetwork’s behavior outside the data subspace.Even if individual neural network classiﬁers were not robust to imperceptible perturbations, we mightsettle for a measure of conﬁdence/credibility reporting high uncertainty on adversarial examples. Intheory, we would expect conﬁdence intervals or credible sets associated with neural network classiﬁersto represent high uncertainty on adversarial images provided that, outside the data subspace, there wasdisagreement among the family of classiﬁers achieving, e.g., high likelihood/posterior probability. Inpractice, efﬁcient computational methods may not be able to determine whether there is uncertainty.The ﬁeld has poor understanding of both issues. To date, no frequentist or Bayesian approach hasdemonstrated the ability to correctly classify or report high uncertainty on adversarial images.At the very least, adversarial examples reﬂect the fact that neural network classiﬁers are relying onproperties of the data different from those used by humans. In theory, even a classiﬁer trained on adata set of diverging size might fall prey to adversarial examples if the training data live on a subspace.Techniques such as data augmentation (e.g., by adding noise or adversarial perturbations) would beexpected to remove a certain class of adversarial examples, but unless the notion of “perceptibleperturbation” is exactly captured by the data augmentation scheme, it seems that there will always bespace for adversarial examples to exist. Natural image classiﬁcation is an example of a high-dimensional classiﬁcation task whose inputshave low intrinsic dimension. Indeed, we can be all but certain that if we were to randomly generatea bitmap, the result would not be a natural image. On the other hand, humans are not affected byadversarial perturbations or other perturbations such as random noise, and so we introduce the notionof the perceptual subspace : the space of bitmaps perceived by humans as being natural imageswith some corruption. Empirical evidence suggests that neural networks learn to make accuratepredictions inside the data subspace. Neural networks are also understood to be fairly resistantto random perturbations as these perturbations are understood to cancel themselves out [GSS14].Neural networks classiﬁers work well, in part, due to their strong inductive biases. But this same biasmeans that a neural network may report strong predictions beyond the data subspace where there is The extent to which humans are themselves susceptible to adversarial imagery is not well understood, atleast by the machine learning community. Can small perturbations (e.g., in the mean-squared-error) cause humanperception to change dramatically?

3o training data. We cannot expect sensible predictions outside the data subspace from individualclassiﬁers. If we could project adversarial images back onto the data subspace, we could conceivable get ridof adversarial perturbations. Unfortunately, it is not clear whether it is possible to characterizeor learn a suitable representation of the data subspace corresponding to natural images. We may,however, be able to ﬁnd other lower-dimensional subspaces that contain the data subspace. Tothat end, note that most image classiﬁcation data sets, like ImageNet [Rus+15], are built from JPGimages. Call this set of images the

JPG subspace , which necessarily contains the data subspace.Perturbations of natural images (by adding scaled white noise or randomly corrupting a small numberof pixels) are almost certain to move an image out of the JPG subspace and therefore out of the datasubspace. While we cannot project on the data subspace, we can use JPG compression to “project”the perturbed images back onto the JPG subspace. We might expect JPG compression to reverseadversarial perturbations for several reasons: First, adversarial perturbations could be very sensitiveand reversed by most image processing steps. (Our ﬁndings contradict this, as do the ﬁndings in[KGB16].) Second, adversarial perturbations might be “orthogonal” to the JPG subspace, in whichcase we would expect the modiﬁcations to be removed by JPG compression. (Our ﬁndings forsmall perturbations do not contradict this idea, though larger perturbations are not removed by JPGcompression. It would be interesting to evaluate the discrete cosine transformation of adversarialimages to settle this hypothesis.) More study is necessary to explain our ﬁndings.

We evaluated the effect of adversarial perturbations on the network’s classiﬁcation, and then studiedhow the classiﬁcation was affected by a further JPG compression of the adversarial image. Wemeasured the change at several different magnitudes of adversarial perturbation.We used the pre-trained OverFeat network (Sermanet et al., 2013), which was trained on imagesfrom the 2012 ImageNet training set (1000 classes). The training images used to produce OverFeatunderwent several preprocessing steps: they were scaled so that the smallest dimension was 256; then5 random crops of size × were produced; ﬁnally, the set of images (viewed as vectors) werethen standardized to have zero mean and unit variance. (When we refer to standardization below, weare referring to the process of repeating precisely the same shift and scaling used to standardize thetraining data fed to OverFeat.) The OverFeat network is composed of ReLU activations and maxpooling operations, 5 convolutional layers, and 3 fully connected layers.For a (bitmap) image x , we will write JPG( x ) to denote the JPG compression of x at quality level .For a network with weights w and input image x , let p w ( c | x ) be the probability assigned to class c .Let (cid:96) x = arg max p w ( c | x ) be the class label assigned the highest probability (which we will assumeis unique). Then p w ( (cid:96) x | x ) is the probability assigned to this label.To generate adversarial examples, we used the Fast Gradient Sign method introduced by Goodfellow,Shlens, and Szegedy [GSS14]. Let w represent the pre-trained weights of the OverFeat network. TheFast Gradient Sign perturbation is calculated by scaling the element-wise sign of the gradient of thetraining objective J ( x, w, y ) with respect to the image x for the label y = (cid:96) x , i.e., η (cid:15) ( x ) = (cid:15) sign (cid:0) ∇ x (cid:48) J ( x (cid:48) , w, y ) | x (cid:48) = x,y = (cid:96) x (cid:1) and thus Adv (cid:15) ( x ) = x + η (cid:15) ( x ) (1)The image gradient ∇ x (cid:48) J ( x (cid:48) , w, y ) can be efﬁciently computed using back propagation. In ourexperiments with the OverFeat network, we used (cid:15) ∈ { , , } . See Fig. 2 for several examples ofimages after adversarial perturbations of increasing magnitudes. One would hope that, even if individual neural networks achieving high posterior probability suffered fromadversarial perturbations, networks sampled from a Bayesian posterior would disagree on the classiﬁcation ofan input outside the data subspace, representing uncertainty. However, our experiments with current scalableapproximate Bayesian neural network methods (namely, variants of stochastic gradient Langevin dynamics[WT11; Li+15]) revealed that Bayesian neural networks report conﬁdent misclassiﬁcations on adversarialexamples. It is worth evaluating other approximate inference frameworks. igure 2: ( ﬁrst ) Original image x , with label “agama” assigned 0.99 probability; ( second ) Adversarialimage Adv (cid:15) ( x ) , where (cid:15) = 1 , with label “rock crab” assigned 0.93 probability and label “agama”assigned × − probability; ( third and fourth ) Adversarial images Adv (cid:15) ( x ) with (cid:15) set to 5 and 10.Both assign probability ≈ to “agama”. However, adversarial noise becomes apparent; ( last ) JPGcompression of the adversarial image, JPG(Adv (cid:15) ( x )) with (cid:15) = 1 , with label “agama” assigned 0.96probability.For each image x in the ImageNet validation set, we performed the following steps:1. Scale x so that its smallest dimension is ; crop to the centered × square region;and then standardize;2. Compute Adv (cid:15) ( x ) using the Fast Gradient Sign method, with (cid:15) ∈ { , , } ;3. Compute JPG(Adv (cid:15) ( x )) using the save method from Torch7’s image package;4. Compute the OverFeat network predictions for all images: original x , adversarial Adv (cid:15) ( x ) ;and compressed JPG(Adv (cid:15) ( x )) .For an image x , we will refer to p w ( (cid:96) x | x ) as its top-label probability and, more generally, for atransformation f acting on images, we will refer to p w ( (cid:96) x | f ( x )) as the top-label probability aftertransformation f .Fig. 3 gives a coarse summary of how JPG compression affects adversarial examples, while Fig. 4gives a more detailed picture at the level of individual images for the case of perturbations ofmagnitude (cid:15) = 1 . We will now explain these ﬁgures in turn.Fig. 3 reports statistics on the top-label probability under various transformations for every image inthe validation set . The ﬁrst boxplot summarizes the distribution of the top-label probability for thevalidation images when no perturbations have been made. As we see, the network assigns, on average,0.6 probability to the most probable label and the interquartile range lies away from the extremes and . While we might consider JPG (re)compression to be a relatively innocuous operation, thesecond boxplot reveals that JPG compression already affects the top-label probability negatively. Thethird boxplot summarizes the top-label probability under an adversarial transformation of magnitude / : the mean probability assigned to the top label (cid:96) x drops from approximately . to below . . The top-label probability after JPG compression of the adversarial images increases backtowards the levels of JPG compressed images, but falls short: the mean recovers to just over 0.4.Larger adversarial perturbations (of magnitude / and / ) cause more dramatic negativechanges to the top-label probability. Moreover, JPG compression of these more perturbed imagesis not effective at reversing the adversarial perturbation: the top-label probability remains almostunchanged, improving only slightly.The scatter plots in Fig. 4 paint a more detailed picture for small advesarial perturbations ( (cid:15) = 1 ).In every scatter plot, a point ( p , p ) speciﬁes the top-label probability under a pair ( f , f ) oftransformations, respectively. In the ﬁrst plot, we see the effect of JPG compression on the top-labelprobability, which can be combined with the second boxplot in Fig. 3 to better understand theeffect of JPG compression on a neural networks top-label probability assignments. In short, JPGcompression can lower and raise the top-label probability, although the mean effect is negative, andJPG compression affects images with high top-label probabilities least. The bottom-left plot showsthe strong negative effect of the adversarial perturbation on the top-label probability, which can becontrasted with the top-middle plot, where we see that the top-label probabilities recover almost tothe level of the original images after JPG recompression. (C.f., boxplots 2 and 4 in Fig. 3.)If JPG compression were a good surrogate for projection onto the data subspace, we would expectthe top-label probabilities to recover to the level of the top-label probabilities for JPG( x ) . This is5 JPG( x ) Adv ( x ) JPG(Adv ( x )) Adv ( x ) JPG(Adv ( x )) Adv ( x ) JPG(Adv ( x )) JPG noise (Adv ( x )) P r e d i c t e d p r ob a b ilit y Figure 3:

The top-label probabilities, i.e., the predicted probability (y-axis) assigned to the mostlikely class (cid:96) x , after various transformations x (cid:55)→ f ( x ) . The red horizontal line in each box plots isthe average top-label probability. The solid red line is the median, the box represents the interquartilerange, and the whiskers represent the minimum and maximum values, excluding outliers. Labelsalong the bottom specify the transformation f ( x ) applied to the image x before measuring thetop-label probability.not quite the case, even for small perturbations ( (cid:15) = 1 ), although the adversarial nature of theseimages is often signiﬁcantly reduced. For larger perturbations, the effect of JPG compression is small.(This agrees with the ﬁnding by Kurakin, Goodfellow, and Bengio [KGB16] that Fast Gradient Signperturbations are quite resilient to image transformations, including JPG compression.)Does the improvement for small perturbations yielded by JPG compression depend on the speciﬁcstructure of JPG compression or could it be mimicked with noise sharing some similar statistics?To test this hypothesis, we studied the effect on top-label probabilities after adding a randompermutation of the vector representing the effect of JPG compression. More precisely, let P be arandom permutation matrix. We tested the effect of the perturbation η JPG ( x ) = P ∆(Adv (cid:15) ( x )) , where ∆( x (cid:48) ) = JPG( x (cid:48) ) − x (cid:48) , (2)which we call JPG noise . Thus, we studied the top-label probabilities for images of the form

JPG noise (Adv (cid:15) ( x )) = Adv (cid:15) ( x ) + η JPG ( x ) . (3)By construction, JPG noise shares every permutation-invariant statistics with JPG compression, butloses, e.g., information about the direction of the JPG compression modiﬁcation. The last box plot inFig. 3 shows that adversarial images remain adversarial after adding JPG noise: indeed, the averagepredicted probability for (cid:96) x is even lower than for adversarial images (second box plot).Table 1 summarizes classiﬁcation accuracy and mean top-label probabilities after various transforma-tions applied to images in the ImageNet validation set. (C.f., Fig. 3.) Notice that the accuracy dropsdramatically after adversarial perturbation. JPG compression increases the accuracy substantiallyfor small perturbations ( (cid:15) = 1 ), however, the accuracy is still lower than on clean images. For largeradversarial perturbations ( (cid:15) ∈ { , } ), JPG compression does not increase accuracy enough torepresent a practical solution to adversarial examples. Our experiments demonstrate that JPG compression can reverse small adversarial perturbationscreated by the Fast-Gradient-Sign method. However, if the adversarial perturbations are larger, JPGcompression does not reverse the adversarial perturbation. In this case, the strong inductive biasof neural network classiﬁers leads to incorrect yet conﬁdent misclassiﬁcations. Even the largestperturbations that we evaluated are barely visible to an untrained human eye, and so JPG compressionis far from a solution. We do not yet understand why JPG compression reverses small adversarialperturbations. 6 .0 0.2 0.4 0.6 0.8 1.0

Img J P G (I m g ) Img J P G ( A dv (I m g ) Img A dv (I m g ) + J P G no i s e Img A dv (I m g ) Adv(Img) J P G ( A dv (I m g )) Figure 4:

In every scatter plot, every validation image x is represented by a point ( p , p ) , whichspeciﬁes the top-label probabilities p j = p w ( (cid:96) x | f j ( x )) under a pair ( f , f ) of modiﬁcations of theimage, respectively. All adversarial perturbations in these ﬁgures were generated with magnitude (cid:15) = 1 . Along the top row, the x -axis represents the top-label probability for a clean image. ( top left )The plot illustrates the effect of JPG compression of a natural image. The predictions do change, buton average they lie close to the diagonal and do not change the top-label probability appreciably; ( topmiddle ) If JPG compression of the adversarial image removed adversarial perturbations, we wouldexpect this plot to look like the one to the left. While they are similar (most points lie around thediagonal), more images lie in the lower right triangle, suggesting that the adversarial perturbationsare sometimes not removed or only partially removed. ( top right ) Adding JPG noise does not reversethe effect of adversarial perturbations: indeed, points lie closer to the lower axis than under a simpleadversarial modiﬁcation; ( bottom left ) The top-label probabilities after adversarial perturbation dropssubstantially on average; ( bottom right ) This plot complements the top-middle plot. Most of thepoints lie on the upper left triangle, which suggests that JPG compression of an adversarial imageincreases the top-label probability and partially reverses the effect of many adversarial perturbations. Acknowledgments

ZG acknowledges funding from the Alan Turing Institute, Google, Microsoft Research and EPSRCGrant EP/N014162/1. DMR is supported in part by a Newton Alumni grant through the RoyalSociety.

References [BEA16] C. Billovits, M. Eric, and N. Agarwala.

Hitting Depth: Investigating Robustness to Adversarial Exam-ples in Deep Convolutional Neural Networks . http://cs231n.stanford.edu/reports2016/119_Report.pdf . 2016.[FFF15] A. Fawzi, O. Fawzi, and P. Frossard. “Analysis of classiﬁers’ robustness to adversarial perturbations”.arXiv:1502.02590 (2015).[GR14] S. Gu and L. Rigazio. “Towards Deep Neural Network Architectures Robust to Adversarial Exam-ples”. arXiv:1412.5068 (2014).[GSS14] I. J. Goodfellow, J. Shlens, and C. Szegedy. “Explaining and Harnessing Adversarial Examples”.arXiv:1412.6572 (2014). f ( x ) Top-1 Accuracy Mean p w ( (cid:96) x | f ( x )) x Adv ( x ) Adv ( x ) Adv ( x ) JPG(Adv ( x )) JPG(Adv ( x )) JPG(Adv ( x )) JPG noise (Adv ( x )) Table 1:

Classiﬁcation accuracy and mean top-label probabilities after various transformations. [He+15] K. He, X. Zhang, S. Ren, and J. Sun. “Deep Residual Learning for Image Recognition”.arXiv:1512.03385 (2015).[KGB16] A. Kurakin, I. Goodfellow, and S. Bengio. “Adversarial examples in the physical world”.arXiv:1607.02533 (2016).[Li+15] C. Li, C. Chen, D. Carlson, and L. Carin. “Preconditioned Stochastic Gradient Langevin Dynamicsfor Deep Neural Networks”. arXiv:1512.07666 (2015).[Pap+15] N. Papernot, P. D. McDaniel, X. Wu, S. Jha, and A. Swami. “Distillation as a Defense to AdversarialPerturbations against Deep Neural Networks”. arXiv:1511.04508 (2015).[Pap+16] N. Papernot, P. D. McDaniel, I. J. Goodfellow, S. Jha, Z. B. Celik, and A. Swami. “PracticalBlack-Box Attacks against Deep Learning Systems using Adversarial Examples”. arXiv:1602.02697(2016).[Rus+15] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla,M. Bernstein, A. C. Berg, and L. Fei-Fei. “ImageNet Large Scale Visual Recognition Challenge”.

International Journal of Computer Vision (IJCV)

ICML . 2011.. 2011.