β -Variational Classifiers Under Attack
ββ -Variational Classifiers Under Attack Marco Maggipinto ∗ Matteo Terzi ∗ Gian Antonio Susto ∗∗∗
Department of Information Engineering (DEI), University of Padova,Italy (e-mail: [email protected], [email protected])) ∗∗ DEI and Human-Inspired Technology Center, University of Padova,Italy (e-mail: [email protected])
Abstract:
Deep Neural networks have gained lots of attention in recent years thanks to thebreakthroughs obtained in the field of Computer Vision. However, despite their popularity, ithas been shown that they provide limited robustness in their predictions. In particular, it ispossible to synthesise small adversarial perturbations that imperceptibly modify a correctlyclassified input data, making the network confidently misclassify it. This has led to a plethoraof different methods to try to improve robustness or detect the presence of these perturbations.In this paper, we perform an analysis of β -Variational Classifiers, a particular class of methodsthat not only solve a specific classification task, but also provide a generative component thatis able to generate new samples from the input distribution. More in details, we study theirrobustness and detection capabilities, together with some novel insights on the generative partof the model. Keywords:
Adversarial Training, Computer Vision, Deep Learning, Machine Learning,Robustness1. INTRODUCTIONThe astounding performance that Deep Neural Networks(DNNs) provide when dealing with large amounts of com-plex data has recently led to extensive research in DeepLearning (DL) technologies. Empirical evidence showsthat, in contrast to standard Machine Learning (ML)methods, DNNs are able to generalize well in the over-parametrized regime (Belkin et al. (2018a,b)), i.e. whenthe number of parameters of the model is much higher thanthe number of data used to train it; hence, there is basicallyno limit, other than the computational capabilities, to thecomplexity of the hypothesis class of functions that is ofpractical use. Despite this property, that is still not wellunderstood and object of an entire line of research, thehigh complexity of the input-output relationship comes ata cost: the predictions provided by DNNs are not inter-pretable, making it difficult to understand what caused themodel to take a particular decision; moreover, (Szegedyet al. (2013)) discovered that DNNs are susceptible toadversarial perturbations, small changes in the input spacethat result in high changes in the output space. Thisallows the creation of
Adversarial Examples (Goodfellowet al. (2014)): for example, in image classification, it ispossible to synthesise artificial images that, while visuallyidentical for the human eye to a correctly classified sample,they are confidently misclassified. While such problem iscommon in ML, it is emphasized in DNNs by the highdimensionality of the input space and the complexity ofthe function described by the DNN that may be subjectto high curvature directions that can be exploited, even (cid:63)
Part of this work was supported by MIUR (Italian Minister forEducation) under the initiative ”Departments of Excellence” (Law232/2016). We would like to also thank Nvidia for donating theNvidia Titan V GPU used in for this research. in a small neighborhood of a point, to significantly changethe response of the network.The discovery started a completely new research trendthat tries to understand the phenomenon (see Liu et al.(2016); Shaham et al. (2018)) or find ways to defendagainst it. The most common approach to train robustnetworks is
Adversarial Training (Goodfellow et al. (2014);Madry et al. (2017); Terzi et al. (2020)) that consistsin generating adversarial examples and feeding them tothe network during training along with the correct label.While effective, such method comes at the cost of reducedprediction accuracy (Tsipras et al. (2018)) and increasedtraining time compared to standard models. Other ap-proaches have been proposed to obtain robust models suchas gradient regularization (Ross and Doshi-Velez (2018)),Lipschitz regularization (Finlay et al. (2018)) and cur-vature regularization (Moosavi-Dezfooli et al. (2019)). Adifferent research line focuses on developing methods todetect adversarial examples, without requiring the modelto be robust. In (Feinman et al. (2017)) it is proposed tocombine Kernel Density Estimation on the hidden layerof the network (Botev et al. (2010)) and MC-Dropout(Gal and Ghahramani (2016)) that provides an estimateof the prediction uncertainty of the network. The rationalebehind the method is that adversarial examples shouldhave lower likelihood according to the estimated densityand higher prediction uncertainty. (Gong et al. (2017))propose to train a binary classifier as a detection method:it is shown that such approach is able to detect 99% ofadversarial examples and it is robust to a second attackthat aims at fooling the detection method. (Grosse et al.(2017)) uses the kernel-based two-sample test (Grettonet al. (2012)) to detect statistical differences between ad-versarial examples and normal data. a r X i v : . [ c s . L G ] A ug X ZX Y
N N
Fig. 1. Bayesian network of a VAE (left) and the Varia-tional Classifier analyzed in this work (right).Recently, (Li et al. (2018)) proposed a study on therobustness of Generative Classifiers and their detectioncapabilities. They propose three detection methods whoserejection policies are respectively: 1) reject samples withlikelihood of the input lower than a certain threshold; 2)Reject samples with joint input/output likelihood lowerthan a certain threshold; 3) Reject over/under confidentpredictions. The methods have proven effective on objectrecognition tasks. Moreover, they show that models withlower capacity are more robust to adversarial examples.We build upon (Li et al. (2018)) to provide an analysis of β -Variational Classifiers, a similar approach but based on β -Variational Autoencoders (Higgins et al. (2017)) that areable to provide a disentangled representation of the input(Burgess et al. (2018)) and give an alternative method tocontrol the model capacity. In particular, the contributionsof our paper are as follows: • We analyze the robustness of β -Variational Classifierscombined with sparse regularization; • We analyze the detection capabilities of β -VariationalClassifiers; • We analyse the effects of adversarial perturbations onthe decoder network.The remainder of this paper is organized as follows: Insection 2 and 3 we provide a description of β -VariationalAutoencoders and their β -Variational Classifiers. In Sec-tion 4 we explain the phenomenon of adversarial examplesand the main method used to synthesise them. In Section5 and 6 we describe the experimental settings and outlinethe obtained results. Finally, in Section 7 conclusions andfuture works are reported.2. β -VARIATIONAL-AUTOENCODERSGenerative modeling aims at learning a parametrizedmodel of the probability distribution underlying the datain order to obtain new realistic samples from it. In thiscontext, Variational Autoencoders (VAEs) (Kingma andWelling (2013)) are a well know approach to develop acomplex latent variable model that can be learned byStochastic Gradient Descent (SDG).Given a dataset of independent identically distributedsamples { x i } i = 1 , · · · , N with distribution p ( x ), weintroduce hidden variables { z i } i = 1 , · · · , N of dimension d distributed as a multivariate Gaussian p ( z ) ∼ N ( , I ) with I ∈ R d × d the identity matrix.VAEs model the joint distribution of the random variables X and Z as p θ ( x , z ) = p θ ( x | z ) p ( z ) corresponding tothe bayesian network in Figure 1(left) where the meanof p θ ( x | z ) is the output of a Decoder Neural Network D θ ( z ) parametrized by parameters θ , typical choices areGaussian p θ ( x | z ) ∼ N ( D θ ( z ) , I ) for continuous outputor Bernoulli p θ ( x | z ) ∼ B ( D θ ( z )) for binary output. Be-ing the z i unknown, we aim at finding the parametersvalue that maximizes the marginal likelihood; however,this requires evaluating an intractable integral to compute p θ ( x ) = E z [ p θ ( x | z )], which is also difficult to approximateby means of Monte Carlo methods due to the dimension-ality of the hidden factors and the amount of data thatis often very high in Deep Learning settings. In a similarscenario approximate inference is typically very effective;more in details, introducing an approximate posterior dis-tribution q φ ( z | x ) ∼ N ( µ ( x , Σ ( x )) where µ ( x ) , Σ ( x ) areoutput of an Encoder network. We define the ExpectationLower bound (ELBO) L ( θ , φ , x ) as: L ( θ , φ , x ) = − D KL ( q φ ( z | x ) || p ( z )) + E q φ ( z | x ) [log p θ ( x | z )] (1)Where D KL is the KullbackLeibler divergence. It is alwaystrue that (for a detailed proof see Bishop (2006)):log ( p θ ( x )) ≥ L ( θ , φ , x ) (2)(2) has important implications when the parametrizeddistributions are extremely complex, such as Neural Net-works. We can in fact maximize the ELBO instead of theintractable marginal likelihood. In particular, if we analyzethe expression in (1), the Decoder network is trainedto minimize an expected reconstruction error, while theEncoder network distribution is pushed to be close to theprior, this acts as a regularizer, tuning the capacity of theEncoder. Such regularizer impacts the type of represen-tations that the Encoder can learn, in particular (Belkinet al. (2018a)) showed that by controlling this term usingthe following modified ELBO: L ( θ , φ , x ) = β | D KL ( q φ ( z | x || p ( z )) − C | + E q φ ( z | x ) [log p θ ( x | z )] (3)The model is able to produce disentangled representationsthat are related to different characteristics of the image,e.g. color, shape etc. Here β and C are hyperparameters,the first one is usually kept very high around 1000 whilethe second directly control the capacity and is linearlyincreased at training time from 0 to a predefined value(this procedure has been shown to provide better repre-sentations). This modified version, is called β -VAE.During optimization, E q φ ( z | x ) [log p θ ( x | z )] is computedusing a Monte Carlo approximation E q φ ( z | x ) [log p θ ( x | z )] ≈ M (cid:80) Mm =1 log p θ ( x | z ( m ) ) { z m } Mm =1 ∼ q φ ( z | x ). Since it isnot possible to back-propagate through the sampling op-eration, a reparametrization trick is used i.e. z m = µ ( x ) + ξ m Σ ( x ) with ξ m ∼ N (0 , β -VARIATIONAL-CLASSIFIERS β -VAEs provide an interesting method to perform varia-tional inference in the presence of complex parametrizedmodels of probability distributions with hidden factors. A lgorithm 1 Training Procedure
Input: { x ( i ) } ni =1 , B, M, β, C, n iter for j = 1 . . . n iter do Sample B examples from the training set { x ( i ) } Bi =1 Sample B · M latent variables z ( i ) m c = linearSchedule( C, j ) Compute the gradient with respect to Φ , θδ θ = ∇ θ B · M B (cid:88) i =1 M (cid:88) m =1 log p θ ( x ( i ) | z ( i ) m ) δ Φ = ∇ Φ B B (cid:88) i =1 (cid:104) β | D KL (cid:16) q φ ( z | x ( i ) ) || p ( z ) (cid:17) − c | ++ 1 M M (cid:88) m =1 log p θ ( x ( i ) | z ( i ) m ) (cid:35) Ascend the gradient θ = ascendRule ( θ , δ θ ) Ascend the gradient Φ = ascendRule ( θ , δ Φ ) end for similar optimization procedure can be employed to learna more complex Bayesian model that includes a randomvariable Y representing a class which the input belongsto. The resulting model is called β -Variational Classifier( β -VAC) . In this work, we focus our study on a particular β -VAC represented by the Bayesian network in Figure 1(right), we assume that the dataset is composed by couples { x ( i ) , y ( i ) } ni =1 where y ( i ) are the true labels (i.e. objectclasses) associated to the input. Introducing the condi-tional distribution p ω ( y | z ). The ELBO for such model is: L ( θ , φ , x , y ) = − D KL ( q φ ( z | x , y ) || p ( z )) ++ E q φ ( z | x ,y ) [log p θ ( x | z ) p w ( y | z )] (4)From now on, we assume that q φ ( z | x , y ) = q φ ( z | x ) whichstates that all the information about z is contained in x .The resulting ELBO, including the modified regularizer ofSection 2 can be expressed as follows: L ( θ , φ , x , y ) = β | D KL ( q φ ( z | x ) || p ( z )) − C | ++ E q φ ( z | x ) [log p θ ( x | z ) p w ( y | z )] (5)The optimization procedure to learn the model parametersis analogous to Algorithm 1.4. ADVERSARIAL EXAMPLESIn object recognition, an adversarial example is an imagethat, while being visually indistinguishable or very similarto a correctly classified input, is confidently mis-classifiedby the model. The most common approach to synthetizean adversarial example x adv is the Projected GradientDescent (PGD) attack, where a normal data x is perturbedby following an ascending direction of the loss functionwhile remaining in an (cid:15) -ball B (cid:15) ( x ) = { x adv s.t. || x − x adv || p ≤ (cid:15) } centered at the original sample. More indetails, let L ( x , y ) be the value of the loss at x where y is the correct label, (cid:15) > k the number of iterations and α thestep size, PGD works as desctibed in Algorithm 2. P roj ( x j , B (cid:15) ( x )) is the projection operator that variesdepending on the chosen norm, typical choices are (cid:96) Algorithm 2
PGD
Input: x , y, k, (cid:15), α x = x for j = 1 . . . k do x j = x j + α ∇ L ( x j , y ) x j = P roj ( x j , B (cid:15) ( x )) end for x adv = x j return x adv Fig. 2. Adversarial example on a digit recognition task:the 6 on the left is correctly classified while the oneon the right is classified a 4.and (cid:96) ∞ . The value of (cid:15) identifies the strength of theapplied perturbation. Typically, for simple tasks such ashandwritten digit recognition, an high value is necessaryto fool the network while for more complex tasks, thevalue can be much smaller. Figure 2 shows an adversarialexample on a digit recognition task dataset, with (cid:15) = 0 . β -VAC to adversarialexamples we train the model on two popular datasetsin Computer Vision for handwritten digit classification (MNIST) and for clothes recognition (Fashion-MNIST orFMNIST by Zalando research). They both are composedby 60 thousand labeled images for the train set and 10thousand for testing, the images are in grayscale of size28 × • Encoder: is an adapted Allcnn (Springenberg et al.(2014)) to provide an hidden representation of size100 i.e. z ∈ R ; • Decoder: has a structure symmetric to the encoder,in order to provide as output an image of the samedimension of the input; • Classifier: is a 2 layer perceptron with 64 hiddenneurons per layer and ReLu (Nair and Hinton (2010))activations.We train for 60 epochs using an SGD optimizer withmomentum 0.9 and learning rate 0.01 that is decreasedat the 10th and 30th epoch by a factor of 10. The strengthof the momentum is a common choice in literature whilethe other attributes have been chosen in order to properly http://yann.lecun.com/exdb/mnist https://github.com/zalandoresearch/fashion-mnist rain the network. We employ a small weight decay of1e-6 (typical values are around 1e-3 for the Allcnn) soits influence on the capacity is limited and we are freeto control it using the KL regularization term in theELBO. We use PGD to compute the adversarial exampleswith variable strength, in particular for MNIST we use (cid:15) ∈ [0 , .
3] while for FMNIST (cid:15) ∈ [0 , . (cid:15) is referred to the (cid:96) ∞ norm.6. RESULTSIn this section we outline the obtained results and, inparticular, we analyze the effect of the capacity and sparseregularization on the robustness and detection capabili-ties of the β -VAC described in the previous section. Toconclude, we visually inspect the reconstructed images ofadversarial examples, showing some interesting properties. The decoder part of the model can be extremely useful todetect adversarial examples: by definition, they provide anhigh variation of the network output given small variationsof the input; hence, if the reconstructed image is stronglyaffected by the adversarial perturbation, it will probablybe very different to the input fed to the network. Anexample of this phenomenon is shown in Figure 3(a)and 4(a), it is noticeable how the the reconstructionerror considerably increases at the strengthening of theattack. We can thus train a classifier on the reconstructionerror to obtain an effective attack detection methods.In the following, we will use a logistic classifier trainedon adversarial examples computed on the training setand we analyse its performance by computing adversarialexamples on the test set, and classifying both clean inputand perturbed ones. We use the classification rate asperformance metric being the dataset well balanced dueto the high effectiveness of adversarial attacks.
In Figure 3(b) and 4(b) we report the robustness ofthe β -VAC, measured in terms of accuracy at classifyingadversarial examples, as a function of the parameter C controlling the capacity of the encoder network. We don’tnotice particular correlation between the capacity andthe accuracy. On the FMNIST dataset, for small attackstrength, there are some values of C that provide betterrobustness i.e. 0.01 and 1.0, however the first one has lowaccuracy on clean samples ( (cid:15) = 0) due to the limitedcapacity of the model. In general, we cannot conclude tobe able to control the robustness by changing the capacity.For the detection rate Figure 3(c) and 4(c), the capacityseems to have a more strong effect, in particular whenvalues are too low, such as 0.01, the reconstruction erroris high also for normal samples, making it difficult todistinguish them from the adversarial ones. On the otherhand, we don’t see a positive correlation between capacityand detection rate, so the parameter has to be fine tuned to get the best result. Overall, the detection accuracy isvery good on the MNIST dataset also due to the higherattack strength used, for FMNIST the detection rate iswell above 70% with the right capacity for every attackstrength. The idea of adding sparse regularization is motivated bythe fact that the encoder network of β -VAE has beenshown to provide disentangled representations. Hence, ifthe classifier selects only the ones that are useful for theclassification task and discard the others, it should be morerobust to adversarial perturbations. In Table 1 and 2 weshow the effect of the (cid:96) regularization on the robustnessof the classifier trained using the best performing capacity,i.e 1.0 for MNIST and 10.0 for FMNIST. The (cid:96) regu-larization does not seem to provide improved robustness.This result provides us the following insight: while it iseasy to see that for linear models, obtaining l ∞ -robustnessis equivalent to applying (cid:96) -regularization, for non-linearmodel this relation is not true. Thus, our results give theevidence that the Encoder network is not providing robustlatent embeddings.Overall, the detection accuracy is lower than the onefor the non regularized method. This would be a fairprice to pay in case of improved robustness but from theresults obtained it is probably better to avoid using (cid:96) regularization and choose the correct model complexitywith C . We analyse here the effect of an adversarial perturbationon the decoded images of the autoencoder. Interestingly,from Figure 5 we notice that the decoder is fooled toreconstruct an image that seems to belong to the sameincorrect class provided the classifier. For example in thefirst row the true class is 0, the mistaken class is 6 and thedecoder produces an image similar to a 6. This suggeststhat a similar model may be used for other vision taskssuch as conditional image generation and style transfer.We reserve this analysis as a future work.7. CONCLUSIONSIn this paper we proposed an analysis of β -VAC in thepresence of adversarial perturbations. We have shown thatthe model does not provide increased robustness to adver-sarial examples however it is able to detect them effec-tively thanks to the reconstruction error of the decodernetwork. Sparse regularization of the classifier does nothelp in reducing the effects of adversarial perturbations onthe classification. We have shown that the decoder, whenfed with an adversarial example, tend to reconstruct animage that belongs to the class mistakenly selected by theclassifier. As a future work, we want to investigate deeperthis aspect that may be exploited to perform conditionalimage generation and style transfer tasks. We also plan toextend these results to more complex vision datasets. a) Reconstruction error (b) Model accuracy (c) Detection rate Fig. 3. Results for the FMNIST dataset. (a) Reconstruction error (b) Model accuracy (c) Detection rate
Fig. 4. Results for the MNIST dataset.Table 1. l (cid:15) = 0 . (cid:15) = 0 . (cid:15) = 0 . (cid:15) = 0 . (cid:15) = 0 . Table 2. l (cid:15) = 0 . (cid:15) = 0 . (cid:15) = 0 . (cid:15) = 0 . (cid:15) = 0 . Table 3. (cid:96) regularization effect on the detection accuracy for the MNIST dataset. (cid:96) strength (cid:15) = 0 . (cid:15) = 0 . (cid:15) = 0 . (cid:15) = 0 . (cid:15) = 0 . REFERENCESBelkin, M., Ma, S., and Mandal, S. (2018a). To understanddeep learning we need to understand kernel learning. arXiv:1802.01396 .Belkin, M., Rakhlin, A., and Tsybakov, A.B. (2018b).Does data interpolation contradict statistical optimal-ity? arXiv:1806.09471 .Bishop, C.M. (2006).
Pattern recognition and machinelearning . springer. Botev, Z.I., Grotowski, J.F., Kroese, D.P., et al. (2010).Kernel density estimation via diffusion.
The annals ofStatistics , 38(5), 2916–2957.Burgess, C.P., Higgins, I., Pal, A., Matthey, L., Watters,N., Desjardins, G., and Lerchner, A. (2018). Under-standing disentangling in β -vae. arXiv:1804.03599 .Feinman, R., Curtin, R.R., Shintre, S., and Gardner, A.B.(2017). Detecting adversarial samples from artifacts. arXiv:1703.00410 .able 4. (cid:96) regularization effect on the detection accuracy for the FMNIST dataset. (cid:96) strength (cid:15) = 0 . (cid:15) = 0 . (cid:15) = 0 . (cid:15) = 0 . (cid:15) = 0 . Fig. 5. Example of the effect of adversarial perturbationson the decoder network. It is noticeable how thereconstruction belongs to the class mistakenly chosenby the classifier.Finlay, C., Calder, J., Abbasi, B., and Oberman, A. (2018).Lipschitz regularized deep neural networks generalizeand are adversarially robust. arXiv:1808.09540 .Gal, Y. and Ghahramani, Z. (2016). Dropout as a bayesianapproximation: Representing model uncertainty in deeplearning. In
ICLR , 1050–1059.Gong, Z., Wang, W., and Ku, W.S. (2017). Adversarialand clean data are not twins. arXiv:1704.04960 . Goodfellow, I.J., Shlens, J., and Szegedy, C. (2014).Explaining and harnessing adversarial examples. arXiv:1412.6572 .Gretton, A., Borgwardt, K.M., Rasch, M.J., Sch¨olkopf, B.,and Smola, A. (2012). A kernel two-sample test.
Journalof Machine Learning Research , 13(Mar), 723–773.Grosse, K., Manoharan, P., Papernot, N., Backes, M., andMcDaniel, P. (2017). On the (statistical) detection ofadversarial examples. arXiv:1702.06280 .Higgins, I., Matthey, L., Pal, A., Burgess, C., Glorot,X., Botvinick, M., Mohamed, S., and Lerchner, A.(2017). beta-vae: Learning basic visual concepts witha constrained variational framework.
ICLR , 2(5), 6.Kingma, D.P. and Welling, M. (2013). Auto-encodingvariational bayes. arXiv:1312.6114 .Li, Y., Bradshaw, J., and Sharma, Y. (2018). Are gen-erative classifiers more robust to adversarial attacks? arXiv:1802.06552 .Liu, Y., Chen, X., Liu, C., and Song, D. (2016). Delvinginto transferable adversarial examples and black-boxattacks. arXiv:1611.02770 .Madry, A., Makelov, A., Schmidt, L., Tsipras, D., andVladu, A. (2017). Towards deep learning models re-sistant to adversarial attacks. arXiv:1706.06083 .Moosavi-Dezfooli, S.M., Fawzi, A., Uesato, J., andFrossard, P. (2019). Robustness via curvature regular-ization, and vice versa. In
Proceedings of the IEEE Con-ference on Computer Vision and Pattern Recognition ,9078–9086.Nair, V. and Hinton, G.E. (2010). Rectified linear unitsimprove restricted boltzmann machines. In
Proceedingsof the 27th international conference on machine learning(ICML-10) , 807–814.Ross, A.S. and Doshi-Velez, F. (2018). Improving theadversarial robustness and interpretability of deep neu-ral networks by regularizing their input gradients. In
Thirty-second AAAI conf. on artificial intelligence .Shaham, U., Yamada, Y., and Negahban, S. (2018). Un-derstanding adversarial training: Increasing local stabil-ity of supervised models through robust optimization.
Neurocomputing , 307, 195–204.Springenberg, J.T., Dosovitskiy, A., Brox, T., and Ried-miller, M. (2014). Striving for simplicity: The all con-volutional net. arXiv:1412.6806 .Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan,D., Goodfellow, I., and Fergus, R. (2013). Intriguingproperties of neural networks. arXiv:1312.6199 .Terzi, M., Susto, G.A., and Chaudhari, P. (2020). Direc-tional adversarial training for cost sensitive deep learn-ing classification applications.
Engineering Applicationsof Artificial Intelligence , 91, 103550.Tsipras, D., Santurkar, S., Engstrom, L., Turner, A., andMadry, A. (2018). Robustness may be at odds withaccuracy. arXiv:1805.12152arXiv:1805.12152