Adversarially Trained Models with Test-Time Covariate Shift Adaptation
Jay Nandy, Sudipan Saha, Wynne Hsu, Mong Li Lee, Xiao Xiang Zhu
AAdversarially Robust Classifier with Covariate Shift Adaptation
Adversarially Robust Classifier withCovariate Shift Adaptation
Jay Nandy Sudipan Saha Wynne Hsu Mong Li Lee Xiao Xiang Zhu Abstract
Existing adversarially trained models typically perform inference on test examples inde-pendently from each other. This mode of testing is unable to handle covariate shift in thetest samples. Due to this, the performance of these models often degrades significantly. Inthis paper, we show that simple adaptive batch normalization (BN) technique that involvesre-estimating the batch-normalization parameters during inference, can significantly improvethe robustness of these models for any random perturbations, including the Gaussian noise.This simple finding enables us to transform adversarially trained models into randomizedsmoothing classifiers to produce certified robustness to (cid:96) noise. We show that we canachieve (cid:96) certified robustness even for adversarially trained models using (cid:96) ∞ -boundedadversarial examples. We further demonstrate that adaptive BN technique significantlyimproves robustness against common corruptions, while often enhancing performance againstadversarial attacks. This enables us to achieve both adversarial and corruption robustnessfor the same classifier.
1. Introduction
Deep neural network (DNN) based models achieve impeccable success in the independentand identical distribution (IID) settings where the test examples are obtained from thesame distribution as the training examples (Goodfellow et al., 2016). However in practice,this assumption rarely holds for most real-world applications as the test examples areoften obtained from a different acquisition system or in a different environmental settings(Cariucci et al., 2017; Madry et al., 2018; Hendrycks et al., 2019; Nandy et al., 2020a).Moreover, recent studies have shown that a DNN classifier that correctly classifies an image x , can be easily fooled by an adversarial attack to misclassify x + δ , where, δ is a minor adversarial perturbation such that the changes between x and x + δ remain indistinguishableto the human eye (Szegedy et al., 2014). Further, DNN classifiers are also found to besensitive to naturally occurred random corruptions (Hendrycks and Dietterich, 2019; Geirhoset al., 2018). Hence, building a robust classifier against both adversarial perturbations andcommon corruptions has emerged as an important research direction to enhance the safetyand trustworthiness for sensitive real-world applications (Gilmer et al., 2019).Among the successful defense mechanisms against adversarial attacks, adversarial trainingachieves the best empirical robustness performance for a specific perturbation type (such as . School of Computing, National University of Singapore (NUS), Singapore Technical University of Munich (TUM), Munich, GermanyCorrespondence to: Jay Nandy < [email protected] > , Sudipan Saha < [email protected] > a r X i v : . [ c s . L G ] F e b small (cid:96) p -noise) by training on adversarial examples of the same perturbation types (Madryet al., 2018; Tram`er and Boneh, 2019). However, inclusion of the adversarial examplesfor training significantly reduces their performance on clean test examples. A number ofrobustness certification techniques have been proposed for adversarial training frameworks,where the predictions for most of the test examples x can be verified to be constant within aneighborhood of x (Wong and Kolter, 2018; Wang et al., 2018b). However, these techniquestypically do not scale for large networks (e.g., ResNet50) and datasets (e.g., ImageNet ).In contrast to adversarial training, randomized smoothing technique provides a scalable (cid:96) -norm certification for any classification model that are robust against standard isotropicGaussian noise (Cohen et al., 2019). It transforms an original base classifier f into asmoothed classifier g . For an input x , g ( x ) labels x as class y that the base classifier is mostlikely to return under noisy corruption x + δ , that is, g ( x ) = arg max y ∈Y P ( f ( x + δ ) == y ) (1)where, Y is the set of class labels and δ ∼ N (0 , σ I ) is sampled from an isotropic Gaussiandistribution with standard deviation, σ . However, this certification techniques cannot beapplied for adversarially trained models as they are not robust against such Gaussianaugmented noise.Further, Gilmer et al. (2019) recently demonstrated a close relation between adversarialrobustness and corruption robustness. However, in practice adversarially trained modelsoften lead to poor performance even compared to the standard DNN based models whenexposed to naturally occurring common corruptions, (e.g., CIFAR10-C and
ImageNet-C (Hendrycks and Dietterich, 2019)), limiting the practical purview of them. Hence, weneed a better understanding about when such models can be useful for sensitive real-worldapplications.Existing adversarially trained models typically perform inference on test examplesindependently from each other. This standard inference setup often underestimates theirrobustness in non-IID settings. For example, the distribution of test examples may change inthe medical imaging settings, as we use a different data acquisition system or in autonomouscars, satellite image analysis as the weather condition changes Saha et al. (2019a,b). However,these external conditions do not change abruptly for most (but not all) of the real-worldapplications. Hence, we can expect to receive potentially a large number of test examplesfrom the same distribution during inference. Furthermore, adversarial examples are alsoshown to be distributionally shifted from the manifold of clean images (Stutz et al., 2019).Even under such adversarial attacks with multiple iterations, a classifier is likely to observea stream of unlabelled test examples. Hence, we can update the classifier to adapt thedistributional shift using these unlabelled test examples in an unsupervised manner.Unsupervised adaptation mechanisms are well studied in the field of domain adaptation(DA), where the aim is to modulate the model parameters for a different target domain (Liet al., 2016). Recent works have explored self-supervised training (Sun et al., 2020), batchnormalization (BN) adaptation (Schneider et al., 2020; Nado et al., 2020) using test batchesto improve the robustness against common corruptions for standard classifiers. In this paper,we investigate adapting batch normalization (BN) statistics for adversarially trained modelsfor both adversarial and common corruption robustness. BN is a popular technique to speed dversarially Robust Classifier with Covariate Shift Adaptation up training and is used in almost all current state-of-the-art DNN models (Ioffe and Szegedy,2015). BN estimates the statistics of activations and uses them to modulate or standardizethe activations for following deeper layers, improving the training efficiency of the DNNmodels. In the standard inference setup, BN statistics of the DNN models are estimatedduring training and used without re-estimation for the test examples.However, activation statistics obtained during training time do not reflect the statisticsof the test examples under covariate shifts. Here we apply adaptive BN, i.e., re-estimate theBN statistics using the test images to mitigate such covariate shifts (Li et al., 2016). Wedemonstrate that this simple adaptation alone greatly improves the overall robustness of theadversarially trained models.We present our results for CIFAR-10 and
ImageNet datasets to demonstrate thefollowing contributions:1. This work shifts the paradigm for adversarially trained models by improving theirrobustness to Gaussian noise, which in turn helps us to transform them into a smoothedclassifier that can achieve certified robustness in (cid:96) norm. We recall that achievingcertified robustness for adversarially trained model has been a major challenge (Cohenet al., 2019; Salman et al., 2019a) and our finding is a step forward towards this.2. We further demonstrate that adaptive BN technique brings frequent improvementagainst adversarial attack, in particular for ImageNet dataset.3. We also show that adversarially trained models with adaptive BN significantly improvesthe performance against common perturbations and mitigates the performance gapbetween common corruption images and clean images, improving the generalizationfor real-world image classification tasks. To the best of our knowledge, we are the firstto show that both adversarial robustness and corruption robustness can be achievedusing the same classifier.We achieve all of these above by simple adaptive batch normalization technique, thesimplicity of which is its major advantage, that it can be applied to any network withoutany additional training and architectural overhead.
2. Background and Related Work
Several defense models have been proposed against adversarial attacks in the recent years.These models can be broadly categorized into empirical and certified defenses.Empirical defenses provide empirical robustness against adversarial attacks (Schott et al.,2019; Moosavi Dezfooli et al., 2019; Nandy et al., 2020b). Adversarial training empiricallyachieves the best defense (Kurakin et al., 2016; Madry et al., 2018). To achieve robustnesswithin an (cid:15) -bounded threat model for an (cid:96) p norm, where the perturbations δ ∈ ∆ areconstrained as ∆ = { δ : || δ || p ≤ (cid:15) } , adversarial training optimizes the following loss functionfor a DNN classifier, f : min θ E ( x,y ) (cid:104) max δ ∈ ∆ L ( f θ ( x + δ ) , y ) (cid:105) (2)where θ denotes the model parameters and L is the classification loss function. he inner maximization problem can be solved by producing adversarial examples usingstrong attacks, such as projected gradient descent (PGD) attack (Madry et al., 2018).However, Wong et al. (2020) have shown that even single-step fast gradient sign method(FGSM) attack can also improve the robustness of the DNN models (Goodfellow et al.,2015). Several variations such as Trades (Zhang et al., 2019), Adv-LLR (Qin et al., 2019),etc have been proposed to further improve this training framework. However, Rice et al.(2020) demonstrated that applying early stopping criteria for standard adversarial trainingusing PGD attack produces the best robust model for a given perturbation type among themodels that do not incorporate additional training data (Carmon et al., 2019; Uesato et al.,2019).Empirical models demonstrate empirical robustness against the existing “known” adver-sarial attacks without providing any robustness guarantees. In fact, several empirical defensemodels were later broken by stronger adversaries (Athalye et al., 2018; Uesato et al., 2018;Jalal et al., 2019). A number of certified defenses have been proposed to provide a provablerobustness guarantees for a specific perturbation type (Wong and Kolter, 2018; Wang et al.,2018a,b; Raghunathan et al., 2018; Singh et al., 2018; Salman et al., 2019b). Given input x ,it guarantees that the classifier’s prediction is constant within a neighborhood of x . Howeverin general, these techniques are not scalable to large networks (e.g., ResNet50) and datasets(e.g., ImageNet ).To scale up to practical networks, randomized smoothing has been proposed as aprobabilistically certified defense for (cid:96) norm. Randomized smoothing was initially proposedas a heuristic defense (Cao and Gong, 2017; Liu et al., 2018) and later shown to be certifiable(Lecuyer et al., 2019; Li et al., 2019). Recently, Cohen et al. (2019) and later Salman et al.(2019a) presented a tight robustness guarantee for randomized smoothing to achieve the stateof the art in (cid:96) -norm certification technique as follows: Suppose the base classifier f classifies N ( x, σ I ) to return the most probable class, c A with probability p A = P ( f ( x + δ ) == c A )and the “runner-up” class c B with probability p B = max y (cid:54) = c A P ( f ( x + δ ) == y ). Then, thesmooth classifier, g is certifiably robust around x within an (cid:96) radius of R : R = σ (cid:16) Φ − ( p A ) − Φ − ( p B ) (cid:17) (3)where Φ is the inverse of the standard Gaussian CDF.However, computing the exact values of p A and p B is not possible in practice. Hence,(Cohen et al., 2019) used Monte Carlo sampling to estimate p A and p B such that p A ≤ p A and p B ≥ p B with arbitrarily high probability. The certified radius is then computed byreplacing p A and p B with p A and p B respectively. However, adversarially trained modelsin the standard inference setup are not robust against Gaussian noise. Hence, we cannottransform them into smoothed classifier to certify for (cid:96) norm using this technique (Equation1). Notably, Salman et al. (2020) recently demonstrate that any classifier can be convertedto a smoothed classifier using a separate denoiser module as a pre-processor. However, itrequires to retrain the denoiser to generalize for different random perturbation types.Furthermore, understanding the interplay between multiple perturbation robustnessin the adversarial and common corruption case is a longstanding problem (Gilmer et al.,2019). The robustness of adversarially trained models do not generalize well to naturallyoccurring common corruptions such as CIFAR10 -C,
ImageNet -C etc (Gilmer et al., 2019; dversarially Robust Classifier with Covariate Shift Adaptation Yin et al., 2019; Hendrycks et al., 2020a). A number of recent works focused on improvingthe corruption robustness. However, majority of them require special training protocols,involving additional time and resources. These include data augmentation using Gaussiannoise (Gilmer et al., 2019), optimized mixtures of data augmentations in conjunction with aconsistency loss (Hendrycks et al., 2020b), training on stylized images (Geirhos et al., 2018;Michaelis et al., 2019) or using adversarially generated noises (Rusak et al., 2020). Otherapproaches tweak the architecture, e.g., by adding shift-equivariance with an anti-aliasingmodule (Zhang, 2019) or assembling different training techniques (Lee et al., 2020).Recently, test time adaptation techniques using self-supervision (Sun et al., 2020) as wellas adaptive batch-normalization (Schneider et al., 2020; Nado et al., 2020) have been proposedin these context for standard DNN classifiers. However, none of these methods provide bothadversarial robustness and corruption robustness for the same classifier. Notably, recentstudies have also investigated DNN classifiers by reducing number of BN layers (Gallowayet al., 2019) and applying additional auxiliary batch norm (Xie et al., 2020) to empiricallyimprove adversarial robustness. In contrast, we explore the test-time adaptation of BNparameters for adversarially trained models to improve both adversarial and corruptionrobustness.
3. Adaptive Batch-Normalization
Batch-Normalization is a powerful tool to improve the training stability and attain fasterconvergence (Ioffe and Szegedy, 2015). It normalizes the hidden feature activations toreduce the covariate shifts between the hidden layers. Batch normalization is inspired fromthe well-known concept that normalizing or whitening the input speeds up the training ofDNN models (Wiesler and Ney, 2011; Huang et al., 2018). Batch normalization for DNNmodels extends this concept by normalizing the internal/hidden layers (Huang et al., 2018).Typically, modern DNN models incorporate BN after each convolutional or dense layers forcomplex machine learning tasks including image classification (He et al., 2016a; Szegedyet al., 2015).BN layer estimates the mean and variance of the hidden activation maps across thechannels. The feature activations are then normalized by subtracting the mean and dividingby the variance to normalize into N (0 , N (0 , x ∈ X are inputs and y ∈ Y are the class labels. We denote thetraining distribution as P T : X × Y → R + and test distribution as P t : X × Y → R + . Then, here exists covariate shift between training and test distribution if: P T ( y | x )= P t ( y | x ) and P T ( x ) (cid:54) = P t ( x ). Further, if the covariate shift only leads to change in the first and secondorder moments of the feature activations f h ( x ), we can remove it by re-estimating thesestatistics using the test batches, followed by normalization (Schneider et al., 2020): P T (cid:32) f h ( x ) − E T [ f h ( x )] (cid:112) V T [ f h ( x )] (cid:33) P T ( x ) ≈ P t (cid:32) f h ( x ) − E t [ f h ( x )] (cid:112) V t [ f h ( x )] (cid:33) P t ( x ) (4)In other words, we can remove the covariate shift by correcting first and second ordermoments as long as the change in the test distribution only leads the feature activations toscale and translate. In the context of domain adaptation (Li et al., 2016; Saha et al., 2018)and corruption robustness (Nado et al., 2020; Schneider et al., 2020), adaptive BN to correctthe BN statistics are found to be effective as long as the semantics in the test images doesnot change.Adaptive BN adapts the batch normalization statistics using the unlabelled test batchesin unsupervised fashion. Given a set of test examples, we compute the BN statistics usingthe feature activations of the test batch, denoted as µ t , s t and adapt them with the existingstatistics, µ T , s T that are already obtained using the training batches as: µ = ρ · µ t + (1 − ρ ) · µ T s = ρ · s t + (1 − ρ ) · s T (5)where, ρ ∈ [0 ,
1] represents the momentum.The Choice of ρ = 0 rejects the statistics of the test examples. This is equivalent to thestandard inference setup with deterministic DNN classifier in the IID settings. In contrast, ρ = 1 completely ignores the pre-calculated statistics from the training batches. Clearly, aswe receive a larger test-batch, we can get a better estimate of the test distribution. Hence,we should choose a larger value for ρ . However, the choice of ρ is often decided by variouspractical constraints, e.g., available hardware resources. In contrast, if the test-batch size istoo small, the estimated statistics would be unreliable. The appropriate value for ρ can bedetermined empirically for different applications and depending on the available test batchsize.It is noteworthy that the mechanism of adaptively modulating batch normalizationstatistics during inference does not incur any overhead during training. In other words, thetraining process remains completely unaltered and any model that has BN layers can applyadaptive BN during inference. The updation of the statistics during inference is performedin the forward pass. It does not involve gradient computation (i.e., backward pass). Batchnormalization layers also have two additional trainable parameters that require gradientcomputation to update. However, adaptive BN does not require to update those parameters.Thus adaptive BN provides a straightforward mechanism to mitigate the covariate shiftduring inference. A number of extensions of adaptive batch normalization have been proposedin domain adaptation literature, which use additional training and resources (Cariucci et al.,2017; Roy et al., 2019). Unlike those works, we use adaptive BN technique without changingthe DNN architecture or altering the training procedure to achieve both adversarial andcorruption robustness. dversarially Robust Classifier with Covariate Shift Adaptation (a) ImageNet (b)
CIFAR-10
Model σ = 0 σ = 0 . σ = 0 . σ = 0 .
75 Model σ = 0 σ = 0 . σ = 0 . σ = 0 . ± . ± . ± . ± . Baseline 95.2 ± . ± . ± . ± . + adaptive BN 74.4 ± . ± . ± . ± . + adaptive BN 95.0 ± . ± . ± . ± . Adv ∞ ± . ± . ± . ± . Adv ∞ ± . ± . ± . ± . + adaptive BN 60.8 ± . ± . ± . ± . + adaptive BN 81.6 ± . ± . ± . ± . Adv ± . ± . ± . ± . Adv ± . ± . ± . ± . + adaptive BN 58.3 ± . ± . ± . ± . + adaptive BN 81.8 ± . ± . ± . ± . Rand σ =0 . ± . ± . ± . ± . Rand σ =0 . ± . ± . ± . ± . + adaptive BN 62.7 ± . ± . ± . ± . + adaptive BN 74.0 ± . ± . ± . ± . Table 1: Top-1 accuracy of different classifiers with and without adaptive BN techniqueunder different level of Gaussian noises, augmented to the test images. We randomly shufflethe test images and sample the noises to report ( mean ± × sd ) of 5 different runs. . (a) σ = 0 (Clean) (b) σ = 0 .
25 (c) σ = 0 . σ = 0 . Figure 1:
Visualization of loss-gradient images produced by adversarially trained classifiers as weapply different levels of Gaussian noises in the test images. We present the Gaussian augmented testimages at the top row, followed by the loss-gradients corresponding to Adv ∞ , Adv ∞ + adaptive BN,Adv and Adv + adaptive BN in 2 nd , 3 rd , 4 th and 5 th rows respectively. See Appendix E.2 for moreresults.
4. Experiments
In this section, we demonstrate the effectiveness of adaptive BN technique for adversariallytrained models using four sets of experiments.
Experimental Setup.
We use benchmark
CIFAR-10 (Krizhevsky et al., 2009) and
ImageNet (Deng et al., 2009) datasets. We use pre-activation ResNet18 model for
CIFAR-10 , and ResNet50 model for
ImageNet (He et al., 2016a,b). For
CIFAR-10 , we learntwo adversarially trained models Adv ∞ and Adv , for (cid:96) ∞ and (cid:96) threat models with threat U D G L X V F H U W L I L H G D F F X U D F \ % D V H O L Q H D W =0.25 $ G Y D W =0.25 $ G Y D G D S W L Y H % 1 D W =0.5 $ G Y D W =0.25 $ G Y D G D S W L Y H % 1 D W =0.5 5 D Q G =0.5 &