[PDF] Adversarially Trained Models with Test-Time Covariate Shift Adaptation

Abstract

We empirically demonstrate that test-time adaptive batch normalization, which re-estimates the batch-normalization statistics during inference, can provide \ell_2-certification as well as improve the commonly occurring corruption robustness of adversarially trained models while maintaining their state-of-the-art empirical robustness against adversarial attacks. Furthermore, we obtain similar \ell_2-certification as the current state-of-the-art certification models for CIFAR-10 by learning our adversarially trained model using larger \ell_2-bounded adversaries. Therefore our work is a step towards bridging the gap between the state-of-the-art certification and empirical robustness. Our results also indicate that improving the empirical adversarial robustness may be sufficient as we achieve certification and corruption robustness as a by-product using test-time adaptive batch normalization.

Full PDF

AAdversarially Robust Classifier with Covariate Shift Adaptation

Adversarially Robust Classiﬁer withCovariate Shift Adaptation

Jay Nandy Sudipan Saha Wynne Hsu Mong Li Lee Xiao Xiang Zhu Abstract

Existing adversarially trained models typically perform inference on test examples inde-pendently from each other. This mode of testing is unable to handle covariate shift in thetest samples. Due to this, the performance of these models often degrades signiﬁcantly. Inthis paper, we show that simple adaptive batch normalization (BN) technique that involvesre-estimating the batch-normalization parameters during inference, can signiﬁcantly improvethe robustness of these models for any random perturbations, including the Gaussian noise.This simple ﬁnding enables us to transform adversarially trained models into randomizedsmoothing classiﬁers to produce certiﬁed robustness to (cid:96) noise. We show that we canachieve (cid:96) certiﬁed robustness even for adversarially trained models using (cid:96) ∞ -boundedadversarial examples. We further demonstrate that adaptive BN technique signiﬁcantlyimproves robustness against common corruptions, while often enhancing performance againstadversarial attacks. This enables us to achieve both adversarial and corruption robustnessfor the same classiﬁer.

1. Introduction

Deep neural network (DNN) based models achieve impeccable success in the independentand identical distribution (IID) settings where the test examples are obtained from thesame distribution as the training examples (Goodfellow et al., 2016). However in practice,this assumption rarely holds for most real-world applications as the test examples areoften obtained from a diﬀerent acquisition system or in a diﬀerent environmental settings(Cariucci et al., 2017; Madry et al., 2018; Hendrycks et al., 2019; Nandy et al., 2020a).Moreover, recent studies have shown that a DNN classiﬁer that correctly classiﬁes an image x , can be easily fooled by an adversarial attack to misclassify x + δ , where, δ is a minor adversarial perturbation such that the changes between x and x + δ remain indistinguishableto the human eye (Szegedy et al., 2014). Further, DNN classiﬁers are also found to besensitive to naturally occurred random corruptions (Hendrycks and Dietterich, 2019; Geirhoset al., 2018). Hence, building a robust classiﬁer against both adversarial perturbations andcommon corruptions has emerged as an important research direction to enhance the safetyand trustworthiness for sensitive real-world applications (Gilmer et al., 2019).Among the successful defense mechanisms against adversarial attacks, adversarial trainingachieves the best empirical robustness performance for a speciﬁc perturbation type (such as . School of Computing, National University of Singapore (NUS), Singapore Technical University of Munich (TUM), Munich, GermanyCorrespondence to: Jay Nandy < [email protected] > , Sudipan Saha < [email protected] > a r X i v : . [ c s . L G ] F e b small (cid:96) p -noise) by training on adversarial examples of the same perturbation types (Madryet al., 2018; Tram`er and Boneh, 2019). However, inclusion of the adversarial examplesfor training signiﬁcantly reduces their performance on clean test examples. A number ofrobustness certiﬁcation techniques have been proposed for adversarial training frameworks,where the predictions for most of the test examples x can be veriﬁed to be constant within aneighborhood of x (Wong and Kolter, 2018; Wang et al., 2018b). However, these techniquestypically do not scale for large networks (e.g., ResNet50) and datasets (e.g., ImageNet ).In contrast to adversarial training, randomized smoothing technique provides a scalable (cid:96) -norm certiﬁcation for any classiﬁcation model that are robust against standard isotropicGaussian noise (Cohen et al., 2019). It transforms an original base classiﬁer f into asmoothed classiﬁer g . For an input x , g ( x ) labels x as class y that the base classiﬁer is mostlikely to return under noisy corruption x + δ , that is, g ( x ) = arg max y ∈Y P ( f ( x + δ ) == y ) (1)where, Y is the set of class labels and δ ∼ N (0 , σ I ) is sampled from an isotropic Gaussiandistribution with standard deviation, σ . However, this certiﬁcation techniques cannot beapplied for adversarially trained models as they are not robust against such Gaussianaugmented noise.Further, Gilmer et al. (2019) recently demonstrated a close relation between adversarialrobustness and corruption robustness. However, in practice adversarially trained modelsoften lead to poor performance even compared to the standard DNN based models whenexposed to naturally occurring common corruptions, (e.g., CIFAR10-C and

ImageNet-C (Hendrycks and Dietterich, 2019)), limiting the practical purview of them. Hence, weneed a better understanding about when such models can be useful for sensitive real-worldapplications.Existing adversarially trained models typically perform inference on test examplesindependently from each other. This standard inference setup often underestimates theirrobustness in non-IID settings. For example, the distribution of test examples may change inthe medical imaging settings, as we use a diﬀerent data acquisition system or in autonomouscars, satellite image analysis as the weather condition changes Saha et al. (2019a,b). However,these external conditions do not change abruptly for most (but not all) of the real-worldapplications. Hence, we can expect to receive potentially a large number of test examplesfrom the same distribution during inference. Furthermore, adversarial examples are alsoshown to be distributionally shifted from the manifold of clean images (Stutz et al., 2019).Even under such adversarial attacks with multiple iterations, a classiﬁer is likely to observea stream of unlabelled test examples. Hence, we can update the classiﬁer to adapt thedistributional shift using these unlabelled test examples in an unsupervised manner.Unsupervised adaptation mechanisms are well studied in the ﬁeld of domain adaptation(DA), where the aim is to modulate the model parameters for a diﬀerent target domain (Liet al., 2016). Recent works have explored self-supervised training (Sun et al., 2020), batchnormalization (BN) adaptation (Schneider et al., 2020; Nado et al., 2020) using test batchesto improve the robustness against common corruptions for standard classiﬁers. In this paper,we investigate adapting batch normalization (BN) statistics for adversarially trained modelsfor both adversarial and common corruption robustness. BN is a popular technique to speed dversarially Robust Classifier with Covariate Shift Adaptation up training and is used in almost all current state-of-the-art DNN models (Ioﬀe and Szegedy,2015). BN estimates the statistics of activations and uses them to modulate or standardizethe activations for following deeper layers, improving the training eﬃciency of the DNNmodels. In the standard inference setup, BN statistics of the DNN models are estimatedduring training and used without re-estimation for the test examples.However, activation statistics obtained during training time do not reﬂect the statisticsof the test examples under covariate shifts. Here we apply adaptive BN, i.e., re-estimate theBN statistics using the test images to mitigate such covariate shifts (Li et al., 2016). Wedemonstrate that this simple adaptation alone greatly improves the overall robustness of theadversarially trained models.We present our results for CIFAR-10 and

ImageNet datasets to demonstrate thefollowing contributions:1. This work shifts the paradigm for adversarially trained models by improving theirrobustness to Gaussian noise, which in turn helps us to transform them into a smoothedclassiﬁer that can achieve certiﬁed robustness in (cid:96) norm. We recall that achievingcertiﬁed robustness for adversarially trained model has been a major challenge (Cohenet al., 2019; Salman et al., 2019a) and our ﬁnding is a step forward towards this.2. We further demonstrate that adaptive BN technique brings frequent improvementagainst adversarial attack, in particular for ImageNet dataset.3. We also show that adversarially trained models with adaptive BN signiﬁcantly improvesthe performance against common perturbations and mitigates the performance gapbetween common corruption images and clean images, improving the generalizationfor real-world image classiﬁcation tasks. To the best of our knowledge, we are the ﬁrstto show that both adversarial robustness and corruption robustness can be achievedusing the same classiﬁer.We achieve all of these above by simple adaptive batch normalization technique, thesimplicity of which is its major advantage, that it can be applied to any network withoutany additional training and architectural overhead.

2. Background and Related Work

Several defense models have been proposed against adversarial attacks in the recent years.These models can be broadly categorized into empirical and certiﬁed defenses.Empirical defenses provide empirical robustness against adversarial attacks (Schott et al.,2019; Moosavi Dezfooli et al., 2019; Nandy et al., 2020b). Adversarial training empiricallyachieves the best defense (Kurakin et al., 2016; Madry et al., 2018). To achieve robustnesswithin an (cid:15) -bounded threat model for an (cid:96) p norm, where the perturbations δ ∈ ∆ areconstrained as ∆ = { δ : || δ || p ≤ (cid:15) } , adversarial training optimizes the following loss functionfor a DNN classiﬁer, f : min θ E ( x,y ) (cid:104) max δ ∈ ∆ L ( f θ ( x + δ ) , y ) (cid:105) (2)where θ denotes the model parameters and L is the classiﬁcation loss function. he inner maximization problem can be solved by producing adversarial examples usingstrong attacks, such as projected gradient descent (PGD) attack (Madry et al., 2018).However, Wong et al. (2020) have shown that even single-step fast gradient sign method(FGSM) attack can also improve the robustness of the DNN models (Goodfellow et al.,2015). Several variations such as Trades (Zhang et al., 2019), Adv-LLR (Qin et al., 2019),etc have been proposed to further improve this training framework. However, Rice et al.(2020) demonstrated that applying early stopping criteria for standard adversarial trainingusing PGD attack produces the best robust model for a given perturbation type among themodels that do not incorporate additional training data (Carmon et al., 2019; Uesato et al.,2019).Empirical models demonstrate empirical robustness against the existing “known” adver-sarial attacks without providing any robustness guarantees. In fact, several empirical defensemodels were later broken by stronger adversaries (Athalye et al., 2018; Uesato et al., 2018;Jalal et al., 2019). A number of certiﬁed defenses have been proposed to provide a provablerobustness guarantees for a speciﬁc perturbation type (Wong and Kolter, 2018; Wang et al.,2018a,b; Raghunathan et al., 2018; Singh et al., 2018; Salman et al., 2019b). Given input x ,it guarantees that the classiﬁer’s prediction is constant within a neighborhood of x . Howeverin general, these techniques are not scalable to large networks (e.g., ResNet50) and datasets(e.g., ImageNet ).To scale up to practical networks, randomized smoothing has been proposed as aprobabilistically certiﬁed defense for (cid:96) norm. Randomized smoothing was initially proposedas a heuristic defense (Cao and Gong, 2017; Liu et al., 2018) and later shown to be certiﬁable(Lecuyer et al., 2019; Li et al., 2019). Recently, Cohen et al. (2019) and later Salman et al.(2019a) presented a tight robustness guarantee for randomized smoothing to achieve the stateof the art in (cid:96) -norm certiﬁcation technique as follows: Suppose the base classiﬁer f classiﬁes N ( x, σ I ) to return the most probable class, c A with probability p A = P ( f ( x + δ ) == c A )and the “runner-up” class c B with probability p B = max y (cid:54) = c A P ( f ( x + δ ) == y ). Then, thesmooth classiﬁer, g is certiﬁably robust around x within an (cid:96) radius of R : R = σ (cid:16) Φ − ( p A ) − Φ − ( p B ) (cid:17) (3)where Φ is the inverse of the standard Gaussian CDF.However, computing the exact values of p A and p B is not possible in practice. Hence,(Cohen et al., 2019) used Monte Carlo sampling to estimate p A and p B such that p A ≤ p A and p B ≥ p B with arbitrarily high probability. The certiﬁed radius is then computed byreplacing p A and p B with p A and p B respectively. However, adversarially trained modelsin the standard inference setup are not robust against Gaussian noise. Hence, we cannottransform them into smoothed classiﬁer to certify for (cid:96) norm using this technique (Equation1). Notably, Salman et al. (2020) recently demonstrate that any classiﬁer can be convertedto a smoothed classiﬁer using a separate denoiser module as a pre-processor. However, itrequires to retrain the denoiser to generalize for diﬀerent random perturbation types.Furthermore, understanding the interplay between multiple perturbation robustnessin the adversarial and common corruption case is a longstanding problem (Gilmer et al.,2019). The robustness of adversarially trained models do not generalize well to naturallyoccurring common corruptions such as CIFAR10 -C,

ImageNet -C etc (Gilmer et al., 2019; dversarially Robust Classifier with Covariate Shift Adaptation Yin et al., 2019; Hendrycks et al., 2020a). A number of recent works focused on improvingthe corruption robustness. However, majority of them require special training protocols,involving additional time and resources. These include data augmentation using Gaussiannoise (Gilmer et al., 2019), optimized mixtures of data augmentations in conjunction with aconsistency loss (Hendrycks et al., 2020b), training on stylized images (Geirhos et al., 2018;Michaelis et al., 2019) or using adversarially generated noises (Rusak et al., 2020). Otherapproaches tweak the architecture, e.g., by adding shift-equivariance with an anti-aliasingmodule (Zhang, 2019) or assembling diﬀerent training techniques (Lee et al., 2020).Recently, test time adaptation techniques using self-supervision (Sun et al., 2020) as wellas adaptive batch-normalization (Schneider et al., 2020; Nado et al., 2020) have been proposedin these context for standard DNN classiﬁers. However, none of these methods provide bothadversarial robustness and corruption robustness for the same classiﬁer. Notably, recentstudies have also investigated DNN classiﬁers by reducing number of BN layers (Gallowayet al., 2019) and applying additional auxiliary batch norm (Xie et al., 2020) to empiricallyimprove adversarial robustness. In contrast, we explore the test-time adaptation of BNparameters for adversarially trained models to improve both adversarial and corruptionrobustness.

3. Adaptive Batch-Normalization

Batch-Normalization is a powerful tool to improve the training stability and attain fasterconvergence (Ioﬀe and Szegedy, 2015). It normalizes the hidden feature activations toreduce the covariate shifts between the hidden layers. Batch normalization is inspired fromthe well-known concept that normalizing or whitening the input speeds up the training ofDNN models (Wiesler and Ney, 2011; Huang et al., 2018). Batch normalization for DNNmodels extends this concept by normalizing the internal/hidden layers (Huang et al., 2018).Typically, modern DNN models incorporate BN after each convolutional or dense layers forcomplex machine learning tasks including image classiﬁcation (He et al., 2016a; Szegedyet al., 2015).BN layer estimates the mean and variance of the hidden activation maps across thechannels. The feature activations are then normalized by subtracting the mean and dividingby the variance to normalize into N (0 , N (0 , x ∈ X are inputs and y ∈ Y are the class labels. We denote thetraining distribution as P T : X × Y → R + and test distribution as P t : X × Y → R + . Then, here exists covariate shift between training and test distribution if: P T ( y | x )= P t ( y | x ) and P T ( x ) (cid:54) = P t ( x ). Further, if the covariate shift only leads to change in the ﬁrst and secondorder moments of the feature activations f h ( x ), we can remove it by re-estimating thesestatistics using the test batches, followed by normalization (Schneider et al., 2020): P T (cid:32) f h ( x ) − E T [ f h ( x )] (cid:112) V T [ f h ( x )] (cid:33) P T ( x ) ≈ P t (cid:32) f h ( x ) − E t [ f h ( x )] (cid:112) V t [ f h ( x )] (cid:33) P t ( x ) (4)In other words, we can remove the covariate shift by correcting ﬁrst and second ordermoments as long as the change in the test distribution only leads the feature activations toscale and translate. In the context of domain adaptation (Li et al., 2016; Saha et al., 2018)and corruption robustness (Nado et al., 2020; Schneider et al., 2020), adaptive BN to correctthe BN statistics are found to be eﬀective as long as the semantics in the test images doesnot change.Adaptive BN adapts the batch normalization statistics using the unlabelled test batchesin unsupervised fashion. Given a set of test examples, we compute the BN statistics usingthe feature activations of the test batch, denoted as µ t , s t and adapt them with the existingstatistics, µ T , s T that are already obtained using the training batches as: µ = ρ · µ t + (1 − ρ ) · µ T s = ρ · s t + (1 − ρ ) · s T (5)where, ρ ∈ [0 ,

1] represents the momentum.The Choice of ρ = 0 rejects the statistics of the test examples. This is equivalent to thestandard inference setup with deterministic DNN classiﬁer in the IID settings. In contrast, ρ = 1 completely ignores the pre-calculated statistics from the training batches. Clearly, aswe receive a larger test-batch, we can get a better estimate of the test distribution. Hence,we should choose a larger value for ρ . However, the choice of ρ is often decided by variouspractical constraints, e.g., available hardware resources. In contrast, if the test-batch size istoo small, the estimated statistics would be unreliable. The appropriate value for ρ can bedetermined empirically for diﬀerent applications and depending on the available test batchsize.It is noteworthy that the mechanism of adaptively modulating batch normalizationstatistics during inference does not incur any overhead during training. In other words, thetraining process remains completely unaltered and any model that has BN layers can applyadaptive BN during inference. The updation of the statistics during inference is performedin the forward pass. It does not involve gradient computation (i.e., backward pass). Batchnormalization layers also have two additional trainable parameters that require gradientcomputation to update. However, adaptive BN does not require to update those parameters.Thus adaptive BN provides a straightforward mechanism to mitigate the covariate shiftduring inference. A number of extensions of adaptive batch normalization have been proposedin domain adaptation literature, which use additional training and resources (Cariucci et al.,2017; Roy et al., 2019). Unlike those works, we use adaptive BN technique without changingthe DNN architecture or altering the training procedure to achieve both adversarial andcorruption robustness. dversarially Robust Classifier with Covariate Shift Adaptation (a) ImageNet (b)

CIFAR-10

Model σ = 0 σ = 0 . σ = 0 . σ = 0 .

75 Model σ = 0 σ = 0 . σ = 0 . σ = 0 . ± . ± . ± . ± . Baseline 95.2 ± . ± . ± . ± . + adaptive BN 74.4 ± . ± . ± . ± . + adaptive BN 95.0 ± . ± . ± . ± . Adv ∞ ± . ± . ± . ± . Adv ∞ ± . ± . ± . ± . + adaptive BN 60.8 ± . ± . ± . ± . + adaptive BN 81.6 ± . ± . ± . ± . Adv ± . ± . ± . ± . Adv ± . ± . ± . ± . + adaptive BN 58.3 ± . ± . ± . ± . + adaptive BN 81.8 ± . ± . ± . ± . Rand σ =0 . ± . ± . ± . ± . Rand σ =0 . ± . ± . ± . ± . + adaptive BN 62.7 ± . ± . ± . ± . + adaptive BN 74.0 ± . ± . ± . ± . Table 1: Top-1 accuracy of diﬀerent classiﬁers with and without adaptive BN techniqueunder diﬀerent level of Gaussian noises, augmented to the test images. We randomly shuﬄethe test images and sample the noises to report ( mean ± × sd ) of 5 diﬀerent runs. . (a) σ = 0 (Clean) (b) σ = 0 .

25 (c) σ = 0 . σ = 0 . Figure 1:

Visualization of loss-gradient images produced by adversarially trained classiﬁers as weapply diﬀerent levels of Gaussian noises in the test images. We present the Gaussian augmented testimages at the top row, followed by the loss-gradients corresponding to Adv ∞ , Adv ∞ + adaptive BN,Adv and Adv + adaptive BN in 2 nd , 3 rd , 4 th and 5 th rows respectively. See Appendix E.2 for moreresults.

4. Experiments

In this section, we demonstrate the eﬀectiveness of adaptive BN technique for adversariallytrained models using four sets of experiments.

Experimental Setup.

We use benchmark

CIFAR-10 (Krizhevsky et al., 2009) and

ImageNet (Deng et al., 2009) datasets. We use pre-activation ResNet18 model for

CIFAR-10 , and ResNet50 model for

ImageNet (He et al., 2016a,b). For

CIFAR-10 , we learntwo adversarially trained models Adv ∞ and Adv , for (cid:96) ∞ and (cid:96) threat models with threat UDGLXV F H U W L I L HG D FF X U D F\ %DVHOLQHDW =0.25 $GY DW =0.25 $GY DGDSWLYH%1DW =0.5 $GY DW =0.25 $GY DGDSWLYH%1DW =0.5 5DQG =0.5 &RKHQDWDO UDGLXV F H U W L I L HG D FF X U D F\ %DVHOLQHDW =0.25 $GY DW =0.25 $GY DGDSWLYH%1DW =0.5 $GY DW =0.25 $GY DGDSWLYH%1DW =0.5 5DQG =0.5 &RKHQDWDO Figure 2: Certiﬁed accuracy attained by diﬀerent classiﬁers for

ImageNet and

CIFAR-10 with 99 .

9% probability.boundary of 8 /

255 and 1, respectively. For

ImageNet , the adversarially trained modelsAdv ∞ and Adv are learned for (cid:96) ∞ and (cid:96) threat models with threat boundary of 4 /

255 and3 respectively. . Unless otherwise speciﬁed, our adversarially trained models are obtainedusing early stopping criteria (Rice et al., 2020). We also use Baseline and Rand σ =0 . for ourcomparisons. These models are respectively trained using clean images and by augmentingrandom noise, sampled from isotropic Gaussian distribution N (0 , σ I ) with σ = 0 . Our ﬁrst objective is to demonstrate the improvement of robustness of Adv ∞ and Adv models at diﬀerent levels of Gaussian noises as we apply adaptive BN technique duringinference. In Table 1, we present the performance of diﬀerent models. We observe that whenthe test images are sampled from IID settings (i.e., σ = 0 for Baseline, Adv ∞ , and Adv and σ = 0 . σ =0 . ), performance of the models remain similar regardless of whetherBN adaptation is used. However, as we move away from the IID settings by increasing (ordecreasing) the distribution of Gaussian noise using σ , the performance of these modelssigniﬁcantly degrade in the standard inference setup. In contrast, by modulating the BNstatistics using adaptive BN technique, we can improve the performance for all classiﬁcationmodels.In particular, we observe that adversarially trained models achieve signiﬁcantly higherimprovements by applying adaptive BN. For example, at σ = 0 .

5, these models respectivelyachieve 0 . . .

9% top-1 accuracy for

ImageNet dataset (Table 1). By usingadaptive BN technique, Adv and Adv ∞ improves the top-1 accuracy to 44 . . σ = 0 . .

4% and 58 .

9% to only 15 .

9% and 11 .

0% respectively. In contrast, after correctingthe BN statistics, the Baseline achieves 7 .

7% top-1 accuracy, only slightly reducing the

1. For

ImageNet , we obtain Adv ∞ and Adv from https://github.com/locuslab/robust overﬁtting andBaseline and Rand σ =0 . models from https://github.com/locuslab/smoothing. dversarially Robust Classifier with Covariate Shift Adaptation (a) ImageNet (cid:96) Radius 0.25 0.5 0.75 1.0 1.25 1.5 1.75Baseline (at σ = 0 .

25) 7.8 4.8 3.0 0.0 0.0 0.0 0.0Adv ∞ (at σ = 0 .

25) 4.0 4.0 3.8 0.0 0.0 0.0 0.0+ adaptive BN (at σ = 0 .

5) 43.6 39.4 35.8 31.4 27.6 23.4 18.2Adv (at σ = 0 .

25) 10.8 10.4 10.4 0.0 0.0 0.0 0.0+ adaptive BN (at σ = 0 .

5) 47.0 43.0 39.0 36.4 32.8 30.8 27.0Rand σ =0 . (Cohen et al., 2019) 60.8 54.4 47.8 39.0 34.2 29.0 23.8(b) CIFAR-10 (cid:96) Radius 0.25 0.5 0.75 1.0 1.25 1.5 1.75Baseline (at σ = 0 .

25) 6.96 2.04 0.09 0.0 0.0 0.0 0.0Adv ∞ (at σ = 0 .

25) 35.95 29.44 23.51 0.0 0.0 0.0 0.0+ adaptive BN (at σ = 0 .

5) 53.65 42.91 32.58 22.68 14.24 7.88 2.94Adv (at σ = 0 .

25) 41.89 34.15 26.7 0.0 0.0 0.0 0.0+ adaptive BN (at σ = 0 .

5) 56.45 46.24 35.6 26.89 18.73 11.37 5.41Rand σ =0 . (Cohen et al., 2019) 51.68 40.38 30.25 20.81 13.36 7.71 3.38 Table 2: Certiﬁed top-1 accuracy at various (cid:96) radii for Imagenet using ResNet-50 (Top)and

CIFAR-10 using preactivation ResNet-18 (Bottom).performance gaps from 74 .

9% to 66 . CIFAR-10 dataset.To investigate the eﬀect of adaptive BN technique for adversarially trained models, wevisualize the loss gradients for individual pixels of an image as we increase the Gaussian noise,i.e., σ (see Figure 1). Loss-gradients reﬂect the most important input pixels that stronglyaﬀect the prediction of a classiﬁer. We scale, translate and clip the loss-gradient valuesfor visualization, without using any complex processing techniques. For clean test images(i.e., at σ = 0), gradients for adversarially trained models align properly with perceptuallyrelevant features, irrespective of whether BN adaptation is used (Tsipras et al., 2019; Etmannet al., 2019). The eﬀect of adaptive BN becomes prominent as σ is increased. We can seethat the overall loss gradient tends to become more noisy as we increase the noise level to σ =0 . σ =0 .

75 However, the models without BN adaptation leads to produce sharperloss gradients (i.e., providing higher importance) even for back-ground pixels. In contrast,models with BN adaptation produce sharper loss gradients for the pixels from the objectof interest while suppressing the back-ground pixels. Hence, classiﬁer is made availablewith required semantic information for classiﬁcation. It is interesting to note that Adv produces signiﬁcantly more human-aligned loss gradients compared to Adv ∞ at σ =0 . ∞ achieves33 .

7% top-1 accuracy at σ =0 .

75 while Adv achieves 39 .

8% (see Table 1).

A classiﬁer, f that achieves robustness against standard Gaussian noise, can be transformedinto a smoothed classiﬁer, g (Equation 1) to provide certiﬁed robustness against (cid:96) -norm(Equation 3). Since modulating BN statistics using adaptive BN technique signiﬁcantlyimproves robustness against Gaussian noise for both Adv and Adv ∞ , we can transform oth of these models into a smoothed classiﬁer to provide robustness certiﬁcation for (cid:96) norm.For this experiment, we ﬁrst modulate the BN statistics of the classiﬁers using adaptiveBN technique by feeding the Gaussian noise augmented test examples. The noises aresampled from the same isotropic Gaussian distribution N (0 , σ I ) as used for certiﬁcation.We then freeze the parameters for the whole certiﬁcation process. In other words, the baseclassiﬁer f remains the same to calculate the certiﬁcation radius R (Equation 3).The certiﬁed accuracy at radius R is deﬁned as the fraction of test examples that g classiﬁes correctly without abstaining, and certiﬁes as robust for atleast an (cid:96) radius of R .Similar to Cohen et al. (2019), we estimate the class label probabilities of the smoothedclassiﬁer g using Monte-Carlo sampling with 100 ,

000 noisy samples for each test images.We certify the test images with 99 .

9% probability i.e., there is at most a 0 .

1% chance thatthe certiﬁcation technique falsely certiﬁes a non-robust test example.We use the full

CIFAR-10 test set and a subsampled

ImageNet test set of 500 samples,as in (Cohen et al., 2019). We present the results of certiﬁed accuracy in Figure 2 andTable 2. Adaptive BN allows the ﬂexibility over the choice of σ to set the noise level for thecertiﬁcation process. Here, we use σ = 0 . ∞ and Adv models with BN adaptationfor both datasets. We use σ = 0 .

25 for the Baseline, Adv ∞ and Adv models with no BNadaption that are not robust against Gaussian noises (see Table 1). In Appendix C.1, wefurther show that, unlike existing works, BN adaptation technique allows the ﬂexibilityof choosing an appropriate σ to certify diﬀerent test examples during inference for theadversarially trained models. For our comparison, we also provide the result for standardrandomized smoothing classiﬁers using Rand σ =0 . as proposed in Cohen et al. (2019). Wecan see that both Adv ∞ and Adv models can be transformed using adaptive BN techniqueto provide certiﬁed robustness for (cid:96) norm for both datasets. Adv models consistentlyachieve better performance compared to Adv ∞ in terms of certiﬁed accuracy. It is interestingto note that both Adv ∞ and Adv models have outperformed Rand σ =0 . for CIFAR-10 .For

ImageNet , Adv achieves better certiﬁed accuracy compared to Rand σ =0 . beyond (cid:96) -radii of 1 .

5. In Appendix C.2, we also note that stronger adversarially trained model,learned using adversrial examples with higher threat boundaries, produces better certiﬁedaccuracy at diﬀerent (cid:96) radii.Since the (cid:96) ball of radius √ d contains the (cid:96) ∞ unit ball in R d , a robust model against (cid:96) perturbation of radius R is also robust against (cid:96) ∞ perturbation of norm R √ d (Salman et al.,2019a). Using this naive conversion, we can also certify Adv ∞ and Adv models with BNadaptation for (cid:96) ∞ perturbations. For CIFAR-10 at (cid:96) ∞ radius of 2 /

255 (i.e., (cid:96) certiﬁedaccuracy at 2 √ × / ≈ . ∞ and Adv models can provide certiﬁedaccuracy of 45 .

8% and 49 .

0, compared to 43 .

0% for Rand σ =0 . . Similarly, for ImageNet at (cid:96) ∞ radius of 1 /

255 (i.e (cid:96) certiﬁed accuracy at 2 √ × / ≈ . ∞ and Adv models can provide certiﬁed accuracy of 23 .

4% and 30 .

8, compared to 29 .

0% for Rand σ =0 . .So far we only focus on (cid:96) certiﬁcation for adversarially trained models using Gaussianaugmented noise. However, we can also adapt these models to any random noise distributionto provide non-trivial certiﬁcations for other (cid:96) p norms (e.g., uniform noise for (cid:96) threatmodels (Yang et al., 2020)), without any additional training (unlike (Salman et al., 2020)).As in Section 4.4, we demonstrate that adversarially trained models with adaptive BN dversarially Robust Classifier with Covariate Shift Adaptation Robust Test AccuracyDatasets Model ThreatBoundary Withoutadaptive BN Withadaptive BN Diﬀ

ImageNet

Adv ∞ || δ || ∞ ≤ /

255 34.2 41.1 ± . +6.9Adv || δ || ≤ . ± . +1.6 CIFAR-10

Adv ∞ || δ || ∞ ≤ /

255 51.9 52.2 ± . +0.3Adv || δ || ≤ . ± . +0.4 Table 3:

Performance under adversarial attacks. For models with BN adaptation, we randomlyshuﬄe the test images to report ( mean + 2 × sd ) of 3 diﬀerent runs. technique can signiﬁcantly mitigate the generalization gap between clean and any randomcorruption images. Adversarially trained models provide the best empirical defense against adversarial attacks.Hence, we do not need to transform them to smoothed classiﬁers for practical use. In thissection, we empirically evaluate the performance of adversarially trained models against thereal-world adversarial attacks as we modulate the BN statistics for each batch of adversarialexamples using adaptive BN technique.For a test image x with label y , the goal of an adversarial attack is to ﬁnd perturbation δ within a chosen perturbation (cid:15) for an (cid:96) p norm such that: max δ ∈ ∆ L ( f θ ( x + δ ) , y ) where ∆ = { δ : || δ || p ≤ (cid:15) } (6) We use projected gradient descent (PGD) attack (Kurakin et al., 2017; Madry et al.,2018) to evaluate Adv ∞ and Adv models in the standard inference setup (i.e., withoutusing adaptive BN). Starting with a random initial perturbation δ (0) , PGD attack iterativelyadjusts the perturbation following the gradient steps with step-size η and projects back ontothe (cid:96) p threat model: (cid:101) δ = δ ( t ) + η · Q p ( ∇ δ L ( f θ ( x + δ ( t ) ) , y )) δ ( t +1) = max(min( (cid:101) δ, (cid:15) ) , − (cid:15) ) (7)For (cid:96) ∞ , we use the sign of the gradient i.e., Q p ( z ) = sign( z ). For other norms such as (cid:96) ,we normalize the gradient as: Q p ( z ) = z/ || z || .In contrast to the standard inference setup, adaptive BN during inference consistentlychange the BN parameters, accommodating the inputs at each forward pass. Hence, we cannoteﬀectively compute the gradients, ∇ δ L ( f θ ( x + δ ) , y ), to produce the most eﬀective adversarialexamples. Here, we instead apply expectation over transformation (EoT) technique toapproximate the gradient (Athalye et al., 2017). For cross-entropy loss, we can write the lossas: ∇ δ L ( f θ ( x + δ ) , y ) = ∇ δ ( − log( f θ ( x + δ ) y )). To apply EoT, we select a set of m modelswith varying BN statistics and approximate the gradient as: ∇ δ [ − log E θ ∈ Θ f θ ( x + δ ) y ] ≈ ∇ δ [ − log 1 m m (cid:88) i =1 f θ ( i ) ( x + δ ) y ] (8) here, { f θ ( i ) } mi =1 are the classiﬁers with diﬀerent BN parameters.Each f θ ( i ) are obtained by applying adaptive BN technique on the original classiﬁer f θ using test batch, perturbed with randomly chosen δ ∈ ∆ from the threat model (as a proxyof the adversarial perturbations). Notably, We place log outside of the expectation as it wasfound to be more eﬀective (Salman et al., 2019a).We use the same threat boundaries for attacks as used during the adversarial training(Equation 2). The adversarial examples are generated using 100 iterations. We set m = 10to evaluate for the models using adaptive BN technique, as in (Athalye et al., 2018). We donot observe a signiﬁcant diﬀerence by choosing larger m (See Appendix D.1). Note that,generating adversarial examples for models with adaptive BN technique is signiﬁcantly moreexpensive in terms of computation. Hence, we use a subsampled test examples of 2000images for ImageNet (similar to (Athalye et al., 2018)) while use the full

CIFAR-10 testset.Table 3 compares the performance of the adversarially trained models as we apply adaptiveBN technique. For

CIFAR-10 models, adaptive BN technique does not signiﬁcantly impactthe performance. However, for

ImageNet models, we observe a consistent improvement. Inother words, adaptive BN often improves performance against adversarial attacks withoutany detrimental eﬀect. In Appendix D.2, we also note that adaptive BN tends to provide ahigher improvement in the robust accuracy when the model is obtained without using theearly stopping criteria.

Hendrycks and Dietterich (2019) recently studied robustness of classiﬁers to multiple com-mon (i.e., non-adversarial) visual corruptions. The authors introduced

ImageNet-C and

CIFAR10-C datasets by algorithmically generated corruptions from noise, blur, weather,or digital categories with 5 diﬀerent severity levels for each corruptions. While adversar-ially trained models are eﬀective against adversarial attacks, they often perform poorlyagainst such common corruptions, even compared to the standard neural networks in thestandard inference setup (Hendrycks et al., 2020a; Yin et al., 2019; Gilmer et al., 2019).Here, we demonstrate that by applying adaptive BN technique, we can greatly improve theperformance of adversarially trained models.The corruption robustness of a classiﬁcation model, f is typically measured using the meancorruption error ( mCE ) (Hendrycks and Dietterich, 2019). It is computed by normalizingthe model’s top-1 errors with the top-1 errors of AlexNet model (Krizhevsky et al., 2012)across the set of corruptions, K and corruption severities, I , as follows: mCE ( f ) = 1 K K (cid:88) k =1 (cid:80) Ii =1 err fk,i (cid:80) Ii =1 err A lexk,i (9)where, err fk,i and err A lexk,i denotes the top-1 error under corruption type k at severity i forthe classiﬁcation model, f and the AlexNet model respectively.However, mCE does not reﬂect the generalization gap between clean test images andthe corruption images. For example, a classiﬁer may withstand the corruption images,minimizing the generalization gaps from clean test images. However, it may produce ahigher mCE compared to a classiﬁer with a low clean error rate while the error rate spike dversarially Robust Classifier with Covariate Shift Adaptation (a) ImageNet-C mCE relative-mCEModel w/oadaptive BN ( ↓ ) withadaptive BN ( ↓ ) w/oadaptive BN ( ↓ ) withadaptive BN ( ↓ )Baseline 77.0 ± . ± . ± . ± . Adv ∞ ± . ± . ± . ± . Adv ± . ± . ± . ± . (b) CIFAR-10-C mCE relative-mCEModel w/oadaptive BN ( ↓ ) withadaptive BN ( ↓ ) w/oadaptive BN ( ↓ ) withadaptive BN ( ↓ )Baseline 72.8 ± . ± . ± . ± . Adv ∞ ± . ± . ± . ± . Adv ± . ± . ± . ± . Table 4: Comparison of mCE and relative-mCE ( rmCE ) scores for diﬀerent classiﬁers withand without using adaptive BN technique. For models with BN adaptation, we randomlyshuﬄe the test images to report ( mean + 2 × sd ) of 3 diﬀerent runs. For each column,down-arrow ( ↓ ) denotes the lower values are better and up-arrow ( ↑ ) denotes the highervalues are better. (a) ImageNet-C

Top-1 AccuracyModel w/o adaptive BN with adaptive BNClean ( ↑ ) Corrupt ( ↑ ) Gaps ( ↓ ) Clean ( ↑ ) Corrupt ( ↑ ) Gaps ( ↓ )Baseline 75.2 ± . ± . ± . ± . ∞ ± . ± . ± . ± . ± . ± . ± . ± . CIFAR-10-C

Top-1 AccuracyModel w/o adaptive BN with adaptive BNClean ( ↑ ) Corrupt ( ↑ ) Gaps ( ↓ ) Clean ( ↑ ) Corrupt ( ↑ ) Gaps ( ↓ )Baseline 95.2 ± . ± . ± . ± . ∞ ± . ± . ± . ± . ± . ± . ± . ± . Table 5: Comparing the performance gap in terms of top-1 Accuracy for clean test examplesversus average top-1 Accuracy for corrupted images as we apply adaptive BN technique fordiﬀerent classiﬁers. For models with BN adaptation, we randomly shuﬄe the test images toreport ( mean + 2 × sd ) of 3 diﬀerent runs. For each column, down-arrow ( ↓ ) denotes thelower values are better and up-arrow ( ↑ ) denotes the higher values are better. n the presence of corruptions. Hendrycks and Dietterich (2019) proposed another metric,called relative mean corruption error ( rmCE ) to address this. It computes a normalizedrelative change of top-1 error rate of the classiﬁer, f from the clean error rate across the setof corruptions, K and the corruption severity levels, I , as follows: rmCE ( f ) = 1 K K (cid:88) k =1 (cid:80) Ii =1 err fk,i − err f C lean (cid:80) Ii =1 err A lexk,i − err A lex C lean (10)where, err f C lean and err A lex C lean denote the clean error rate for f and the AlexNet modelrespectively.Since, adversarially trained models already produce high error rate for the clean testimages to achieve adversarial robustness, we argue that rmCE is a more appropriate metricfor comparison.In Table 4 presents the results of diﬀerent classiﬁers against the corruption images.We note that adversarially trained models already achieve lower rmCE scores. Further,by applying adaptive BN technique, these models consistently produce signiﬁcantly lower rmCE scores for both datasets, leading to a better classiﬁcation generalization.In Table 5, We additionally report the mean top-1 accuracy for clean and corruptedtest images and the performance gaps, along with mCE and rmCE scores. We can seethat adaptive BN technique signiﬁcantly reduces the gap in top-1 accuracy between cleanand corrupted test images for both datasets for adversarially trained models. Appendix E.1presents the performance under individual corruptions. Appendix E.2 also visualizes theeﬀect of adaptive BN technique for adversarially trained models on the loss gradients underdiﬀerent corruptions and analyzes from a Fourier perspective.

5. Conclusion

In this paper, we successfully demonstrate that adaptive BN technique signiﬁcantly improvesrobustness of adversarially trained models against Gaussian noise. This simple ﬁndingpushes forward the paradigm of these models by enabling us to turn them into a randomizedsmoothing classiﬁer to provide certiﬁed robustness for (cid:96) norm. Further, adaptive BNtechnique often improves robustness against adversarial attacks and signiﬁcantly improvesthe robustness against common corruptions for adversarially trained models. These powerfulrevelations push the boundary of their practical usability. References

Anish Athalye, Logan Engstrom, Andrew Ilyas, and Kevin Kwok. Synthesizing robustadversarial examples. arXiv , 2017.Anish Athalye, Nicholas Carlini, and David Wagner. Obfuscated gradients give a false senseof security: Circumventing defenses to adversarial examples. In

ICML , 2018.Xiaoyu Cao and Neil Zhenqiang Gong. Mitigating evasion attacks to deep neural networksvia region-based classiﬁcation. In

ACSAC , 2017. dversarially Robust Classifier with Covariate Shift Adaptation Fabio Maria Cariucci, Lorenzo Porzi, Barbara Caputo, Elisa Ricci, and Samuel Rota Bulo.Autodial: Automatic domain alignment layers. In

ICCV , 2017.Yair Carmon, Aditi Raghunathan, Ludwig Schmidt, Percy Liang, and John C Duchi.Unlabeled data improves adversarial robustness. arXiv , 2019.Jeremy M Cohen, Elan Rosenfeld, and J Zico Kolter. Certiﬁed adversarial robustness viarandomized smoothing.

ICML , 2019.Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: Alarge-scale hierarchical image database. In

CVPR , 2009.Logan Engstrom, Andrew Ilyas, Hadi Salman, Shibani Santurkar, and Dimitris Tsipras.Robustness (python library), 2019. URL https://github.com/MadryLab/robustness .Christian Etmann, Sebastian Lunz, Peter Maass, and Carola-Bibiane Sch¨onlieb. On theconnection between adversarial robustness and saliency map interpretability. In

ICML ,2019.Angus Galloway, Anna Golubeva, Thomas Tanay, Medhat Moussa, and Graham W Taylor.Batch normalization is a cause of adversarial vulnerability. arXiv , 2019.Robert Geirhos, Patricia Rubisch, Claudio Michaelis, Matthias Bethge, Felix A Wichmann,and Wieland Brendel. Imagenet-trained cnns are biased towards texture; increasing shapebias improves accuracy and robustness. In

ICLR , 2018.Justin Gilmer, Nicolas Ford, Nicholas Carlini, and Ekin Cubuk. Adversarial examples are anatural consequence of test error in noise. In

ICML , 2019.Ian Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessingadversarial examples. In

ICLR , 2015.Ian Goodfellow, Yoshua Bengio, and Aaron Courville.

Deep Learning . MIT Press, 2016.Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for imagerecognition. In

IEEE CVPR , 2016a.Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Identity mappings in deepresidual networks. In

ECCV , 2016b.Dan Hendrycks and Thomas Dietterich. Benchmarking neural network robustness to commoncorruptions and perturbations. In

ICLR , 2019.Dan Hendrycks, Mantas Mazeika, and Thomas Dietterich. Deep anomaly detection withoutlier exposure. In

ICLR , 2019.Dan Hendrycks, Steven Basart, Norman Mu, Saurav Kadavath, Frank Wang, Evan Dorundo,Rahul Desai, Tyler Zhu, Samyak Parajuli, Mike Guo, Dawn Song, Jacob Steinhardt, andJustin Gilmer. The many faces of robustness: A critical analysis of out-of-distributiongeneralization. arXiv , 2020a. an Hendrycks, Norman Mu, Ekin D Cubuk, Barret Zoph, Justin Gilmer, and BalajiLakshminarayanan. Augmix: A simple data processing method to improve robustnessand uncertainty. ICLR , 2020b.Lei Huang, Dawei Yang, Bo Lang, and Jia Deng. Decorrelated batch normalization. In

CVPR , 2018.Sergey Ioﬀe and Christian Szegedy. Batch normalization: Accelerating deep network trainingby reducing internal covariate shift. In

ICML , 2015.Ajil Jalal, Andrew Ilyas, Constantinos Daskalakis, and Alexandros G Dimakis. The robustmanifold defense: Adversarial training using generative models. arXiv , 2019.Alex Krizhevsky, Geoﬀrey Hinton, et al. Learning multiple layers of features from tinyimages. 2009.Alex Krizhevsky, Ilya Sutskever, and Geoﬀrey E Hinton. Imagenet classiﬁcation with deepconvolutional neural networks.

NeurIPS , 2012.Alexey Kurakin, Ian Goodfellow, and Samy Bengio. Adversarial machine learning at scale. arXiv preprint arXiv:1611.01236 , 2016.Alexey Kurakin, Ian J. Goodfellow, and Samy Bengio. Adversarial examples in the physicalworld. In

Workshop track, ICLR , 2017.Mathias Lecuyer, Vaggelis Atlidakis, Roxana Geambasu, Daniel Hsu, and Suman Jana.Certiﬁed robustness to adversarial examples with diﬀerential privacy. In

IEEE SP , 2019.Jungkyu Lee, Taeryun Won, Tae Kwan Lee, Hyemin Lee, Geonmo Gu, and Kiho Hong.Compounding the performance improvements of assembled techniques in a convolutionalneural network. arXiv , 2020.Bai Li, Changyou Chen, Wenlin Wang, and Lawrence Carin. Certiﬁed adversarial robustnesswith additive noise.

NeurIPS , 2019.Yanghao Li, Naiyan Wang, Jianping Shi, Jiaying Liu, and Xiaodi Hou. Revisiting batchnormalization for practical domain adaptation. arXiv , 2016.Xuanqing Liu, Minhao Cheng, Huan Zhang, and Cho-Jui Hsieh. Towards robust neuralnetworks via random self-ensemble. In

ECCV , 2018.Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and AdrianVladu. Towards deep learning models resistant to adversarial attacks. In

ICLR , 2018.Claudio Michaelis, Benjamin Mitzkus, Robert Geirhos, Evgenia Rusak, Oliver Bringmann,Alexander S Ecker, Matthias Bethge, and Wieland Brendel. Benchmarking robustness inobject detection: Autonomous driving when winter is coming. arXiv , 2019.Seyed Mohsen Moosavi Dezfooli, Alhussein Fawzi, Jonathan Uesato, and Pascal Frossard.Robustness via curvature regularization, and vice versa. In

CVPR , 2019. dversarially Robust Classifier with Covariate Shift Adaptation Zachary Nado, Shreyas Padhy, D Sculley, Alexander D’Amour, Balaji Lakshminarayanan,and Jasper Snoek. Evaluating prediction-time batch normalization for robustness undercovariate shift. arXiv , 2020.Jay Nandy, Wynne Hsu, and Mong-Li Lee. Towards maximizing the representation gapbetween in-domain & out-of-distribution examples. In

NeurIPS , 2020a.Jay Nandy, Wynne Hsu, and Mong-Li Lee. Approximate manifold defense against multipleadversarial perturbations. In

International Joint Conference on Neural Networks (IJCNN) ,2020b.Chongli Qin et al. Adversarial robustness through local linearization. In

NeurIPS , 2019.Aditi Raghunathan, Jacob Steinhardt, and Percy Liang. Certiﬁed defenses against adversarialexamples.

ICLR , 2018.Leslie Rice, Eric Wong, and Zico Kolter. Overﬁtting in adversarially robust deep learning.In

ICML , 2020.Subhankar Roy, Aliaksandr Siarohin, Enver Sangineto, Samuel Rota Bulo, Nicu Sebe, andElisa Ricci. Unsupervised domain adaptation using feature-whitening and consensus loss.In

CVPR , 2019.Evgenia Rusak, Lukas Schott, Roland Zimmermann, Julian Bitterwolf, Oliver Bringmann,Matthias Bethge, and Wieland Brendel. Increasing the robustness of dnns against imagecorruptions by playing the game of noise. arXiv , 2020.Sudipan Saha, Francesca Bovolo, and Lorenzo Bruzzone. Destroyed-buildings detection fromvhr sar images using deep features. In

Image and Signal Processing for Remote Sensing ,2018.Sudipan Saha, Francesca Bovolo, and Lorenzo Bruzzone. Unsupervised deep change vectoranalysis for multiple-change detection in vhr images.

IEEE Transactions on Geoscienceand Remote Sensing , 2019a.Sudipan Saha, Francesca Bovolo, and Lorenzo Bruzzone. Unsupervised multiple-change detec-tion in vhr multisensor images via deep-learning based adaptation. In

IEEE InternationalGeoscience and Remote Sensing Symposium , 2019b.Hadi Salman, Jerry Li, Ilya Razenshteyn, Pengchuan Zhang, Huan Zhang, Sebastien Bubeck,and Greg Yang. Provably robust deep learning via adversarially trained smoothed classiﬁers.In

NeurIPS , 2019a.Hadi Salman, Greg Yang, Huan Zhang, Cho-Jui Hsieh, and Pengchuan Zhang. A convexrelaxation barrier to tight robustness veriﬁcation of neural networks. In

NeurIPS , 2019b.Hadi Salman, Mingjie Sun, Greg Yang, Ashish Kapoor, and J Zico Kolter. Denoisedsmoothing: A provable defense for pretrained classiﬁers.

NeurIPS , 2020. teﬀen Schneider, Evgenia Rusak, Luisa Eck, Oliver Bringmann, Wieland Brendel, andMatthias Bethge. Improving robustness against common corruptions by covariate shiftadaptation. NeurIPS , 2020.Lukas Schott, Jonas Rauber, Matthias Bethge, and Wieland Brendel. Towards the ﬁrstadversarially robust neural network model on MNIST. In

ICLR , 2019.Gagandeep Singh, Timon Gehr, Matthew Mirman, Markus P¨uschel, and Martin Vechev.Fast and eﬀective robustness certiﬁcation. In

NeurIPS , pages 10802–10813, 2018.David Stutz, Matthias Hein, and Bernt Schiele. Disentangling adversarial robustness andgeneralization. In

CVPR , 2019.Yu Sun, Xiaolong Wang, Zhuang Liu, John Miller, Alexei Efros, and Moritz Hardt. Test-timetraining with self-supervision for generalization under distribution shifts. In

ICML , 2020.Christian Szegedy, Vincent Vanhoucke, Sergey Ioﬀe, Jonathon Shlens, and Zbigniew Wojna.Rethinking the inception architecture for computer vision.

CVPR , 2015.Christian Szegedy et al. Intriguing properties of neural networks. In

ICLR , 2014.Florian Tram`er and Dan Boneh. Adversarial training and robustness for multiple perturba-tions. In

NeurIPS , 2019.Dimitris Tsipras, Shibani Santurkar, Logan Engstrom, Alexander Turner, and AleksanderMadry. Robustness may be at odds with accuracy. In

ICLR , 2019.Jonathan Uesato, Brendan O’Donoghue, Aaron van den Oord, and Pushmeet Kohli. Adver-sarial risk and the dangers of evaluating against weak attacks.

ICML , 2018.Jonathan Uesato, Jean-Baptiste Alayrac, Po-Sen Huang, Robert Stanforth, Alhussein Fawzi,and Pushmeet Kohli. Are labels required for improving adversarial robustness? arXiv ,2019.Shiqi Wang, Yizheng Chen, Ahmed Abdou, and Suman Jana. Mixtrain: Scalable training ofveriﬁably robust neural networks. arXiv , 2018a.Shiqi Wang, Kexin Pei, Justin Whitehouse, Junfeng Yang, and Suman Jana. Eﬃcient formalsafety analysis of neural networks. In

NeurIPS , pages 6367–6377, 2018b.Simon Wiesler and Hermann Ney. A convergence analysis of log-linear training.

NeurIPS ,2011.Eric Wong and Zico Kolter. Provable defenses against adversarial examples via the convexouter adversarial polytope. In

ICML , 2018.Eric Wong, Leslie Rice, and J Zico Kolter. Fast is better than free: Revisiting adversarialtraining.

ICLR , 2020.Cihang Xie, Mingxing Tan, Boqing Gong, Jiang Wang, Alan L Yuille, and Quoc V Le.Adversarial examples improve image recognition. In

IEEE CVPR , 2020. dversarially Robust Classifier with Covariate Shift Adaptation Greg Yang, Tony Duan, Edward Hu, Hadi Salman, Ilya Razenshteyn, and Jerry Li. Ran-domized smoothing of all shapes and sizes. arXiv , 2020.Dong Yin, Raphael Gontijo Lopes, Jon Shlens, Ekin Dogus Cubuk, and Justin Gilmer. Afourier perspective on model robustness in computer vision. In

NeurIPS , 2019.Hongyang Zhang et al. Theoretically principled trade-oﬀ between robustness and accuracy.In

ICML , 2019.Richard Zhang. Making convolutional networks shift-invariant again. In

ICML , 2019.

Appendix A. Appendix Organization

We organize the appendix as follows: • Section B presents the implementation details of training for our experiments and thehyper-parameter settings for adaptive BN technique during inference. • Section C presents two additional experiments for (cid:96) certiﬁcation as we vary σ , as wellas compare the performance when we use stronger adversarially trained models. • Section D provides two additional ablation studies on the adversarial attacks. • Section E present the Top-1 corruption accuracy and loss-gradients for individualcorruptions.

Appendix B. Experimental Setup

In this Section, we ﬁrst present the implementation details for training the classiﬁcationmodels. Next, we discuss the choice of test-time hyper-parameter settings for using adaptivebatch-normalization (BN) technique.

B.1 Implementation Details

We present our experiments on

CIFAR-10

Krizhevsky et al. (2009) and

ImageNet

Denget al. (2009) datasets.

CIFAR-10.

We use pre-activation ResNet18 models for our experiments on

CIFAR-10

He et al. (2016a). We train the classiﬁcation models using the SGD optimizer with a batchsize of 128. We execute a total of 200 training epochs and apply a step-wise learning ratedecay set initially at 0 . × − .We learn two adversarially trained models Adv ∞ and Adv , for (cid:96) ∞ and (cid:96) threat modelswith radius, (cid:15) = 8 /

255 and (cid:15) = 1, respectively. We use PGD attack to generate theadversarial examples for our training Madry et al. (2018). For Adv ∞ , we use 10 iterationsand (cid:96) ∞ step size of (cid:15)/ , we use 10 iterations and (cid:96) step size of (cid:15)/ . ,

000 images from the

CIFAR-10 testset for our validation. We perform adversarial attack using PGD adversary with the ame setup as training and save the best model using the early stopping criteria Rice et al.(2020).We also train Rand σ =0 . by training with augmented random noise, sampled from anisotropic Gaussian distribution N (0 , σ I ) with σ = 0 . ImageNet.

We use ResNet50 models for

ImageNet

He et al. (2016a). For Baseline andRand σ =0 . models, we obtain the weights from Cohen et al. (2019) that are trained usingGaussian augmented noises, sampled from isotropic Gaussian distribution N (0 , σ I ) with σ = 0 . σ = 0 . ∞ and Adv are learned for (cid:96) ∞ and (cid:96) threat models with threat boundary of 4 /

255 and 3, respectively. We use the modelsprovided by Rice et al. (2020) , which in turn is adopted from PGD-based adversarialtraining of the models provided by Engstrom et al. (2019) .We resize the input images to 256 ×

265 pixels and crop 224 ×

224 pixels from the middle.Note that, ImageNet-C images are provided after resizing and cropping. So, we do notchange those images. For our experiments on certiﬁcation, we use a subsample of 500 testimages by choosing at most 1 sample for each class. Also, we use a subsample of 2 ,

000 testimages, with exactly 2 samples per class, for our experiments on adversarial attack. (a)

ImageNet (b)

CIFAR-10 ρ Adv ∞ Adv ρ Adv ∞ Adv ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . ± . Table 6: Top-1 accuracy using ﬁxed test batch-size = 512 for adversarially trained modelsunder Gaussian augmented noise with σ = 0 . ρ duringinference. We randomly shuﬄe the test images to report ( mean + 2 × sd ) of 5 diﬀerent runs. B.2 Choice of hyper-parameters

During inference, adaptive BN technique is controlled using two hyper-parameters, i.e.,the test batch-size and momentum , ρ (see Equation 5) to update the statistics of thebatch-normalization layers. We assume that the images in the test batches are obtainedindependently from the same test distribution. We compute the statistics corresponding tothese images. The hyper-parameter ρ ∈ [0 ,

1] controls the tread-oﬀ between existing trainingstatistics and test statistics for updating the parameters. When a larger size of the testbatch is available, we can get a better estimation of the test distribution. Hence, we canchoose a larger value of ρ . However, the choice of test batch-size is constrained by diﬀerentfactors in practice, such as available hardware resources.

2. https://github.com/locuslab/smoothing3. https://github.com/locuslab/robust overﬁtting4. https://github.com/MadryLab/robustness dversarially Robust Classifier with Covariate Shift Adaptation (a) ImageNet (b)

CIFAR-10 ρ Adv ∞ Adv ρ Adv ∞ Adv w/o BN adapt 0.4 ± . ± . w/o BN adapt 16.1 ± . ± . ± . ± . ± . ± .

16 28.1 ± . ± .

16 60.2 ± . ± .

32 37.1 ± . ± .

32 61.5 ± . ± .

64 41.4 ± . ± .

64 62.3 ± . ± .

128 43.3 ± . ± .

128 62.7 ± . ± .

256 44.4 ± . ± .

256 62.7 ± . ± .

512 44.8 ± . ± .

512 62.4 ± . ± . Table 7: Top-1 accuracy using ﬁxed ρ = 1 for adversarially trained models under Gaussianaugmented noise with σ = 0 . mean + 2 × s.d. ) of 5 diﬀerent runs.Here, we analyze the eﬀects of these hyper-parameters for adversarially trained modelsusing adaptive BN technique during inference. For diﬀerent choices, we compare the top-1test accuracy under Gaussian augmented noise using σ = 0 .

5, to ﬁnd the appropriate ρ and the batch-size. Since, the standard DNN classiﬁers (i.e., Baseline) does not improverobustness under Gaussian noise at σ = 0 . Momentum, ρ . We ﬁrst investigate the eﬀect of momentum, ρ as we choose a largebatch size of 512. In Table 6, we present the performance of adversarially trained modelsas we choose diﬀerent values of ρ . Recall that, ρ = 1 denotes full-adaptation . Here, wecompletely ignore the existing statistics, computed during training and use the test statistics.In contrast, ρ = 0 represents no-adaptation , where we completely ignore the test statistics.This leads to the standard deterministic classiﬁcation model.As we can see in Table 6, the performance of these classiﬁers started improving even ρ = 0 . ρ to rely more on the currenttest batches. We observe that for ImageNet the performance started converging at ρ = 0 . CIFAR-10 , the convergence started even earlier at ρ = 0 . Batch Size.

Next, we investigate the minimum size of the test batches that would allowus to use ρ = 1 (i.e., full-adaptation). In Table 7, we ﬁx ρ = 1 and vary the test batchsizes as we evaluate the models. We observe that the performance of the models startedimproving even when we are using the test batches of size 8. The performance improves aswe choose larger sizes of test batches. We can see that their performance started convergingas we choose batch size of 64 for ImageNet . On the other hand, the convergence startedmuch earlier for

CIFAR-10 . Appendix C. Additional Results on Certiﬁcation

In this section, we ﬁrst demonstrate that BN adaptation technique allows the ﬂexibility ofchoosing appropriate σ during inference to certify test examples for the adversarially trained UDGLXV F H U W L I L HG D FF X U D F\ $GY DGDSWLYH%1DW =0.25 $GY DGDSWLYH%1DW =0.5 $GY DGDSWLYH%1DW =0.75 UDGLXV F H U W L I L HG D FF X U D F\ $GY DGDSWLYH%1DW =0.25 $GY DGDSWLYH%1DW =0.5 $GY DGDSWLYH%1DW =0.75 Figure 3:

ImageNet : Certiﬁed top-1 accuracy at various (cid:96) radii as we vary σ for BN adaptationand certiﬁcation for the same classiﬁcation model. Adaptive BN allows to choose diﬀerent values ofoptimum σ for certifying diﬀerent test examples. By selecting those optimum σ for each test example,we can obtain an upper envelope of these curves as our certiﬁed top-1 accuracy. ImageNet (cid:96) Radius 0.25 0.5 0.75 1.0 1.25 1.5 1.75 2.0 2.25 2.5 2.75Adv ∞ + adapt BN (at σ = 0 .

25) 50.0 46.4 41.6 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0Adv ∞ + adapt BN (at σ = 0 .

50) 43.6 39.4 35.8 31.4 27.6 23.4 18.2 0.0 0.0 0.0 0.0Adv ∞ + adapt BN (at σ = 0 .

75) 31.6 26.4 22.4 18.6 16.8 14.4 11.8 9.4 7.6 5.6 3.6Adv + adapt BN (at σ = 0 .

25) 53.2 50.2 46.8 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0Adv + adapt BN (at σ = 0 .

50) 47.0 43.0 39.0 36.4 32.8 30.8 27.0 0.0 0.0 0.0 0.0Adv + adapt BN (at σ = 0 .

75) 37.8 32.2 28.4 26.0 22.4 20.2 19.0 17.4 14.2 12.0 9.6

Table 8:

ImageNet : Certiﬁed top-1 accuracy at various (cid:96) radii as we vary σ for BN adap-tation and certiﬁcation for the same classiﬁcation model. We use ResNet50 for Imagenet . UDGLXV F H U W L I L HG D FF X U D F\ $GY DGDSWLYH%1DW =0.25 $GY DGDSWLYH%1DW =0.5 $GY DGDSWLYH%1DW =0.75 UDGLXV F H U W L I L HG D FF X U D F\ $GY DGDSWLYH%1DW =0.25 $GY DGDSWLYH%1DW =0.5 $GY DGDSWLYH%1DW =0.75 Figure 4:

CIFAR-10 : Certiﬁed top-1 accuracy at various (cid:96) radii as we vary σ for BN adaptationand certiﬁcation for the same classiﬁcation model. Adaptive BN allows to choose diﬀerent values ofoptimum σ for certifying diﬀerent test examples. By selecting those optimum σ for each test example,we can obtain an upper envelope of these curves as our certiﬁed top-1 accuracy. models. Next, we show that adversarially trained models, learned using adversarial examplesfrom larger threat boundaries, produce higher certiﬁcation accuracy at diﬀerent (cid:96) radii. dversarially Robust Classifier with Covariate Shift Adaptation CIFAR-10 (cid:96) Radius 0.25 0.5 0.75 1.0 1.25 1.5 1.75 2.0Adv ∞ + adapt BN (at σ = 0 .

25) 66.43 55.06 42.86 0.0 0.0 0.0 0.0 0.0Adv ∞ + adapt BN (at σ = 0 .

50) 53.65 42.91 32.58 22.68 14.24 7.88 2.94 0.0Adv ∞ + adapt BN (at σ = 0 .

75) 39.96 30.76 22.01 14.64 8.84 4.81 2.29 1.15Adv + adapt BN (at σ = 0 .

25) 68.02 58.54 46.98 0.0 0.0 0.0 0.0 0.0Adv + adapt BN (at σ = 0 .

50) 56.45 46.24 35.6 26.89 18.73 11.37 5.41 0.0Adv + adapt BN (at σ = 0 .

75) 43.04 33.08 24.81 17.68 11.39 6.6 3.57 1.95

Table 9:

CIFAR-10 : Certiﬁed top-1 accuracy at various (cid:96) radii as we vary σ for BNadaptation and certiﬁcation. We use ResNet50 for Imagenet (Top) and preactivationResNet18 for

CIFAR-10 (Bottom). UDGLXV F H U W L I L HG D FF X U D F\ $GY inf [ inf DGDSWLYH%1DW =0.5 $GY inf [ inf DGDSWLYH%1DW =0.5 $GY inf [ inf DGDSWLYH%1DW =0.5 5DQG =0.5 &RKHQDWDO UDGLXV F H U W L I L HG D FF X U D F\ $GY [ DGDSWLYH%1DW =0.5 $GY [ DGDSWLYH%1DW =0.5 $GY [ DGDSWLYH%1DW =0.5 5DQG =0.5 &RKHQDWDO Figure 5:

CIFAR-10 : Certiﬁed top-1 accuracy at various (cid:96) radii for adversarially trainedmodels learned using diﬀerent threat boundaries.. C.1 Flexibility of choosing σ during inference for certiﬁcation using BNAdaptation The choice of σ controls the tread-oﬀ between certiﬁed robust accuracy and certiﬁcationboundary Cohen et al. (2019); Salman et al. (2019a); Li et al. (2019). When σ is small, wecan certify a smaller radii with high accuracy. However, we cannot certify for large (cid:96) radiiat all. In contrast, as we choose a higher value for σ , larger radii can be certiﬁed, but we geta lower certiﬁcation accuracy at (cid:96) radii.However, existing works require to ﬁx an appropriate value of σ during training to learntheir base classiﬁer Cohen et al. (2019); Salman et al. (2019a). In contrast, adaptive BNtechnique provides the ﬂexibility to choose an appropriate σ during inference. In other words,we can choose diﬀerent values of σ for diﬀerent test examples for the certiﬁcation process.However, this process of choosing an appropriate value of σ for diﬀerent test examples forcertiﬁcation is extremely expensive. For each example, it would require to apply the BNadaptation on the classiﬁer and ﬁnd the certiﬁcation boundaries diﬀerent values of σ followedby choosing the largest (cid:96) certiﬁcation boundary.Here, we choose three diﬀerent values σ i.e., { . , . , . } for our experiments. Figure3 and Table 9 presents the results for ImageNet and Figure 4 and Table 8 presents theresults for

CIFAR-10 . We use the same Adv ∞ and Adv models as described in Section B.Note that, by choosing the appropriate values of σ for the certiﬁcation process, we can get IFAR-10 (cid:96) Radius 0.25 0.5 0.75 1.0 1.25 1.5 1.75Adv ∞ [ (cid:96) ∞ ≤ / σ = 0 .

5) 47.34 31.83 18.78 9.98 4.44 1.62 0.28Adv ∞ [ (cid:96) ∞ ≤ / σ = 0 .

5) 53.65 42.91 32.58 22.68 14.24 7.88 2.94Adv ∞ [ (cid:96) ∞ ≤ / σ = 0 .

5) 51.53 43.94 36.41 28.69 21.25 14.53 8.03Adv [ (cid:96) ≤ .

5] + adapt BN (at σ = 0 .

5) 48.81 33.82 20.95 11.5 5.64 2.29 0.62Adv [ (cid:96) ≤ .

0] + adapt BN (at σ = 0 .

5) 56.45 46.24 35.6 26.89 18.73 11.37 5.41Adv [ (cid:96) ≤ .

25] + adapt BN (at σ = 0 .

5) 57.73 48.8 39.64 31.07 22.61 15.82 8.96Rand σ =0 . Cohen et al. (2019) 51.68 40.38 30.25 20.81 13.36 7.71 3.38

Table 10:

CIFAR-10 : Certiﬁed top-1 accuracy at various (cid:96) radii for adversarially trainedmodels learned using diﬀerent threat boundaries. Adversarially trained models that arelearned using larger (cid:96) ∞ and (cid:96) threat boundaries, produces higher certiﬁed top-1 accuracyat various (cid:96) radii. We use σ = 0 . Robust Test AccuracyModel ThreatBoundary Withoutadapt BN With adaptive BNm=1 m=5 m=10Adv ∞ || δ || ∞ ≤ /

255 34.2 41.5 ± . ± . ± . Adv || δ || ≤ ± . ± . ± . Table 11:

ImageNet : Performance under adversarial attacks as we vary the number ofmodels ( m ) to approximate the loss gradients for the adversarial attack. For models withBN adaptation, we randomly shuﬄe the test images to report ( mean + 2 × sd ) of 3 diﬀerentruns.the upper envelopes of these curves as our certiﬁed top-1 accuracy in Figure 3 and Figure 4for ImageNet and

CIFAR-10 respectively.Similar to Section 4.2, we estimate the class label probabilities using Monte-Carlosampling with 100 ,

000 noisy samples and certify the test examples with 99 .

9% probability.

C.2 Eﬀect of threat boundaries for learning adversaially trained models

Next, we compare the certiﬁcation accuracy of diﬀerent adversarially trained models for

CIFAR-10 dataset, as we choose diﬀerent threat boundaries for learning. For this experiment,we train 3 diﬀerent adversarially trained classiﬁers using both (cid:96) ∞ and (cid:96) -bounded adversarieswith diﬀerent boundaries. We use diﬀerent threat boundaries while keeping the same setupas described in Section B. For Adv ∞ models, we choose (cid:96) ∞ threat boundaries as 4 / /

255 and 12 / models, we choose (cid:96) threat boundaries as 0 .

5, 1 and 1 . (cid:96) radii for both (cid:96) ∞ and (cid:96) threat models. Appendix D. Additional Results under Adversarial Attacks

In this section, we ﬁrst compare the strength of our adaptive attack as we increase thenumber of models to approximate the gradients (recall Equation 8). Next, we study theeﬀect of adaptive BN technique under adversarial attacks when the adversarially trained dversarially Robust Classifier with Covariate Shift Adaptation Robust Test AccuracyModel ThreatBoundary Withoutadapt BN With adaptive BNm=1 m=5 m=10 m=25Adv ∞ || δ || ∞ ≤ /

255 51.9 52.7 ± . ± . ± . ± . Adv || δ || ≤ ± . ± . ± . ± . Table 12:

CIFAR-10 : Performance under adversarial attacks as we vary the number ofmodels ( m ) to approximate the loss gradients for the adversarial attack. For models withBN adaptation, we randomly shuﬄe the test images to report ( mean + 2 × sd ) of 3 diﬀerentruns. Robust Test AccuracyModel ThreatBoundary Criteria Withoutadapt BN Withadapt BN DiﬀAdv ∞ || δ || ∞ ≤ Early stopping 51.9 52.2 ± . +0.3w/o Early stopping 44.2 45.4 ± . +1.2Adv || δ || ≤ . ± . +0.4w/o Early stopping 40.8 42.0 ± . +1.2 Table 13: Performance under adversarial attacks on

CIFAR-10 dataset. ”Final” denotesthe adversarially trained model that is saved at the end of 200 training epochs. For modelswith BN adaptation, we randomly shuﬄe the test images to report ( mean + 2 × sd ) of 3diﬀerent runs.models are over-ﬁtted models, i.e., early stopping criteria is not applied to obtain the bestmodels. D.1 Eﬀect of the EoT hyper-parameter ( m ) We recall from Equation 8 that for adversarially trained models with BN adaptation, weneed to approximate the gradients during back-propagation at each steps of the adaptiveadversarial attack using Equation 8. We select a set of m diﬀerent classiﬁers, { f θ ( i ) } mi =1 withdiﬀerent BN parameters for this approximation. We obtain each f θ ( i ) by applying adaptiveBN technique on the original classiﬁer f θ using test batch, perturbed with randomly chosen δ ∈ ∆ from the threat model. For our experiments in Table 3, we choose m = 10 for both CIFAR-10 and

ImageNet datasets.Here, we compare the strength of our adaptive attack as we vary the number of models(i.e., m ) to approximate the gradients. Table 12 and Table 11 present the results for ImageNet and

CIFAR-10 respectively. We observe that the performance of the adversarialattacks do not change signiﬁcantly as we vary m to approximate the gradients. This isbecause we obtain set of classiﬁer { f θ ( i ) } mi =1 by applying adaptive BN technique on the originalclassiﬁer f θ using a large test batch-size of 500 images, perturbed with randomly samplednoise from a small threat boundary (as a proxy for the adversarial perturbations). Thisresults in producing the classiﬁers, f θ ( i ) with only small changes in their batch-normalizationparameters. Hence, for a given test image, these models produce almost the same lossgradients with respect to the pixels. As a result, we cannot signiﬁcantly improve the strengthof the adversarial attacks by choosing a larger m . .2 Adversarial Attack on over-ﬁtted models Rice et al. (2020) recently demonstrated that the performance of adversarially trained modelssigniﬁcantly degrades by overﬁtting to the training set. By applying early-stopping criteriafor training to obtain the best model, we can signiﬁcantly improve their performance. Here,we study the eﬀect of adaptive BN technique as we obtain the models with and withoutusing early stopping criteria.We investigate the performance on

CIFAR-10 and present the results in Table 13. Weobserve that adaptive BN technique produces a slightly higher performance gain for theover-ﬁtted adversarially trained models i.e., without using early stopping criteria.

Appendix E. Additional Results on Common Corruptions

In this section, we ﬁrst present the accuracy attained by diﬀerent classiﬁers under diﬀerentcommon perturbations. Next, we visualize the loss gradients of adversarially trained modelsusing adaptive BN technique under diﬀerent corruptions from

ImageNet-C dataset.

E.1 Performance under diﬀerent corruptions

Hendrycks and Dietterich (2019) introduced

ImageNet-C and

CIFAR-10-C datasets byalgorithmically generated corruptions from noise, blur, weather, or digital categories with I = 5 diﬀerent severities for each corruptions. In Table 14 and Table 15, we present thetop-1 accuracy of ImageNet and

CIFAR-10 classiﬁers under each corruption, averagedover their 5 severity levels:Top-1 acc.( f ) k = (cid:16) − err k (cid:17) = (cid:16) − err fk,i I (cid:17) (11)where, f denotes the classiﬁer, k is the corruption type and i is the severity level.Recall that, we use the top-1 error rate for individual corruptions, normalized usingAlexNet error rates to measure the corruption robustness i.e mCE (Equation 9) and rmCE (Equation 10) scores. For our experiments on CIFAR-10-C , we train an AlexNet modelKrizhevsky et al. (2012). The model is trained for 150 epochs with batch-size of 128 withoutany data augmentation. We use the SGD optimizer and a constant learning rate of 0 . ImageNet-C , Hendrycks and Dietterich (2019) already provided the performance underdiﬀerent corruptions to compute the mCE and rmCE scores.In Table 14 and Table 15, we present the results for

ImageNet and

CIFAR-10 datasets,respectively. We observe, in particular for

ImageNet dataset, that the adversariallytrained models, without BN adaptation technique, signiﬁcantly dropped the accuracy undercorruptions from noise, blur and weather category for

ImageNet-C , compared to their cleantest accuracy. Infact, these models often achieves signiﬁcantly lower accuracy compared toBaseline model without BN adaptation. For examples, Adv ∞ and Adv achieves an averagetop-1 accuracy of 26 .

5% and 24 . .

0% for Baseline model without BNadaptation. However, as we apply BN adaptation, the performance of Adv ∞ and Adv signiﬁcantly improved by approximately 20% in top-1 accuracy for ImageNet-C . dversarially Robust Classifier with Covariate Shift Adaptation E.2 Loss Gradients under diﬀerent perturbations

Previous studies have shown that the loss gradients for adversarially trained models alignproperly with perceptually relevant features for clean images, without the perturbationsTsipras et al. (2019); Etmann et al. (2019). In this section, we investigate the loss gradientsof these models in presence of random perturbations from diﬀerent categories of corruptions.In Section 4.1, we have already visualized that in presence of high Gaussian noise,adversarially trained models without BN adaptation produces noisy loss gradients withhigher values for the back-ground pixels. Adaptive BN technique mitigates this eﬀect.However, in Figure 6 we observe that this behavior is not generalized for all types of randomcorruptions. Here, we visualize the Fourier spectrum of the corruption images along with theloss gradients values for Adv ∞ model with and without applying BN adaptation techniques . We also visualize the loss gradients with respect to input image pixels for both Adv ∞ and Adv models using and without using adaptive BN technique during inference in Figure7 to Figure 12.We note that, when the corruption images have higher concentration for lower Fourierfrequencies (called “low” frequency corruption), the adversarially trained models withoutBN adaptation tends to produce blurry loss-gradients (see the ﬁrst column in Figure 6).In other words, the classiﬁer provides almost uniform importance to all input pixels for topredict these corrupted images. Generally, we observe this pattern for corruption imagesfrom blur and weather categories.In contrast, for corruption images with larger concentration for higher Fourier frequencies,adversarially trained models without BN adaptation produces sharper loss-gradients. Asthe concentration for higher Fourier frequencies increase, it produces high loss-gradientvalues even for back-ground pixels (see the third column in Figure 6). As a consequence,the loss-gradients become noisy, obfuscating the pixels from object-of-interest. We observethis pattern for corruption images from noisy category.Finally, we can see that adaptive BN technique mitigates the blurry patterns for corruptionimages from blur and weather categories as well as the noisy patterns for the corruptionimages from the noise category. In other words, it mitigates the eﬀect of any randomcorruption to restore the alignment with perceptually relevant features. This allows theadversarially trained models to produce signiﬁcantly higher Top-1 classiﬁcation accuracy forrandom corruptions (Table 14 and Table 15).

5. Please refer to Yin et al. (2019) (see Figure 2) for more details of Fourier perspective on randomcorruptions. a) ImageNet-C

Category Corruption Top-1 AccuracyAlexNet Baseline (ResNet50) Adv ∞ Adv w/oadapt BN withadapt BN w/oadapt BN withadapt BN w/oadapt BN withadapt BNClean - 56.5 ± . ± . ± . ± . ± . ± . ± . Noise Gaussian Noise 11.4 ± . ± . ± . ± . ± . ± . ± . Shot Noise 10.6 ± . ± . ± . ± . ± . ± . ± . Impulse Noise 7.7 ± . ± . ± . ± . ± . ± . ± . Average. 9.9 ± . ± . ± . ± . ± . ± . ± . Blur Defocus Blur 18.0 ± . ± . ± . ± . ± . ± . ± . Glass Blur 17.4 ± . ± . ± . ± . ± . ± . ± . Motion Blur 21.4 ± . ± . ± . ± . ± . ± . ± . Zoom Blur 20.2 ± . ± . ± . ± . ± . ± . ± . Average. 19.3 ± . ± . ± . ± . ± . ± . ± . Weather Snow 13.3 ± . ± . ± . ± . ± . ± . ± . Fog 18.1 ± . ± . ± . ± . ± . ± . ± . Frost 17.3 ± . ± . ± . ± . ± . ± . ± . Brightness 43.5 ± . ± . ± . ± . ± . ± . ± . Contrast 14.7 ± . ± . ± . ± . ± . ± . ± . Average. 21.4 ± . ± . ± . ± . ± . ± . ± . Digital Elastic Transform 35.4 ± . ± . ± . ± . ± . ± . ± . Pixelate 28.2 ± . ± . ± . ± . ± . ± . ± . JPEG Compression 39.3 ± . ± . ± . ± . ± . ± . ± . Average. 34.2 ± . ± . ± . ± . ± . ± . ± . Hold-out Noise Speckle Noise 15.5 ± . ± . ± . ± . ± . ± . ± . Hold-out Blur Gaussian Blur 21.3 ± . ± . ± . ± . ± . ± . ± . Hold-out Weather Spatter 28.2 ± . ± . ± . ± . ± . ± . ± . Hold-out Digital Saturate 34.2 ± . ± . ± . ± . ± . ± . ± . Table 14:

ImageNet-C : Top-1 Accuracy achieved by diﬀerent classiﬁcation models undereach individual corruption. For models with BN adaptation, we randomly shuﬄe the testimages to report ( mean + 2 × sd ) of 3 diﬀerent runs. dversarially Robust Classifier with Covariate Shift Adaptation (a) CIFAR-10-C

Category Corruption Top-1 AccuracyAlexNet Baseline (ResNet50) Adv ∞ Adv w/oadapt BN withadapt BN w/oadapt BN withadapt BN w/oadapt BN withadapt BNClean - 81.4 ± . ± . ± . ± . ± . ± . ± . Noise Gaussian Noise 66.4 ± . ± . ± . ± . ± . ± . ± . Shot Noise 72.8 ± . ± . ± . ± . ± . ± . ± . Impulse Noise 56.3 ± . ± . ± . ± . ± . ± . ± . Average. 65.1 ± . ± . ± . ± . ± . ± . ± . Blur Defocus Blur 49.3 ± . ± . ± . ± . ± . ± . ± . Glass Blur 62.0 ± . ± . ± . ± . ± . ± . ± . Motion Blur 76.8 ± . ± . ± . ± . ± . ± . ± . Zoom Blur 67.2 ± . ± . ± . ± . ± . ± . ± . Average. 63.8 ± . ± . ± . ± . ± . ± . ± . Weather Snow 67.1 ± . ± . ± . ± . ± . ± . ± . Fog 71.3 ± . ± . ± . ± . ± . ± . ± . Frost 64.8 ± . ± . ± . ± . ± . ± . ± . Brightness 81.4 ± . ± . ± . ± . ± . ± . ± . Contrast 76.5 ± . ± . ± . ± . ± . ± . ± . Average. 72.2 ± . ± . ± . ± . ± . ± . ± . Digital Elastic Transform 70.8 ± . ± . ± . ± . ± . ± . ± . Pixelate 63.9 ± . ± . ± . ± . ± . ± . ± . JPEG Compression 54.3 ± . ± . ± . ± . ± . ± . ± . Average. 63.0 ± . ± . ± . ± . ± . ± . ± . Hold-out Noise Speckle Noise 71.2 ± . ± . ± . ± . ± . ± . ± . Hold-out Blur Gaussian Blur 65.8 ± . ± . ± . ± . ± . ± . ± . Hold-out Weather Spatter 68.2 ± . ± . ± . ± . ± . ± . ± . Hold-out Digital Saturate 74.9 ± . ± . ± . ± . ± . ± . ± . Table 15:

CIFAR-10-C : Top-1 Accuracy achieved by diﬀerent classiﬁcation models undereach individual corruption. For models with BN adaptation, we randomly shuﬄe the testimages to report ( mean + 2 × sd ) of 3 diﬀerent runs. lean ImagesContrastDefocus Blur FogFrost Gaussian NoiseShot NoiseFigure 6: Corruption images with higher concentration of Fourier frequencies produce blurry loss-gradients, while, corruption images with higher concentration of Fourier frequencies lead to noisyloss-gradient patterns. Adaptive BN signiﬁcantly adaptive BN technique mitigates these eﬀect torestore the perceptual alignment. 30 dversarially Robust Classifier with Covariate Shift Adaptation (a) Severity Level:= 1 (b) Severity Level:= 3 (c) Severity Level:= 5

Figure 7:

Visualization of loss-gradients produced by adversarially trained classiﬁers on the testexamples that are perturbed using Brightness and Contrast corruptions from weather category.31a) Severity Level:= 1 (b) Severity Level:= 3 (c) Severity Level:= 5

Figure 8:

Visualization of loss-gradients produced by adversarially trained classiﬁers on the testexamples that are perturbed using Snow, Frost and Fog corruptions from Weather category.32 dversarially Robust Classifier with Covariate Shift Adaptation (a) Severity Level:= 1 (b) Severity Level:= 3 (c) Severity Level:= 5

Figure 9:

Visualization of loss-gradient images produced by adversarially trained classiﬁers on thetest examples that are perturbed using Defocus-blur and Glass-blur corruptions from Blur category.33a) Severity Level:= 1 (b) Severity Level:= 3 (c) Severity Level:= 5

Figure 10:

Visualization of loss-gradient images produced by adversarially trained classiﬁers on thetest examples that are perturbed using Zoom-blur and Motion-blur corruptions from Blur category.34 dversarially Robust Classifier with Covariate Shift Adaptation (a) Severity Level:= 1 (b) Severity Level:= 3 (c) Severity Level:= 5

Figure 11:

Visualization of loss-gradient images produced by adversarially trained classiﬁers onthe test examples that are perturbed using Elastic transformation, Pixelate and JPEG compressioncorruptions from Digital category. 35a) Severity Level:= 1 (b) Severity Level:= 3 (c) Severity Level:= 5