Efficient Certified Defenses Against Patch Attacks on Image Classifiers
PPublished as a conference paper at ICLR 2021 E FFICIENT C ERTIFIED D EFENSES A GAINST P ATCH A T - TACKS ON I MAGE C LASSIFIERS
Jan Hendrik Metzen & Maksym Yatsura
Bosch Center for Artificial Intelligence, Robert Bosch GmbHRobert-Bosch-Campus 1, 71272 Renningen, Germany { janhendrik.metzen,maksym.yatsura } @de.bosch.com A BSTRACT
Adversarial patches pose a realistic threat model for physical world attacks onautonomous systems via their perception component. Autonomous systems insafety-critical domains such as automated driving should thus contain a fail-safefallback component that combines certifiable robustness against patches with effi-cient inference while maintaining high performance on clean inputs. We proposeB AG C ERT , a novel combination of model architecture and certification procedurethat allows efficient certification. We derive a loss that enables end-to-end op-timization of certified robustness against patches of different sizes and locations.On CIFAR10, B AG C ERT certifies 10.000 examples in seconds on a single GPUand obtains 86% clean and 60% certified accuracy against × patches. NTRODUCTION
Adversarial patches (Brown et al., 2017) are one of the most relevant threat models for attacks onautonomous systems such as highly automated cars or robots. In this threat model, an attacker canfreely control a small subregion of the input (the “patch”) but needs to leave the rest of the inputunchanged. This threat model is relevant because it corresponds to a physically realizable attack(Lee & Kolter, 2019): an attacker can print the adversarial patch pattern, place it in the physicalworld, and it will become part of the input of any system whose field of view overlaps with thephysical patch. Moreover, once an attacker has generated a successful patch pattern, this pattern canbe easily shared, will be effective against all systems using the same perception component, and anattack can be conducted without requiring access to the individual system. This makes for instanceattacking an entire fleet of cars of the same vendor feasible.While several empirical defenses were proposed (Hayes, 2018; Naseer et al., 2019; Selvaraju et al.,2019; Wu et al., 2020)), these only offer robustness against known attacks but not necessarily againstmore effective attacks that may be developed in the future (Chiang et al., 2020). In contrast, certifieddefenses for the patch threat model (Chiang et al., 2020; Levine & Feizi, 2020; Zhang et al., 2020;Xiang et al., 2020) allow guaranteed robustness against all possible attacks for the given threatmodel. Ideally, a certified defense should combine high certified robustness with efficient inferencewhile maintaining strong performance on clean inputs. Moreover, the training objective should bebased on the certification problem to avoid post-hoc calibration of the model for certification.Existing defenses do not satisfy all of these conditions: Chiang et al. (2020) proposed an approachthat extends interval-bound propagation (Gowal et al., 2019) to the patch threat model. In this ap-proach, there is a clear connection between training objective and certification problem. However,certified accuracy is relatively low and clean performance severely affected (below on CI-FAR10). Moreover, inference requires separate forward passes for all possible patch positions andis thus computationally very expensive. Derandomized smoothing (Levine & Feizi, 2020) achievesmuch higher certified and clean performance on CIFAR10 and even scales to ImageNet. However,inference is computationally expensive since it is based on separately propagating many differentlyablated versions of a single input. Moreover, training and certification are disconnected and a sepa-rate tuning of parameters of the post-hoc certification procedure on some hold-out data is required,a drawback shared also by Clipped BagNet Zhang et al. (2020) and PatchGuard (Xiang et al., 2020).1 a r X i v : . [ c s . L G ] F e b ublished as a conference paper at ICLR 2021In this work, we propose B AG C ERT , which combines high certified accuracy ( on CIFAR10 for × patches) and clean performance ( on CIFAR10), efficient inference (43 seconds on a singleGPU for the . CIFAR10 test samples), and end-to-end training for robustness against patchesof varying size, aspect ratio, and location. B AG C ERT is based on the following contributions:• We propose three different conditions that can be checked for certifying robustness. Oneof these corresponds to the condition proposed by Levine & Feizi (2020). However, weshow that an alternative condition improves certified accuracy of the same model typicallyby roughly percent points while remaining broadly applicable.• We derive a loss function that directly optimizes for certified accuracy against a uniformdistribution of patch sizes at arbitrary positions. This loss corresponds to a specific type ofthe well known class of margin losses.• Similarly to Levine & Feizi (2020), we classify images via a majority voting over a largenumber of predictions that are based on small local regions of a single input. However,the proposed model achieves this via a single forward-pass on the unmodified input, byutilizing a neural network architecture with very small receptive fields, similar to BagNets(Brendel & Bethge, 2019). This enables efficient inference with surprisingly high cleanaccuracy and was concurrently proposed by Zhang et al. (2020) and Xiang et al. (2020). ELATED W ORK
Adversarial Patch Attacks
Vulnerability of image classifiers to adversarial patch attacks wasfirst demonstrated by Brown et al. (2017). They show that a specifically crafted physical adversarialpatch is able to fool multiple ImageNet models into predicting the wrong class with high confi-dence. Numerous patch attacks were proposed for object detection (Liu et al., 2019; Lee & Kolter,2019; Thys et al., 2019; Huang et al., 2019) and optical flow estimation (Ranjan et al., 2019). Theversatility of these attacks allows them to perform efficiently in the black box setup (Croce et al.,2020) as well as suppressing detected objects in a scene without overlapping any of them (Lee &Kolter, 2019). Following Athalye et al. (2018b), adversarial patches can be printed out and placed inthe physical world to fool different models independently from the scaling, rotation, brightness andother visual transformations. These factors make adversarial patch attacks a non-negligible threatfor the safety-critical perception systems (Thys et al., 2019).
Heuristic Defenses Against Patch Attacks
Several heuristic defenses against adversarial patchessuch as digital watermarking (Hayes, 2018) or local gradient smoothing (Naseer et al., 2019) havebeen proposed. However, similarly to the results obtained for the norm-bounded adversarial attacks(Athalye et al., 2018a), it was demonstrated that these defenses can be easily broken by white-boxattacks which account for the pre-processing steps in the optimization procedure (Chiang et al.,2020). The role of spatial context in the object detection algorithms which makes them vulnerableto the patch attacks was investigated by Saha et al. (2019) and an empirical defense based on Grad-CAM (Selvaraju et al., 2019) was proposed. Existing augmentation techniques based on addingGaussian noise patch (Lopes et al., 2019) or a patch from a different image (Yun et al., 2019) increaserobustness against occlusions caused by adversarial patches. Wu et al. (2020) propose a defense thatuses adversarial training to increase robustness against occlusion attacks.
Certified Defenses
Evaluating defense methods using their performance against empirical attackscan lead to the false sense of security since stronger adversaries might be developed in the futurethat break the defenses (Athalye et al., 2018a; Uesato et al., 2018). Therefore, it is important tohave guarantees of robustness. Numerous works were proposed in the field of certified robustnessranging from complete verifiers finding the worst-case adversarial examples exactly (Huang et al.,2017; Tjeng & Tedrake, 2017) to faster but less accurate incomplete methods that provide an upperbound on the robust error (Gehr et al., 2018; Wong & Kolter, 2018; Wong et al., 2018; Gowal et al.,2019). Another line of work is based on Randomized Smoothing (Lecuyer et al., 2019; Li et al.,2019; Cohen et al., 2019), which exhibits strong empirical results and scales to ImageNet, howeverat the cost of increasing inference time by orders of magnitude. Certified defenses crafted for thepatch attacks were first proposed by Chiang et al. (2020). They adapt the IBP method (Gowal et al.,2019) to the patch threat model. Although their approach allows to obtain robustness guarantees, itonly scales to small patches and causes a significant drop in clean accuracy. Levine & Feizi (2020)2ublished as a conference paper at ICLR 2021Figure 1: Illustration of B AG C ERT training for a 1D input and two classes. An input X is processedby region scorer f θ , consisting of a 3-layer CNN with kernel sizes 3, 1, and 3. The resulting con-tinuous region scores are passed through a Heaviside step function (replaced by a sigmoid in thebackward pass) to obtain binary region scores s for every class. The differences ∆ between true andnon-true class scores are then processed by spatial aggregation g , in this case simply summing themvia g = g (cid:80) . The resulting value is maximized by passing it into margin loss L .proposed (de)randomized smoothing: they train a base classifier for classifying images where all buta small local region is ablated. At inference time, many (or even all) possible ablations are classifiedand a majority vote determines the final classification. If this majority vote is with sufficient margin,the decision is provable robust against patch attacks because a patch will be completely ablatedin most of the inputs and can thus only influence a minority of the votes. This method providessignificant accuracy improvement when compared to (Chiang et al., 2020) and allows training andcertifying ImageNet models. However, its inference in block-smoothing mode is computationallyexpensive. A last line of work is based on using models with small receptive fields such as BagNets(Brendel & Bethge, 2019): Zhang et al. (2020) apply a clipping function while Xiang et al. (2020)apply a “detect-and-mask” filter to the logits of pretrained BagNets before global averaging. Usingsmall receptive fields limits the number of region scores affected by a local patches while clippingand masking ensure that few very large region scores cannot dominate the global average. We notethat these approaches do not train models directly for certified robustness but rather achieve it byapplying post-hoc procedures that come with additional hyperparameters that require careful tuning. ETHOD
We introduce B AG C ERT , a framework which consists of novel conditions for certifying robustness, aspecific model architecture, and a new end-to-end training procedure. B AG C ERT allows end-to-endtraining of classifiers whose robustness against adversarial patch attacks can be certified efficiently.We outline our approach for the task of image classification but note that it can be extended to othertasks with grid-structured inputs. We refer to Figure 1 for an illustration of the training phase and toFigure 5 in the supplementary material for an illustration of certification of B AG C ERT . Threat Model
We consider a threat model in which an attacker can conduct an image-dependentpatch attack. Let x ∈ [0 , w in × h in × c in be an input image of resolution w in × h in with c in channels.Let p be a patch and l be a region of an image x having the same size as patch p . We denote a set offeasible regions l as L . For example, for a patch p ∈ [0 , n × c in consisting of n pixels, L could bethe set of all w p × h p rectangular regions l of an image x with w p · h p = n . We define an operator A such that A ( x, p, l ) is the result of placing a patch p onto an image x over a region l . We assume thatthe attacker has white-box knowledge of the model and conducts an input-dependent attack, that isattack region l and inserted patch p can be chosen for every input independently.3ublished as a conference paper at ICLR 20213.1 C ERTIFICATION
We base our method for certification on assuming a certain structure of the classifier. More specifi-cally, we decompose the classifier into two components:• A region scorer f θ that maps from inputs x to region scores s ∈ { , } w out × h out × c out ,where w out × h out is the output resolution, c out is the number of classes, and θ are trainableparameters. Please note that we allow (cid:80) c s i,j,c (cid:54) = 1 .• A spatial aggregator g that maps from region scores s to (global) class scores S ∈ [0 , c out .In this work, we restrict g to be monotonically increasing, that is: for class c and two patchscore maps s (1) and s (2) with s (1) i,j,c ≥ s (2) i,j,c ∀ i, j , we require g ( s (1) ) c ≥ g ( s (2) ) c ∀ c .Generally, we base certification on upper bounding the effect of an actual attack in the threat model.For this, we only exploit architectural properties of f and g that are valid for any choice of modelparameters θ . More specifically, we only exploit the output dependency map R of f , which wedefine as R ( l ) = { ( i, j ) | ∃ x, θ, p : f θ ( A ( x, p, l )) i,j (cid:54) = f θ ( x ) i,j } . Informally, R ( l ) is the set of allindices of the score map that can be affected by a patch applied at region l , for any choice of input x , patch p , and parameters θ . That is: the set of all outputs of f θ whose receptive fields overlap with l . We discuss options for f and the resulting R in Section 3.2.For input x with class label c t and s = f θ ( x ) , we define the ”worst-case” score map s wc ( s, l ) as s wci,j,c ( s, l ) = s i,j,c if ( i, j ) / ∈ R ( l )1 if ( i, j ) ∈ R ( l ) ∧ c (cid:54) = c t if ( i, j ) ∈ R ( l ) ∧ c = c t Moreover, we define ∆ i,j,c = s i,j,c t − s i,j,c and similarly ∆ wci,j,c = s wci,j,c t − s wci,j,c . It follows directlythat ∆ wci,j,c = ∆ i,j,c ∀ ( i, j ) / ∈ R ( l ) and ∆ wci,j,c = − ∀ ( i, j ) ∈ R ( l ) , c (cid:54) = c t .For certifying robustness in the threat model for input x with class label c t , we need to show g ( f θ ( A ( x, p, l ))) c t > g ( f θ ( A ( x, p, l ))) c ∀ c (cid:54) = c t ∀ l ∈ L ∀ p . For this, it suffices to check Condition 3.1. g ( s wc ( s, l )) c t > g ( s wc ( s, l )) c ∀ c (cid:54) = c t , ∀ l ∈ L Proof.
Consider arbitrary l ∈ L and p and let s adv = f θ ( A ( x, p, l )) . With s advi,j,c ∈ { , } we obtain s advi,j,c = s wci,j,c ( s, l ) = s i,j,c ( s, l ) if ( i, j ) / ∈ R ( l ) ≤ s wci,j,c ( s, l ) = 1 if ( i, j ) ∈ R ( l ) ∧ c (cid:54) = c t ≥ s wci,j,c ( s, l ) = 0 if ( i, j ) ∈ R ( l ) ∧ c = c t With g being monotonically increasing, we obtain g ( s adv ) c t ≥ g ( s wc ( l )) c t and for all c (cid:54) = c t g ( s wc ( l )) c ≥ g ( s adv ) c . Condition 3.1 implies g ( s adv ) c t > g ( s adv ) c ∀ c (cid:54) = c t .Checking the Condition 3.1 requires one forward-pass through f θ to obtain s = f θ ( x ) and |L| timesthe construction s wc ( s, l ) and the evaluation of g . We now consider a special case where this can beimplemented very efficiently.3.1.1 S PATIAL S UM A GGREGATION
For the case g = g (cid:80) ( s ) = (cid:80) w out ,h out i =1 ,j =1 s i,j , Condition 3.1 simplifies to Condition 3.2. min c (cid:54) = c t (cid:80) i,j / ∈ R ( l ) ∆ i,j,c > | R ( l ) | ∀ l ∈ L We would like to note that these are “trivial” lower and upper bounds for s adv and we see the potentialto improve upon these bounds in future work, for instance by relaxing s ∈ [0 , w out × h out × c out and applyinginterval bound propagation (Gowal et al., 2019). However, the proposed simple bounds have the advantage ofnot requiring additional forward passes through the model and thus being computationally efficient. Proof.
For all c (cid:54) = c t , we exploit ∀ ( i, j ) ∈ R ( l ) : ∆ wci,j,c = − . With Condition 3.2, we obtain g (cid:80) ( s wc ( l )) c t − g (cid:80) ( s wc ( l )) c = w out ,h out (cid:88) i =1 ,j =1 s wci,j,c t − w out ,h out (cid:88) i =1 ,j =1 s wci,j,c = w out ,h out (cid:88) i =1 ,j =1 ∆ wci,j,c = (cid:88) i,j / ∈ R ( l ) ∆ wci,j,c + (cid:88) i,j ∈ R ( l ) ∆ wci,j,c = (cid:88) i,j / ∈ R ( l ) ∆ i,j,c − | R ( l ) | > . We note that (cid:80) i,j / ∈ R ( l ) ∆ i,j,c = w out ,h out (cid:80) i =1 ,j =1 ∆ i,j,c − (cid:80) i,j ∈ R ( l ) ∆ i,j,c t . For the special case that all R ( l ) arerectangular, (cid:80) i,j ∈ R ( l ) ∆ i,j,c can be computed efficiently for all l ∈ L simultaneously via integralimages/summed-area tables (Crow, 1984). For instance, R ( l ) is rectangular for l being rectangularinput patches and the R resulting from an CNN with grid-aligned kernels.For the case that the R ( l ) are not all rectangular and |L| becomes large, checking Condition 3.2 canbecome prohibitively expensive. For this case, we derive a condition that corresponds to an upperbound on Condition 3.2 and can be evaluated in constant time with respect to |L| : Condition 3.3. min c (cid:54) = c t w out ,h out (cid:80) i =1 ,j =1 ∆ i,j,c > R max ( L ) with R max ( L ) = max l ∈L | R ( l ) | Proof. ∆ i,j,c ≤ implies (cid:80) i,j ∈ R ( l ) ∆ i,j,c ≤ | R ( l ) | ≤ R max ( L ) . For all c (cid:54) = c t , using Condition 3.3: (cid:80) i,j / ∈ R ( l ) ∆ i,j,c = w out ,h out (cid:80) i =1 ,j =1 ∆ i,j,c − (cid:80) i,j ∈ R ( l ) ∆ i,j,c t > R max ( L ) − R max ( L ) ≥ | R ( l ) | We note that Condition 3.3 corresponds to the condition proposed by Levine & Feizi (2020). It is,however, a strictly weaker condition than Condition 3.2. Thus, Condition 3.2 is preferable if all R ( l ) are rectangular or |L| is of moderate size. We refer to Figure 5 in the supplementary material for anillustration of Condition 3.2 and 3.3.3.2 M ODEL
Crucially, the quality of the certification depends on R max ( L ) = max l ∈L | R ( l ) | : the larger thisquantity becomes, the larger the left-hand side of Condition 3.2 or Condition 3.3 needs to be tofulfill the condition. We focus on the specific case where f θ is realized by a convolutional neuralnetwork (CNN). In that case, | R ( l ) | is determined fully by l and the receptive field of the CNN.More specifically, we obtain R ( l ) = { ( i, j ) | ∃ (˜ i, ˜ j ) ∈ l : | i − ˜ i | ≤ (cid:98) w rf / (cid:99) ∧ | j − ˜ j | ≤ (cid:98) h rf / (cid:99)} for a receptive field size of w rf × h rf and ignoring operation strides.Receptive field sizes of CNNs are determined by the shapes of the convolutional kernels as well asoperation strides. We propose using standard CNN architectures such as ResNets but replacing most × convolutions by × convolutions, using stride 1 in (nearly) all operations, and removingall dense layers. This results in a network with very small receptive field sizes and thus small R ( l ) . We note that the proposed architecture is similar to BagNets (Brendel & Bethge, 2019) andusing this type of model was concurrently proposed for certifying robustness against patch attacks byZhang et al. (2020) and Xiang et al. (2020). BagNets obtain surprisingly high classification accuracydespite small receptive field sizes (Brendel & Bethge, 2019). Importantly, in contrast to BagNets,we do not apply a global average pooling on the final feature layer. This results in a dense output ofshape w out × h out × c out . The ratios w in /w out and h in /h out depend on the strides applied in thenetwork and control mostly the computational overhead. We note that the cost for forward/backwardpasses in BagNets are in the same order of magnitude as those of a corresponding residual network.Because of the small receptive fields of BagNets, | R ( l ) | is small if l is a small contiguous region ofthe input, such as a rectangular patch. 5ublished as a conference paper at ICLR 2021We apply a Heaviside step function H ( x ) = (cid:26) , for x < , for x ≥ as final layer of f θ , which en-sures f θ ( X ) ∈ { , } w out × h out × c out . Similar to clipping Zhang et al. (2020) and masking (Xianget al., 2020) this also ensures that a patch cannot flip the global classification by perturbing a localscore so strongly that it dominates the globally aggregated score. However, since H is constantnearly everywhere, it does not provide useful gradient information and thereby precludes end-to-end training. We address this by applying a ”straight-through” type trick (Bengio et al., 2013) wherewe replace H in the backward pass by its smooth approximation, the logistic sigmoid function s ( x ) = e − x . That is, we use H ( x ) in the forward pass but replace the true gradient of H with H (cid:48) ( x ) := s (cid:48) ( x ) = s ( x )(1 − s ( x )) . We explore alternatives to the Heaviside step function in SectionA.3 in the appendix.While the proposed model computes f θ ( X ) in a single forward-pass and controls | R ( l ) | indirectlyvia the architecture of f , we note that alternative models are compatible with B AG C ERT . For in-stance, one could compute every element of the output s i,j via a separate forward pass of an arbitrarymodel on an ablated (Levine & Feizi, 2020) or cropped version of the input similar to Mask-DS-ResNet (Xiang et al., 2020). This also ensures that a specific element of the output depends onlyon the cropper/non-ablated part of the input. While these works are more flexible in terms of modelarchitecture, they require a number forward passes proportional to the resolution of the output s ,which would make inference (and end-to-end) training computationally much more expensive.3.3 E ND - TO -E ND T RAINING
Having derived conditions that can be used for certifying robustness against patch attacks in Section3.1 as well as differentiable model for the region scorer f in Section 3.2, we now define a lossfunction for end-to-end training. We restrict ourselves to the case of a spatial sum aggregation g (cid:80) .We recall Condition 3.3: min c (cid:54) = c t (cid:80) w out ,h out i =1 ,j =1 ∆ i,j,c > R max ( L ) . The corresponding loss for thiscan be defined as L H (∆ , c t , R max ) = H (min c (cid:54) = c t (cid:80) w out ,h out i =1 ,j =1 ∆ i,j,c ≤ R max ) , that is: the loss is1 if there is a target class c such that (cid:80) w out ,h out i =1 ,j =1 ∆ i,j,c becomes smaller/equal two times the sizeof the maximum affected patch score region. However, this requires choosing L and the resulting R max ( L ) before training, which is undesirable. Instead, we stay agnostic with respect to the specific L and simply assume a uniform distribution for R max ( L ) , that is R max ( L ) ∼ U (0 , R ) . Here, R corresponds to the maximum patch size (in region score space) we consider. This results in the loss L R (∆ , c t ) = R (cid:90) p ( ˜ R ) L H (∆ , c t , ˜ R ) d ˜ R = R (cid:90) R H (min c (cid:54) = c t w out ,h out (cid:88) i =1 ,j =1 ∆ i,j,c ≤ R ) d ˜ R = 1 − R R (cid:90) H (min c (cid:54) = c t w out ,h out (cid:88) i =1 ,j =1 ∆ i,j,c > R ) d ˜ R = 1 − R min( 12 min c (cid:54) = c t w out ,h out (cid:88) i =1 ,j =1 ∆ i,j,c , R ) = 1 − R min(min c (cid:54) = c t w out ,h out (cid:88) i =1 ,j =1 ∆ i,j,c , R ) . In practice, we minimize ˜ L R (∆ , c t ) = − min(min c (cid:54) = c t (cid:80) w out ,h out i =1 ,j =1 ∆ i,j,c w out · h out , M ) with M = Rw out · h out .This loss can be interpreted as a margin loss with margin M , where the margin corresponds to twicethe maximum patch size in region score space against which we want to become certifiably robust. One-hot penalty
While we do not strictly enforce (cid:80) c s i,j,c = 1 , we sometimes found it ben-eficial to add a term in the loss that encourages S = g (cid:80) ( s ) being approximately “one-hot”, thatis L oh ( S ) = max c (cid:54) = c max S c − S c max with c max = arg max c S c . Since S c ∈ [0 , , it holds that L oh ( S ) ∈ [ − , and L oh ( S ) = − iff S c max = 1 and S c = 0 ∀ c (cid:54) = c max . The term L oh ( S ) We note that other choices than the uniform distribution would be an interesting direction for future work,in particular if the defender has prior knowledge about more likely patch sizes and shapes.
75 80 85 90 95Top1 Accuracy2030405060 C e r t i f i e d T o p A cc u r a c y CIFAR10 (5x5 patch)DS (BS, theta=0.3)rf=5rf=7rf=9rf=11rf=13Mask-BNMask-DSCBN 35 40 45 50Top1 Accuracy5.010.015.020.025.0 ImageNet (2% patch size)DS (CS, s=25)rf=17rf=25rf=29Mask-BNMask-DSCBN
Figure 2: Clean versus certified accuracy on CIFAR10 and ImageNet for B AG C ERT with differentreceptive fields and train margins ( M ∈ { . , . , . , . } for CIFAR10, M = 0 . for Ima-geNet) when certifying via Condition 3.3 (circles) and Condition 3.2 (stars), same setting connectedby thin line. Smaller M generally corresponds to larger clean accuracy for CIFAR10. Baselines areDerandomized Smoothing (DS) (Levine & Feizi, 2020), Masked BagNet (Mask-BN) and MaskedDS-ResNet (Mask-DS) (Xiang et al., 2020), and Clipped BagNet (CBN) (Zhang et al., 2020). Re-sults for these baselines are taken from the respective papers.prevents training from prematurely converging to a solution where s i,j,c is approximately constantfor all i, j, c , which we observed otherwise for tasks with many classes (e.g. ImageNet). The totalloss becomes L total = ˜ L R (∆ , c t ) + σL oh ( S ) , where σ controls the strength of this one-hot penalty. XPERIMENTS
We perform an empirical evaluation of B AG C ERT on CIFAR10 (Krizhevsky, 2009) and ImageNet(Russakovsky et al., 2015). We report clean and certified accuracy and compare to Interval BoundPropagation (IBP) (Chiang et al., 2020), Derandomized Smoothing (DS) (Levine & Feizi, 2020),Clipped BagNet (CBN) (Zhang et al., 2020), and PatchGuard (Xiang et al., 2020). For DS, wefocus on block-smoothing and for PatchGuard, we focus on the masked BagNet (Mask-BN) becausecolumn smoothing for DS (and the derived Mask-DS for PatchGuard) perform poorly for non-squarepatches that are “short-but-wide” (see Figure 4). We notice that column smoothing and Mask-DS perform better than column smoothing and Mask-BN against square-patches; however, thereis no reason an attacker should prefer square over non-square rectangular patches. Details on theB AG C ERT model architecture and training can be found in Appendix A.1. Moreover, we focus oncertified accuracy, a lower bound on the actual robustness of a model. Results for accuracy against astrong adversarial patch attack, corresponding to an upper bound on actual robustness, are discussedin Section A.2 in the appendix.Figure 2 shows results for different methods against × patches for CIFAR10 corresponding to . of the image size and patches of of the image size for ImageNet. For CIFAR10, when cer-tifying accuracy via Condition 3.3, the Pareto frontier of B AG C ERT follows closely the one reportedfor DS with block smoothing and θ = 0 . . This is somewhat surprising given that both model andtraining procedure are very different and only the condition for certifying robustness is identical.We hypothesize that both approaches have reached close to optimal Pareto frontiers when certifyingrobustness via Condition 3.3. However, as Table 1 shows, B AG C ERT requires (depending on itsreceptive field size) only between . and . seconds for certifying all . test examples ona single Tesla V100 SXM2 GPU while DS with block smoothing requires seconds. B AG C ERT also clearly dominates Mask-BN and CBN, which utilize a similar model architecture, as well asIBP (not shown) which reaches . clean and . certified accuracy. Moreover, when ap-plying Condition 3.2 for certification, certified accuracy is increased by approx. 3 percent pointswithout changes in clean accuracy or any noticeable increase in certification time. In summary, the7ublished as a conference paper at ICLR 2021RF of B AG C ERT (0, 0) (5, 5) (10, 10) (15, 15)Patch size0255075100 C e r t i f i e d A cc u r a c y Margin M=0.5 RF5791113IBPDS (BS, s=12)Mask-BNCBN (0, 0) (5, 5) (10, 10) (15, 15)Patch sizeRF=7 Margin M0.250.50.751.0IBPDS (BS, s=12)Mask-BNCBN
Figure 3: Certified accuracy against square patches of different sizes on CIFAR10. Shown is theperformance for different receptive fields of B AG C ERT (left) and train margins (right). Lines corre-spond to the same model (without retraining), evaluated against patches of different size.strongest B AG C ERT model with receptive field × and margin M = 0 . can certify all . test examples in . seconds, reaching clean accuracy of and certified accuracy of .On ImageNet, B AG C ERT also dominates all baselines in terms of certified accuracy, reaching . via Condition 3.3 and . via Condition 3.2 for receptive field size and margin M = 0 . .Running certification for the entire validation set of . images takes roughly minutes.Figure 3 shows accuracy of B AG C ERT certified via Condition 3.2 for square patches of differentsizes on CIFAR10. Again, baselines are dominated for both × and × patches. Moreover, asingle configuration of B AG C ERT with receptive field size 7 and margin M = 0 . performs closeto optimal for all patch sizes and can certify non-trivial performance for up to × patch size.This implies that a single model can be used for a broad range of threat models. Figure 4 showsa similar analysis for non-square patches of a total size of 24 pixels. While B AG C ERT with thesame configuration as above achieves a certified accuracy of or more for any patch aspect ratio,performance of DS with column smoothing varies greatly with aspect ratio. In particular, “short-but-wide” patches of shape × or × reduce certified accuracy of column smoothing closeto . Since there is no reason to assume attackers will restrict themselves to square patches, we donot consider DS with column smoothing or Mask-DS (Xiang et al., 2020) general patch defenses,despite good performance for square patches and efficient certification according to Table 1. ONCLUSION AND O UTLOOK
We have introduced a novel framework B AG C ERT that combines efficient certification with end-to-end training for certified robustness. The main contributions are a model architecture based on aCNN with small receptive field, certification conditions that are applicable to a broad range of mod-els, and a margin-loss based objective that is derived from the certification condition. The resultingmodel achieves high certified robustness against patches with a broad range of sizes, aspect ratios,and locations on CIFAR10 and ImageNet. Promising directions for future work are the explorationof other choices for the spatial aggregation function g (such as ones using the “detect-and-mask”mechanism from PatchGuard (Xiang et al., 2020)) and corresponding certification conditions andlosses that can be used for end-to-end training. Moreover, the development of alternative choices8ublished as a conference paper at ICLR 2021 (24, 1) (12, 2) (8, 3) (6, 4) (4, 6) (3, 8) (2, 12) (1, 24)Patch shape0204060 C e r t i f i e d A cc u r a c y Margin M=0.5RF5791113DS (CS, s=4) (24, 1) (12, 2) (8, 3) (6, 4) (4, 6) (3, 8) (2, 12) (1, 24)Patch shapeRF=7Margin M0.250.50.751.0DS (CS, s=4)
Figure 4: Certified accuracy against non-square patches of total size 24 pixels on CIFAR10. Shownis the performance for different receptive fields of B AG C ERT (left) and train margins (right) com-pared to Derandomized Smoothing with Column-Smoothing (Levine & Feizi, 2020). Lines corre-spond to the same model (without retraining), evaluated against patches of different aspect rations.for models with small receptive fields could be promising, such as ones based on learnable receptivefields or based on self-attention. Moreover, applying B AG C ERT to other modalities than imageswould be an exciting avenue. R EFERENCES
Anish Athalye, Nicholas Carlini, and David A. Wagner. Obfuscated gradients give a false sense ofsecurity: Circumventing defenses to adversarial examples. In
Proceedings of the 35th Interna-tional Conference on Machine Learning (ICML) , 2018a. URL http://proceedings.mlr.press/v80/athalye18a.html .Anish Athalye, Logan Engstrom, Andrew Ilyas, and Kevin Kwok. Synthesizing robust adversarialexamples. In
International Conference on Machine Learning (ICML) , 2018b. URL http://arxiv.org/abs/1707.07397 .Yoshua Bengio, Nicholas L´eonard, and Aaron Courville. Estimating or Propagating GradientsThrough Stochastic Neurons for Conditional Computation. arXiv e-prints , art. arXiv:1308.3432,August 2013.Wieland Brendel and Matthias Bethge. Approximating CNNs with bag-of-local-features modelsworks surprisingly well on imagenet. In
International Conference on Learning Representations ,2019. URL https://openreview.net/forum?id=SkfMWhAqYQ .Tom Brown, Dandelion Mane, Aurko Roy, Martin Abadi, and Justin Gilmer. Adversarial patch. In
Conference on Neural Information Processing System (NIPS) , 2017. URL https://arxiv.org/pdf/1712.09665.pdf . arXiv: 1712.09665.Ping-yeh Chiang, Renkun Ni, Ahmed Abdelkader, Chen Zhu, Chris Studor, and Tom Goldstein. Cer-tified defenses for adversarial patches. In
International Conference on Learning Representations ,2020.Jeremy Cohen, Elan Rosenfeld, and Zico Kolter. Certified adversarial robustness via random-ized smoothing. volume 97 of
Proceedings of Machine Learning Research , pp. 1310–1320,Long Beach, California, USA, 09–15 Jun 2019. PMLR. URL http://proceedings.mlr.press/v97/cohen19c.html .Francesco Croce, Maksym Andriushchenko, Naman D. Singh, Nicolas Flammarion, and MatthiasHein. Sparse-rs: a versatile framework for query-efficient sparse black-box adversarial attacks.
CoRR , 2020. URL https://arxiv.org/abs/2006.12834 .9ublished as a conference paper at ICLR 2021Franklin C. Crow. Summed-area tables for texture mapping. In
Proceedings of the 11th AnnualConference on Computer Graphics and Interactive Techniques , SIGGRAPH ’84, pp. 207–212,New York, NY, USA, 1984. Association for Computing Machinery. ISBN 0897911385. doi:10.1145/800031.808600. URL https://doi.org/10.1145/800031.808600 .Xavier Gastaldi. Shake-Shake regularization. arXiv e-prints , art. arXiv:1705.07485, May 2017.Timon Gehr, Matthew Mirman, Dana Drachsler-Cohen, Petar Tsankov, Swarat Chaudhuri, andMartin T. Vechev. AI2: safety and robustness certification of neural networks with abstract in-terpretation. In , 2018. URL https://doi.org/10.1109/SP.2018.00058 .Sven Gowal, Krishnamurthy (Dj) Dvijotham, Robert Stanforth, Rudy Bunel, Chongli Qin, JonathanUesato, Relja Arandjelovic, Timothy Mann, and Pushmeet Kohli. Scalable verified training forprovably robust image classification. In
The IEEE International Conference on Computer Vision(ICCV) , October 2019.Jamie Hayes. On visible adversarial perturbations & digital watermarking. In
IEEE Con-ference on Computer Vision and Pattern Recognition Workshops (CVPR) , 2018. URL http://openaccess.thecvf.com/content_cvpr_2018_workshops/w32/html/Hayes_On_Visible_Adversarial_CVPR_2018_paper.html .Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for im-age recognition. In
Computer Vision and Pattern Recognition (CVPR) , 2016. URL https://arxiv.org/abs/1512.03385 .Xiaowei Huang, Marta Kwiatkowska, Sen Wang, and Min Wu. Safety verification of deep neuralnetworks. In
Computer Aided Verification - 29th International Conference (CAV) , 2017. URL https://doi.org/10.1007/978-3-319-63387-9_1 .Yi Huang, Adams Wai Kin Kong, and Kwok-Yan Lam. Adversarial signboard against object detec-tor. In
British Machine Vision Conference (BMVC) , 2019.Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training byreducing internal covariate shift. In
Proceedings of the 32nd International Conference on In-ternational Conference on Machine Learning - Volume 37 , ICML’15, pp. 448–456. JMLR.org,2015.Alex Krizhevsky. Learning multiple layers of features from tiny images.
University of Toronto , 052009.M. Lecuyer, V. Atlidakis, R. Geambasu, D. Hsu, and S. Jana. Certified robustness to adversarialexamples with differential privacy. In , pp.656–672, 2019.Mark Lee and J. Zico Kolter. On physical adversarial patches for object detection.
Interna-tional Conference on Machine Learning (Workshop) , 2019. URL http://arxiv.org/abs/1906.11897 .Alexander Levine and Soheil Feizi. (De)Randomized Smoothing for Certifiable Defense againstPatch Attacks. arXiv e-prints , art. arXiv:2002.10733, February 2020.Bai Li, Changyou Chen, Wenlin Wang, and Lawrence Carin. Certified adversarial robust-ness with additive noise. In
Advances in Neural Information Processing Systems 32 , pp.9464–9474. Curran Associates, Inc., 2019. URL http://papers.nips.cc/paper/9143-certified-adversarial-robustness-with-additive-noise.pdf .Xin Liu, Huanrui Yang, Ziwei Liu, Linghao Song, Yiran Chen, and Hai Li. DPATCH: an adversarialpatch attack on object detectors. In
Workshop on Artificial Intelligence Safety 2019 co-locatedwith the Thirty-Third AAAI Conference on Artificial Intelligence 2019 (AAAI-19) , 2019. URL http://ceur-ws.org/Vol-2301/paper_5.pdf .10ublished as a conference paper at ICLR 2021Raphael Gontijo Lopes, Dong Yin, Ben Poole, Justin Gilmer, and Ekin D. Cubuk. Improving ro-bustness without sacrificing accuracy with patch gaussian augmentation.
CoRR , 2019. URL http://arxiv.org/abs/1906.02611 .I. Loshchilov and F. Hutter. Sgdr: Stochastic gradient descent with warm restarts. In
InternationalConference on Learning Representations (ICLR) 2017 Conference Track , April 2017.Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu.Towards deep learning models resistant to adversarial attacks. In
International Conference onLearning Representations , 2018. URL https://arxiv.org/abs/1706.06083 .Muzammal Naseer, Salman Khan, and Fatih Porikli. Local gradients smoothing: Defense againstlocalized adversarial attacks. In
IEEE Winter Conference on Applications of Computer Vision(WACV) , 2019. URL https://doi.org/10.1109/WACV.2019.00143 .Anurag Ranjan, Joel Janai, Andreas Geiger, and Michael J. Black. Attacking optical flow. In
International Conference on Computer Vision (ICCV) , 2019.Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, ZhihengHuang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei.ImageNet Large Scale Visual Recognition Challenge.
International Journal of Computer Vision(IJCV) , 115(3):211–252, 2015. doi: 10.1007/s11263-015-0816-y.Aniruddha Saha, Akshayvarun Subramanya, Koninika Patil, and Hamed Pirsiavash. Adver-sarial Patches Exploiting Contextual Reasoning in Object Detection. arXiv e-prints , art.arXiv:1910.00068, Sep 2019.Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh,and Dhruv Batra. Grad-CAM: Visual Explanations from Deep Networks via Gradient-BasedLocalization.
International Journal of Computer Vision , October 2019. ISSN 1573-1405.Simen Thys, Wiebe Van Ranst, and Toon Goedem´e. Fooling automated surveillance cameras: Ad-versarial patches to attack person detection. In
IEEE Conference on Computer Vision and PatternRecognition Workshops (CVPR) , 2019.Vincent Tjeng and Russ Tedrake. Verifying neural networks with mixed integer programming.
CoRR , 2017. URL http://arxiv.org/abs/1711.07356 .Jonathan Uesato, Brendan O’Donoghue, Pushmeet Kohli, and A¨aron van den Oord. Adversarial riskand the dangers of evaluating against weak attacks. In
Proceedings of the 35th International Con-ference on Machine Learning (ICML) , 2018. URL http://proceedings.mlr.press/v80/uesato18a.html .Eric Wong and J. Zico Kolter. Provable defenses against adversarial examples via the convex outeradversarial polytope. In Jennifer G. Dy and Andreas Krause (eds.),
Proceedings of the 35thInternational Conference on Machine Learning (ICML) , 2018. URL http://proceedings.mlr.press/v80/wong18a.html .Eric Wong, Frank R. Schmidt, Jan Hendrik Metzen, and J. Zico Kolter. Scaling provable adversarialdefenses. In
Advances in Neural Information Processing Systems 31: Annual Conference onNeural Information Processing Systems 2018 (NeurIPS) , 2018. URL http://papers.nips.cc/paper/8060-scaling-provable-adversarial-defenses .Tong Wu, Liang Tong, and Yevgeniy Vorobeychik. Defending against physically realizable attackson image classification. In
International Conference on Learning Representations (ICLR) , 2020.URL https://arxiv.org/abs/1909.09552 .Chong Xiang, Arjun Nitin Bhagoji, Vikash Sehwag, and Prateek Mittal. PatchGuard: ProvableDefense against Adversarial Patches Using Masks on Small Receptive Fields. arXiv e-prints , art.arXiv:2005.10884, May 2020.Sangdoo Yun, Dongyoon Han, Seong Joon Oh, Sanghyuk Chun, Junsuk Choe, and Youngjoon Yoo.Cutmix: Regularization strategy to train strong classifiers with localizable features. In
The IEEEInternational Conference on Computer Vision (ICCV) , October 2019.11ublished as a conference paper at ICLR 2021Zhanyuan Zhang, Benson Yuan, Michael McCoyd, and David Wagner. Clipped BagNet: Defend-ing Against Sticker Attacks with Clipped Bag-of-features. In , 2020. 12ublished as a conference paper at ICLR 2021
A A
PPENDIX
A.1 E
XPERIMENTAL D ETAILS
A.1.1 CIFAR10We use the following class of models for CIFAR10: we use a ResNet (He et al., 2016) base architec-ture, consisting of a single 3x3 convolution stem, followed by 8 residual blocks. We use stride onein all operations (that is: output resolution is × ) and use a constant width of throughoutthe network. The last layer consists of a × convolution with 10 outputs. All layers use batchnormalization (Ioffe & Szegedy, 2015) and ReLU. Blocks get assigned either a kernel size of 1 or3, depending on the desired receptive field of the network. The following table summarizes thekernel-sizes of different blocks used in the experiments:RF of B AG C ERT stem b1 b2 b3 b4 b5 b6 b7 b85 3 3 1 1 1 1 1 1 17 3 3 1 3 1 1 1 1 19 3 3 1 3 1 3 1 1 111 3 3 1 3 1 3 1 3 113 3 3 3 3 1 3 1 3 1Residual blocks use shake-shake regularization (Gastaldi, 2017) in the batch-wise mode. For resid-ual blocks with kernel size 3, a special form of shake-shake regularization is used: the first residualpath applies a × convolution followed by a × convolution, while the second residual pathapplies first a × convolution followed by a × convolution. This increases diversity of pathswithout changing the total receptive field of the network. Besides that, no additional regularizationis applied, that is weight decay is . , and the one-hot penalty is set to σ = 0 . .For training, we use the Adam optimizer with learning rate . , batch size , and train for epochs. We apply a cosine decay learning rate schedule (Loshchilov & Hutter, 2017) with a warmupof epochs. Moreover, we apply random horizontal flips and random crops with padding 4 for dataaugmentation.A.1.2 I MAGE N ET We work on × inputs, which are extracted by rescaling the shorter side of the image to pixels and extracting a random crop (training phase) or center crop (test phase) of size × .Note that this input resolution differs from the × resolution used by Derandomized Smooth-ing Levine & Feizi (2020). In order to achieve comparable results, we evaluate against patches ofsize × ( / ≈ . ) while DS test against patches of size × ( / ≈ . ).We use the following class of models for ImageNet: We use a ResNet base architecture, consistingof a single 3x3 convolution stem, followed by 8 residual blocks. We use stride 2 in blocks 1 and3 and stride 1 otherwise (that is: output resolution is × ). We use width in the stem andthe first two blocks, width in blocks 3 and 4, width in blocks 5 and 6, and width inblocks 7 and 8. The last layer consists of a × convolution with 1000 outputs. All layers usebatch normalization and ReLU. Blocks get assigned either a kernel size of 1 or 3, depending on thedesired receptive field of the network. The following table summarizes the kernel-sizes of differentblocks used in the experiments:RF of B AG C ERT stem b1 b2 b3 b4 b5 b6 b7 b817 3 3 1 3 1 3 1 1 125 3 3 1 3 1 3 1 3 129 3 3 3 3 1 3 1 3 1We apply neither shake-shake regularization nor weight decay. However, we set the one-hot penaltyto σ = 1 . . For training, we use the Adam optimizer with learning rate . , batch size , andtrain for epochs. We apply a cosine decay learning rate schedule with a warmup of epochs.Moreover, we apply random horizontal flips for data augmentation.13ublished as a conference paper at ICLR 2021Figure 5: Illustration of B AG C ERT certification for a 1D input and two classes. We assume for thisexample that L consists only of a single element; that is: the attacker can only place a patch atlocation l = { } , shown by the checkerboard pattern in the input. The resulting R ( l ) consists of thethree top elements in region score space s (shown again by a checkerboard pattern). Accordingly, R max ( L ) = 3 . (Top) Certification via Condition 3.3: The regular network output +5 is comparedto · R max ( L ) = +6 . Since ≤ , the robustness of the prediction cannot be certified. (Bottom)Certification via Condition 3.2: region scores s are replaced by s wc based on R ( l ) . The resultingnetwork output is +1 , which is greater than . Thus, robustness of prediction can be certified.14ublished as a conference paper at ICLR 2021 Clean image Region Scores (True class) Region Scores (Target class)Adversarial image Region Scores (True class) Region Scores (Target class)
Figure 6: Illustration of an adversarial patch attack and its effect on region scores. Top row corre-sponds to the clean image (left), the resulting score maps for the true class (middle), and the scoremaps for the chosen target class (right). The bottom row shows the same for the image with anadversarial × patch inserted at the chosen region l . The red rectangle corresponds to R ( l ) .A.2 R OBUSTNESS AGAINST H EURISTIC P ATCH A TTACK
While the certification conditions proposed in Section 3.1 allow computing a lower bound of amodel’s robustness against a specific type of patch attack, a model’s true robustness against suchattacks can be anywhere between this lower bound and the clean accuracy. In order to determinea tighter upper bound on robustness than clean accuracy, we perform a heuristic adversarial patchattack on the model and evaluate the model’s accuracy on inputs that were modified by the attacker.Our threat model from Section 3 allows an attacker to place an arbitrary patch p ∈ [0 , n × c in atan arbitrary region l ∈ L . We employ the following approach: we first select a region l ∗ ∈ L andtarget class c ∗ , and (once selected) keep this region and target class fixed and optimize the patch p accordingly. Please note that no guarantee exists that actually the best region for an attack or thebest patch are determined; thus, the resulting adversarial accuracy is only an upper bound.Specifically, we focus in this evaluation on × square patches on CIFAR-10. Accordingly, L consists of all possible × subregions of a × input. Ideally, one would perform independentattacks at all possible regions l ∈ L . However, this becomes quickly computationally intractable.We exploit specific design choices of B AG C ERT to select one region and target class that may beparticularly problematic for a model on an input assuming a spatial sum aggregation is applied. Forthis, we make directly use of Condition 3.2 and choose l ∗ , c ∗ = arg min l,c (cid:80) i,j / ∈ R ( l ) ∆ i,j,c . Thischoice corresponds to assuming a maximally effective patch attack that is able to achieve ∆ i,j,c ∗ = − ∀ ( i, j ) ∈ R ( l ) . A practical patch attack might not be able to achieve this ideal outcome (seealso Figure 6) and thus l ∗ , c ∗ are not necessarily optimal. However, they are reasonable choices thatcan be determined efficiently.Once l ∗ and c ∗ are fixed, we perform a PGD attack (Madry et al., 2018) with steps, a step sizeof . , and the objective of maximizing the loss ˜ L R from Section 3.3 with margin M = 1 . Anillustration of such an attack is shown in Figure 6.Figure 7 shows scatter plots of clean versus adversarial accuracy (left) and certified versus adversar-ial accuracy (right) for the B AG C ERT models also shown in Figure 2. Interestingly, while clean andadversarial accuracy are highly correlated, the same does not hold true for certified and adversar-ial accuracy. In particular, adversarial accuracy seems to favor slightly larger receptive fields thancertified accuracy. A potential reason for this can be seen in Figure 6: while a patch attack is typi-cally effective for flipping the score of true and target class for the inner part of R ( l ) , it seems a lotharder to flip also scores close to the boundary of R ( l ) . For larger receptive fields of the model, thisboundary effect seems to be amplified since the patch is smaller relative to the receptive field size.15ublished as a conference paper at ICLR 2021
75 80 85 90 95Clean Accuracy6065707580 A d v e r s a r i a l A cc u r a c y RF5791113 30 40 50 60Certfied Accuracy
Figure 7: Scatter plots of clean versus adversarial accuracy (left) and certified versus adversarialaccuracy (right). Color encodes different receptive filed sizes. T o p A cc u r a c y Score FunctionHeaviside stepsigmoidsoftmax 0 20 40 60Epochs0204060 C e r t i f i e d T o p A cc u r a c y Figure 8: Comparison of Heaviside step function with alternative choices such as element-wisesigmoid or channel-wise softmax for five independent runs with different random seeds.We consider reducing this gap between certified and adversarial accuracy further (that is: makingbounds tighter) important future work. This will require both developing more effective attacks aswell as improving certification procedures.A.3 E
XPLORING A LTERNATIVES TO THE H EAVISIDE S TEP F UNCTION
In this section, we explore alternative to the Heaviside step function as the last layer in the regionscorer f θ (see Section 3.2). For this, we relax region scores s to be arbitrary values between and , that is we have s = f θ ( x ) ∈ [0 , w out × h out × c out . More specifically, we explore the element-wise sigmoid function si ( x i,j,c ) = e − xi,j,c and the channel-wise softmax function sm ( x i,j,c ) = e xi,j,c (cid:80) c (cid:48) e xi,j,c (cid:48) . Note that for the channel-wise softmax, (cid:80) c s i,j,c = 1 , while this is not enforced byelement-wise sigmoid and Heaviside step function.Figure 8 shows clean and certified accuracy (via Condition 3.2) on validation data during modeltraining for five independent runs with the same configuration but different random seeds. TheHeaviside step function ensure stable convergence to high clean and certified accuracy in all fiveruns (please note that performance is slightly lower than in Figure 2 because training was stoppedafter 75 epochs). The channel-wise softmax also shows consistent convergence, albeit to a lowerlevel of performance. We attribute this to the hard constraint (cid:80) c s i,j,c = 1= 1