[PDF] Efficient Certified Defenses Against Patch Attacks on Image Classifiers

Abstract

Adversarial patches pose a realistic threat model for physical world attacks on autonomous systems via their perception component. Autonomous systems in safety-critical domains such as automated driving should thus contain a fail-safe fallback component that combines certifiable robustness against patches with efficient inference while maintaining high performance on clean inputs. We propose BagCert, a novel combination of model architecture and certification procedure that allows efficient certification. We derive a loss that enables end-to-end optimization of certified robustness against patches of different sizes and locations. On CIFAR10, BagCert certifies 10.000 examples in 43 seconds on a single GPU and obtains 86% clean and 60% certified accuracy against 5x5 patches.

Full PDF

PPublished as a conference paper at ICLR 2021 E FFICIENT C ERTIFIED D EFENSES A GAINST P ATCH A T - TACKS ON I MAGE C LASSIFIERS

Jan Hendrik Metzen & Maksym Yatsura

Bosch Center for Artiﬁcial Intelligence, Robert Bosch GmbHRobert-Bosch-Campus 1, 71272 Renningen, Germany { janhendrik.metzen,maksym.yatsura } @de.bosch.com A BSTRACT

Adversarial patches pose a realistic threat model for physical world attacks onautonomous systems via their perception component. Autonomous systems insafety-critical domains such as automated driving should thus contain a fail-safefallback component that combines certiﬁable robustness against patches with efﬁ-cient inference while maintaining high performance on clean inputs. We proposeB AG C ERT , a novel combination of model architecture and certiﬁcation procedurethat allows efﬁcient certiﬁcation. We derive a loss that enables end-to-end op-timization of certiﬁed robustness against patches of different sizes and locations.On CIFAR10, B AG C ERT certiﬁes 10.000 examples in seconds on a single GPUand obtains 86% clean and 60% certiﬁed accuracy against × patches. NTRODUCTION

Adversarial patches (Brown et al., 2017) are one of the most relevant threat models for attacks onautonomous systems such as highly automated cars or robots. In this threat model, an attacker canfreely control a small subregion of the input (the “patch”) but needs to leave the rest of the inputunchanged. This threat model is relevant because it corresponds to a physically realizable attack(Lee & Kolter, 2019): an attacker can print the adversarial patch pattern, place it in the physicalworld, and it will become part of the input of any system whose ﬁeld of view overlaps with thephysical patch. Moreover, once an attacker has generated a successful patch pattern, this pattern canbe easily shared, will be effective against all systems using the same perception component, and anattack can be conducted without requiring access to the individual system. This makes for instanceattacking an entire ﬂeet of cars of the same vendor feasible.While several empirical defenses were proposed (Hayes, 2018; Naseer et al., 2019; Selvaraju et al.,2019; Wu et al., 2020)), these only offer robustness against known attacks but not necessarily againstmore effective attacks that may be developed in the future (Chiang et al., 2020). In contrast, certiﬁeddefenses for the patch threat model (Chiang et al., 2020; Levine & Feizi, 2020; Zhang et al., 2020;Xiang et al., 2020) allow guaranteed robustness against all possible attacks for the given threatmodel. Ideally, a certiﬁed defense should combine high certiﬁed robustness with efﬁcient inferencewhile maintaining strong performance on clean inputs. Moreover, the training objective should bebased on the certiﬁcation problem to avoid post-hoc calibration of the model for certiﬁcation.Existing defenses do not satisfy all of these conditions: Chiang et al. (2020) proposed an approachthat extends interval-bound propagation (Gowal et al., 2019) to the patch threat model. In this ap-proach, there is a clear connection between training objective and certiﬁcation problem. However,certiﬁed accuracy is relatively low and clean performance severely affected (below on CI-FAR10). Moreover, inference requires separate forward passes for all possible patch positions andis thus computationally very expensive. Derandomized smoothing (Levine & Feizi, 2020) achievesmuch higher certiﬁed and clean performance on CIFAR10 and even scales to ImageNet. However,inference is computationally expensive since it is based on separately propagating many differentlyablated versions of a single input. Moreover, training and certiﬁcation are disconnected and a sepa-rate tuning of parameters of the post-hoc certiﬁcation procedure on some hold-out data is required,a drawback shared also by Clipped BagNet Zhang et al. (2020) and PatchGuard (Xiang et al., 2020).1 a r X i v : . [ c s . L G ] F e b ublished as a conference paper at ICLR 2021In this work, we propose B AG C ERT , which combines high certiﬁed accuracy ( on CIFAR10 for × patches) and clean performance ( on CIFAR10), efﬁcient inference (43 seconds on a singleGPU for the . CIFAR10 test samples), and end-to-end training for robustness against patchesof varying size, aspect ratio, and location. B AG C ERT is based on the following contributions:• We propose three different conditions that can be checked for certifying robustness. Oneof these corresponds to the condition proposed by Levine & Feizi (2020). However, weshow that an alternative condition improves certiﬁed accuracy of the same model typicallyby roughly percent points while remaining broadly applicable.• We derive a loss function that directly optimizes for certiﬁed accuracy against a uniformdistribution of patch sizes at arbitrary positions. This loss corresponds to a speciﬁc type ofthe well known class of margin losses.• Similarly to Levine & Feizi (2020), we classify images via a majority voting over a largenumber of predictions that are based on small local regions of a single input. However,the proposed model achieves this via a single forward-pass on the unmodiﬁed input, byutilizing a neural network architecture with very small receptive ﬁelds, similar to BagNets(Brendel & Bethge, 2019). This enables efﬁcient inference with surprisingly high cleanaccuracy and was concurrently proposed by Zhang et al. (2020) and Xiang et al. (2020). ELATED W ORK

Adversarial Patch Attacks

Vulnerability of image classiﬁers to adversarial patch attacks wasﬁrst demonstrated by Brown et al. (2017). They show that a speciﬁcally crafted physical adversarialpatch is able to fool multiple ImageNet models into predicting the wrong class with high conﬁ-dence. Numerous patch attacks were proposed for object detection (Liu et al., 2019; Lee & Kolter,2019; Thys et al., 2019; Huang et al., 2019) and optical ﬂow estimation (Ranjan et al., 2019). Theversatility of these attacks allows them to perform efﬁciently in the black box setup (Croce et al.,2020) as well as suppressing detected objects in a scene without overlapping any of them (Lee &Kolter, 2019). Following Athalye et al. (2018b), adversarial patches can be printed out and placed inthe physical world to fool different models independently from the scaling, rotation, brightness andother visual transformations. These factors make adversarial patch attacks a non-negligible threatfor the safety-critical perception systems (Thys et al., 2019).

Heuristic Defenses Against Patch Attacks

Several heuristic defenses against adversarial patchessuch as digital watermarking (Hayes, 2018) or local gradient smoothing (Naseer et al., 2019) havebeen proposed. However, similarly to the results obtained for the norm-bounded adversarial attacks(Athalye et al., 2018a), it was demonstrated that these defenses can be easily broken by white-boxattacks which account for the pre-processing steps in the optimization procedure (Chiang et al.,2020). The role of spatial context in the object detection algorithms which makes them vulnerableto the patch attacks was investigated by Saha et al. (2019) and an empirical defense based on Grad-CAM (Selvaraju et al., 2019) was proposed. Existing augmentation techniques based on addingGaussian noise patch (Lopes et al., 2019) or a patch from a different image (Yun et al., 2019) increaserobustness against occlusions caused by adversarial patches. Wu et al. (2020) propose a defense thatuses adversarial training to increase robustness against occlusion attacks.

Certiﬁed Defenses

Evaluating defense methods using their performance against empirical attackscan lead to the false sense of security since stronger adversaries might be developed in the futurethat break the defenses (Athalye et al., 2018a; Uesato et al., 2018). Therefore, it is important tohave guarantees of robustness. Numerous works were proposed in the ﬁeld of certiﬁed robustnessranging from complete veriﬁers ﬁnding the worst-case adversarial examples exactly (Huang et al.,2017; Tjeng & Tedrake, 2017) to faster but less accurate incomplete methods that provide an upperbound on the robust error (Gehr et al., 2018; Wong & Kolter, 2018; Wong et al., 2018; Gowal et al.,2019). Another line of work is based on Randomized Smoothing (Lecuyer et al., 2019; Li et al.,2019; Cohen et al., 2019), which exhibits strong empirical results and scales to ImageNet, howeverat the cost of increasing inference time by orders of magnitude. Certiﬁed defenses crafted for thepatch attacks were ﬁrst proposed by Chiang et al. (2020). They adapt the IBP method (Gowal et al.,2019) to the patch threat model. Although their approach allows to obtain robustness guarantees, itonly scales to small patches and causes a signiﬁcant drop in clean accuracy. Levine & Feizi (2020)2ublished as a conference paper at ICLR 2021Figure 1: Illustration of B AG C ERT training for a 1D input and two classes. An input X is processedby region scorer f θ , consisting of a 3-layer CNN with kernel sizes 3, 1, and 3. The resulting con-tinuous region scores are passed through a Heaviside step function (replaced by a sigmoid in thebackward pass) to obtain binary region scores s for every class. The differences ∆ between true andnon-true class scores are then processed by spatial aggregation g , in this case simply summing themvia g = g (cid:80) . The resulting value is maximized by passing it into margin loss L .proposed (de)randomized smoothing: they train a base classiﬁer for classifying images where all buta small local region is ablated. At inference time, many (or even all) possible ablations are classiﬁedand a majority vote determines the ﬁnal classiﬁcation. If this majority vote is with sufﬁcient margin,the decision is provable robust against patch attacks because a patch will be completely ablatedin most of the inputs and can thus only inﬂuence a minority of the votes. This method providessigniﬁcant accuracy improvement when compared to (Chiang et al., 2020) and allows training andcertifying ImageNet models. However, its inference in block-smoothing mode is computationallyexpensive. A last line of work is based on using models with small receptive ﬁelds such as BagNets(Brendel & Bethge, 2019): Zhang et al. (2020) apply a clipping function while Xiang et al. (2020)apply a “detect-and-mask” ﬁlter to the logits of pretrained BagNets before global averaging. Usingsmall receptive ﬁelds limits the number of region scores affected by a local patches while clippingand masking ensure that few very large region scores cannot dominate the global average. We notethat these approaches do not train models directly for certiﬁed robustness but rather achieve it byapplying post-hoc procedures that come with additional hyperparameters that require careful tuning. ETHOD

We introduce B AG C ERT , a framework which consists of novel conditions for certifying robustness, aspeciﬁc model architecture, and a new end-to-end training procedure. B AG C ERT allows end-to-endtraining of classiﬁers whose robustness against adversarial patch attacks can be certiﬁed efﬁciently.We outline our approach for the task of image classiﬁcation but note that it can be extended to othertasks with grid-structured inputs. We refer to Figure 1 for an illustration of the training phase and toFigure 5 in the supplementary material for an illustration of certiﬁcation of B AG C ERT . Threat Model

We consider a threat model in which an attacker can conduct an image-dependentpatch attack. Let x ∈ [0 , w in × h in × c in be an input image of resolution w in × h in with c in channels.Let p be a patch and l be a region of an image x having the same size as patch p . We denote a set offeasible regions l as L . For example, for a patch p ∈ [0 , n × c in consisting of n pixels, L could bethe set of all w p × h p rectangular regions l of an image x with w p · h p = n . We deﬁne an operator A such that A ( x, p, l ) is the result of placing a patch p onto an image x over a region l . We assume thatthe attacker has white-box knowledge of the model and conducts an input-dependent attack, that isattack region l and inserted patch p can be chosen for every input independently.3ublished as a conference paper at ICLR 20213.1 C ERTIFICATION

We base our method for certiﬁcation on assuming a certain structure of the classiﬁer. More speciﬁ-cally, we decompose the classiﬁer into two components:• A region scorer f θ that maps from inputs x to region scores s ∈ { , } w out × h out × c out ,where w out × h out is the output resolution, c out is the number of classes, and θ are trainableparameters. Please note that we allow (cid:80) c s i,j,c (cid:54) = 1 .• A spatial aggregator g that maps from region scores s to (global) class scores S ∈ [0 , c out .In this work, we restrict g to be monotonically increasing, that is: for class c and two patchscore maps s (1) and s (2) with s (1) i,j,c ≥ s (2) i,j,c ∀ i, j , we require g ( s (1) ) c ≥ g ( s (2) ) c ∀ c .Generally, we base certiﬁcation on upper bounding the effect of an actual attack in the threat model.For this, we only exploit architectural properties of f and g that are valid for any choice of modelparameters θ . More speciﬁcally, we only exploit the output dependency map R of f , which wedeﬁne as R ( l ) = { ( i, j ) | ∃ x, θ, p : f θ ( A ( x, p, l )) i,j (cid:54) = f θ ( x ) i,j } . Informally, R ( l ) is the set of allindices of the score map that can be affected by a patch applied at region l , for any choice of input x , patch p , and parameters θ . That is: the set of all outputs of f θ whose receptive ﬁelds overlap with l . We discuss options for f and the resulting R in Section 3.2.For input x with class label c t and s = f θ ( x ) , we deﬁne the ”worst-case” score map s wc ( s, l ) as s wci,j,c ( s, l ) =  s i,j,c if ( i, j ) / ∈ R ( l )1 if ( i, j ) ∈ R ( l ) ∧ c (cid:54) = c t if ( i, j ) ∈ R ( l ) ∧ c = c t Moreover, we deﬁne ∆ i,j,c = s i,j,c t − s i,j,c and similarly ∆ wci,j,c = s wci,j,c t − s wci,j,c . It follows directlythat ∆ wci,j,c = ∆ i,j,c ∀ ( i, j ) / ∈ R ( l ) and ∆ wci,j,c = − ∀ ( i, j ) ∈ R ( l ) , c (cid:54) = c t .For certifying robustness in the threat model for input x with class label c t , we need to show g ( f θ ( A ( x, p, l ))) c t > g ( f θ ( A ( x, p, l ))) c ∀ c (cid:54) = c t ∀ l ∈ L ∀ p . For this, it sufﬁces to check Condition 3.1. g ( s wc ( s, l )) c t > g ( s wc ( s, l )) c ∀ c (cid:54) = c t , ∀ l ∈ L Proof.

Consider arbitrary l ∈ L and p and let s adv = f θ ( A ( x, p, l )) . With s advi,j,c ∈ { , } we obtain s advi,j,c  = s wci,j,c ( s, l ) = s i,j,c ( s, l ) if ( i, j ) / ∈ R ( l ) ≤ s wci,j,c ( s, l ) = 1 if ( i, j ) ∈ R ( l ) ∧ c (cid:54) = c t ≥ s wci,j,c ( s, l ) = 0 if ( i, j ) ∈ R ( l ) ∧ c = c t With g being monotonically increasing, we obtain g ( s adv ) c t ≥ g ( s wc ( l )) c t and for all c (cid:54) = c t g ( s wc ( l )) c ≥ g ( s adv ) c . Condition 3.1 implies g ( s adv ) c t > g ( s adv ) c ∀ c (cid:54) = c t .Checking the Condition 3.1 requires one forward-pass through f θ to obtain s = f θ ( x ) and |L| timesthe construction s wc ( s, l ) and the evaluation of g . We now consider a special case where this can beimplemented very efﬁciently.3.1.1 S PATIAL S UM A GGREGATION

For the case g = g (cid:80) ( s ) = (cid:80) w out ,h out i =1 ,j =1 s i,j , Condition 3.1 simpliﬁes to Condition 3.2. min c (cid:54) = c t (cid:80) i,j / ∈ R ( l ) ∆ i,j,c > | R ( l ) | ∀ l ∈ L We would like to note that these are “trivial” lower and upper bounds for s adv and we see the potentialto improve upon these bounds in future work, for instance by relaxing s ∈ [0 , w out × h out × c out and applyinginterval bound propagation (Gowal et al., 2019). However, the proposed simple bounds have the advantage ofnot requiring additional forward passes through the model and thus being computationally efﬁcient. Proof.

For all c (cid:54) = c t , we exploit ∀ ( i, j ) ∈ R ( l ) : ∆ wci,j,c = − . With Condition 3.2, we obtain g (cid:80) ( s wc ( l )) c t − g (cid:80) ( s wc ( l )) c = w out ,h out (cid:88) i =1 ,j =1 s wci,j,c t − w out ,h out (cid:88) i =1 ,j =1 s wci,j,c = w out ,h out (cid:88) i =1 ,j =1 ∆ wci,j,c = (cid:88) i,j / ∈ R ( l ) ∆ wci,j,c + (cid:88) i,j ∈ R ( l ) ∆ wci,j,c = (cid:88) i,j / ∈ R ( l ) ∆ i,j,c − | R ( l ) | > . We note that (cid:80) i,j / ∈ R ( l ) ∆ i,j,c = w out ,h out (cid:80) i =1 ,j =1 ∆ i,j,c − (cid:80) i,j ∈ R ( l ) ∆ i,j,c t . For the special case that all R ( l ) arerectangular, (cid:80) i,j ∈ R ( l ) ∆ i,j,c can be computed efﬁciently for all l ∈ L simultaneously via integralimages/summed-area tables (Crow, 1984). For instance, R ( l ) is rectangular for l being rectangularinput patches and the R resulting from an CNN with grid-aligned kernels.For the case that the R ( l ) are not all rectangular and |L| becomes large, checking Condition 3.2 canbecome prohibitively expensive. For this case, we derive a condition that corresponds to an upperbound on Condition 3.2 and can be evaluated in constant time with respect to |L| : Condition 3.3. min c (cid:54) = c t w out ,h out (cid:80) i =1 ,j =1 ∆ i,j,c > R max ( L ) with R max ( L ) = max l ∈L | R ( l ) | Proof. ∆ i,j,c ≤ implies (cid:80) i,j ∈ R ( l ) ∆ i,j,c ≤ | R ( l ) | ≤ R max ( L ) . For all c (cid:54) = c t , using Condition 3.3: (cid:80) i,j / ∈ R ( l ) ∆ i,j,c = w out ,h out (cid:80) i =1 ,j =1 ∆ i,j,c − (cid:80) i,j ∈ R ( l ) ∆ i,j,c t > R max ( L ) − R max ( L ) ≥ | R ( l ) | We note that Condition 3.3 corresponds to the condition proposed by Levine & Feizi (2020). It is,however, a strictly weaker condition than Condition 3.2. Thus, Condition 3.2 is preferable if all R ( l ) are rectangular or |L| is of moderate size. We refer to Figure 5 in the supplementary material for anillustration of Condition 3.2 and 3.3.3.2 M ODEL

Crucially, the quality of the certiﬁcation depends on R max ( L ) = max l ∈L | R ( l ) | : the larger thisquantity becomes, the larger the left-hand side of Condition 3.2 or Condition 3.3 needs to be tofulﬁll the condition. We focus on the speciﬁc case where f θ is realized by a convolutional neuralnetwork (CNN). In that case, | R ( l ) | is determined fully by l and the receptive ﬁeld of the CNN.More speciﬁcally, we obtain R ( l ) = { ( i, j ) | ∃ (˜ i, ˜ j ) ∈ l : | i − ˜ i | ≤ (cid:98) w rf / (cid:99) ∧ | j − ˜ j | ≤ (cid:98) h rf / (cid:99)} for a receptive ﬁeld size of w rf × h rf and ignoring operation strides.Receptive ﬁeld sizes of CNNs are determined by the shapes of the convolutional kernels as well asoperation strides. We propose using standard CNN architectures such as ResNets but replacing most × convolutions by × convolutions, using stride 1 in (nearly) all operations, and removingall dense layers. This results in a network with very small receptive ﬁeld sizes and thus small R ( l ) . We note that the proposed architecture is similar to BagNets (Brendel & Bethge, 2019) andusing this type of model was concurrently proposed for certifying robustness against patch attacks byZhang et al. (2020) and Xiang et al. (2020). BagNets obtain surprisingly high classiﬁcation accuracydespite small receptive ﬁeld sizes (Brendel & Bethge, 2019). Importantly, in contrast to BagNets,we do not apply a global average pooling on the ﬁnal feature layer. This results in a dense output ofshape w out × h out × c out . The ratios w in /w out and h in /h out depend on the strides applied in thenetwork and control mostly the computational overhead. We note that the cost for forward/backwardpasses in BagNets are in the same order of magnitude as those of a corresponding residual network.Because of the small receptive ﬁelds of BagNets, | R ( l ) | is small if l is a small contiguous region ofthe input, such as a rectangular patch. 5ublished as a conference paper at ICLR 2021We apply a Heaviside step function H ( x ) = (cid:26) , for x < , for x ≥ as ﬁnal layer of f θ , which en-sures f θ ( X ) ∈ { , } w out × h out × c out . Similar to clipping Zhang et al. (2020) and masking (Xianget al., 2020) this also ensures that a patch cannot ﬂip the global classiﬁcation by perturbing a localscore so strongly that it dominates the globally aggregated score. However, since H is constantnearly everywhere, it does not provide useful gradient information and thereby precludes end-to-end training. We address this by applying a ”straight-through” type trick (Bengio et al., 2013) wherewe replace H in the backward pass by its smooth approximation, the logistic sigmoid function s ( x ) = e − x . That is, we use H ( x ) in the forward pass but replace the true gradient of H with H (cid:48) ( x ) := s (cid:48) ( x ) = s ( x )(1 − s ( x )) . We explore alternatives to the Heaviside step function in SectionA.3 in the appendix.While the proposed model computes f θ ( X ) in a single forward-pass and controls | R ( l ) | indirectlyvia the architecture of f , we note that alternative models are compatible with B AG C ERT . For in-stance, one could compute every element of the output s i,j via a separate forward pass of an arbitrarymodel on an ablated (Levine & Feizi, 2020) or cropped version of the input similar to Mask-DS-ResNet (Xiang et al., 2020). This also ensures that a speciﬁc element of the output depends onlyon the cropper/non-ablated part of the input. While these works are more ﬂexible in terms of modelarchitecture, they require a number forward passes proportional to the resolution of the output s ,which would make inference (and end-to-end) training computationally much more expensive.3.3 E ND - TO -E ND T RAINING

Having derived conditions that can be used for certifying robustness against patch attacks in Section3.1 as well as differentiable model for the region scorer f in Section 3.2, we now deﬁne a lossfunction for end-to-end training. We restrict ourselves to the case of a spatial sum aggregation g (cid:80) .We recall Condition 3.3: min c (cid:54) = c t (cid:80) w out ,h out i =1 ,j =1 ∆ i,j,c > R max ( L ) . The corresponding loss for thiscan be deﬁned as L H (∆ , c t , R max ) = H (min c (cid:54) = c t (cid:80) w out ,h out i =1 ,j =1 ∆ i,j,c ≤ R max ) , that is: the loss is1 if there is a target class c such that (cid:80) w out ,h out i =1 ,j =1 ∆ i,j,c becomes smaller/equal two times the sizeof the maximum affected patch score region. However, this requires choosing L and the resulting R max ( L ) before training, which is undesirable. Instead, we stay agnostic with respect to the speciﬁc L and simply assume a uniform distribution for R max ( L ) , that is R max ( L ) ∼ U (0 , R ) . Here, R corresponds to the maximum patch size (in region score space) we consider. This results in the loss L R (∆ , c t ) = R (cid:90) p ( ˜ R ) L H (∆ , c t , ˜ R ) d ˜ R = R (cid:90) R H (min c (cid:54) = c t w out ,h out (cid:88) i =1 ,j =1 ∆ i,j,c ≤ R ) d ˜ R = 1 − R R (cid:90) H (min c (cid:54) = c t w out ,h out (cid:88) i =1 ,j =1 ∆ i,j,c > R ) d ˜ R = 1 − R min( 12 min c (cid:54) = c t w out ,h out (cid:88) i =1 ,j =1 ∆ i,j,c , R ) = 1 − R min(min c (cid:54) = c t w out ,h out (cid:88) i =1 ,j =1 ∆ i,j,c , R ) . In practice, we minimize ˜ L R (∆ , c t ) = − min(min c (cid:54) = c t (cid:80) w out ,h out i =1 ,j =1 ∆ i,j,c w out · h out , M ) with M = Rw out · h out .This loss can be interpreted as a margin loss with margin M , where the margin corresponds to twicethe maximum patch size in region score space against which we want to become certiﬁably robust. One-hot penalty

While we do not strictly enforce (cid:80) c s i,j,c = 1 , we sometimes found it ben-eﬁcial to add a term in the loss that encourages S = g (cid:80) ( s ) being approximately “one-hot”, thatis L oh ( S ) = max c (cid:54) = c max S c − S c max with c max = arg max c S c . Since S c ∈ [0 , , it holds that L oh ( S ) ∈ [ − , and L oh ( S ) = − iff S c max = 1 and S c = 0 ∀ c (cid:54) = c max . The term L oh ( S ) We note that other choices than the uniform distribution would be an interesting direction for future work,in particular if the defender has prior knowledge about more likely patch sizes and shapes.

75 80 85 90 95Top1 Accuracy2030405060 C e r t i f i e d T o p A cc u r a c y CIFAR10 (5x5 patch)DS (BS, theta=0.3)rf=5rf=7rf=9rf=11rf=13Mask-BNMask-DSCBN 35 40 45 50Top1 Accuracy5.010.015.020.025.0 ImageNet (2% patch size)DS (CS, s=25)rf=17rf=25rf=29Mask-BNMask-DSCBN

Figure 2: Clean versus certiﬁed accuracy on CIFAR10 and ImageNet for B AG C ERT with differentreceptive ﬁelds and train margins ( M ∈ { . , . , . , . } for CIFAR10, M = 0 . for Ima-geNet) when certifying via Condition 3.3 (circles) and Condition 3.2 (stars), same setting connectedby thin line. Smaller M generally corresponds to larger clean accuracy for CIFAR10. Baselines areDerandomized Smoothing (DS) (Levine & Feizi, 2020), Masked BagNet (Mask-BN) and MaskedDS-ResNet (Mask-DS) (Xiang et al., 2020), and Clipped BagNet (CBN) (Zhang et al., 2020). Re-sults for these baselines are taken from the respective papers.prevents training from prematurely converging to a solution where s i,j,c is approximately constantfor all i, j, c , which we observed otherwise for tasks with many classes (e.g. ImageNet). The totalloss becomes L total = ˜ L R (∆ , c t ) + σL oh ( S ) , where σ controls the strength of this one-hot penalty. XPERIMENTS

We perform an empirical evaluation of B AG C ERT on CIFAR10 (Krizhevsky, 2009) and ImageNet(Russakovsky et al., 2015). We report clean and certiﬁed accuracy and compare to Interval BoundPropagation (IBP) (Chiang et al., 2020), Derandomized Smoothing (DS) (Levine & Feizi, 2020),Clipped BagNet (CBN) (Zhang et al., 2020), and PatchGuard (Xiang et al., 2020). For DS, wefocus on block-smoothing and for PatchGuard, we focus on the masked BagNet (Mask-BN) becausecolumn smoothing for DS (and the derived Mask-DS for PatchGuard) perform poorly for non-squarepatches that are “short-but-wide” (see Figure 4). We notice that column smoothing and Mask-DS perform better than column smoothing and Mask-BN against square-patches; however, thereis no reason an attacker should prefer square over non-square rectangular patches. Details on theB AG C ERT model architecture and training can be found in Appendix A.1. Moreover, we focus oncertiﬁed accuracy, a lower bound on the actual robustness of a model. Results for accuracy against astrong adversarial patch attack, corresponding to an upper bound on actual robustness, are discussedin Section A.2 in the appendix.Figure 2 shows results for different methods against × patches for CIFAR10 corresponding to . of the image size and patches of of the image size for ImageNet. For CIFAR10, when cer-tifying accuracy via Condition 3.3, the Pareto frontier of B AG C ERT follows closely the one reportedfor DS with block smoothing and θ = 0 . . This is somewhat surprising given that both model andtraining procedure are very different and only the condition for certifying robustness is identical.We hypothesize that both approaches have reached close to optimal Pareto frontiers when certifyingrobustness via Condition 3.3. However, as Table 1 shows, B AG C ERT requires (depending on itsreceptive ﬁeld size) only between . and . seconds for certifying all . test examples ona single Tesla V100 SXM2 GPU while DS with block smoothing requires seconds. B AG C ERT also clearly dominates Mask-BN and CBN, which utilize a similar model architecture, as well asIBP (not shown) which reaches . clean and . certiﬁed accuracy. Moreover, when ap-plying Condition 3.2 for certiﬁcation, certiﬁed accuracy is increased by approx. 3 percent pointswithout changes in clean accuracy or any noticeable increase in certiﬁcation time. In summary, the7ublished as a conference paper at ICLR 2021RF of B AG C ERT (0, 0) (5, 5) (10, 10) (15, 15)Patch size0255075100 C e r t i f i e d A cc u r a c y Margin M=0.5 RF5791113IBPDS (BS, s=12)Mask-BNCBN (0, 0) (5, 5) (10, 10) (15, 15)Patch sizeRF=7 Margin M0.250.50.751.0IBPDS (BS, s=12)Mask-BNCBN

Figure 3: Certiﬁed accuracy against square patches of different sizes on CIFAR10. Shown is theperformance for different receptive ﬁelds of B AG C ERT (left) and train margins (right). Lines corre-spond to the same model (without retraining), evaluated against patches of different size.strongest B AG C ERT model with receptive ﬁeld × and margin M = 0 . can certify all . test examples in . seconds, reaching clean accuracy of and certiﬁed accuracy of .On ImageNet, B AG C ERT also dominates all baselines in terms of certiﬁed accuracy, reaching . via Condition 3.3 and . via Condition 3.2 for receptive ﬁeld size and margin M = 0 . .Running certiﬁcation for the entire validation set of . images takes roughly minutes.Figure 3 shows accuracy of B AG C ERT certiﬁed via Condition 3.2 for square patches of differentsizes on CIFAR10. Again, baselines are dominated for both × and × patches. Moreover, asingle conﬁguration of B AG C ERT with receptive ﬁeld size 7 and margin M = 0 . performs closeto optimal for all patch sizes and can certify non-trivial performance for up to × patch size.This implies that a single model can be used for a broad range of threat models. Figure 4 showsa similar analysis for non-square patches of a total size of 24 pixels. While B AG C ERT with thesame conﬁguration as above achieves a certiﬁed accuracy of or more for any patch aspect ratio,performance of DS with column smoothing varies greatly with aspect ratio. In particular, “short-but-wide” patches of shape × or × reduce certiﬁed accuracy of column smoothing closeto . Since there is no reason to assume attackers will restrict themselves to square patches, we donot consider DS with column smoothing or Mask-DS (Xiang et al., 2020) general patch defenses,despite good performance for square patches and efﬁcient certiﬁcation according to Table 1. ONCLUSION AND O UTLOOK

We have introduced a novel framework B AG C ERT that combines efﬁcient certiﬁcation with end-to-end training for certiﬁed robustness. The main contributions are a model architecture based on aCNN with small receptive ﬁeld, certiﬁcation conditions that are applicable to a broad range of mod-els, and a margin-loss based objective that is derived from the certiﬁcation condition. The resultingmodel achieves high certiﬁed robustness against patches with a broad range of sizes, aspect ratios,and locations on CIFAR10 and ImageNet. Promising directions for future work are the explorationof other choices for the spatial aggregation function g (such as ones using the “detect-and-mask”mechanism from PatchGuard (Xiang et al., 2020)) and corresponding certiﬁcation conditions andlosses that can be used for end-to-end training. Moreover, the development of alternative choices8ublished as a conference paper at ICLR 2021 (24, 1) (12, 2) (8, 3) (6, 4) (4, 6) (3, 8) (2, 12) (1, 24)Patch shape0204060 C e r t i f i e d A cc u r a c y Margin M=0.5RF5791113DS (CS, s=4) (24, 1) (12, 2) (8, 3) (6, 4) (4, 6) (3, 8) (2, 12) (1, 24)Patch shapeRF=7Margin M0.250.50.751.0DS (CS, s=4)

Figure 4: Certiﬁed accuracy against non-square patches of total size 24 pixels on CIFAR10. Shownis the performance for different receptive ﬁelds of B AG C ERT (left) and train margins (right) com-pared to Derandomized Smoothing with Column-Smoothing (Levine & Feizi, 2020). Lines corre-spond to the same model (without retraining), evaluated against patches of different aspect rations.for models with small receptive ﬁelds could be promising, such as ones based on learnable receptiveﬁelds or based on self-attention. Moreover, applying B AG C ERT to other modalities than imageswould be an exciting avenue. R EFERENCES

Anish Athalye, Nicholas Carlini, and David A. Wagner. Obfuscated gradients give a false sense ofsecurity: Circumventing defenses to adversarial examples. In

Proceedings of the 35th Interna-tional Conference on Machine Learning (ICML) , 2018a. URL http://proceedings.mlr.press/v80/athalye18a.html .Anish Athalye, Logan Engstrom, Andrew Ilyas, and Kevin Kwok. Synthesizing robust adversarialexamples. In

International Conference on Machine Learning (ICML) , 2018b. URL http://arxiv.org/abs/1707.07397 .Yoshua Bengio, Nicholas L´eonard, and Aaron Courville. Estimating or Propagating GradientsThrough Stochastic Neurons for Conditional Computation. arXiv e-prints , art. arXiv:1308.3432,August 2013.Wieland Brendel and Matthias Bethge. Approximating CNNs with bag-of-local-features modelsworks surprisingly well on imagenet. In

International Conference on Learning Representations ,2019. URL https://openreview.net/forum?id=SkfMWhAqYQ .Tom Brown, Dandelion Mane, Aurko Roy, Martin Abadi, and Justin Gilmer. Adversarial patch. In

Conference on Neural Information Processing System (NIPS) , 2017. URL https://arxiv.org/pdf/1712.09665.pdf . arXiv: 1712.09665.Ping-yeh Chiang, Renkun Ni, Ahmed Abdelkader, Chen Zhu, Chris Studor, and Tom Goldstein. Cer-tiﬁed defenses for adversarial patches. In

International Conference on Learning Representations ,2020.Jeremy Cohen, Elan Rosenfeld, and Zico Kolter. Certiﬁed adversarial robustness via random-ized smoothing. volume 97 of

Proceedings of Machine Learning Research , pp. 1310–1320,Long Beach, California, USA, 09–15 Jun 2019. PMLR. URL http://proceedings.mlr.press/v97/cohen19c.html .Francesco Croce, Maksym Andriushchenko, Naman D. Singh, Nicolas Flammarion, and MatthiasHein. Sparse-rs: a versatile framework for query-efﬁcient sparse black-box adversarial attacks.

CoRR , 2020. URL https://arxiv.org/abs/2006.12834 .9ublished as a conference paper at ICLR 2021Franklin C. Crow. Summed-area tables for texture mapping. In

Proceedings of the 11th AnnualConference on Computer Graphics and Interactive Techniques , SIGGRAPH ’84, pp. 207–212,New York, NY, USA, 1984. Association for Computing Machinery. ISBN 0897911385. doi:10.1145/800031.808600. URL https://doi.org/10.1145/800031.808600 .Xavier Gastaldi. Shake-Shake regularization. arXiv e-prints , art. arXiv:1705.07485, May 2017.Timon Gehr, Matthew Mirman, Dana Drachsler-Cohen, Petar Tsankov, Swarat Chaudhuri, andMartin T. Vechev. AI2: safety and robustness certiﬁcation of neural networks with abstract in-terpretation. In , 2018. URL https://doi.org/10.1109/SP.2018.00058 .Sven Gowal, Krishnamurthy (Dj) Dvijotham, Robert Stanforth, Rudy Bunel, Chongli Qin, JonathanUesato, Relja Arandjelovic, Timothy Mann, and Pushmeet Kohli. Scalable veriﬁed training forprovably robust image classiﬁcation. In

The IEEE International Conference on Computer Vision(ICCV) , October 2019.Jamie Hayes. On visible adversarial perturbations & digital watermarking. In

IEEE Con-ference on Computer Vision and Pattern Recognition Workshops (CVPR) , 2018. URL http://openaccess.thecvf.com/content_cvpr_2018_workshops/w32/html/Hayes_On_Visible_Adversarial_CVPR_2018_paper.html .Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for im-age recognition. In

Computer Vision and Pattern Recognition (CVPR) , 2016. URL https://arxiv.org/abs/1512.03385 .Xiaowei Huang, Marta Kwiatkowska, Sen Wang, and Min Wu. Safety veriﬁcation of deep neuralnetworks. In

Computer Aided Veriﬁcation - 29th International Conference (CAV) , 2017. URL https://doi.org/10.1007/978-3-319-63387-9_1 .Yi Huang, Adams Wai Kin Kong, and Kwok-Yan Lam. Adversarial signboard against object detec-tor. In

British Machine Vision Conference (BMVC) , 2019.Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training byreducing internal covariate shift. In

Proceedings of the 32nd International Conference on In-ternational Conference on Machine Learning - Volume 37 , ICML’15, pp. 448–456. JMLR.org,2015.Alex Krizhevsky. Learning multiple layers of features from tiny images.

University of Toronto , 052009.M. Lecuyer, V. Atlidakis, R. Geambasu, D. Hsu, and S. Jana. Certiﬁed robustness to adversarialexamples with differential privacy. In , pp.656–672, 2019.Mark Lee and J. Zico Kolter. On physical adversarial patches for object detection.

Interna-tional Conference on Machine Learning (Workshop) , 2019. URL http://arxiv.org/abs/1906.11897 .Alexander Levine and Soheil Feizi. (De)Randomized Smoothing for Certiﬁable Defense againstPatch Attacks. arXiv e-prints , art. arXiv:2002.10733, February 2020.Bai Li, Changyou Chen, Wenlin Wang, and Lawrence Carin. Certiﬁed adversarial robust-ness with additive noise. In

Advances in Neural Information Processing Systems 32 , pp.9464–9474. Curran Associates, Inc., 2019. URL http://papers.nips.cc/paper/9143-certified-adversarial-robustness-with-additive-noise.pdf .Xin Liu, Huanrui Yang, Ziwei Liu, Linghao Song, Yiran Chen, and Hai Li. DPATCH: an adversarialpatch attack on object detectors. In

Workshop on Artiﬁcial Intelligence Safety 2019 co-locatedwith the Thirty-Third AAAI Conference on Artiﬁcial Intelligence 2019 (AAAI-19) , 2019. URL http://ceur-ws.org/Vol-2301/paper_5.pdf .10ublished as a conference paper at ICLR 2021Raphael Gontijo Lopes, Dong Yin, Ben Poole, Justin Gilmer, and Ekin D. Cubuk. Improving ro-bustness without sacriﬁcing accuracy with patch gaussian augmentation.

CoRR , 2019. URL http://arxiv.org/abs/1906.02611 .I. Loshchilov and F. Hutter. Sgdr: Stochastic gradient descent with warm restarts. In

InternationalConference on Learning Representations (ICLR) 2017 Conference Track , April 2017.Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu.Towards deep learning models resistant to adversarial attacks. In

International Conference onLearning Representations , 2018. URL https://arxiv.org/abs/1706.06083 .Muzammal Naseer, Salman Khan, and Fatih Porikli. Local gradients smoothing: Defense againstlocalized adversarial attacks. In

IEEE Winter Conference on Applications of Computer Vision(WACV) , 2019. URL https://doi.org/10.1109/WACV.2019.00143 .Anurag Ranjan, Joel Janai, Andreas Geiger, and Michael J. Black. Attacking optical ﬂow. In

International Conference on Computer Vision (ICCV) , 2019.Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, ZhihengHuang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei.ImageNet Large Scale Visual Recognition Challenge.

International Journal of Computer Vision(IJCV) , 115(3):211–252, 2015. doi: 10.1007/s11263-015-0816-y.Aniruddha Saha, Akshayvarun Subramanya, Koninika Patil, and Hamed Pirsiavash. Adver-sarial Patches Exploiting Contextual Reasoning in Object Detection. arXiv e-prints , art.arXiv:1910.00068, Sep 2019.Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh,and Dhruv Batra. Grad-CAM: Visual Explanations from Deep Networks via Gradient-BasedLocalization.

International Journal of Computer Vision , October 2019. ISSN 1573-1405.Simen Thys, Wiebe Van Ranst, and Toon Goedem´e. Fooling automated surveillance cameras: Ad-versarial patches to attack person detection. In

IEEE Conference on Computer Vision and PatternRecognition Workshops (CVPR) , 2019.Vincent Tjeng and Russ Tedrake. Verifying neural networks with mixed integer programming.

CoRR , 2017. URL http://arxiv.org/abs/1711.07356 .Jonathan Uesato, Brendan O’Donoghue, Pushmeet Kohli, and A¨aron van den Oord. Adversarial riskand the dangers of evaluating against weak attacks. In

Proceedings of the 35th International Con-ference on Machine Learning (ICML) , 2018. URL http://proceedings.mlr.press/v80/uesato18a.html .Eric Wong and J. Zico Kolter. Provable defenses against adversarial examples via the convex outeradversarial polytope. In Jennifer G. Dy and Andreas Krause (eds.),

Proceedings of the 35thInternational Conference on Machine Learning (ICML) , 2018. URL http://proceedings.mlr.press/v80/wong18a.html .Eric Wong, Frank R. Schmidt, Jan Hendrik Metzen, and J. Zico Kolter. Scaling provable adversarialdefenses. In

Advances in Neural Information Processing Systems 31: Annual Conference onNeural Information Processing Systems 2018 (NeurIPS) , 2018. URL http://papers.nips.cc/paper/8060-scaling-provable-adversarial-defenses .Tong Wu, Liang Tong, and Yevgeniy Vorobeychik. Defending against physically realizable attackson image classiﬁcation. In

International Conference on Learning Representations (ICLR) , 2020.URL https://arxiv.org/abs/1909.09552 .Chong Xiang, Arjun Nitin Bhagoji, Vikash Sehwag, and Prateek Mittal. PatchGuard: ProvableDefense against Adversarial Patches Using Masks on Small Receptive Fields. arXiv e-prints , art.arXiv:2005.10884, May 2020.Sangdoo Yun, Dongyoon Han, Seong Joon Oh, Sanghyuk Chun, Junsuk Choe, and Youngjoon Yoo.Cutmix: Regularization strategy to train strong classiﬁers with localizable features. In

The IEEEInternational Conference on Computer Vision (ICCV) , October 2019.11ublished as a conference paper at ICLR 2021Zhanyuan Zhang, Benson Yuan, Michael McCoyd, and David Wagner. Clipped BagNet: Defend-ing Against Sticker Attacks with Clipped Bag-of-features. In , 2020. 12ublished as a conference paper at ICLR 2021

A A

PPENDIX

A.1 E

XPERIMENTAL D ETAILS

A.1.1 CIFAR10We use the following class of models for CIFAR10: we use a ResNet (He et al., 2016) base architec-ture, consisting of a single 3x3 convolution stem, followed by 8 residual blocks. We use stride onein all operations (that is: output resolution is × ) and use a constant width of throughoutthe network. The last layer consists of a × convolution with 10 outputs. All layers use batchnormalization (Ioffe & Szegedy, 2015) and ReLU. Blocks get assigned either a kernel size of 1 or3, depending on the desired receptive ﬁeld of the network. The following table summarizes thekernel-sizes of different blocks used in the experiments:RF of B AG C ERT stem b1 b2 b3 b4 b5 b6 b7 b85 3 3 1 1 1 1 1 1 17 3 3 1 3 1 1 1 1 19 3 3 1 3 1 3 1 1 111 3 3 1 3 1 3 1 3 113 3 3 3 3 1 3 1 3 1Residual blocks use shake-shake regularization (Gastaldi, 2017) in the batch-wise mode. For resid-ual blocks with kernel size 3, a special form of shake-shake regularization is used: the ﬁrst residualpath applies a × convolution followed by a × convolution, while the second residual pathapplies ﬁrst a × convolution followed by a × convolution. This increases diversity of pathswithout changing the total receptive ﬁeld of the network. Besides that, no additional regularizationis applied, that is weight decay is . , and the one-hot penalty is set to σ = 0 . .For training, we use the Adam optimizer with learning rate . , batch size , and train for epochs. We apply a cosine decay learning rate schedule (Loshchilov & Hutter, 2017) with a warmupof epochs. Moreover, we apply random horizontal ﬂips and random crops with padding 4 for dataaugmentation.A.1.2 I MAGE N ET We work on × inputs, which are extracted by rescaling the shorter side of the image to pixels and extracting a random crop (training phase) or center crop (test phase) of size × .Note that this input resolution differs from the × resolution used by Derandomized Smooth-ing Levine & Feizi (2020). In order to achieve comparable results, we evaluate against patches ofsize × ( / ≈ . ) while DS test against patches of size × ( / ≈ . ).We use the following class of models for ImageNet: We use a ResNet base architecture, consistingof a single 3x3 convolution stem, followed by 8 residual blocks. We use stride 2 in blocks 1 and3 and stride 1 otherwise (that is: output resolution is × ). We use width in the stem andthe ﬁrst two blocks, width in blocks 3 and 4, width in blocks 5 and 6, and width inblocks 7 and 8. The last layer consists of a × convolution with 1000 outputs. All layers usebatch normalization and ReLU. Blocks get assigned either a kernel size of 1 or 3, depending on thedesired receptive ﬁeld of the network. The following table summarizes the kernel-sizes of differentblocks used in the experiments:RF of B AG C ERT stem b1 b2 b3 b4 b5 b6 b7 b817 3 3 1 3 1 3 1 1 125 3 3 1 3 1 3 1 3 129 3 3 3 3 1 3 1 3 1We apply neither shake-shake regularization nor weight decay. However, we set the one-hot penaltyto σ = 1 . . For training, we use the Adam optimizer with learning rate . , batch size , andtrain for epochs. We apply a cosine decay learning rate schedule with a warmup of epochs.Moreover, we apply random horizontal ﬂips for data augmentation.13ublished as a conference paper at ICLR 2021Figure 5: Illustration of B AG C ERT certiﬁcation for a 1D input and two classes. We assume for thisexample that L consists only of a single element; that is: the attacker can only place a patch atlocation l = { } , shown by the checkerboard pattern in the input. The resulting R ( l ) consists of thethree top elements in region score space s (shown again by a checkerboard pattern). Accordingly, R max ( L ) = 3 . (Top) Certiﬁcation via Condition 3.3: The regular network output +5 is comparedto · R max ( L ) = +6 . Since ≤ , the robustness of the prediction cannot be certiﬁed. (Bottom)Certiﬁcation via Condition 3.2: region scores s are replaced by s wc based on R ( l ) . The resultingnetwork output is +1 , which is greater than . Thus, robustness of prediction can be certiﬁed.14ublished as a conference paper at ICLR 2021 Clean image Region Scores (True class) Region Scores (Target class)Adversarial image Region Scores (True class) Region Scores (Target class)

Figure 6: Illustration of an adversarial patch attack and its effect on region scores. Top row corre-sponds to the clean image (left), the resulting score maps for the true class (middle), and the scoremaps for the chosen target class (right). The bottom row shows the same for the image with anadversarial × patch inserted at the chosen region l . The red rectangle corresponds to R ( l ) .A.2 R OBUSTNESS AGAINST H EURISTIC P ATCH A TTACK

While the certiﬁcation conditions proposed in Section 3.1 allow computing a lower bound of amodel’s robustness against a speciﬁc type of patch attack, a model’s true robustness against suchattacks can be anywhere between this lower bound and the clean accuracy. In order to determinea tighter upper bound on robustness than clean accuracy, we perform a heuristic adversarial patchattack on the model and evaluate the model’s accuracy on inputs that were modiﬁed by the attacker.Our threat model from Section 3 allows an attacker to place an arbitrary patch p ∈ [0 , n × c in atan arbitrary region l ∈ L . We employ the following approach: we ﬁrst select a region l ∗ ∈ L andtarget class c ∗ , and (once selected) keep this region and target class ﬁxed and optimize the patch p accordingly. Please note that no guarantee exists that actually the best region for an attack or thebest patch are determined; thus, the resulting adversarial accuracy is only an upper bound.Speciﬁcally, we focus in this evaluation on × square patches on CIFAR-10. Accordingly, L consists of all possible × subregions of a × input. Ideally, one would perform independentattacks at all possible regions l ∈ L . However, this becomes quickly computationally intractable.We exploit speciﬁc design choices of B AG C ERT to select one region and target class that may beparticularly problematic for a model on an input assuming a spatial sum aggregation is applied. Forthis, we make directly use of Condition 3.2 and choose l ∗ , c ∗ = arg min l,c (cid:80) i,j / ∈ R ( l ) ∆ i,j,c . Thischoice corresponds to assuming a maximally effective patch attack that is able to achieve ∆ i,j,c ∗ = − ∀ ( i, j ) ∈ R ( l ) . A practical patch attack might not be able to achieve this ideal outcome (seealso Figure 6) and thus l ∗ , c ∗ are not necessarily optimal. However, they are reasonable choices thatcan be determined efﬁciently.Once l ∗ and c ∗ are ﬁxed, we perform a PGD attack (Madry et al., 2018) with steps, a step sizeof . , and the objective of maximizing the loss ˜ L R from Section 3.3 with margin M = 1 . Anillustration of such an attack is shown in Figure 6.Figure 7 shows scatter plots of clean versus adversarial accuracy (left) and certiﬁed versus adversar-ial accuracy (right) for the B AG C ERT models also shown in Figure 2. Interestingly, while clean andadversarial accuracy are highly correlated, the same does not hold true for certiﬁed and adversar-ial accuracy. In particular, adversarial accuracy seems to favor slightly larger receptive ﬁelds thancertiﬁed accuracy. A potential reason for this can be seen in Figure 6: while a patch attack is typi-cally effective for ﬂipping the score of true and target class for the inner part of R ( l ) , it seems a lotharder to ﬂip also scores close to the boundary of R ( l ) . For larger receptive ﬁelds of the model, thisboundary effect seems to be ampliﬁed since the patch is smaller relative to the receptive ﬁeld size.15ublished as a conference paper at ICLR 2021

75 80 85 90 95Clean Accuracy6065707580 A d v e r s a r i a l A cc u r a c y RF5791113 30 40 50 60Certfied Accuracy

Figure 7: Scatter plots of clean versus adversarial accuracy (left) and certiﬁed versus adversarialaccuracy (right). Color encodes different receptive ﬁled sizes. T o p A cc u r a c y Score FunctionHeaviside stepsigmoidsoftmax 0 20 40 60Epochs0204060 C e r t i f i e d T o p A cc u r a c y Figure 8: Comparison of Heaviside step function with alternative choices such as element-wisesigmoid or channel-wise softmax for ﬁve independent runs with different random seeds.We consider reducing this gap between certiﬁed and adversarial accuracy further (that is: makingbounds tighter) important future work. This will require both developing more effective attacks aswell as improving certiﬁcation procedures.A.3 E

XPLORING A LTERNATIVES TO THE H EAVISIDE S TEP F UNCTION

In this section, we explore alternative to the Heaviside step function as the last layer in the regionscorer f θ (see Section 3.2). For this, we relax region scores s to be arbitrary values between and , that is we have s = f θ ( x ) ∈ [0 , w out × h out × c out . More speciﬁcally, we explore the element-wise sigmoid function si ( x i,j,c ) = e − xi,j,c and the channel-wise softmax function sm ( x i,j,c ) = e xi,j,c (cid:80) c (cid:48) e xi,j,c (cid:48) . Note that for the channel-wise softmax, (cid:80) c s i,j,c = 1 , while this is not enforced byelement-wise sigmoid and Heaviside step function.Figure 8 shows clean and certiﬁed accuracy (via Condition 3.2) on validation data during modeltraining for ﬁve independent runs with the same conﬁguration but different random seeds. TheHeaviside step function ensure stable convergence to high clean and certiﬁed accuracy in all ﬁveruns (please note that performance is slightly lower than in Figure 2 because training was stoppedafter 75 epochs). The channel-wise softmax also shows consistent convergence, albeit to a lowerlevel of performance. We attribute this to the hard constraint (cid:80) c s i,j,c = 1= 1