[PDF] DetectorGuard: Provably Securing Object Detectors against Localized Patch Hiding Attacks

Abstract

State-of-the-art object detectors are vulnerable to localized patch hiding attacks where an adversary introduces a small adversarial patch to make detectors miss the detection of salient objects. The patch attacker can carry out a physical-world attack by printing and attaching an adversarial patch to the victim object. In this paper, we propose DetectorGuard, the first general framework for building provably robust detectors against localized patch hiding attacks. To start with, we aim to take advantage of recent advancements of robust image classification research by asking: can we adapt robust image classifiers for robust object detection? Unfortunately, due to their task difference, an object detector naively adapted from a robust image classifier 1) may not necessarily be robust in the adversarial setting or 2) even maintain decent performance in the clean setting. To build a high-performance robust object detector, we propose an objectness explaining strategy: we adapt a robust image classifier to predict objectness for every image location and then explain each objectness using the bounding boxes predicted by a conventional object detector. If all objectness is well explained, we output the predictions made by the conventional object detector; otherwise, we issue an attack alert. Notably, 1) in the adversarial setting, we formally prove the end-to-end robustness of DetectorGuard on certified objects, i.e., it either detects the object or triggers an alert, against any patch hiding attacker within our threat model; 2) in the clean setting, we have almost the same performance as state-of-the-art object detectors. Our evaluation on the PASCAL VOC, MS COCO, and KITTI datasets further demonstrates that DetectorGuard achieves the first provable robustness against localized patch hiding attacks at a negligible cost (<1%) of clean performance.

Full PDF

DDetectorGuard: Provably Securing Object Detectors against Localized PatchHiding Attacks

Chong XiangPrinceton University [email protected]

Prateek MittalPrinceton University [email protected]

Abstract

State-of-the-art object detectors are vulnerable to localizedpatch hiding attacks where an adversary introduces a small ad-versarial patch to make detectors miss the detection of salientobjects. In this paper, we propose the ﬁrst general frameworkfor building provably robust detectors against the localizedpatch hiding attack called DetectorGuard. To start with, wepropose a general approach for transferring the robustnessfrom image classiﬁers to object detectors, which builds abridge between robust image classiﬁcation and robust objectdetection. We apply a provably robust image classiﬁer to asliding window over the image and aggregates robust win-dow classiﬁcations at different locations for a robust objectdetection. Second, in order to mitigate the notorious trade-offbetween clean performance and provable robustness, we usea prediction pipeline in which we compare the outputs of aconventional detector and a robust detector for catching anongoing attack. When no attack is detected, DetectorGuardoutputs the precise bounding boxes predicted by the conven-tional detector to achieve a high clean performance; otherwise,DetectorGuard triggers an attack alert for security. Notably,our prediction strategy ensures that the robust detector incor-rectly missing objects will not hurt the clean performance ofDetectorGuard. Moreover, our approach allows us to formallyprove the robustness of DetectorGuard on certiﬁed objects,i.e., it either detects the object or triggers an alert, against any patch hiding attacker . Our evaluation on the PASCALVOC and MS COCO datasets shows that DetectorGuard hasthe almost same clean performance as conventional detectors,and more importantly, that DetectorGuard achieves the ﬁrstprovable robustness against localized patch hiding attacks.

While object detection is widely deployed in critical applica-tions like autonomous driving, video surveillance, and identityveriﬁcation, conventional detectors have been shown vulner-able to a number of real-world adversarial attacks [7, 14, 47, 53, 56]. Eykholt et al. [14] and Chen et al. [7] demonstratesuccessful physical attacks against YOLOv2 [40] and FasterR-CNN [42] detectors for trafﬁc sign recognition. Wu etal. [53] and Xu et al. [56] succeed in evading object detectionvia wearing a T-shirt printed with adversarial perturbations.Unfortunately, securing object detectors is extremely chal-lenging: only a limited number of defenses [8, 43, 59] havebeen proposed, and they all suffer from at least one of thefollowing issues: limited clean performance, lack of provablerobustness, and inability to adapt to localized patch attacks(see Section 7).In this paper, we investigate countermeasures against thelocalized patch hiding attack in object detection. The local-ized patch attacker can arbitrarily modify image pixels withina restricted region and easily mount a physical-world attackby printing and attaching the adversarial patch to the ob-ject. The practical nature of patch attacks has made them theﬁrst choice of physical-world attacks against object detec-tors [7, 14, 47, 53, 56]. The focus of our work is on hidingattacks that aim to make the object detector fail to detect thevictim object. This attack can cause serious consequences inscenarios like an autonomous vehicle missing an upcomingcar and ending up with a car crash. To secure real-world ob-ject detectors from these threats, we propose DetectorGuardas the ﬁrst general framework for building provably robustobject detectors against localized patch hiding attacks. Wedesign DetectorGuard with the following two key insights.

Insight I: Transferring robustness from image classi-ﬁers to object detectors.

There has been a signiﬁcant ad-vancement in robust image classiﬁcation research in recentyears [9,10,16,20,21,30,33,34,38,44,52,54,60] while objectdetectors remain vulnerable to attacks. In DetectorGuard, weaim to make use of well-studied robust image classiﬁers andtransfer their robustness to object detectors. To achieve this,we leverage a key observation: almost all state-of-the-art im-age classiﬁers and object detectors use Convolutional NeuralNetworks (CNNs) as their backbone for feature extraction.The major difference lies in that an image classiﬁer makes aprediction based on all extracted features (or all image pixels)1 a r X i v : . [ c s . C V ] F e b etection Output Objectness

Predictor

Input Image (clean)

Detection MatcherBase Detector

ALERT!

Clean Setting Adversarial

Setting

Detection Output

Objectness

Predictor

Input Image (adversarial)

Detection MatcherBase Detector dog dog dog dog dog dog

Figure 1:

DetectorGuard Overview.

Base Detector predicts precise bounding boxes on clean images, and

Objectness Predictor outputs robustobjectness feature map.

Detection Matcher compares the outputs of Base Detector and Objectness Predictor to determine the ﬁnal output.

In the clean setting (left ﬁgure) , the dog on the left is detected by both Base Detector and Objectness Predictor. This leads to a match andDetectorGuard outputs the bounding box predicted by Base Detector. In the meantime, the dog on the right is only detected by Base Detector.Detection Matcher will consider this as a benign mismatch , and DetectorGuard will trust Base Detector in this case by outputting the predictedbounding box from Base Detector.

In the adversarial setting (right ﬁgure) , a patch makes Base Detector fail to detect any object whileObjectness Predictor still robustly outputs high activation. Detection Matcher detects a malicious mismatch and triggers an attack alert. while an object detector predicts each object using a smallportion of features (or image pixels) at each location. Thisobservation suggests that we can build a robust object detec-tor by doing robust image classiﬁcation on every subset ofextracted features (or image pixels). Towards this end, webuild an Objectness Predictor by using a sliding window overthe whole image or feature map and applying a robust imageclassiﬁer for robust window classiﬁcation at each location.We then securely aggregate and post-process all window clas-siﬁcations to generate a robust objectness map, in which eachelement indicates the objectness at its corresponding location.In Section 4.2, we prove the robustness of Objectness Predic-tor using the provable analysis of the robust image classiﬁer.

Insight II: Mitigating the trade-off between clean per-formance and provable robustness.

The robustness ofsecurity-critical systems usually comes at the cost of cleanperformance, making the defense deployment less appealing.To mitigate this common trade-off, we design DetectorGuardin a manner such that our defense achieves substantial prov-able robustness and also maintains a clean performance thatis close to state-of-the-art detectors. We provide our defenseoverview in Figure 1. DetectorGuard has three modules: BaseDetector, Objectness Predictor, and Detection Matcher. BaseDetector can be any state-of-the-art object detector that canmake precise predictions on clean images but is vulnerable topatch hiding attacks. We build Objectness Predictor on top ofa provably robust image classiﬁer and use it for robust object-ness predictions. We then use Detection Matcher to comparethe outputs of Base Detector and Objectness Predictor, whichwill trigger an attack alert if and only if Objectness Predic-tor detects an object while Base Detector misses. When noattack is detected, DetectorGuard outputs the predictions ofBase Detector and thus has a high clean performance. Whena hiding attack occurs, Base Detector could miss the object while Objectness Predictor can still robustly output high ob-jectness activation. This mismatch will trigger an attack alert,and DetectorGuard will abstain from making predictions. Ourdesign ensures that Objectness Predictor incorrectly missingobjects (false negatives) will not hurt the clean performanceof DetectorGuard (Figure 1 left) while Objectness Predictorrobustly detecting objects provides provable security guaran-tee for DetectorGuard (Figure 1 right). This approach miti-gates the trade-off between clean performance and provablerobustness. In Section 4, we will rigorously show that De-tectorGuard can achieve a similarly high clean performanceas conventional detectors and prove the robustness of Detec-torGuard on certiﬁed objects against any patch hiding attackconsidered in our threat model.

Desirable properties of DetectorGuard.

DetectorGuardis the ﬁrst provably robust defense for object detection againstlocalized patch hiding attacks. Notably, DetectorGuard hasfour desirable properties. First, DetectorGuard has a high de-tection performance in the clean setting because its cleanpredictions come from state-of-the-art detectors (when nofalse alert is triggered). Second, DetectorGuard is agnostic toattack algorithms and can provide strong provable robustnessagainst any adaptive attack considered in our threat model.Third, DetectorGuard is agnostic to the design of Base De-tector and therefore compatible with any conventional objectdetector. Fourth, DetectorGuard is compatible with any ro-bust image classiﬁcation technique, and can beneﬁt from anyprogress in the relevant research.We evaluate DetectorGuard performance on the PASCALVOC [13] and MS COCO [23] datasets. In our evaluation,we instantiate the Base Detector with a hypothetical perfectclean detector, YOLOv4 [2, 49], and Faster R-CNN [42]. We In contrast, the clean performance of traditional attack-detection-baseddefenses [30, 57] is bottlenecked by the errors of the defense module.

In this section, we ﬁrst introduce the object detection task,followed by the localized patch hiding attack and defenseformulation.

Detection objective.

The goal of object detection is to predicta list of bounding boxes for all objects in the input image x ∈ [ , ] W × H × C , where pixel values are rescaled into [ , ] ,and W , H , C is the width, height, and channels of the image,respectively. Each bounding box b is represented as a tuple ( x min , y min , x max , y max , l ) , where x min , y min , x max , y max togetherillustrate the coordinates of the bounding box, and l ∈ L = { , , · · · , N − } denotes the predicted object label ( N is thenumber of object classes). Conventional detector.

Object detection models can be cate-gorized into two-stage and one-stage detectors depending ontheir detection pipelines. A two-stage detector ﬁrst generatesproposal for regions that might contain objects and then usesthe proposed regions for object classiﬁcation and bounding-box regression. Representative examples include Faster R-CNN [42] and Mask R-CNN [18]. On the other hand, a one-stage detector does detection directly on the input image with-out any region proposal step. SSD [26], YOLO [2, 39–41, 49],RetinaNet [22], and EfﬁcientDet [46] are representative one-stage detectors.Conventionally, a detection is considered correct when 1)the predicted label matches the ground truth and 2) the overlap Conventional object detectors usually output objectness score and pre-diction conﬁdence as well—we discard them in notation for simplicity. between the predicted bounding box and the ground-truth box,measured by Intersection over Union (IoU), exceeds a certainthreshold τ . We term a correct detection a true positive (TP) .On the other hand, any predicted bounding box that fails tosatisfy both two TP criteria is considered as a false positive(FP) . Finally, if a ground-truth object is not detected by anyTP bounding box, it is a false negative (FN) . Research onobject detection aims to minimize FP and FN errors. Attack objective.

The hiding attack, also referred to as thefalse-negative (FN) attack, aims to make object detectors missthe detection of certain objects (which increases FN). Thehiding attack can cause serious consequences in scenarioslike an autonomous vehicle missing a pedestrian. Therefore,defending against patch hiding attacks is of great importance.

Attacker capability.

The localized adversary is allowed toarbitrarily manipulate pixels within one restricted region. Formally, we can use a binary pixel mask pm ∈ { , } W × H to represent this restricted region, where the pixels withinthe region are set to 1. The adversarial image then can berepresented as x (cid:48) = ( − pm ) (cid:12) x + pm (cid:12) x (cid:48)(cid:48) where (cid:12) denotesthe element-wise product operator, and x (cid:48)(cid:48) ∈ [ , ] W × H × C isthe content of the adversarial patch. pm is a function of patchsize and patch location. The patch size should be limited suchthat the object is recognizable by a human (otherwise, theattack is meaningless). For patch locations, we consider threedifferent threat models: over-patch, close-patch, far-patch ,where the patch is over, close to (partial overlap), or far awayfrom (no overlap) the victim object, respectively.Previous works [27, 43] have shown that attacks againstobject detectors can succeed even when the patch is far awayfrom the victim object. Therefore, defending against all threethreat models is of our interest. Defense objective.

We focus on defenses against patch hidingattacks. We consider our defense to be robust if 1) its detectionon the clean image is correct and 2) the defense can detectpart of the object or send out an attack alert on the adversarialimage. Crucially, we design our defense to be provably robust :our defense can either detect the certiﬁed object or issue an We use “hiding attack" and “FN attack" interchangeably in this paper. Provably robust defenses against one single patch are currently anopen/unsolved problem, and hence the focus of this paper. In Appendix C,we will justify our one-patch threat model and discuss the implication ofmultiple patches. We note that in the adversarial setting, we only require the predictedbounding box to cover part of the object. This is because that it is likelythat only a small part of the object is recognizable due to the adversarialpatch (e.g., the left dog in the right part of Figure 1). We provide additionaljustiﬁcation for our defense objective in Appendix E.

Remark: primary focus on hiding attacks.

In this paper,we focus on the hiding attack because it is the most funda-mental and notorious attack against object detectors. We canvisualize dividing the object detection task into two steps: 1)detecting the object bounding box and then 2) classifying thedetected object. If the ﬁrst step is compromised by the hidingattack, there is no hope for robust object detection. On theother hand, securing the ﬁrst step against the patch hidingattack lays a foundation for the robust object detection ; wecan design effective remediation for the second step if needed.Take the application domain of autonomous vehicles (AV)as an example: an AV missing the detection of an upcom-ing car could end up with a serious car accident. However,if the AV detects the upcoming object but predicts an in-correct class label (e.g., mistaking a car for a pedestrian), itcan still make the correct decision of stopping and avoidingthe collision. Moreover, in challenging applications domainswhere the predicted class label is of great importance (e.g.,trafﬁc sign recognition), we can feed the detected bound boxto an auxiliary image classiﬁer to re-determine the class la-bel. The defense problem is then reduced to the robust im-age classiﬁcation and has been studied by several previousworks [21, 33, 54, 60]. Therefore, we make the hiding attackas the primary focus of this paper and will also discuss theextension of DetectorGuard against other attacks in Section 6.

In this section, we ﬁrst introduce the key insights and overviewof DetectorGuard. We then detail the design of our defensecomponents (Objectness Predictor, Detection Matcher) andour choice of the underlying robust image classiﬁer.

We leverage two key insights to design the DetectorGuardframework.

Insight I: exploiting the tight connection between imageclassiﬁcation and object detection tasks to transfer therobustness from classiﬁers to detectors.

We observe thatalmost all state-of-the-art image classiﬁers and object detec-tors use CNNs as their backbone for feature extraction. Animage classiﬁer makes a prediction based on all extractedfeatures (or image pixels) while an object detector predictseach object using a partial feature map (or image pixels) atdifferent locations. This observation motivates our design of arobust object detector using a robust image classiﬁer. We usea sliding window over the entire image or feature map andperform robust classiﬁcation on each window to determine whether there is an object. We then securely aggregate all win-dow classiﬁcations for a robust object detection output. Ourgeneral approach transfers the robustness of image classiﬁersto object detectors so that robust object detection can alsobeneﬁt from ongoing advances in robust image classiﬁcation.

Insight II: using an ensemble prediction strategy to mit-igate the trade-off between clean performance and prov-able robustness.

It is well known that the robustness of ma-chine learning based systems usually comes at the cost ofclean performance (measured by TP, FP, and FN in objectdetection as introduced in Section 2.1). To mitigate this com-mon trade-off, we propose an ensemble prediction strategythat uses a robust detector and a state-of-the-art conventionalobject detector for catching an ongoing attack. We use a con-ventional detector to make precise predictions when no attackis detected, and use a robust detector to provide substantialrobustness in the adversarial setting. The clean performanceof this ensemble is maintained close to state-of-the-art de-tectors and can also be improved given any advances in be-nign/conventional object detection research.

DetectorGuard design.

Recall that Figure 1 provides anoverview of DetectorGuard, which will either output a listbounding box predictions (left ﬁgure; clean setting) or anattack alert (right ﬁgure; adversarial setting). There are threemajor modules in DetectorGuard: Base Detector, ObjectnessPredictor, and Detection Matcher. Base Detector is respon-sible for making precise detections in the clean setting andcan be any popular high-performance object detector such asYOLOv4 [2,49] and Faster R-CNN [42]. Objectness Predictoris built on our ﬁrst insight and aims to output robust objectnessfeature map in the adversarial environment; the robustnessis derived from its building block—a robust image classi-ﬁer. Detection Matcher leverages the detection outputs ofBase Detector and Objectness Predictor to catch a maliciousattack using deﬁned rules. When no attack is detected, Detec-torGuard will output the detection results of Base Detector(i.e., a conventional detector), so that our clean performance isclose to state-of-the-art detectors. When a patch hiding attackoccurs, Base Detector can miss the object while ObjectnessPredictor is likely to robustly detect the presence of an ob-ject. This malicious mismatch will be caught by DetectionMatcher, and DetectorGuard will send out an attack alert.

Algorithm Pseudocode.

We provide the pseudocodeof DetectorGuard in Algorithm 1. The main procedureDG ( · ) has three sub-procedures: B ASE D ETECTOR ( · ) ,O BJ P REDICTOR ( · ) , and D ET M ATCHER ( · ) . The sub-procedure B ASE D ETECTOR ( · ) can be any off-the-shelfdetector as discussed previously. We introduce the remainingtwo sub-procedures in the following subsections. Alltensors/arrays are represented with bold symbols and scalarsare in italic. All tensor/array indices start from zeros; thetensor/array slicing is in Python style (e.g., [ i : j ] meansall indices k satisfying i ≤ k < j ). We assume that the“background" class corresponds to the largest class index. We4 able 1: Summary of important notation Notation Description Notation Descriptionx

Input image b bounding box om Objectness map v classiﬁcation logits l classiﬁcation label N number of object classes ( w x , w y ) window size ( p x , p y ) patch size T binarizing threshold D detection results u , l upper/lower bound of classiﬁcation logits values of each class give a summary of important notation in Table 1. Objectness Predictor is built using our

Insight I and aims tooutput a robust objectness prediction map in an adversarialenvironment. In doing so, we use a sliding window over theimage (or feature map) to make robust window classiﬁcation,and then post-process window classiﬁcations to generate theobjectness map. Objectness Predictor is designed to be prov-ably robust against patch hiding attacks. We introduce thisprediction pipeline in this subsection and analyze its provablerobustness in Section 4.2.

Robust window classiﬁcation.

The pseudocode of Object-ness Predictor is presented as O BJ P REDICTOR ( · ) in Algo-rithm 1. The key operation is to use a sliding window andmake window classiﬁcations at different locations. Each win-dow classiﬁcation aims to predict the object class or “back-ground" based on all pixels (or features) within the window.To make the window classiﬁcation robust even when somepixels (or features) are corrupted by the adversarial patch, weapply the robust classiﬁcation technique (Line 16). For eachwindow location, represented as ( i , j ) , we feed the correspond-ing window x [ i : i + w x , j : j + w y ] to the robust classiﬁcationsub-procedure RC ( · ) to get the classiﬁcation label l and theclassiﬁcation logits v ∈ R N + for N object classes and the“background" class. DetectorGuard is compatible with anyrobust classiﬁcation technique, and we treat RC ( · ) as a black-box procedure in DetectorGuard. We postpone the discussionof RC ( · ) until Section 3.4 for ease of presentation. Objectness map generation.

Given the robust window clas-siﬁcation results, we aim to output an objectness map thatindicates the objectness (i.e., a conﬁdence score indicatingthe likelihood of the presence of an object) of each location.First, we generate an all-zero array ¯ om for holding the object-ness score (Line 14); each objectness vector in ¯ om has N + v toevery objectness vector located within the window (Line 17).After accumulating objectness scores from all sliding win-dows, we binarize ¯ om to obtain the binary objectness map We note that the sliding window can be either in the pixel space orfeature space; we abuse the notation of x to let it represent either an inputimage or an extracted feature map in O BJ P REDICTOR ( · ) . Discussion onpixel-space and feature-space windows is available in Appendix F. Algorithm 1

DetectorGuard

Input: input image x , window size ( w x , w y ) , binarizingthreshold T , Base Detector B ASE D ETECTOR ( · ) , robustclassiﬁcation procedure RC ( · ) , cluster detection proce-dure D ET C LUSTER ( · ) Output: robust detection D ∗ or ALERT procedure DG( x , w x , w y , T ) D ← B ASE D ETECTOR ( x ) (cid:46) Conventional detection om ← O BJ P REDICTOR ( x , w x , w y , T ) (cid:46) Objectness a ← D ET M ACTHER ( D , om ) (cid:46) Detect hiding attacks if a == True then (cid:46)

Malicious mismatch D ∗ ← ALERT (cid:46)

Trigger an alert else D ∗ ← D (cid:46) Return Base Detector’s predictions end if return D ∗ end procedure procedure O BJ P REDICTOR ( x , w x , w y , T ) X , Y , _ ← S HAPE ( x ) ¯ om ← Z ERO A RRAY [ X , Y , N + ] (cid:46) Initialization for each valid ( i , j ) do (cid:46) Every window location l , v ← RC ( x [ i : i + w x , j : j + w y ]) (cid:46) Classify ¯ om [ i : i + w x , j : j + w y ] ← ¯ om [ i : i + w x , j : j + w y ] + v (cid:46) Add classiﬁcation logits end for om ← B INARIZE ( ¯ om , T · w x · w y ) (cid:46) Binarization return om end procedure procedure D ET M ATCHER ( D , om ) (cid:46) Match each detected box to objectness map for i ∈ { , , · · · , | D | − } do x min , y min , x max , y max , l ← b ← D [ i ] if S UM ( om [ x min : x max , y min : y max ]) > then om [ x min : x max , y min : y max ]) ← end if end for if D ET C LUSTER ( om ) is None then return

False (cid:46)

All objectness explained else return

True (cid:46)

Unexplained objectness end if end procedureom ∈ { , } X × Y as the ﬁnal output (Line 19). In B INARIZE ( · ) ,we examine each location in ¯ om . If the maximum objectnessscores for the non-background class at that location is largerthan the threshold T · w x · w y , we set the objectness score in om to one; otherwise, it is set to zero. We note that we discardthe information of classiﬁcation label l in this binarizationoperation. This helps reduce FPs when the model correctlydetects the object but fails to predict the correct label, which5ould happen frequently between similar object classes likebicycle-vs-motorbike. Remark: Limitation of Objectness Predictor.

We note thatthe underlying robust image classiﬁer RC ( · ) in ObjectnessPredictor usually suffers from a trade-off between robustnessand clean performance; therefore, Objectness Predictor cansometimes be imprecise on the clean images (e.g., missingobjects). However, as we discuss next, this limitation will notsigniﬁcantly hurt the clean performance of DetectorGuarddue to our special ensemble structure inspired by Insight II . Detection Matcher leverages our

Insight II to mitigate thetrade-off between provable robustness and clean performance.It takes as inputs the predicted bounding boxes of Base Detec-tor and the generated objectness map of Objectness Predictor,and tries to match each predicted bounding box to a highactivation region in the objectness map. Detection Matcherwill label each matching attempt as either a match, a mali-cious mismatch, or a benign mismatch. The matching resultsdetermine the ﬁnal prediction of DetectorGuard. We will ﬁrstintroduce the high-level matching rules and then elaborate onthe matching algorithm.

Matching rules. A match corresponds to both Base Detec-tor and Objectness Predictor detecting an object at a certainlocation while a mismatch corresponds to only one of them de-tecting an object. There are three possible matching outcomes,each of them leading to a different prediction strategy:• A match happens when Base Detector and ObjectnessPredictor reach a consensus on an object at a speciﬁclocation. In this simplest case, our defense will assumethe detection is correct and output the precise boundingbox predicted by Base Detector.• A malicious mismatch will be ﬂagged when only Ob-jectness Predictor detects the object. This is most likelyto happen when a hiding attack succeeds in fooling theconventional detector to miss the object while our Ob-jectness Predictor still makes robust predictions. In thiscase, our defense will send out an attack alert.• A benign mismatch occurs when only Base Detectordetects the object. This can happen when Objectness Pre-dictor incorrectly misses the object due to its limitations(recall the trade-off between robustness and clean perfor-mance). In this case, we trust Base Detector and outputits predicted bounding box. We note that this mismatchcan also be caused by other attacks that are orthogonalto the focus of this paper (we focus on the hiding attack).We will discuss strategies for defending against otherattacks in Section 6.Next, we will discuss the concrete procedure for determiningmatching outcomes and applying corresponding prediction strategies. Processing detected bounding boxes.

Line 23-28 of Al-gorithm 1 demonstrate the matching process for each de-tected bounding box. For each box b , we get its coordi-nates x min , y min , x max , y max , and calculate the sum of object-ness scores within the same box on the objectness map. If theobjectness sum is larger than zero, we assume that the bound-ing box b correctly matches the objectness map om . Next,we zero out the corresponding region in om , to indicate thatthis region of objectness has been explained by the detectedbounding box. On the other hand, if all objectness scores arezeros, we assume it is a benign mismatch, and the algorithmdoes nothing. Processing the objectness map.

The ﬁnal step of the match-ing is to analyze the objectness map om . We use the sub-procedure D ET C LUSTER ( · ) to determine if any non-zeropoints in om form a large cluster. Speciﬁcally, we chooseDBSCAN [12] as the cluster detection algorithm, whichwill assign each point to a certain cluster or label it as anoutlier based on the point density in its neighborhood. IfD ET C LUSTER ( om ) returns None , it means that no cluster isfound, and that all objectness activations predicted by Object-ness Predictor are explained by the predicted bounding boxesof Base Detector, and D ET M ATCHER ( · ) returns False . Onthe other hand, receiving a non-empty cluster set indicatesthat there are clusters of unexplained objectness activationsin om (i.e, Base Detector misses an object but ObjectnessPredictor detects an object). Detection Matcher will regardthis as a sign of patch hiding attacks, and return True . Final output.

Line 5-10 demonstrates the strategy for ﬁnalprediction. If the alert ﬂag a is True (i.e., a malicious mis-match is detected), DetectorGuard returns D ∗ = ALERT . Inother cases, DetectorGuard returns the detection D ∗ = D . In this subsection, we discuss the design choice of robustclassiﬁer in Objectness Predictor.

Our approach is compati-ble with any image classiﬁer that is provably robust againstadversarial patch attacks.

In this paper, we follow Patch-Guard [54] to build robust image classiﬁer RC ( · ) as it is ageneral defense framework and it subsumes several defenseinstances [21, 33, 54, 60] that have state-of-the-art provablerobustness and clean accuracy. PatchGuard: backbone CNNs with small receptive ﬁelds.

The PatchGuard framework [54] proposes to use a CNN withsmall receptive ﬁelds to limit the impact of a localized adver-sarial patch. The receptive ﬁeld of a CNN is the input pixelregion where each extracted feature is looking at, or affectedby. If the receptive ﬁeld of a CNN is too large, then a smalladversarial patch has the potential to corrupt most extractedfeatures and easily manipulate the model behavior [27,43,54].There are two main design choices for CNNs with a small6eceptive ﬁeld: the BagNet architecture [3] and an ensemblearchitecture using small pixel patches [21]. In our evaluation,we select the BagNet as the backbone CNN for our ObjectnessPredictor since it achieves state-of-the-art performance onhigh resolution images and is also more efﬁcient [54].

PatchGuard: secure feature aggregation.

The use of Bag-Net ensures that a small adversarial patch is able to corruptonly a small number of extracted features. The second step inPatchGuard is to perform a secure aggregation technique onextracted features; design choices include clipping [54, 60],masking [54], majority voting [21, 33]. In this paper, we use robust masking due to its state-of-the-art provable robust-ness for high-resolution image classiﬁcation [54]. We providemore details of robust masking as well as its provable clas-siﬁcation analysis in Appendix H. We will also discuss andimplement other aggregation techniques in Appendix B todemonstrate the generality of our framework. Next, we dis-cuss how to speciﬁcally adapt and train these building blocksin the context of object detection.

Training image classiﬁers with object detection datasets.

Each image in an object detection dataset has multiple ob-jects with different class labels. To train an image classiﬁergiven a list of bounding boxes and labels, we ﬁrst map pixel-space bounding boxes to the feature space and get a list ofcropped feature maps and labels (details of box mapping arein Appendix F). We then teach BagNet to make a correctprediction on each cropped feature map by minimizing thecross-entropy loss between the aggregated feature predictionand the one-hot encoded label vector. In addition, we aggre-gate all features outside any feature boxes as the “negative"feature vector for the “background" classiﬁcation.

In this section, we theoretically analyze the defense model per-formance in clean and adversarial settings. In the clean setting,we analyze the impact of false positives and false negativesin the Objectness Predictor module, and how DetectorGuardcan achieve clean performance that is only slightly lower thanstate-of-the-art detectors. In the adversarial setting, we for-mally show that DetectorGuard can achieve certiﬁed/provablerobustness against patch hiding attacks.

Here, we analyze the performance of the defense in the cleansetting. Recall that DetectorGuard is an ensemble of Base De-tector and Objectness Predictor. When we instantiate Base De-tector with a state-of-the-art object detector that rarely makesmistake on the clean images (i.e., D is typically correct),Objectness Predictor becomes the major source of errors inDetectorGuard. A false-negative (FN) of Objectness Predictor will nothurt the clean performance of DetectorGuard.

Objectness Predictor has an FN when it fails to output high objectnessactivation for certain objects. Fortunately, this FN of Object-ness Predictor will not hurt the performance of DetectorGuardbecause our defense will label it as a benign mismatch andtrust the high-performance Base Detector by taking D as theﬁnal output (as introduced in Section 3.3). A false-positive (FP) of Objectness Predictor will triggera false alert of DetectorGuard.

Objectness Predictor has anFP when it incorrectly outputs high objectness activation forregions that do not contain any real object. The FP will resultin unexplained objectness activation in Detection Matcherand cause a false alert. Let tp , fp , fn be the TP, FP, FN ofBase Detector (i.e., the vanilla undefended object detector),and fa be the number of objects within the clean image onwhich DetectorGuard has false alerts. The TP, FP, and FN ofDetectorGuard satisfy: tp (cid:48) ≥ tp − fa , fp (cid:48) ≤ fp , fn (cid:48) ≤ fn + fa .Therefore, we aim to optimize for a low fa in DetectorGuard,or equivalently a low FP in Objectness Predictor, which canbe achieved with properly chosen hyper-parameters as willbe shown in Section 5.5. In summary, DetectorGuard has a slightly lower clean per-formance compared with state-of-the-art detectors when weoptimize for a low FP in Objectness Predictor (resulting infew false alerts in DetectorGuard). This small clean perfor-mance drop is worthwhile given the provable robustness ofDetectorGuard, which we will discuss in the next subsection.

Recall that we consider DetectorGuard to be provably robustfor a given object (in a given image) when it can make correctdetection on the clean image and will either detect part of theobject or issue an alert in the presence of any patch hiding at-tacker within our threat model. In this subsection, we will ﬁrstshow the sufﬁcient condition for the provable robustness ofDetectorGuard, then present our provable analysis algorithm,and ﬁnally prove its soundness.

Sufﬁcient condition for DetectorGuard’s robustness.

First, we show in Lemma 1 that the robustness of Object-ness Predictor implies the robustness of DetectorGuard. Weabuse the notation “ ∈ " by letting b ∈ D denote that one pre-dicted box ¯ b in D matches the ground-truth box b , and letting b ∈ om denote that the objectness map om has high object-ness activation that matches b . Lemma 1.

Consider a given an object in an image, which isrepresented as a bounding box b and can be correctly detectedby DetectorGuard in a clean image x . DetectorGuard hasprovable robustness to any valid adversarial image x (cid:48) , i.e., b ∈ D ∗ or D ∗ = ALERT for D ∗ = DG ( x (cid:48) ) , if ObjectnessPredictor is robust to any valid adversarial image x (cid:48) , i.e., b ∈ om = O BJ P REDICTOR ( x (cid:48) ) .Proof. We prove by contradiction. Suppose that Detec-torGuard is vulnerable to an adversarial image x (cid:48) . Then we7ave that 1) D ∗ (cid:54) = ALERT and 2) b (cid:54)∈ D ∗ .From b ∈ om = O BJ P REDICTOR ( x (cid:48) ) and D ∗ (cid:54) = ALERT ,we will have b ∈ D = B ASE D ETECTOR ( x ) to avoid ALERT .Since no alert is triggered, DG ( · ) returns D ∗ = D . We thenhave b ∈ D = D ∗ , which contradicts with the condition 2) b (cid:54)∈ D ∗ . Thus, DetectorGuard must not be vulnerable to anyadversarial image x (cid:48) when Objectness Predictor is robust. Provable robustness of DetectorGuard.

We will use theprovable analysis of the robust image classiﬁer, denoted asRC-PA ( · ) , as the analysis building block to prove the robust-ness of DetectorGuard. Given the provable analysis procedureRC-PA ( · ) , we can reason about the objectness map output inObjectness Predictor. If its worse-case output still has high ob-jectness activation, we can certify the provable robustness ofObjectness Predictor. Finally, using Lemma 1, we can derivethe robustness of DetectorGuard.We present the provable analysis of DetectorGuard in Algo-rithm 2. The algorithm takes a clean image x , a ground-truthobject bounding box b , and a set of valid patch locations P asinputs, and will determine whether the object in bounding box b in the image x has provable robustness against any patch atany location in P . We state the correctness of Algorithm 2 inTheorem 1, and will explain the algorithm details by provingthe theorem. Theorem 1.

Given an object bounding box b in a cleanimage x , a set of patch locations P , window size ( w x , w y ) ,and binarizing threshold T (used in DG ( · ) ), if Algorithm 2returns True , i.e.,

DG-PA ( x , w x , w y , T , b , P ) = True , De-tectorGuard has provable robustness for the object b againstany patch hiding attack using any patch location in P .Proof. DG-PA ( · ) ﬁrst calls DG ( · ) of Algorithm 1 to deter-mine if DetectorGuard can detect the object bounding box b on the clean image x . The algorithm will proceed only whenthe clean detection is correct (Line 2-4).Next, we iterate over each patch location in P and call thesub-procedure DG-PA-O NE ( · ) , which analyzes worst-casebehavior over all possible adversarial strategies, to determinethe model robustness. If any call of DG-PA-O NE ( · ) returns False , the algorithm returns

False , indicating that at leastone patch location can bypass our defense. On the other hand,if the algorithm tries all valid patch locations and does not re-turn

False , this means that DetectorGuard is provably robustto all patch locations in P and the algorithm returns True .In sub-procedure DG-PA-O NE ( · ) , we analyze the robust-ness of Objectness Predictor against the given patch location.We use the provable analysis of the robust image classiﬁer(i.e., RC-PA ( · ) ) to determine the lower/upper bounds of clas-siﬁcation logits for each window. If the aggregated worse-case(i.e., lower bound) objectness map still has high activationfor the object of interest, we can certify the robustness ofObjectness Predictor and then DetectorGuard (by Lemma 1).As shown in DG-PA-O NE ( · ) pseudocode, we ﬁrst initial-ize a zero array ¯ om ∗ to hold the worse-case objectness scores. Algorithm 2

Provable Analysis of DetectorGuard

Input: input image x , window size ( w x , w y ) , matching thresh-old T , the set of patch locations P , the object bound-ing box b , provable analysis of the robust classiﬁerRC-PA ( · ) , cluster detection procedure D ET C LUSTER ( · ) Output: whether the object b in x has provable robustness procedure DG-PA( x , w x , w y , T , P , b ) if b (cid:54)∈ DG ( x , w x , w y , T ) then return False (cid:46)

Clean detection is incorrect end if for each p ∈ P do (cid:46) Check every patch location x , y , p x , p y ← p r ← DG-PA-O NE ( x , x , y , w x , w y , p x , p y , b , T ) if r == False then return False (cid:46)

Possibly vulnerable end if end for return

True (cid:46)

Provably robust end procedure procedure

DG-PA-O NE ( x , x , y , w x , w y , p x , p y , b , T ) X , Y , _ ← S HAPE ( x ) ¯ om ∗ ← Z ERO A RRAY [ X , Y , N + ] (cid:46) Initialization (cid:46)

Generates worse-case objectness map for analysis for each valid ( i , j ) do (cid:46) Every window location u , l ← RC-PA ( x , x − i , y − j , p x , p y , m x , m y ) ¯ om ∗ [ i : i + w x , j : j + w y ] ← ¯ om ∗ [ i : i + w x , j : j + w y ] + l (cid:46) Add worst-case (lower-bound) logits end for om ∗ ← B INARIZE ( ¯ om ∗ , T · w x · w y ) (cid:46) Binarization x min , y min , x max , y max , l ← b if D ET C LUSTER ( om ∗ [ x min : x max , y min : y max ]) is None then return

False (cid:46)

No high objectness left else return

True (cid:46)

High worst-case objectness end if end procedure

We then iterate over each sliding window and call RC-PA ( · ) ,which takes the image x (or feature map as discussed in Sec-tion 3.2), relative patch coordinates ( x − i , y − j ) , patch size ( p x , p y ) as inputs and outputs the upper bound u and lowerbound l of the classiﬁcation logits. Since the goal of the hid-ing attack is to minimize the objectness scores, we add thelower bound of classiﬁcation logits to ¯ om ∗ . After we analyzeall valid windows, we call B INARIZE ( · ) for the worse-caseobjectness map om ∗ (recall that the logits values for “back-ground" is discarded in binarization). We then get the croppedfeature map that corresponds to the object of interest (i.e., We treat RC-PA ( · ) as a black-box sub-procedure in Algorithm 2; moredetails for RC-PA ( · ) are available in Appendix H. m ∗ [ x min : x max , y min : y max ] ) and feed it to the cluster detec-tion algorithm D ET C LUTSER ( · ) . If None is returned, a hidingattack using this patch location might succeed, and the sub-procedure returns

False . Otherwise, Objectness Predictor hasa high worse-case object activation and is thus robust to anyattacked using this patch location. This implies the provablerobustness, and the sub-procedure returns

True . In this section, we provide a comprehensive evaluation ofDetectorGuard on PASCAL VOC [13] and MS COCO [23]datasets. We will ﬁrst introduce the datasets and models usedin our evaluation, followed by our evaluation metrics. We thenreport our main evaluation results on different models anddatasets, and ﬁnally discuss the effect of hyper-parameters.

Dataset: PASCAL VOC [13].

The detection challenge ofPASCAL Visual Object Classes (VOC) project is a popularobject detection benchmark dataset with annotations for 20different classes. We take trainval2007 (5k images) and trainval2012 (11k images) as our training set and evaluateour defense on test2007 (5k images), which is a conven-tional usage of the PASCAL VOC dataset [26, 59].

Dataset: MS COCO [23].

The Microsoft Common Objectsin COntext (COCO) dataset is an extremely challenging ob-ject detection dataset with 80 annotated common object cat-egories. We use the training and validation set of

COCO2017 for our experiments. The training set has 117k images, andthe validation set has 5k images.

Base Detector model: YOLOv4 [2, 49].

YOLOv4 [2] is thestate-of-the-art one-stage detector that achieves the optimalspeed and accuracy of object detection. We choose Scaled-YOLOv4-P5 [49] in our evaluation. We adopt the same imagepre-processing pipeline and network architecture as proposedin the original paper. For MS COCO, we use the pre-trainedmodel. For PASCAL VOC, we do transfer learning by ﬁne-tuning the model previously trained on MS COCO.

Base Detector model: Faster R-CNN [42].

Faster R-CNN isa representative two-stage detector. We use ResNet101-FPNas its backbone network. Image pre-processing and modelarchitecture follows the original paper. We use pre-trainedmodels for MS COCO and do transfer learning to train aPASCAL VOC detector.

Base Detector model: a perfect clean detector (PCD).

Weuse the ground-truth annotations to simulate a perfect cleandetector. The perfect clean detector can always make correctdetection in the clean setting but is assumed vulnerable topatch hiding attacks. This hypothetical detector ablates theerrors of Base Detector and helps us better understand thebehavior of Objectness Predictor and Detection Matcher.

Objectness Predictor model: BagNet-33 [3].

We useBagNet-33, which has a 33 ×

33 receptive ﬁeld, as the back-bone network of Objectness Predictor. We zero-pad each im-age to a square and resize it to 416 ×

416 before feeding itto BagNet. We take a BagNet model that is pre-trained onImageNet [11] and ﬁne-tune it on our detection datasets.

Default hyper-parameters.

In Objectness Predictor, wechoose to use a sliding window in the feature space, and weset the default feature-space window size to 14. We discussthe mapping between pixel space and feature space in Ap-pendix F. In the Detection Matcher, we set the default thresh-old to 10. In our D ET C LUSTER ( · ) , we use DBSCAN [12]algorithm with eps = , min_points =

28. We will analyzethe effect of different hyper-parameters in Section 5.5. Wewill also release our source code upon publication.

Clean performance: precision and recall.

We calculate pre-cision as TP/(TP+FP) and recall as TP/(TP+FN). For the cleanimages without a false alert, we follow previous works [8, 59]setting the IoU threshold τ = . Clean performance: average precision (AP).

To removethe dependence on the conﬁdence threshold and to have aglobal view of model performance, we also report AP as donein object detection research [13, 23]. We vary the conﬁdencethreshold from 0 to 1, record the precision and recall at differ-ent thresholds, and calculate AP as the averaged precision atdifferent recall levels.

Clean performance: false alert rate ([email protected]).

FAR isdeﬁned as the percentage of clean images on which Detec-torGuard will trigger a false alert. We note that FAR is alsoclosely tied to the conﬁdence threshold of Base Detector: ahigher conﬁdence threshold leads to fewer predicted boundingboxes, leading to higher unexplained high objectness activa-tion, and ﬁnally higher FAR. We will report FAR at differentrecall levels for a global evaluation, and use [email protected] todenote FAR at a clean recall of 0.x.

Provable robustness: certiﬁed recall ([email protected]).

We usecertiﬁed recall as the robustness metric against patch hidingattacks. The certiﬁed recall is deﬁned as the percentage ofground-truth objects that have provable robustness againstany patch hiding attack. Recall that an object has provablerobustness when DetectorGuard can detect the object in theclean setting and Objectness Predictor can output high object-ness activation in the worst case (as discussed in Section 2.39 able 2: Clean performance of DetectorGuardPASCAL VOC MS COCOAP w/o defense AP w/ defense [email protected] AP w/o defense AP w/ defense [email protected] clean detector 100% 98.3% 1.5% 100% 96.3% 3.8%YOLOv4 92.6% 91.3% 4.1% 73.4% 71.2% 4.1%Faster R-CNN 90.0% 88.7% 2.7% 66.7% 64.7% 3.5% P r e c i s i o n / F A R Precision-PCD-VPrecision-PCD-DGFAR-PCD-DGPrecision-YOLO-VPrecision-YOLO-DGFAR-YOLO-DGPrecision-FRCNN-VPrecision-FRCNN-DGFAR-FRCNN-DG

Figure 2: Clean performance of DetectorGuard on PASCAL VOC(V – vanilla; DG – DetectorGuard; PCD – perfect clean detector;FRCNN – Faster R-CNN) and Section 4.2). Note that CR is affected by the performanceof Base Detector (e.g., conﬁdence threshold), and we [email protected] to denote the certiﬁed recall at a clean recall of 0.x.

In this subsection, we evaluate the clean performance ofDetectorGuard with three different base detectors and twodatasets. In Table 2, we report AP of vanilla Base Detector(AP w/o defense), AP of DetectorGuard (AP w/ defense), andFAR at a clean recall of 0.8 or 0.6 ([email protected] or [email protected]).We also plot the precision-recall and FAR-recall curve forPASCAL VOC in Figure 2 for detailed model analysis, and asimilar plot for MS COCO is in Appendix D.

DetectorGuard has a low FAR and a high AP.

We can seefrom Table 2 that DetectorGuard has a low FAR of 1.5%and a high AP of 98.3% on PASCAL VOC when we usea perfect clean detector as Base Detector. The result showsthat DetectorGuard only has a minimal impact on the cleanperformance.

DetectorGuard is highly compatible with different con-ventional detectors.

From Table 2 and Figure 2, we can seethat when we use YOLOv4 or Faster R-CNN as Base De-tector, the clean AP as well as the precision-recall curve ofDetectorGuard is close to that of its vanilla Base Detector.Furthermore, the [email protected] for PASCAL VOC is as low as4.1% for YOLOv4 and 2.7% for Faster R-CNN. These resultsshow that DetectorGuard is highly compatible with differentconventional detectors.

DetectorGuard works well across different datasets.

Wecan see that the observation of high clean performance issimilar across two different datasets: DetectorGuard achievesa low FAR and a similar AP as the vanilla Base Detector onboth PASCAL VOC and MS COCO (the precision-recall plotfor MS COCO is available in Appendix D). These similarresults show that DetectorGuard is a general approach andcan be used for both easier and challenging detection tasks.

In this subsection, we ﬁrst introduce the robustness evaluationsetup and then report the provable robustness of our defenseagainst any patch hiding attack within our threat model.

Setup.

We use a 32 ×

32 adversarial pixel patch on the re-scaled and padded 416 ×

416 images to evaluate the provablerobustness. We consider all possible image locations as can-didate locations for the adversarial patch to evaluate the modelrobustness. We categorize our results into three categories de-pending on the distance between an object and the patch loca-tion. When the patch is totally over the object, we consider itas over-patch . When the patch partially overlaps with the ob-ject, we consider it as close-patch . The other patch locationsare considered as far-patch . For each patch location and eachobject, we use Algorithm 2 to determine the robustness. Wenote that the above algorithm already considers all possibleadaptive attacks (attacker strategies) within our threat model.We use [email protected] as the robustness metric, and we also reportthe percentage of objects that can be detected by ObjectnessPredictor in the clean setting as Max-CR. We call it Max-CRbecause DetectorGuard can only certify the robustness forobjects that are detected by Objectness Predictor. Given thelarge number of all possible patch locations, we only use a400-image subset of the test/validation datasets for evaluation(due to computational constraints).

DetectorGuard achieves the ﬁrst non-trivial provable ro-bustness against patch hiding attack.

We report the certi-ﬁed recall at a clean recall of 0.8 or 0.6 ([email protected] or [email protected])in Table 3. As shown in Table 3, DetectorGuard can certify therobustness for around 30% of PASCAL VOC objects whenthe patch is far away from the object; which means no attack DPatch [27] demonstrates that even a 20 ×

20 adversarial patch at theimage corner can have a malicious effect. In Appendix A, we show thatmore than 15% of PASCAL VOC objects and 44% of MS COCO objects aresmaller than a 32 ×

32 patch. We also provide robustness results for differentpatch sizes as well as visualizations in Appendix A. able 3: Provable robustness of DetectorGuardPASCAL VOC ([email protected]) MS COCO ([email protected])far-patch close-patch over-patch far-patch close-patch over-patchPerfect clean detector 29.6% 21.9% 7.4% 9.5% 4.9% 2.4%YOLOv4 26.6% 19.9% 7.1% 8.0% 4.7% 2.4%Faster R-CNN 27.9% 21.2% 6.7% 8.6% 4.9% 2.4% within our threat model can successfully attack these certiﬁedobjects. We also plot the CR-recall curve for PASCAL VOCin Figure 3 (a similar plot for MS COCO is in Appendix D).The ﬁgures show that the provable robustness improves asthe clean recall increases, and the performance of YOLOv4and Faster R-CNN is close to that of a perfect clean detectorwhen the recall is close to one. DetectorGuard is especially effective when the patch isfar away from the objects.

From Table 3 and Figure 3, wecan clearly see that the provable robustness of DetectorGuardis especially good when the patch gets far away from theobject. This model behavior aligns with our intuition that alocalized adversarial patch should only have a spatially con-strained adversarial effect. Moreover, this observation showsthat DetectorGuard has made the attack much more difﬁcult:to have a chance to bypass DetectorGuard, the adversary hasto put the patch close to or even over the victim object, whichis not always feasible in real-world scenarios. We also notethat in the over-patch threat mode, we allow the patch to beanywhere over the object. This means that the patch can beplaced over the most salient part of the object (e.g., the faceof a person), and makes robust detection extremely difﬁcult.

Larger objects are more robust than small objects in De-tectorGuard.

To better understand DetectorGuard’s provablerobustness, we plot the histogram of object sizes for PASCALVOC in Figure 4. We categorize all objects into three groups:1) objects that are missed by Objectness Predictor in the cleansetting (missed); 2) objects that are detected by ObjectnessPredictor but are not provably robust (vulnerable); 3) objectsthat are provably robust (robust). As shown in the ﬁgure, mostof the missed and vulnerable objects are in small sizes. Thisis an expected behavior because it is hard for even humansto perfectly detect all small objects. Moreover, consideringthat missing a big object is much more serious than miss-ing a small object in real-world applications, we believe thatDetectorGuard has strong foundational potential.

In this subsection, we take the hypothetical perfect cleandetector (PCD) as Base Detector and use the PASCAL VOCdataset to analyze the performance of DetectorGuard underdifferent hyper-parameter settings. Note that using PCD helpsus to focus on the behavior of Objectness Predictor, which isthe most important component in this paper.

Effect of the binarizing threshold.

We ﬁrst vary the binariz- C e r t i f i e d R e c a ll Max-CRCR-PCD-farCR-PCD-closeCR-PCD-inCR-YOLO-farCR-YOLO-closeCR-YOLO-overCR-FRCNN-farCR-FRCNN-closeCR-FRCNN-over

Figure 3: Provable robustness of DetectorGuard on PASCAL VOC % O b j e c t s PCD-missedPCD-robustPCD-vulnerable

Figure 4: Histograms of object sizes for PASCAL VOC (close-patch;results for far-patch and over-patch are in Appendix D) ing threshold T in O BJ P REDICTOR ( · ) to see how the modelperformance changes. For each threshold, we report CR forthree patch threat models as well as the Max-CR. We alsoinclude AP and 1-FAR to understand the effect of thresholdon clean performance. We report these results in the leftmostsub-ﬁgure in Figure 5. We can see that when the binarizingthreshold is low, the CR is high because more objectness isretained after the binarization. However, more objectness alsomakes it more likely to trigger a false alert in the clean setting,and we can see both AP and 1-FAR are affected greatly as wedecrease the threshold T . Therefore, we need to balance thetrade-off between clean performance and provable robustness.In our default parameter setting, we set T =

10 to have a FARlower than 2% while maintaining decent provable robustness.

Effect of window size.

We consider the effect of using differ-ent window sizes in the second sub-ﬁgure in Figure 5. The11 T C R ( % ) A P / - F A R ( % ) Max-CRCR-over-patchCR-close-patchCR-far-patchAP1-FAR 10 11 12 13 14 15 16 17 18Window size0.00.10.20.30.40.50.6 C R ( % ) A P / - F A R ( % ) Max-CRCR-far-patchCR-close-patchCR-over-patchAP1-FAR 1 2 3 4 5DBSCAN 0.00.10.20.30.40.50.6 C R ( % ) A P / - F A R ( % ) Max-CRCR-over-patchCR-close-patchCR-far-patchAP1-FAR 14 16 18 20 22 24 26 28DBSCAN min_points0.00.10.20.30.40.50.6 C R ( % ) A P / - F A R ( % ) Max-CRCR-over-patchCR-close-patchCR-far-patchAP1-FAR

Figure 5: Effect of different hyper-parameters (left to right: binarizing threshold, window size, DBSCAN ε , DBSCAN min_points ) ﬁgure demonstrates a similar trade-off between provable ro-bustness and clean performance. As we increase the windowsize, each window receives more information from the inputand therefore the clean performance (AP and 1-FAR) im-proves. However, a large window size increases the numberof windows that are affected by the small adversarial patch,and the provable robustness drops. In our default setting, weset the window size to 14 to have a low FAR and good CR. Effect of DBSCAN parameters.

We also analyze the effectof DBSCAN parameters in D ET C LUSTER ( · ) . DBSCAN hastwo parameters ε and min_points . A point is labeled as acore point when there are at least min_points points withindistance ε of it; all core points and their neighbors will bedetected as clusters. We plot the effect of ε and min_points in the right two sub-ﬁgures in Figure 5. As we increase ε or min_points , it becomes more difﬁcult to form clusters. Asa result, the clean performance improves because of fewerdetected clusters and fewer false alerts. However, the provablerobustness (CR) drops due to fewer detected clusters in theworst-case objectness map. We note that though fewer de-tected clusters and fewer false alerts lead to a slight increasein Max-CR, they do not result in a higher CR. In this section, we discuss the limitation, extension, and futurework of DetectorGuard.

Robust object detection without abstention.

In this paper,we have tailored DetectorGuard for attack detection: when noattack is detected, the model makes predictions using conven-tional detectors; when an attack is detected, the model alertsand abstains from making predictions. This type of defenseis useful in application scenarios like autonomous vehicleswhich can give the control back to the driver upon detecting anattack. However, the most desirable notion of robustness is toalways make correct predictions without any abstention. Howto extend DetectorGuard for robust object detection withoutabstention is an interesting direction of future work.

Robust image classiﬁer.

DetectorGuard can be built uponany robust image classiﬁer. In this paper we use the robustmasking of PatchGuard [54], the state-of-the-art provablyrobust image classiﬁer for adversarial patch attacks, as a black-

Algorithm 3

Auxiliary predictor of DetectorGuard procedure A UX P REDICTOR ( D , x ) ˆ D ← {} for i ∈ { , , · · · , num _ detection − } do x min , y min , x max , y max , l ← b ← D [ i ] l (cid:48) ← A UX C LASSIER ( x [ x min : x max , y min : y max ]) if l (cid:48) (cid:54) = “background" then ˆ D ← ˆ D (cid:83) { ( x min , y min , x max , y max , l (cid:48) ) } end if end for return ˆ D end procedure box sub-procedure. As a result, DetectorGuard inherits thelimitations of its underlying image classiﬁer. We note that therobust masking only has limited robustness against multiplepatches. Interestingly, we will show this limitation is lessserious for DetectorGuard in Appendix C. We also emphasizethat DetectorGuard can beneﬁt from any future advances inrobust image classiﬁcation research. Defense against other attacks.

In this paper, we proposeDetectorGuard as a provably robust defense against patchhiding, or false-negative (FN), attacks . Here, we discuss howto extend DetectorGuard for defending against FP attacks.The FP attack aims to introduce incorrect bounding boxes inthe predictions of detectors to increase FP. We can considerFP attacks as a misclassiﬁcation problem (i.e., a boundingbox is given an incorrect label), and thus this attack can bemitigated if we have a “perfect" auxiliary image classiﬁer tore-classify the detected bounding boxes. If the auxiliary clas-siﬁer predicts a different label, we consider it as an FP attackand can easily correct or ﬁlter out the FP boxes. We providethe pseudocode for using an auxiliary classiﬁer (can be anyrobust image classiﬁer) against FP attacks in Algorithm 3.The algorithm re-classiﬁes each detected bounding box in D as label l (cid:48) (Line 5). We trust the auxiliary classiﬁer and addbounding boxes with non-background labels to D (cid:48) (Line 7).Finally, the algorithm returns the ﬁltered detection ˆ D , and wecan replace the D in Line 8 of Algorithm 1 with ˆ D to extendthe original DetectorGuard design. Other improvements.

We propose DetectorGuard as a gen-12ral framework and also a starting point for provably robustdefenses against patch hiding attacks. We hope other ideaslike “co-designing" the entire system instead of using robustclassiﬁers as a black-box procedure can lead to further ro-bustness improvements. Finally, we note that despite the lowabsolute number of certiﬁed recall in Section 5, the notion ofprovable robustness is strong, considering all possible patchlocations and attack strategies within our threat model.

Image Classiﬁcation.

Unlike most adversarial examples thatintroduce a global perturbation with a L p -norm constraint,localized adversarial patch attacks only allow the (arbitrary)perturbation to be within a restricted region. Brown et al. [4]introduced the ﬁrst adversarial patch attack against imageclassiﬁcation. They successfully realized a real-world attackby attaching a patch to the victim object. A few follow-uppapers have studied variants of localized attacks against imageclassiﬁers with different threat models [19, 24, 25]. Object Detection.

Localized patch attacks against object de-tection have also received much attention in the past few years.Liu et al. [27] proposed DPatch as the ﬁrst patch attack againstobject detectors in the digital domain. Lu et al. [28], Chen etal. [7], Eykholt et al. [14], and Zhao et al. [61] proposed dif-ferent physical attacks against object detectors for trafﬁc signrecognition. Thys et al. [47] proposed to use a rigid physicalpatch to evade human detection while Xu et al. [56] and Wuet al. [53] generated successfully non-rigid perturbations onT-shirt to evade detection.

Image Classiﬁcation.

Digital Watermark (DW) [17] andLocal Gradient Smoothing (LGS) [35] were the ﬁrst twoheuristic defenses against adversarial patch attacks. Unfor-tunately, these defenses are vulnerable to an adaptive at-tacker with the knowledge of the defense. A few certiﬁeddefenses [9, 21, 30, 33, 54, 60] have been proposed to providestrong provable robustness guarantee against any adaptiveattacker. Notably, PatchGuard [54] achieves state-of-the-artprovable robustness and clean accuracy by using CNNs withsmall receptive ﬁelds and secure aggregation. In contrast,DetectorGuard aims to transfer the robustness from imageclassiﬁers to object detectors, which builds a bridge betweenthese two domains.

Object Detection.

How to secure object detection is a muchless studied area due to the complexity of this task. Saha etal. [43] demonstrated that YOLOv2 [40] were vulnerable toadversarial patches because detectors were using spatial con-text for their predictions, and then proposed a new trainingloss to limit the usage of context information. To the best of our knowledge, this is the only attempt to secure objectdetectors from patch attacks. However, this defense is basedon heuristics and thus does not have any provable robustness.Moreover, the attack and defense are targeted at YOLOv2only, and it is unclear if the defense generalizes to other detec-tors. In contrast, our defense has provable robustness againstany patch hiding attack considered in our threat model and iscompatible with any state-of-the-art object detectors.

Image Classiﬁcation.

Attacks and defenses for classic L p -bounded adversarial examples [6, 15, 45] have been exten-sively studied. Many empirical defenses [29, 31, 32, 37, 57]were proposed to mitigate the threat of adversarial examples,but were later found vulnerable to adaptive attackers [1, 5, 48].The fragility of the empirical defenses has inspired certiﬁeddefenses that are robust to any attacker considered in the threatmodel [10, 16, 20, 34, 38, 44, 52]. We refer interested readersto survey papers [36, 58] for a more detailed background. Object Detection.

Global perturbations against object detec-tors were ﬁrst studied by Xie et al [55] and followed by manyresearchers [50,51] in different applications. Defenses againstglobal L p perturbations are also very challenging. Zhang etal. [59] used adversarial training (AT) to improve empiricalmodel robustness while Chiang et al. [8] proposed the useof randomized median smoothing (RMS) for building certiﬁ-ably robust object detectors. Both defenses suffer from poorclean performance while DetectorGuard’s clean performanceis close to state-of-the-art object detectors. On PASCAL VOC,AT incurs a ~26% clean AP drop while DetectorGuard onlyincurs a ~1% drop. On MS COCO, both AT and RMS hasa clean AP drop that is larger than 10% while ours is only~2%. We note that we do not compare robustness performancebecause these two works focus on global perturbations andcannot generalize to the localized patch threat model.

In this paper, we propose DetectorGuard, the ﬁrst generalframework for building provably robust object detectorsagainst patch hiding attacks. DetectorGuard includes a gen-eral approach to transfer robustness from image classiﬁersto object detectors. Furthermore, DetectorGuard proposes adetection pipeline to achieve a clean performance that is closeto state-of-the-art detectors, mitigating the trade-off betweenclean performance and provable robustness. Our evaluation onthe PASCAL VOC and MS COCO datasets demonstrates thatDetectorGuard has a high clean performance that is close tostate-of-the-art detectors and also achieves the ﬁrst provablerobustness against patch hiding attacks. RMS [8] did not report results for PASCAL VOC. eferences [1] Anish Athalye, Nicholas Carlini, and David A. Wag-ner. Obfuscated gradients give a false sense of security:Circumventing defenses to adversarial examples. In Proceedings of the 35th International Conference onMachine Learning (ICML) , pages 274–283, 2018.[2] Alexey Bochkovskiy, Chien-Yao Wang, and Hong-Yuan Mark Liao. Yolov4: Optimal speed and accuracyof object detection. arXiv preprint arXiv:2004.10934 ,2020.[3] Wieland Brendel and Matthias Bethge. ApproximatingCNNs with bag-of-local-features models works surpris-ingly well on ImageNet. In , 2019.[4] Tom B. Brown, Dandelion Mané, Aurko Roy, MartínAbadi, and Justin Gilmer. Adversarial patch. In

Ad-vances in neural information processing systems work-shops (NeurIPS Workshops) , 2017.[5] Nicholas Carlini and David A. Wagner. Adversarial ex-amples are not easily detected: Bypassing ten detectionmethods. In

Proceedings of the 10th ACM Workshop onArtiﬁcial Intelligence and Security (AISec@CCS) , pages3–14, 2017.[6] Nicholas Carlini and David A. Wagner. Towards evalu-ating the robustness of neural networks. In , pages 39–57, 2017.[7] Shang-Tse Chen, Cory Cornelius, Jason Martin, andDuen Horng Polo Chau. Shapeshifter: Robust physicaladversarial attack on faster r-cnn object detector. In

JointEuropean Conference on Machine Learning and Knowl-edge Discovery in Databases , pages 52–68. Springer,2018.[8] Ping-yeh Chiang, Michael Curry, Ahmed Abdelkader,Aounon Kumar, John Dickerson, and Tom Goldstein.Detection as regression: Certiﬁed object detection withmedian smoothing. In

Advances in Neural InformationProcessing Systems (NeurIPS) 2020 , volume 33, 2020.[9] Ping-Yeh Chiang, Renkun Ni, Ahmed Abdelkader, ChenZhu, Christoph Studor, and Tom Goldstein. Certiﬁeddefenses for adversarial patches. In , 2020.[10] Jeremy M. Cohen, Elan Rosenfeld, and J. Zico Kolter.Certiﬁed adversarial robustness via randomized smooth-ing. In

Proceedings of the 36th International Conferenceon Machine Learning (ICML) , pages 1310–1320, 2019. [11] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li,and Fei-Fei Li. ImageNet: A large-scale hierarchicalimage database. In , pages 248–255, 2009.[12] Martin Ester, Hans-Peter Kriegel, Jörg Sander, XiaoweiXu, et al. A density-based algorithm for discoveringclusters in large spatial databases with noise. In

Kdd ,volume 96, pages 226–231, 1996.[13] Mark Everingham, Luc Van Gool, Christopher K. I.Williams, John M. Winn, and Andrew Zisserman. Thepascal visual object classes (VOC) challenge.

Inter-national Journal of Computer Vision , 88(2):303–338,2010.[14] Kevin Eykholt, Ivan Evtimov, Earlence Fernandes,Bo Li, Amir Rahmati, Florian Tramer, Atul Prakash,Tadayoshi Kohno, and Dawn Song. Physical adversarialexamples for object detectors. In , 2018.[15] Ian J. Goodfellow, Jonathon Shlens, and ChristianSzegedy. Explaining and harnessing adversarial ex-amples. In , 2015.[16] Sven Gowal, Krishnamurthy Dvijotham, Robert Stan-forth, Rudy Bunel, Chongli Qin, Jonathan Uesato, ReljaArandjelovic, Timothy Arthur Mann, and PushmeetKohli. Scalable veriﬁed training for provably robustimage classiﬁcation. In , pages 4841–4850, 2019.[17] Jamie Hayes. On visible adversarial perturbations &digital watermarking. In , pages 1597–1604, 2018.[18] Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross B.Girshick. Mask R-CNN. In

IEEE International Con-ference on Computer Vision, (ICCV 2017) , pages 2980–2988. IEEE Computer Society, 2017.[19] Danny Karmon, Daniel Zoran, and Yoav Goldberg. La-VAN: Localized and visible adversarial noise. In

Pro-ceedings of the 35th International Conference on Ma-chine Learning (ICML) , pages 2512–2520, 2018.[20] Mathias Lécuyer, Vaggelis Atlidakis, Roxana Geambasu,Daniel Hsu, and Suman Jana. Certiﬁed robustness toadversarial examples with differential privacy. In , pages656–672, 2019.1421] Alexander Levine and Soheil Feizi. (De)randomizedsmoothing for certiﬁable defense against patch attacks. arXiv preprint arXiv:2002.10733 , 2020.[22] Tsung-Yi Lin, Priya Goyal, Ross B. Girshick, KaimingHe, and Piotr Dollár. Focal loss for dense object detec-tion. In

IEEE International Conference on ComputerVision, (ICCV) 2017 , pages 2999–3007. IEEE ComputerSociety, 2017.[23] Tsung-Yi Lin, Michael Maire, Serge J. Belongie, JamesHays, Pietro Perona, Deva Ramanan, Piotr Dollár, andC. Lawrence Zitnick. Microsoft COCO: common ob-jects in context. In

European Conference on Com-puter Vision (ECCV) 2014 , volume 8693, pages 740–755. Springer, 2014.[24] Aishan Liu, Xianglong Liu, Jiaxin Fan, Yuqing Ma, An-lan Zhang, Huiyuan Xie, and Dacheng Tao. Perceptual-sensitive GAN for generating adversarial patches. In

The 33rd AAAI Conference on Artiﬁcial Intelligence,(AAAI) 2019 , pages 1028–1035. AAAI Press, 2019.[25] Aishan Liu, Jiakai Wang, Xianglong Liu, Bowen Cao,Chongzhi Zhang, and Hang Yu. Bias-based universaladversarial patch attack for automatic check-out. In

Eu-ropean conference on computer vision (ECCV) , volume12358, pages 395–410. Springer, 2020.[26] Wei Liu, Dragomir Anguelov, Dumitru Erhan, ChristianSzegedy, Scott E. Reed, Cheng-Yang Fu, and Alexan-der C. Berg. SSD: single shot multibox detector. In

European conference on computer vision (ECCV) , vol-ume 9905, pages 21–37. Springer, 2016.[27] Xin Liu, Huanrui Yang, Ziwei Liu, Linghao Song, YiranChen, and Hai Li. DPATCH: an adversarial patch attackon object detectors. In

AAAI Conference on ArtiﬁcialIntelligence Workshop (AAAI workshop) 2019 , volume2301, 2019.[28] Jiajun Lu, Hussein Sibai, and Evan Fabry. Adver-sarial examples that fool detectors. arXiv preprintarXiv:1712.02494 , 2017.[29] Aleksander Madry, Aleksandar Makelov, LudwigSchmidt, Dimitris Tsipras, and Adrian Vladu. To-wards deep learning models resistant to adversarialattacks. In , 2018.[30] Michael McCoyd, Won Park, Steven Chen, Neil Shah,Ryan Roggenkemper, Minjune Hwang, Jason XinyuLiu, and David Wagner. Minority reports defense: De-fending against adversarial patches. arXiv preprintarXiv:2004.13799 , 2020. [31] Dongyu Meng and Hao Chen. Magnet: A two-prongeddefense against adversarial examples. In

Proceedings ofthe 2017 ACM SIGSAC Conference on Computer andCommunications Security (CCS) , pages 135–147, 2017.[32] Jan Hendrik Metzen, Tim Genewein, Volker Fischer,and Bastian Bischoff. On detecting adversarial pertur-bations. In , 2017.[33] Jan Hendrik Metzen and Maksym Yatsura. Efﬁcientcertiﬁed defenses against patch attacks on image clas-siﬁers. In , 2021.[34] Matthew Mirman, Timon Gehr, and Martin T. Vechev.Differentiable abstract interpretation for provably robustneural networks. In

Proceedings of the 35th Interna-tional Conference on Machine Learning (ICML) , pages3575–3583, 2018.[35] Muzammal Naseer, Salman Khan, and Fatih Porikli. Lo-cal gradients smoothing: Defense against localized ad-versarial attacks. In

IEEE Winter Conference on Appli-cations of Computer Vision (WACV) , pages 1300–1307,2019.[36] Nicolas Papernot, Patrick McDaniel, Arunesh Sinha, andMichael P Wellman. Sok: Security and privacy in ma-chine learning. In , pages 399–414, 2018.[37] Nicolas Papernot, Patrick D. McDaniel, Xi Wu, SomeshJha, and Ananthram Swami. Distillation as a defense toadversarial perturbations against deep neural networks.In

IEEE Symposium on Security and Privacy (S&P) ,pages 582–597, 2016.[38] Aditi Raghunathan, Jacob Steinhardt, and Percy Liang.Certiﬁed defenses against adversarial examples. In , 2018.[39] Joseph Redmon, Santosh Divvala, Ross Girshick, andAli Farhadi. You only look once: Uniﬁed, real-timeobject detection. In

Proceedings of the IEEE conferenceon computer vision and pattern recognition , pages 779–788, 2016.[40] Joseph Redmon and Ali Farhadi. Yolo9000: better,faster, stronger. In

Proceedings of the IEEE confer-ence on computer vision and pattern recognition , pages7263–7271, 2017.[41] Joseph Redmon and Ali Farhadi. Yolov3: An incre-mental improvement. arXiv preprint arXiv:1804.02767 ,2018.1542] Shaoqing Ren, Kaiming He, Ross Girshick, and JianSun. Faster r-cnn: Towards real-time object detectionwith region proposal networks. In

Advances in neuralinformation processing systems , pages 91–99, 2015.[43] Aniruddha Saha, Akshayvarun Subramanya, KoninikaPatil, and Hamed Pirsiavash. Role of spatial context inadversarial robustness for object detection. In

Proceed-ings of the IEEE/CVF Conference on Computer Visionand Pattern Recognition Workshops (CVPR Workshops) ,pages 784–785, 2020.[44] Hadi Salman, Jerry Li, Ilya P. Razenshteyn, PengchuanZhang, Huan Zhang, Sébastien Bubeck, and Greg Yang.Provably robust deep learning via adversarially trainedsmoothed classiﬁers. In

Annual Conference on NeuralInformation Processing Systems 2019 (NeurIPS) , pages11289–11300, 2019.[45] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever,Joan Bruna, Dumitru Erhan, Ian J. Goodfellow, and RobFergus. Intriguing properties of neural networks. In , 2014.[46] Mingxing Tan, Ruoming Pang, and Quoc V Le. Efﬁ-cientdet: Scalable and efﬁcient object detection. In , pages 10781–10790, 2020.[47] Simen Thys, Wiebe Van Ranst, and Toon Goedemé.Fooling automated surveillance cameras: adversarialpatches to attack person detection. In

Proceedings ofthe IEEE Conference on Computer Vision and PatternRecognition Workshops , 2019.[48] Florian Tramer, Nicholas Carlini, Wieland Brendel, andAleksander Madry. On adaptive attacks to adversarialexample defenses. arXiv preprint arXiv:2002.08347 ,2020.[49] Chien-Yao Wang, Alexey Bochkovskiy, and Hong-Yuan Mark Liao. Scaled-YOLOv4: Scaling cross stagepartial network. arXiv preprint arXiv:2011.08036 , 2020.[50] Derui Wang, Chaoran Li, Sheng Wen, Xiaojun Chang,Surya Nepal, and Yang Xiang. Daedalus: Breakingnon-maximum suppression in object detection via ad-versarial examples. arXiv , pages arXiv–1902, 2019.[51] Xingxing Wei, Siyuan Liang, Ning Chen, and XiaochunCao. Transferable adversarial attacks for image andvideo object detection. In Sarit Kraus, editor,

Proceed-ings of the Twenty-Eighth International Joint Confer-ence on Artiﬁcial Intelligence (IJCAI) 2019 , pages 954–960. ijcai.org, 2019. [52] Eric Wong and J. Zico Kolter. Provable defenses againstadversarial examples via the convex outer adversarialpolytope. In

Proceedings of the 35th InternationalConference on Machine Learning (ICML) , pages 5283–5292, 2018.[53] Zuxuan Wu, Ser-Nam Lim, Larry S. Davis, and TomGoldstein. Making an invisibility cloak: Real worldadversarial attacks on object detectors. In

EuropeanConference on Computer Vision (ECCV) 2020 , volume12349, pages 1–17, 2020.[54] Chong Xiang, Arjun Nitin Bhagoji, Vikash Sehwag, andPrateek Mittal. Patchguard: Provable defense against ad-versarial patches using masks on small receptive ﬁelds. arXiv preprint arXiv:2005.10884 , 2020.[55] Cihang Xie, Jianyu Wang, Zhishuai Zhang, Yuyin Zhou,Lingxi Xie, and Alan L. Yuille. Adversarial examplesfor semantic segmentation and object detection. In

IEEEInternational Conference on Computer Vision ( ICCV)2017 , pages 1378–1387. IEEE Computer Society, 2017.[56] Kaidi Xu, Gaoyuan Zhang, Sijia Liu, Quanfu Fan, Meng-shu Sun, Hongge Chen, Pin-Yu Chen, Yanzhi Wang, andXue Lin. Adversarial t-shirt! evading person detectors ina physical world. In

European Conference on ComputerVision (ECCV) 2020 , volume 12350, pages 665–681,2020.[57] Weilin Xu, David Evans, and Yanjun Qi. Feature squeez-ing: Detecting adversarial examples in deep neural net-works. In , 2018.[58] Xiaoyong Yuan, Pan He, Qile Zhu, and Xiaolin Li. Ad-versarial examples: Attacks and defenses for deep learn-ing.

IEEE transactions on neural networks and learningsystems , 30(9):2805–2824, 2019.[59] Haichao Zhang and Jianyu Wang. Towards adversari-ally robust object detection. In ,pages 421–430. IEEE, 2019.[60] Zhanyuan Zhang, Benson Yuan, Michael McCoyd, andDavid Wagner. Clipped bagnet: Defending againststicker attacks with clipped bag-of-features. In , 2020.[61] Yue Zhao, Hong Zhu, Ruigang Liang, Qintao Shen,Shengzhi Zhang, and Kai Chen. Seeing isn’t believ-ing: Towards more robust adversarial attack against realworld object detectors. In

Proceedings of the 2019 ACMSIGSAC Conference on Computer and CommunicationsSecurity , pages 1989–2004, 2019.16 .

00 0 .

02 0 .

04 0 .

06 0 .

08 0 .

10 0 .

12 0 .

14 0 .

16 0 .

18 0 .

20 0 .

22 0 .

24 0 .

26 0 .

28 0 .

30 0 .

32 0 .

34 0 .

36 0 .

38 0 .

40 0 .

42 0 .

44 0 .

46 0 . Object size (%)0.000.050.100.150.20 % O b j e c t s Figure 6: Histogram of PASCAL VOC object sizes .

00 0 .

01 0 .

02 0 .

03 0 .

04 0 .

05 0 .

06 0 .

07 0 .

08 0 .

09 0 .

10 0 .

11 0 .

12 0 .

13 0 .

14 0 .

15 0 .

16 0 .

17 0 .

18 0 .

19 0 .

20 0 .

21 0 .

22 0 .

23 0 . Object size (%)0.000.050.100.150.200.250.300.350.40 % O b j e c t s Figure 7: Histogram of MS COCO object sizes

A Object Size and Patch Size

Recall that in Section 5.4, we use a 32 ×

32 patch on 416 × Small objects are the majority of both two datasets.

InFigure 6 and Figure 7, we plot the histogram of object size(in percentage of pixels) in the test sets of PASCAL VOC andMS COCO. As shown in the plots, small, or even tiny, objectsare the majority of both two datasets. A 32 ×

32 patch takesup 0.6% pixels of a 416 ×

416 image, and our further analy-sis shows that 15.2% objects of PASCAL VOC are smallerthan 0.6% image pixels and 44.5% of MS COCO objects aresmaller than 0.6%. Moreover, more than 36.5% of PASCALVOC objects and more than 66.3% of MS COCO objectsare smaller than a 64 ×

64. These numbers explain why theabsolute numbers of certiﬁed recall in Table 3 are low. InFigure 8, we further provide visualization of a 32 ×

32 patchon the 416 ×

416 image to demonstrate the challenge of per-fect robust detection even when a small patch is presented. Inthe left two examples, the person and the cow are completelyblocked by the adversarial patch and thus are unrecognizable.In the rightmost example, the head of the dog is patched andit is even hard for humans to determine if it is a dog or cat.

Additional evaluation results for different patch sizes.

InFigure 9, we vary the patch size to see how the provablerobustness is affected given different attacker capabilities (i.e,

Figure 8: Visualization of patches on small objects (upper: original416 ×

416 images; lower: images with a 32 ×

32 black patch)

10 20 30 40 50 60 70Patch size (px)0.000.050.100.150.200.250.300.350.400.45 C R ( % ) CR-over-patchCR-close-patchCR-far-patch

Figure 9: Effect of patch size on provable robustness of Detec-torGuard with a perfect clean detector patch sizes). If we consider a smaller patch of 8 × ×

32 patch. From Figure 9, Wecan also see that the CR decreases as the patch size increases.This analysis demonstrates the limits of DetectorGuard aswell the challenge of robust object detection with larger patchsizes. We aim to push this limit further in our future work.

B Discussion on Secure Aggregation

In Section 3.4, we follow PatchGuard [54] to build robustimage classiﬁer. The second step of PatchGuard is to use a se-cure feature aggregation mechanism; design choice includesclipping, robust masking, majority voting [54]. In this sec-tion, we implement the clipping defense to demonstrate thegenerality of our framework.

Clipping for secure aggregation.

From Algorithm 2 in Sec-tion 4.2, we can see that we use the lower bound of classiﬁ-cation logits to perform provable analysis of DetectorGuard. Majority voting can also be regarded as a special case of clipping (i.e.,clip features to [ , ] and then do summation). able 4: Comparison between masking-based and clipping-based defenses of DetectorGuard (using a perfect clean detector) PASCAL VOC MS COCOAP FAR CR-far CR-close CR-over AP FAR CR-far CR-close CR-overmasking-based DetectorGuard 98.3% 1.5% 29.6% 21.9% 7.4% 96.3% 3.8% 9.5% 4.9% 2.4%clipping-based DetectorGuard 98.4% 1.5% 27.6% 19.2% 8.1% 98.0% 2.4% 9.1% 3.9% 2.1%

Therefore, we need a secure aggregation to impose a lowerbound on the logits values. Towards this end, we can clip allfeature values into [ , ∞ ] such that an adversarial patch can-not decrease the values of object classes signiﬁcantly. In itsprovable analysis, we zero out all features within the patch lo-cation(s) and then aggregate the remaining features to obtainthe lower bound of classiﬁcation logits. We compare the per-formance of defenses with clipping-based and masking-basedsecure aggregation in Table 4. As we can see from the table,the clipping-based defense also achieves a similar perfor-mance as robust masking, demonstrating that DetectorGuardis compatible with any provably robust image classiﬁer. C Additional Discussion on Multiple Patches

One major limitation of robust masking in PatchGuard [54] isthat it only focuses on defenses against one single adversarialpatch. Since our implementation in this paper takes robustmasking from PatchGuard as the building-block robust imageclassiﬁer, DetectorGuard also inherits this limitation. How-ever, we note that this limitation is less serious in the contextof object detection.

Multiple patches need to be close to each other and to thevictim object for stronger malicious effects.

Unlike imageclassiﬁcation where the classiﬁer makes predictions basedon all image pixels (or extracted features), an object detectorpredicts each object largely based on the pixels (or features)around the object. As a result, patches that are far away fromthe object only have limited malicious effects, and this claimis supported by our evaluation results in Section 5.4 (i.e.,DetectorGuard is more effective against the far-patch threatmodel). Therefore, multiple patches should be close to the vic-tim object and hence close to each other for a more effectiveattack. In this case, the multiple-patch threat model becomessimilar to the one-patch model since patches are close to eachother and can merge into one single patch. That is, we canuse one single patch of a larger size to cover all perturbationsin multiple small patches.

Quantitative analysis of clipping-based DetectorGuardagainst multiple patches.

As shown in Appendix B, Detec-torGuard is also compatible with clipping-based robust imageclassier. One advantage of the clipping-based robust classiﬁeris its robustness against multiple patches. As long as the sub-procedure RC-PA ( · ) of the clipping-based robust classiﬁercan return non-trivial bounds of classiﬁcation logits, we candirectly plug the sub-procedure into our Algorithm 2 to ana- Table 5: Provable robustness (CR) of DetectorGuard (using a perfectclean detector) against multiple patches (evaluated on 50 PASCALVOC images with a subset of patch locations) far-patch close-patch over-patchone 32 ×

32 patch (1024 px) 33.3% 27.8% 11.8%two 32 ×

32 patches (2048 px) 33.3% 22.9% 1.4%two 24 ×

24 patches (1152 px) 33.3% 26.4% 7.6%two 16 ×

16 patches (512 px) 32.6% 27.7% 9.0% lyze the robustness against multiple patches. In this section,we aim to use the clipping-based DetectorGuard variant todemonstrate our robustness against multiple patches.We note that despite the theoretical possibility to defendagainst attacks with multiple patches, its quantitative evalua-tion for provable robustness is extremely expansive due to thelarge number of all possible combinations of multiple patch lo-cations. Consider a 32 ×

32 patch on a 416 ×

416 image. Thereare 148k possible patch locations (or 1.6k feature-space loca-tions). If we are using 2 patches of the same size, the numberof all location combinations becomes higher than 10 (or1.4M feature-space location combinations)!In order to provide a proof-of-concept for defense againstmultiple patches, we perform an evaluation on 50 PASCALVOC images using a subset of patch locations (1/16 of alllocation combinations). The results are reported in Table 5.As shown in the table, DetectorGuard is able to defend againstmultiple patches. Moreover, if we compare provable robust-ness against one 32 ×

32 (1024 px) and two 24 ×

24 patches(1152 px), which have a similar number of pixels, we canﬁnd that using two smaller patches (two 24 ×

24 patches) isonly more effective for over-patch threat model but not forfar-patch and close-patch threat models. This observation sup-ports our previous theoretical analysis.

D Additional Experiment Results

In this section, we include additional evaluation results onthe histogram of object sizes and plots for DetectorGuard’sclean/provable performance on MS COCO. The observationis similar to that in Section 5.

Histogram of object sizes.

In Figure 10, we provide addi-tional evaluation results for the histogram of object sizes. Theobservation is similar to Figure 4 in Section 5.

Additional plots for MS COCO.

We plot the clean perfor-mance and the provable robustness on MS COCO in Figure 11and Figure 12. The observation is similar to that on PASCAL18 .0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8Object size (%)0.000.050.100.150.200.25 % O b j e c t s PCD-missedPCD-robustPCD-vulnerable 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8Object size (%)0.000.050.100.150.200.25 % O b j e c t s PCD-missedPCD-robustPCD-vulnerable 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8Object size (%)0.000.050.100.150.200.25 % O b j e c t s PCD-missedPCD-robustPCD-vulnerable

Figure 10: Histograms of object sizes for PASCAL VOC (left: far-patch; middle: lose-patch; right: over-patch) P r e c i s i o n / F A R Precision-PCDPrecision-PCD-DPGFAR-PCD-DPGPrecision-YOLO-VPrecision-YOLO-DGFAR-YOLO-DGPrecision-FRCNN-VPrecision-FRCNN-DGFAR-FRCNN-DG

Figure 11: Clean performance of DetectorGuard on MS COCO C e r t i f i e d R e c a ll Max-CRCR-PD-farCR-PD-closeCR-PD-inCR-YOLO-farCR-YOLO-closeCR-YOLO-overCR-FRCNN-farCR-FRCNN-closeCR-FRCNN-over

Figure 12: Provable robustness of DetectorGuard on MS COCO

VOC (Figure 2 and Figure 3).

E Justiﬁcation for Defense Objective

In Section 2.3, we allow DetectorGuard to only detect part ofthe object or to trigger an attack alert on adversarial images.In this section, we discuss why this is a reasonable defenseobjective and how to extend DetectorGuard for a strongernotion of robustness.

Partially detected bounding box.

We note that we allowthe patch to be anywhere, even over the salient object. As a result, the patch likely covers a large portion of the object(visualization examples include the right part of Figure 1 andFigure 8; see Appendix A for more details of object sizes andpatch sizes). Therefore, it is reasonable to allow the model tooutput a smaller bounding box. If we consider the applicationscenario of autonomous vehicles (AV), partially detecting apedestrian or a car is already sufﬁcient for an AV to make acorrect decision.Moreover, we can tune hyper-parameters such as binarizingthreshold T to increase the objectness in the output of Ob-jectness Predictor. More objectness will force the adversaryto let Base Detector predict a larger bounding box in orderto reduce unexplained objectness that will otherwise lead toan attack alert. However, we note that more objectness alsomakes it more likely for DetectorGuard to trigger a false alerton clean images. This trade-off between robustness and cleanperformance should be carefully balanced. F Pixel-space and Feature-space Windows

Recall that in Section 3.2, we discussed that the sliding win-dow can be either in the feature space or the pixel space. In thissection, we compare pixel-space and feature-space windowsand discuss how to map bounding boxes from pixel-space tofeature-space.

Pixel-space and feature-space window.

As discussed in Sec-tion 3.1, almost all state-of-the-art image classiﬁers and objectdetectors use CNNs as their backbone. The convolution op-eration in CNN preserves the spatial relationship: the nearbyCNN features are extracted from nearby input pixels. Thisproperty enables us to use a sliding window in either pixelspace or feature space. Using a feature-space window canreuse the extracted feature map and reduce computationaloverhead, and thus the idea is widely used in state-of-the-artobject detectors. In our implementation and experiments, wealso choose a feature-space sliding window for efﬁciency.However, in order to bound the number of corrupted features,we need to use CNN with small receptive ﬁelds to achieverobustness. In this paper, we use BagNet-33 [3] for featureextraction. Next, we will discuss how to map bounding boxes19rom pixel space to feature space.

Box mapping.

For each pixel-space box ( x min , y min , x max , y max ) , we calculate the feature-space coor-dinate x (cid:48) min = (cid:98) ( x min − r ) / s (cid:99) , y (cid:48) min = (cid:98) ( y min − r ) / s (cid:99) , x (cid:48) max = (cid:100) x max / s (cid:101) , y (cid:48) max = (cid:100) y max / s (cid:101) , where r , s are the size andstride of the receptive ﬁeld size. The new feature-spacecoordinates indicate all features that are affected by thepixels within the pixel-space bounding box. We note that themapping equation might be slightly different given differentimplementation of CNNs with small receptive ﬁelds. In ourBagNet implementation, we have r = , s = G Details of BagNet

BagNet [3] was originally proposed for interpretable machinelearning. It inherits the high-level architecture of ResNet-50and replaces 3x3 convolution kernels with 1x1 ones to reducethe receptive ﬁeld size. The authors designed three BagNetarchitectures with a small receptive ﬁeld of 9 ×

9, 17 ×

17, and33 ×

33, in contrast to ResNet-50 having a receptive ﬁeld of483 × ×

17 receptive ﬁeld can achieve a similar top-5 accuracyas AlexNet [3]. In recent works [54, 60] on adversarial patchdefense, BagNet has been adopted to bound the number ofcorrupted features to enhance robustness.

H Details of PatchGuard

Recall that Objectness Predictor is built upon a robust imageclassiﬁer and we take the robust masking of PatchGuard [54] as an instance in this paper. The robust masking algorithm ﬁrstuses a CNN with small receptive ﬁelds to extract the featuremap. The small receptive ﬁeld limits the number of corruptedfeatures and turns the robust classiﬁcation problem into thatof doing secure aggregation from a partially corrupted featuremap. Due to the limited number of corrupted features, theadversary tends to create large malicious feature values (orlogits values) to dominate the ﬁnal aggregated prediction. Tolimit the inﬂuence of malicious values, the robust maskingalgorithm uses a sliding mask over the extracted feature mapof BagNet and mask out the region with the highest class evi-dence for each of the classes. Xiang et al. [54] prove that thismasking operation imposes an upper bound and lower boundfor the class evidence of each class. If the upper bound of anywrong class evidence is no larger than the lower bound of thetrue class evidence. PatchGuard can certify the robustnessof the image classiﬁcation. We refer interested readers to itsoriginal paper [54] for more details of robust masking design.We note that in this paper we are using the tighter provableanalysis in Appendix E of the PatchGuard paper [54]. Finally,we wish to emphasize that