[PDF] PredCoin: Defense against Query-based Hard-label Attack

Abstract

Many adversarial attacks and defenses have recently been proposed for Deep Neural Networks (DNNs). While most of them are in the white-box setting, which is impractical, a new class of query-based hard-label (QBHL) black-box attacks pose a significant threat to real-world applications (e.g., Google Cloud, Tencent API). Till now, there has been no generalizable and practical approach proposed to defend against such attacks. This paper proposes and evaluates PredCoin, a practical and generalizable method for providing robustness against QBHL attacks. PredCoin poisons the gradient estimation step, an essential component of most QBHL attacks. PredCoin successfully identifies gradient estimation queries crafted by an attacker and introduces uncertainty to the output. Extensive experiments show that PredCoin successfully defends against four state-of-the-art QBHL attacks across various settings and tasks while preserving the target model's overall accuracy. PredCoin is also shown to be robust and effective against several defense-aware attacks, which may have full knowledge regarding the internal mechanisms of PredCoin.

Full PDF

PPredCoin: Defense against Query-based Hard-label Attack

Junfeng Guo , Yaswanth Yadlapalli , Thiele Lothar , Ang Li , and Cong Liu U T Dallas ET H Zurich Google Deep Mind

Abstract

PredCoin , a practicaland generalizable method for providing robustness againstQBHL attacks.

PredCoin poisons the gradient estimation step,an essential component of most QBHL attacks.

PredCoin suc-cessfully identiﬁes gradient estimation queries crafted by anattacker and introduces uncertainty to the output. Extensiveexperiments show that

PredCoin successfully defends againstfour state-of-the-art QBHL attacks across various settings andtasks while preserving the target model’s overall accuracy.

PredCoin is also shown to be robust and effective against sev-eral defense-aware attacks, which may have full knowledgeregarding the internal mechanisms of

PredCoin . In recent years, Deep Neural Networks (DNNs) have beenpervasively applied to many domains, such as facial recog-nition [37, 45], autonomous driving [41, 47, 48, 58], natu-ral language processing [14, 17, 18, 56], binary code anal-ysis [33,43,55,59], bug detection [4,32,40,53], etc. However,a series of vulnerabilities have been discovered by researchersfrom both academia and industries [5, 8, 10, 20, 23, 29, 31, 35,42, 52], which prevent DNN deployment in safety-critical ap-plications (e.g., autonomous driving [2, 29, 47, 48, 58], healthrobotics [26, 51]). Among the existing vulnerabilities, gener-ating adversarial examples [5, 8, 10, 20, 29, 31, 52] is one ofthe most critical issues, attracting attention from academiaand industry [6, 8, 36, 50, 52, 57].Adversarial attacks are broadly categorized into white-boxattacks [5, 8, 10] and black-box attacks [3, 9, 12, 13, 25, 31, 31, 38]. White-box attacks assume that the attacker has full knowl-edge of the target model (i.e., architecture and parameters),enabling the attacker to craft adversarial samples through opti-mization methods with known target model’s weights. In con-trast, the black-box setting assumes the attacker is restrictedonly to send queries and observe their predictions from thetarget model [3,9,12,13,25,31,38]. Depending on the informa-tion obtained from the target model’s prediction, black-boxattacks can be further divided into soft-label [25, 31] andhard-label attacks [9, 12, 13, 38]. Soft-label attacks [25, 31]assume the attacker can obtain the probability distributionacross classes for each given input, while hard-label attacks[9, 12, 13, 38] assume that the attacker only gets the ﬁnal deci-sion. We focus on hard-label attacks in this paper, which aremore realistic in practice and may pose a signiﬁcant threatto real-world applications due to the minimal knowledge re-quirements.Till now, only two types of hard-label attacks have beenproposed. The ﬁrst type utilizes surrogate models [38] butwas proven to be impractical by recent studies [34, 50]. Thesecond type, Query-Based Hard-Label attack (shortened asQBHL attack throughout the paper), utilizes outputs froma set of carefully crafted queries. QBHL attacks have beenempirically shown (see related works in Sec. 2) to be effectivein many real-world applications, such as Google Cloud [21]and Tencent Image Classiﬁcation API [46].Compared to the rapid development of QBHL attack tech-niques, the defensive methods against QBHL attacks remainunderexplored. There are very few works on defense againstQBHL attacks in the literature. Two existing defenses [11,30]against black-box QBHL attacks exhibit inherent shortcom-ings with several impractical assumptions, restricting theirapplicability (see related works in Sec. 2). On the other hand,most defenses against white-box QBHL attacks can be by-passed by the latest QBHL black-box attack [12]. Empiri-cally evidenced, the most robust existing defense methodsagainst QBHL are the optimization-based approaches (e.g.,adversarial training [36] and TRADES [57]). However, theseapproaches are not scalable in practice due to extensive re-1 a r X i v : . [ c s . CR ] F e b raining overheads, limiting their applicability.This paper proposes PredCoin , a practical and generalizableframework for providing robustness against QBHL black-boxattacks by exploring the properties of the gradient estimationstep, which is an essential step within the most successfulQBHL attacks. The key idea behind

PredCoin is to identifywhen an attacker sends adversarial queries to estimate thegradient of the decision boundary in the target model. Whenan input is identiﬁed as an adversarial query,

PredCoin in-troduces non-determinism in the output, ensuring that theunderlying gradient is not inferable. Supported by a set of an-alytical reasoning (detailed in Sec. 4),

PredCoin can preservethe overall accuracy and provide robustness against QBHLattacks for the target model.We evaluate

PredCoin on 4 state-of-the-art QBHL attacks:Boundary Attack (BA) [3], Sign-OPT Attack (Sign-OPT)[13], HopSkipJumpAttack (HSJA) [9] and Sign-based FlipAttack (SFA) [12]. For each attack, we evaluate the robustnessof

PredCoin under four tasks: MNIST, CIFAR-10, GTSRB,and ImageNet in both (cid:96) and (cid:96) ∞ settings. Experimental resultsshow that PredCoin is effective against the state-of-the-artQBHL attacks, even under excessive query budgets. Impor-tantly,

PredCoin is shown to be robust against defense-awareattacks, even considering the worst-case scenario where anattacker knows the detailed internal mechanism of

PredCoin .Moreover,

PredCoin is shown to be superior compared to ex-isting defensive methods against QBHL attacks. Additionally,compared to optimization-based methods against black-boxQBHL attacks,

PredCoin is shown to be more scalable as itincurs much lower training time and accuracy loss.In summary, our work contributes in the following ways:• We propose a practical and generalizable defensiveframework against black-box query-based hard-label at-tacks via invalidating the essential gradient estimationstep.

PredCoin is a certiﬁable defense, fundamentallysupported by analytical reasoning on this invalidationprocess.• We perform extensive experiments under various tasksand settings to evaluate

PredCoin against four state-of-the-art QBHL attacks. Results show

PredCoin signif-icantly improves the robustness of the model againstQBHL attacks in both (cid:96) and (cid:96) ∞ settings, even assumingan excessive query budget (50K queries).• PredCoin is shown to be robust and effective againstdefense-aware attacks where the attacker may know

PredCoin ’s internal mechanisms. Also interestingly,

PredCoin can be combined with state-of-art white-boxdefensive methods to further enhance the robustnessagainst query-based hard-label attacks.

In the context of single-label classiﬁcation, the inferenceof DNNs can be formulated as y = F ( x ; θ ) . Speciﬁcally, a DNN F ( x ; θ ) : R d → R m maps input x ∈ [ , ] d to y ∈ { y ∈ [ , ] m | ∑ mi = y i = } through a series of computa-tions. Each y i = F i ( x ; θ ) represents the predictive conﬁdencescore for x belonging to label i in the label set [ m ] : { , ..., m } .Typically, a DNN-based classiﬁer C will assign input x to theclass c , i.e., C ( x ) = c : { c ∈ [ m ] | c = arg max i F i ( x ; θ ) } . ADNN model may either give the user y ∈ [ , ] m (soft-label)or a class c ∈ [ m ] (hard-label) for each input x . In this paper,we focus on the hard-label scenario. Deﬁnition of Query-Based Hard-Label Attack.

In the con-text of adversarial machine learning [8, 9, 12, 20, 52], evasionattacks refer to the task of producing an adversarial exam-ple x t given an input x ∗ with its correct label c ∗ . However,according to the goal of the attacker, evasion attacks can befurther categorized into targeted and untargeted attacks. Thegoal of an untargeted attack is to craft x t which would bemisclassiﬁed as any c ∈ [ m ] \ c ∗ by the target model, whereasthe goal of a targeted attack is to craft x t , which would bemisclassiﬁed as certain pre-speciﬁed c t ∈ [ m ] \ c ∗ . Consistentwith prior defensive methods, we focus on defending the un-targeted attack since the untargeted attack is harder for thedefender [8, 27, 36].We adopt two most common conditions an adversarialsample x t has to satisfy in the context of an untargeted attack:• S x ∗ ( x t ) ≥

0, where function S x ∗ : R d → R is deﬁned as: S x ∗ ( x t ) = max c (cid:54) = c ∗ F c ( x t ; θ ) − F c ∗ ( x t ; θ ) ; (1)• (cid:96) p = (cid:107) x t − x ∗ (cid:107) p ≤ ε for p ∈ { , ∞ } . ε is an arbi-trarily small constant which ensures x t is visually-indistinguishable to x ∗ .To indicate success of x t against input x ∗ a new Boolean-valued function is deﬁned φ x ∗ : [ , ] d → {− , } φ x ∗ ( x t ) = sign ( S x ∗ ( x t )) = (cid:26) , if S x ∗ ( x t ) > , − , otherwise. (2)Evasion attacks are categorized into white-box and black-box attacks, based on the attacker’s knowledge speciﬁcations.A white-box attacker has access to the target model C as wellas its learned parameters θ [5, 8, 10]. As a fundamental stepto explore the vulnerabilities within DNNs, numerous white-box attacks have been proposed [5, 8, 10, 29]. Meanwhile, ablack-box attacker is blind to the target model’s structure andparameters. The attacker can only obtain the target model’soutput for a given input. Moreover, in the hard-label setting,the attacker can only get the ﬁnal predicted label c ∈ [ m ] . Related Works.

Prior works generate adversarial samplesby observing the target model’s outputs for a set of cleverlycrafted queries [3, 9, 12, 13, 38]. Papernot et al. [38] presenta practical black-box attack (PA), which crafts transferableadversarial samples by constructing (using queries) a surro-gate model whose decision boundaries are close to the targetmodel. However, Chen et al. [34] report PA to be ineffective2nd query-inefﬁcient for large-scale datasets (CIFAR-10, Im-ageNet). Li et al. [25] present an approach that can effectivelyestimate the target model’s gradient in the soft-label settingbut is ineffective and query-inefﬁcient in the hard-label setting.More recent works [3, 9, 12, 13] propose methods to improvethe query-efﬁciency for hard-label attacks. Brendel et al. [3]proposed the decision boundary attack (BA), which ﬁrst ini-tializes the adversarial image in the target class. Iteratively,the distance between the adversarial image and the originalinput is reduced by sampling a perturbation from a Gaussianor uniform distribution. BA returns the adversarial image,which is misclassiﬁed by the target model. Cheng et al. [13]introduced the Sign-Opt attack, which ﬁnds a direction vectorpointing towards the nearest decision boundary points from x ∗ .Chen et al. [9] proposed HopSkipJumpAttack (HSJA), whichimproves upon the decision-boundary attack by introducinga novel gradient direction estimation, achieving the best per-forming hard-label attack in both (cid:96) and (cid:96) ∞ settings with aquery budget of 5K. Chen et al. [12] propose the Sign Flipattack (SFA), which utilizes an evolutionary algorithm [15] toimprove query-efﬁciency. In this work, we focus on these fourstate-of-art Query-based Hard-label attacks (QBHL): BA [3],Sign-OPT [13], HSJA [9], SFA [12].Unfortunately, there is no generalizable and practical de-fense approach against hard-label black-box attacks. Recentrelated works [11, 30] focus on detecting black-box attacksby storing all the queries sent by an attack and classifyinga user as malicious when similar queries are seen. Thesemethods have two serious disadvantages. On the one hand,there is no evidence showing that an average user will notask such queries. On the other hand, they fail to considerscenarios where an attacker could own many user/zombieaccounts. HSJA [9] proposed a potential defense method toassign an "Unknown" class for low conﬁdence inputs, whichis effective against targeted attacks but fails in the case ofuntargeted attacks. Finally, defensive methods for white-boxattacks are ineffective against some query-based hard-labelattacks [9,12]. Other effective defensive methods have severalpractical issues [36, 50]. Our work is the ﬁrst generalizableand practical approach that provides robustness against black-box query-based attacks to the best of our knowledge. We considered a consistent model with the recent set of state-of-the-art hard-label attacks [9], where the attacker aims togenerate an adversarial sample x t from a given input image x ∗ . The attacker can only obtain the hard-label prediction ona limited ( ≤ C . Defender Goal.

For each given DNN model C , the defenderaims to design a mechanism that prevents the attacker fromcrafting adversarial samples under the attack model whilehaving a negligible effect on the performance (accuracy andcomputational cost) of C . Defender Capability.

We assume the defender can only ac-cess the current query to achieve its goal. Furthermore, thedefender has access to a validation set of images and theircorresponding labels.

PredCoin

This section describes our analysis of the gradient estima-tion method, which is standard for all state-of-the-art QBHLattacks [3, 9, 12, 13]. In addition, we prove why

PredCoin could provide certiﬁcated robustness against attacks such asHSJA [9], Boundary Attack [3], and Sign-opt Attack [13]while preserving the overall accuracy of the target model.

Overview of state-of-the-art QBHL attacks.

Fig. 1 illus-trates several QBHL techniques [3, 9, 13], which contain vari-ous procedures (e.g., binary search, rejection sampling, gradi-ent estimation) to ensure their effectiveness. Take the exam-ple of HSJA, which starts with a binary search to force theadversarial sample x t to approach a decision boundary andthen estimates the gradient ∇ S x ∗ ( x t ) || ∇ S x ∗ ( x t ) || through Monte Carlosampling. Other QBHL techniques (BA, Sign-OPT) exhibitworkﬂows that contain similar gradient estimations, which isessential for their efﬁcacy. Intuitively, if we could invalidatethe gradient estimation step, then the most successful QBHLtechniques are mitigated.As for HSJA, through theoretical analysis, we ﬁnd thatMonte Carlo sampling requires two critical conditions to betrue for accurately estimating the gradient. As shown in thedashed box in Fig. 1, breaking one of the two conditionshinders the gradient estimation step.

Analysis of Gradient Estimation.

Without loss of generality,we analyze the gradient estimation step used in HSJA as it isa state-of-the-art generalized QBHL technique useful in both (cid:96) and (cid:96) ∞ settings. Technically, HSJA is incremental uponexisting attacks [3], with an improved gradient estimationthat utilizes binary information at the nearest decision bound-ary [9]. Intuitively, if PredCoin could defend against HSJA byinvalidating its gradient estimation, it would likely be resilientto other QBHL methods. As shown in Fig. 1, HSJA’s gradientestimation step aims to determine the direction of ∇ S x ∗ ( x t ) Note that the rejection sampling step in the Boundary Attack (BA) whichseeks the minimal δ for x t , has the same goal as gradient estimation. RMid

Binary Search Gradient Estimation

Update

Monte CarloSampling

We can Break

PoweredBy ： HopSkipJump Attack Boundary Attack Sign-OPT Attack

Rejection Sampling

Update

Gradient Estimation

Fig. 1: An Overview Illustration of Boundary Attack, Sign-opt, and HSJA.through Monte Carlo sampling: (cid:102) ∇ S x ∗ ( x t , δ ) = B B ∑ b = φ x ∗ ( x t + δ u b ) u b , (3)where { u b } Bb = are i.i.d draws from a uniform distribution overa d-dimensional sphere and δ is a small positive parameter.Theorem 2 from the HSJA paper [9] proves that if x t is aboundary point of the decision function, then the followingholds: lim δ → cos ∠ ( E [ (cid:102) ∇ S x ∗ ( x t , δ )] , ∇ S x ∗ ( x t )) = . (4)The proof for Eq. 4 is given in Appendix A for reference.We analyze the steps in this proof to ﬁgure out a feasibledefense methodology that could disrupt the computation ofEq. 3. Broadly, the gradient estimation works because E [ | β | ] satisﬁes the following equations: || (cid:102) ∇ S x ∗ ( x t , δ ) − E [ | β | v ] || ≤ q , (5) E [ | β | v ] = B B ∑ b = | β b | ∇ S x ( x t ) || ∇ S x ( x t ) || = ∇ S x ( x t ) || ∇ S x ( x t ) || E [ | β | ] . (6)where β , v , and q are as deﬁned in Appendix A. CombiningEq. 5 and Eq. 6, we have a bound on the accuracy of gradientestimate as: cos ∠ ( (cid:102) ∇ S x ∗ ( x t , δ ) , ∇ S x ( x t )) ≥ − ( q E [ | β | ] ) (7)Through further exploring the distribution of q , Eq. 5 canbe bounded to 1, which is proved in Theorem 1 given inthe Appendix. Thus, to ensure that the accuracy of gradientestimation is no longer bounded, we can disrupt Eq. 5 and/orEq. 6. We ﬁnd it is much easier to manipulate Eq. 6, detailsof which are in the next section. (cid:102) ∇ S x ∗ ( x t ) As previously stated,

PredCoin transforms the model’s be-havior when the attacker tries to do gradient estimation (i.e.,compute Eq. 3). We observe that when a gradient estimationquery x t + δ u b is sent to the model if we return φ x ∗ ( x t + δ u b ) ,which is deﬁned as: φ x ∗ ( x t + δ u b ) = (cid:26) φ x ∗ ( x t + δ u b ) with . probability − φ x ∗ ( x t + δ u b ) with . probability (8)instead of φ x ∗ ( x t + δ ) to the attacker, then the accuracy ofgradient estimation drops signiﬁcantly. Plugging in S x ∗ (cor-responding to φ x ∗ ) in place of S x ∗ into Eq. 6, we have: E [ | β | v ] = B B ∑ b = | β b | ∇ S x ∗ ( x t ) || ∇ S x ∗ ( x t ) || → . (9)Here ∇ S x ∗ ( x t ) denotes a random variable that reﬂects thenon-deterministic behavior introduced by Eq. 8. Intuitively,and through empirical observations, since φ x ∗ ﬂips with aprobability of 50%, the sum of ∇ S x ∗ ( x t ) over randomly sam-pled queries should approach zero. Hence, by applying ourdefense approach, E [ | β | v ] becomes irrelevant to the actual ∇ S x ∗ ( x t ) and E [ | β | ] →

0. This ensures that the accuracy ofthe gradient estimate is no longer bounded (Eq. 7). Hence thecomputation of Eq. 4 results in inaccurate gradients estimates( i.e., cos ∠ ( (cid:103) ∇ S x ∗ ( x t ) , ∇ S x ∗ ( x t )) ≥ − φ x ∗ with φ x ∗ also affects the accuracy ofthe target model. Hence, the mechanism for detecting thesequeries need to be very accurate. To address this issue, we usea second neural network F Q as a classiﬁer to help distinguishbetween regular inputs ( x ∗ ) and queries for gradient estimation( x t + δ u b or x t ). From our evaluations (see Sec. 5.2), we knowthat such an F Q can distinguish between normal inputs andadversarial samples with the accuracy of s ( ≥ ) . The4xpected accuracy loss ( (cid:52) acc ) of the target model can bebounded as follows: E [ (cid:52) acc ] ≤ − s , (10)As s → PredCoin will preserve the target model’s ac-curacy.

PredCoin on Other Components

In the case of HSJA,

PredCoin also affects the steps other thangradient estimation, namely the binary searches and the geo-metric progression, since they also involve sending queries tothe target model. While those methods ensure HSJA’s efﬁcacy,they are not as essential as gradient estimation, as evidencedby its prevalence in other QBHL techniques.SFA [12] utilizes an evolutionary algorithm [15] to improveits query-efﬁciency and is very effective in the (cid:96) ∞ setting. Still,our approach provides signiﬁcant robustness against SFA, asshown in Sec. 6. PredCoin

PredCoin

Overview

In this section, we describe the design of

PredCoin . Pred-Coin is an inference phase defensive method consisting oftwo DNN models: F Q , which determines if an input ( x ) isadversarial ( x t ), and C , which is the same as the target model.The architecture of PredCoin is illustrated in Fig. 2. For eachinput x , C is used to calculate F ( x ; θ ) . Next, F Q uses F ( x ; θ ) to determine if x is adversarial. Finally, if x is adversarial witha 50% probability, we replace C ( x ) with the label having thesecond-largest conﬁdence score within F ( x ; θ ) , else we pre-dict the label of x as C ( x ) . Note that all the above operationscan be done in parallel on a GPU for batches of inputs withoutI/O between devices, ensuring low inference computationalcost. F Q , as supported by our analysis in Sec. 4.2, needs tovery accurate. In the next section, we describe how to createsuch an effective F Q . F Q In the following, we investigate two options to construct thedetection network F Q . As we will see, Options 2 is superiordue to its detection capabilities and independence of C . Option 1: Taking x as input. A straightforward way to build F Q , similar to current methods for detecting adversarial sam-ples [7], is to take x as input and output a binary variable thatdetermines if it is adversarial. Here, F Q manifests as a DNNwith a similar structure as C except for the ﬁnal layer.We ﬁrst create D (cid:48) tr : { G ( x ) | x ∈ D tr } and D (cid:48) te : { G ( x ) | x ∈ D te } for the training phase, where G ( x ) represents the Monte Layer Type Model SizeFull Connected (Input Layer) the size of F ( x ; θ ) Fully Connected + ReLU 64Fully Connected + ReLU 64Fully Connected + ReLU 32Softmax (Output Layer) 2Table 1: DNN Structure of F i Q Carlo sampling procedure in HSJA. The Monte Carlo sam-pling takes a regular input x ∗ and returns a randomly selectedadversarial sample x t . D tr and D te are the training and testingdatasets for C . We label D tr , D te as x ∗ , and D (cid:48) tr , D (cid:48) te as x t , andtrain F Q with D tr ∪ D (cid:48) tr . During the validation procedure, weobserve that F Q has low accuracy on D te ∪ D (cid:48) te ( ≤ G ( x ) and x , especially underan untargeted attack scenario, which may lead to overﬁtting.However, similar to related works [19, 22], this approach re-quires a large amount of training data that belongs to the samedistribution as D tr , thus limiting its real-world applicability. Option 2: Taking F ( x ; θ ) as input. To address the chal-lenge of distinguishing x t from x ∗ , we infer additional infor-mation from the given input. Membership Inference Attack(MIA) [44] successfully identiﬁes the distribution of an inputbased on its prediction vector F ( x ; θ ) from the target model C . Motivated by this approach, we build F (cid:48) Q through shadowmodels F iQ ( i ∈ [ , m ] ): R m → R for each label i of C , whichtakes F ( x ; θ ) as input and gives a score vector ( y x t , y x t ) , where y x t and y x t represent the conﬁdence of x predicted as x t and x ∗ respectively, such that y x t + y x t = y x t , y x t ∈ ( , ) . TheDNN structure of F iQ is shown in Table 1. We deﬁne: F (cid:48) Q ( F ( x ; θ )) = F iQ ( F ( x ; θ )) where i = C ( x ) To validate our approach, we follow the MIA training con-ﬁgurations and deﬁne the following four sets: F (cid:48) itr : { F ( x ; θ ) | x ∈ D (cid:48) tr ∧ i = C ( x ) } , F itr : { F ( x ; θ ) | x ∈ D tr ∧ i = C ( x ) } , F (cid:48) ite : { F ( x ; θ ) | x ∈ D (cid:48) te ∧ i = C ( x ) } , F ite : { F ( x ; θ ) | x ∈ D te ∧ i = C ( x ) } , where D (cid:48) tr and D (cid:48) te are obtained in the same way as in option1. In an iteration, we train and validate each F iQ ∈ F (cid:48) Q with F (cid:48) itr ∪ F itr and F (cid:48) ite ∪ F ite , respectively. We label { F ( x ; θ ) | x ∈ x ∗ } and { F ( x ; θ ) | x ∈ x t } as x ∗ and x t , respectively. We usetwo metrics to evaluate the performance of the MIA-basedapproach:• False positive rate (FP).

Number of times x ∗ is incor-rectly identiﬁed as x t by F (cid:48) Q , which inﬂuences (cid:52) acc .5 anda:0.7 Monkey:0.2

Bird:0.1

Pandas ....

Step 1 Step 2

Monkey

Step 3

Return Label

PandasOr

Fig. 2: The Architecture of

PredCoin

Attack Method FP(%) FN(%) Accuracy(%)BA 2.16 8.82 94.51Sign-OPT 2.16 2.42 97.71HSJA 2.16 2.41 97.71SFA 2.16 2.47 97.69Table 2:

False Positive , False Negative rates of F Q and Ac-curacy of PredCoin under various attack techniques.•

False negative rate (FN).

Number of times x t is mis-classiﬁed as x ∗ , which inﬂuences the defense efﬁcacy.We observe each F (cid:48) Q achieves ≤ .

5% FP and FN on ResNetfor CIFAR-10 dataset with an overall accuracy > . .

05 times the original size of D tr for C as training data is sufﬁcient to achieve the above results.However, the computation of the MIA-based approach ( F (cid:48) Q ) grows proportional to m at the training time. To re-duce the computation cost, instead of using shadow modelslike F iQ , which depend upon C ( x ) , we propose to directlybuild a single DNN, F Q which takes a modiﬁed version of F ( x ; θ ) as the input, and outputs the combined scores ( y Q , y Q ) directly. F Q ’s design and computation cost are independent of C . From our experiments, this new F Q achieves performancesimilar to F iQ .By observation, we ﬁnd in most cases the sum of the largestthree F i ( x ; θ ) is close to 1. So we craft F ( x ; θ ) by selecting thelargest three F i ( x ; θ ) and re-arranging them in the descendingorder to replace F ( x ; θ ) as the input for F Q . Thus, we have F Q : R → R , with F Q ( F ( x ; θ )) = ( y Q , y Q ) . We train andtest the proposed F Q with F tr : { F ( x ; θ ) | x ∈ D tr } ∪ F (cid:48) tr : { F ( x ; θ ) | x ∈ D (cid:48) tr } and F te : { F ( x ; θ ) | x ∈ D te } ∪ F (cid:48) te : { F ( x ; θ ) | x ∈ D (cid:48) te } to validate our approach. The structureof F Q , is the same as F iQ (shown in Table 1), except for theinput layer size, which becomes 3.Finally, We test the improved version of F Q on validationdata generated from other attack techniques for CIFAR-10task, as shown in Table. 2. F Q performs similarly for Sign-Opt attack, SFA, and HSJA, while having higher FN for BA γ a cc M ed i an ‘ Fig. 3: Behavior of (cid:52) acc and Median (cid:96) distance with γ .( ≈ F Q provides signiﬁcant robustnessagainst all four attack techniques. Note that, since we use thesame base images to evaluate each QBHL attack, the FP rateunder each attack shall be the same (i.e., 2.16). Moreover, tobetter balance the trade-off between FP and FN rates for vari-ous tasks, we introduce a threshold value γ for y Q . Speciﬁcally,if y Q ≥ γ , x will be predicted as x t . In Sec. 6.2, we furtherinvestigate the trade-off between (cid:52) acc and the robustness of PredCoin by adjusting γ .To summarize, the improved version of F Q described inOption 2 has two advantages: ( i ) it is rather effective in distin-guishing x t and x , and ( ii ) it yields low and model-agnosticcomputation costs both at the training and inference phases. This section evaluates the overall performance of

PredCoin against four existing Hard-label attacks on ﬁve prevalent tasks.Then, we compare the efﬁcacy of

PredCoin against several6 ask Datasets . . . Table 3: Detailed dataset information and model architecture for each task.

Algorithm 1: Binary Search For Searching An Ap-propriate γ input : Target Model C AND Model F Q (cid:52) acc input : Validation Data Set γ low =0; Set γ high =1; while | γ high - γ low | ≥ . do Set γ ← γ high + γ low if (cid:52) acc ≤ then Set γ low ← γ else Set γ high ← γ return γ existing defense or detection mechanisms. We perform allexperiments using a system with an Intel(R) Xeon(R) CPUE5-2650 CPU and an NVIDIA 2080 Ti GPU and report resultsas averages over ten runs. Experimental Datasets.

We evaluate

PredCoin against exist-ing attacks with four prevalent tasks, including Hand-writtenDigit Recognition (

MNIST ), Object classiﬁcation (

CIFAR-10 ),Trafﬁc sign recognition (

GTSRB ), and Object classiﬁcation(

ImageNet ).We describe each task and its corresponding datasets below,with a summary in Table 3. We include the detailed modelarchitectures and corresponding training conﬁgurations foreach task in Tables: 10, 11, and 12 in the Appendix due tospace constraints.• Hand-written Digit Recognition (

MNIST ). Recognizehandwritten gray-scale images of 10 digits (0-9) [16].The dataset contains 60,000 training samples and 10,000testing samples. This task’s test model is a plain convo-lution network consisting of 3 convolution layers and 2fully connected layers.• Object Classiﬁcation (

CIFAR-10 ). Classify various im-ages into 10 different classes (e.g., plane, truck). Thedataset contains 50,000 training images and 10,000 testing images. We implement the state-of-art modelResNet50 and crop every image to 32x32x3 for this task.• Trafﬁc Sign Recognition (

GTSRB ). Recognize 43 Ger-man trafﬁc signs in color images. The dataset contains35,288 training images and 12,600 testing images. Everyimage is cropped to 32x32x3, and an 8-layer convolutionnetwork is used as the test model.• Object Classiﬁcation (

ImageNet ). ImageNet is a large-scale image classiﬁcation task, which contains 1,281,167training inputs across 1,000 classes and 50,000 test in-puts. For the ImageNet task, we implement ResNet50with the input size as 224x224x3 to evaluate our ap-proach.

Attack Conﬁgurations.

We implement QBHL methods, fol-lowing the prior works [3, 9, 12, 13]. Notably, the Sign-OPTattack, whose design is specialized for the (cid:96) setting, is in-effective in the (cid:96) ∞ setting (success rate ≤

21% and (cid:107) ε (cid:107) ∞ ≤ PredCoin against Sign-OPT in the (cid:96) ∞ setting.We evaluate our approach with 1,000 correctly classiﬁedtest inputs for each attack, sampled randomly from the orig-inal dataset and evenly distributed across different classes.We set the query budgets as 30K, and 50K for each attacksince previous works [3, 9, 12, 13] show that 20K queries aresufﬁcient for each attack. Evaluation Metrics.

To evaluate the robustness and practi-cality of

PredCoin against various QBHL, we adopt thesethree metrics:•

Median (cid:96) p Distance.

Consistent with prior work (e.g., C & W attack [8], HSJA [9]), we use the median (cid:96) p normdistance between x ∗ and its corresponding perturbedimage x t as a metric for evaluating robustness. Under thesame conditions, a larger median (cid:96) p value implies higherrobustness.• Accuracy Success Rate (ASR).

Aligned with priorwork [9], we compute the attack success rate (ASR)as:

ASR : = , (11)the attack success rate directly measures a model’s ro-bustness under certain query budgets and perturbationmagnitudes.7 Task Model Queries30K 50KBA Sign-OPT SFA HSJA BA Sign-OPT SFA HSJA (cid:96) MNIST (1.780) (1.341) (2.231) (1.510) (1.603) (1.339) (2.29) (1.510)CIFAR-10 (0.1695) (0.106) (0.142) (0.098) (0.132) (0.102) (0.134) (0.094)GTSRB (0.667) (0.494) (0.743) (0.462) (0.624) (0.374) (0.701) (0.4511)ImageNet (2.716) (1.134) (5.111) (1.019) (2.709) (0.912) (2.295) (0.842) (cid:96) ∞ MNIST (0.37) - (0.251) (0.193) (0.351) - (0.251) (0.186)CIFAR-10 (0.018) - (0.003) (0.006) (0.0166) - (0.003) (0.006)GTSRB (0.080) - (0.018) (0.035) (0.072) - (0.017) (0.031)ImageNet (0.030) - (0.003) (0.007) (0.030) - (0.0027) (0.006)

Table 4: Median (cid:96) p distances under various queries budgets. The bold values are results with our approach incorporated and thevalues inside the brackets are results for the standalone target model. . . . . . . ‘ ∞ Distance . . . . . . . . . . . A S R Our Defense (30K Queries)Our Defense (50K Queries)Basic Model (30K Queries)Basic Model (50K Queries) (a) MNIST(HSJA) ‘ Distance . . . . . . . . . . . A S R Our Defense (30K Queries)Our Defense (50K Queries)Basic Model (30K Queries)Basic Model (50K Queries) (b) MNIST(HSJA) .

00 0 .

02 0 .

04 0 .

06 0 .

08 0 . ‘ ∞ Distance . . . . . . . . . . . A S R Our Defense (30K Queries)Our Defense (50K Queries)Basic Model (30K Queries)Basic Model (50K Queries) (c) CIFAR-10(HSJA) . . . . . . ‘ Distance . . . . . . . . . . . A S R Our Defense (30K Queries)Our Defense (50K Queries)Basic Model (30K Queries)Basic Model (50K Queries) (d) CIFAR-10(HSJA) .

00 0 .

06 0 .

12 0 .

18 0 .

24 0 . ‘ ∞ Distance . . . . . . . . . . . A S R Our Defense (30K Queries)Our Defense (50K Queries)Basic Model (30K Queries)Basic Model (50K Queries) (e) ImageNet(HSJA) ‘ Distance . . . . . . . . . . . A S R Our Defense (30K Queries)Our Defense (50K Queries)Basic Model (30K Queries)Basic Model (50K Queries) (f) ImageNet(HSJA) . . . . . . ‘ ∞ Distance . . . . . . . . . . . A S R Our Defense (30K Queries)Our Defense (50K Queries)Basic Model (30K Queries)Basic Model (50K Queries) (g) MNIST(SFA) ‘ Distance . . . . . . . . . . . A S R Our Defense (30K Queries)Our Defense (50K Queries)Basic Model (30K Queries)Basic Model (50K Queries) (h) MNIST(SFA) .

00 0 .

02 0 .

04 0 .

06 0 .

08 0 . ‘ ∞ Distance . . . . . . . . . . . A S R Our Defense (30K Queries)Our Defense (50K Queries)Basic Model (30K Queries)Basic Model (50K Queries) (i) CIFAR-10(SFA) . . . . . . ‘ Distance . . . . . . . . . . . A S R Our Defense (30K Queries)Our Defense (50K Queries)Basic Model (30K Queries)Basic Model (50K Queries) (j) CIFAR-10(SFA) .

00 0 .

02 0 .

04 0 .

06 0 .

08 0 . ‘ ∞ Distance . . . . . . . . . . . A S R Our Defense (30K Queries)Our Defense (50K Queries)Basic Model (30K Queries)Basic Model (50K Queries) (k) ImageNet(SFA) ‘ Distance . . . . . . . . . . . A S R Our Defense (30K Queries)Our Defense (50K Queries)Basic Model (30K Queries)Basic Model (50K Queries) (l) ImageNet(SFA)

Fig. 4: ASR of SFA and HSJA attacks across various settings with different thresholds for perturbation magnitude, against thetarget models for CIFAR-10 and ImageNet, with and without

PredCoin .• Conventional Accuracy Loss ( (cid:52) acc ). Conventional Ac-curacy Loss is computed as the change in accuracy (onthe validation set) between the target model with andwithout the defensive method. It is used to check whether a defensive method can preserve the target model’s over-all accuracy.•

Inference Time Ratio.

Inference Time Ratio is the ratiobetween the inference time for the target model with and8 a) Basic Image (b) Adversarial imagewithout applying

Pred-Coin

Approach (c) Adversarial imagewith applying

PredCoin

Fig. 5: Illustrating an example original sample in ImageNetand the corresponding crafted adversarial samples by SFAin the (cid:96) ∞ settings with and without applying PredCoin . (cid:96) ∞ between the original image and the generated images is: (b)0.004, (c) 0.2031. Task (cid:52) acc ( % ) MNIST 0.17CIFAR-10 1.16GTSRB 0.36ImageNet 6.31Table 5: Accuracy Loss incurred due to

PredCoin in each taskon the test dataset.without a defensive method. Inference time ratio is usedto evaluate the computational impact of the defensivemethod with various target models. It is also used toevaluate the defensive method’s scalability by comparingthis metric on target models with various computationalcomplexity. γ As discussed in Sec 5.2, we need a threshold γ in F Q toidentify x t . Thus, before evaluating our approach, we ﬁrstinvestigate how γ affects PredCoin ’s robustness and (cid:52) acc . Weevaluate

PredCoin on the CIFAR-10 dataset with varying γ inthe (cid:96) settings using 100 randomly selected inputs. The resultsare shown in Fig. 3; a very small γ makes F Q identify almostevery input x as x t , resulting in low accuracy ( (cid:52) acc ≥ (cid:96) distance ≤ . (cid:52) acc rapidly drops ( ≤ γ increases to a spe-ciﬁc value ( ≈ . γ becomes large enough( ≥ . F Q would increase as F Q will miss more x t . There-fore, we use a binary search to get an appropriate value for γ efﬁciently, as shown in Algorithm 1. This section investigates the overall performance of

PredCoin against four QBHL attacks for four tasks. Table 4 summarizesthe overall median (cid:96) p distances across various models, set-tings, and attack methods under 30K and 50K query budgets.We ﬁnd that PredCoin can increase the median perturbationsigniﬁcantly in both (cid:96) (1.7X to 18X) and (cid:96) ∞ (2.2X to 18X)settings across various models. Note that, for the MNISTtask, PredCoin is less robust (implied by a relatively smallerincrease) compared to other tasks. This might be due to the in-puts for MNIST being much simpler gray-scale images. Thus,DNNs for MNIST are inherently robust against adversarialsamples (implied by larger magnitude perturbations by eachattack), which is consistent with observations from previousworks [9, 13]. Due to space constraints, Fig. 4 only illustratesthe ASR for the two most recent attacks (SFA and HSJA)on MNIST, CIFAR-10, and ImageNet; the remaining ASRresults are included in the Appendix.As seen in Fig. 4, even with an excessive amount of queries( ≥ PredCoin can still provide signiﬁcant robustness,which implies that

PredCoin can prevent the attacker fromcrafting visually-indistinguishable adversarial samples in ablack-box setting. An interesting observation is that, unlikeother attack methods, HSJA is more sensitive to query bud-gets (ASR increases as the number of queries increases); thisis due to HSJA requiring more samples for its gradient es-timation step. Hence a larger query budget (more gradientestimation steps) allows HSJA to approach closer to its opti-mization point. Another interesting observation is that HSJAis the most robust attack method (inferred by its low aver-age increase in the distance from base to

PredCoin ) against

PredCoin compared to other attack methods. Additionally,SFA outperforms HSJA the (cid:96) ∞ settings but severely under-performs in the (cid:96) settings.Fig. 5 shows a visual comparison of the original image andthe generated adversarial samples by SFA attack in the (cid:96) ∞ setting, with and without PredCoin . Due to space restrictions,we include the detailed demonstration for adversarial imagesacross various settings in Fig. 13 in the Appendix. We observethat

PredCoin enhances the visual differences between theoriginal image x ∗ and the crafted adversarial sample x t forvarious attacks under both (cid:96) and (cid:96) ∞ settings, especially inthe (cid:96) ∞ settings. Furthermore, in Table 5 we evaluate (cid:52) acc of PredCoin for each task;

PredCoin has an acceptable impacton the model’s accuracy with at most (cid:52) acc ( ≤ . PredCoin . We compare

PredCoin with existing state-of-art defense mech-anisms designed for white-box attacks. We choose four de-fense mechanisms for comparison: defensive distillation [39],region-based classiﬁcation [6], TRADES [57], BitDepth [54],9nd adversarial training (ADV-TRAINING) [36]. For these de-fense mechanisms, we test against the best performing attacks,HSJA and SFA, which are robust against several defensivemethods [9, 12]. Since most state-of-art defensive methodssupport the CIFAR-10 dataset only, we conduct this studyon CIFAR-10. Additionally, the unscalable nature of state-of-art robust optimization-based approaches (ADV-TRAINING,TRADES) makes them impractical to complicated tasks (e.g.,ImageNet). Moreover, previous work [12] shows that existingdefensive methods that apply to ImageNet (e.g., BitDepth) aremore vulnerable than the defensive methods for CIFAR-10.We implement each defense mechanism following the con-ﬁgurations of previous works [9, 12] and use publicly avail-able models proposed by [1, 49] to evaluate TRADES [57],ADV-TRAINING [36].

Defense Performance.

Fig. 6 compares the performance of

PredCoin and existing defense mechanisms in the (cid:96) ∞ and (cid:96) settings. In the (cid:96) ∞ setting, against SFA, PredCoin achieves sim-ilar

ASR as the best state-of-art white-box defensive methodTRADES while (cid:96) ∞ ≤ .

1; while (cid:96) ∞ ≥ . PredCoin outper-forms TRADES. We also observe that the inference-phasedefensive methods (BitDepth, Region-Based Classiﬁcation)fail against the SFA attack. Only the robust optimization-based approaches (i.e., TRADES, ADV-TRAIN) hinder theSFA attack. As for HSJA, consistent with previous work [12]observations, we ﬁnd that HSJA is much more sensitive towhite-box defensive mechanisms. Hence,

PredCoin performsa little worse than the white-box defensive methods when (cid:96) ∞ is smaller. As (cid:96) ∞ increases, PredCoin starts outperforms otherstate-of-art methods.In the (cid:96) settings, for HSJA attack, PredCoin has a sig-niﬁcant impact on

ASR but performs worse than TRADES,ADV-TRAINING, and Region-Based Classiﬁcation methods.In the case of SFA,

PredCoin outperforms all existing defen-sive methods. Interestingly, we ﬁnd that BitDepth can enhanceSFA in the (cid:96) settings and makes the target model more vul-nerable, implying a potential new vulnerability introduced byBitDepth. Moreover, similar to the (cid:96) ∞ settings, SFA is morerobust to the set of inference-phase defensive methods (i.e.,Region-based Classiﬁcation, BitDepth).While TRADES performs much better in the (cid:96) settingagainst HSJA and is comparable to our approach in othersettings, PredCoin is the ﬁrst successful, scalable defensivemethod. As discussed earlier (also see evaluation on scalabil-ity later in this section), robust optimization-based approaches(like TRADES) suffer from unnecessary and rather expensivetraining costs, making them inapplicable to more complextasks such as ImageNet. Moreover,

PredCoin can be in con-junction with the TRADES, enhancing their effectiveness asshown in Sec. 6.5.

Inference Efﬁcacy.

We also evaluate the inference time ra-tio for each defense mechanism on CIFAR-10 and ImageNetdatasets. We feed each defensive model and its correspondingtarget model a batch of 256 inputs and calculate the ratio

Task Approach Inference Time RatioCIFAR-10

PredCoin (Ours) 1.219

Region-based Classiﬁcation 28.978Defensive Distillation 1.000TRADES 1.000ADV-TRAINING 1.000BitDepth 1.544ImageNet

PredCoin (Ours) 1.085

Region-based Classiﬁcation 446.538BitDepth 1.508

Table 6: Inference Time Ratio of various defense mechanismsfor Image Classiﬁcation task.

Task Approach (cid:52) acc ( % ) CIFAR-10

PredCoin (Ours) 1.16

Region-based Classiﬁcation 0Defensive Distillation 1 . . . . PredCoin (Ours) 6.31

Region-based Classiﬁcation 0.3BitDepth 7.30

Table 7: (cid:52) acc ( % ) for each defensive methods under CIFAR-10 and ImageNetbetween their inference times. Results are shown in Table.6, we can see that PredCoin only suffers a 1.21x and 1.08xslow down for CIFAR-10 and ImageNet datasets, which iscomparable to robust optimization-based approaches (ADV-TRAINING, TRADES) and surpasses other inference-phasedefensive mechanisms (Region-Based Classiﬁcation and Bit-Depth). Such low inference cost property demonstrates thepracticality and suitability of

PredCoin to time-sensitive situ-ations (embedding systems).

Accuracy Loss.

Table 7 evaluates each defensive mecha-nism’s overall accuracy on regular test inputs in CIFAR-10and ImageNet datasets. We calculate (cid:52) acc ( % ) based on themodels incorporated with defensive methods and its corre-sponding target model. We can see that compared to the ro-bust optimization-based approaches (i.e., TRADES, ADV-TRAINING), PredCoin has a negligible effect (1 . (cid:52) acc ( % ) for the CIFAR-10 dataset. As for ImageNet, Pred-Coin has a smaller (cid:52) acc ( % ) than BitDepth but underperformsto region-based classiﬁcation. Scalability.

Unfortunately, most existing defensive methodsmay be impractical due to their low scalability. For instance,the most potent defensive methods, robust optimization-basedapproaches (e.g., TRADES, ADV-TRAINING), typically re-quire the defender to build the model from scratch usingsufﬁcient training data with data augmentation. Such trainingprocedures require at least dozens of hours, even on envi-ronments with multiple GPUs for each model. Hence, ro-bust optimization-based approaches do not scale to complexdatasets (e.g., ImageNet). Moreover, in addition to the compu-10 .

00 0 .

08 0 .

16 0 .

24 0 . ‘ ∞ Norm distance . . . . . . . . . . . A S R Our ApproachRegion-based ClassiﬁcationDefensive DistillationTRADESADV-TRAININGBitDepth (a) CIFAR-10(HSJA) .

00 0 .

08 0 .

16 0 .

24 0 .

32 0 . ‘ ∞ Norm distance . . . . . . . . . . . A S R Our ApproachRegion-based ClassiﬁcationDefensive DistillationTRADESADV-TRAININGBitDepth (b) CIFAR-10(SFA) . . . . . . ‘ Norm distance . . . . . . . . . . . A S R Our ApproachRegion-based ClassificationDefensive DistillationTRADESADV-TRAININGBitDepth (c) CIFAR-10(HSJA) . . . . . . ‘ Norm distance . . . . . . . . . . . A S R Our ApproachRegion-based ClassificationDefensive DistillationTRADESADV-TRAININGBitDepth (d) CIFAR-10(SFA)

Fig. 6: ASR of

PredCoin and state-of-the-art defense meth-ods against QBHL (SFA and HSJA) attacks, across differentsettings and tasks.

Attack Model (cid:96) ∞ Distance (cid:96) DistanceSFA DenseNet121 (0.003) (2.103)ResNet.V2.50 (0.003) (2.145)ResNet.V2.101 (0.003) (2.209)VGG16 (0.003) (1.716)Inception.V3 (0.003) (2.276)Xception (0.003) (2.743)HSJA DenseNet121 (0.006) (0.89)ResNet.V2.152 (0.006) (0.83)ResNet.V2.101 (0.007) (0.83)VGG16 (0.006) (0.82)Inception.V3 (0.062) (1.145)Xception (0.067) (1.103)

Table 8: The Summary Of median (cid:96) p of PredCoin

UnderVarious Publicly available models for SFA and HSJA with thequery budget as 50K. The values out of the bracket representthe median perturbation under the model incorporated withour approach, while the values inside the bracket representthat under the basic model.tational cost, optimization-based approaches cannot directlybe built upon existing models, which signiﬁcantly constrainsits application under many real-world scenarios. For instance,considering a scenario where the defender can only obtain themodel C from a third party, they may not have enough compu-tation resources or adequate training data for robust training.Under such a scenario, a company may replace their wholemodel with new architectures and parameters; it is impractical Model Accuracy Loss(%) Inference Time RatioDenseNet121 6.37 1.034ResNet.V2.50 6.31 1.087ResNet.V2.101 6.33 1.034VGG16 6.27 1.056Xception 6.38 1.037InceptionV3 6.35 1.036

Table 9: Summary Of Accuracy Loss and Inference TimeRatio of

PredCoin on a set of publicly available DNN models. .

00 0 .

04 0 .

08 0 .

12 0 .

16 0 .

20 0 .

24 0 . ‘ ∞ Norm distance . . . . . . . . . . . A S R Our ApproachTRADESTRADES + Our Approach (a) CIFAR-10(HSJA) .

00 0 .

04 0 .

08 0 .

12 0 .

16 0 .

20 0 .

24 0 . ‘ ∞ Norm distance . . . . . . . . . . . A S R Our ApproachTRADESTRADES + Our Approach (b) CIFAR-10(SFA) . . . . . . ‘ Norm distance . . . . . . . . . . . A S R Our ApproachTRADESTRADES + Our Approach (c) CIFAR-10(HSJA) . . . . . . ‘ Norm distance . . . . . . . . . . . A S R Our ApproachTRADESTRADES + Our Approach (d) CIFAR-10(SFA)

Fig. 7: ASR comparison between TRADES, our approach,the combination of TRADES and our approach.to conduct costly robust optimization for each update.In contrast,

PredCoin can be built upon existing publiclyshared models with a training time of under 30 minutes usinga single GPU. To further illustrate our approach’s scalabilityand generalizability, we evaluate

PredCoin on 6 publicly avail-able state-of-art pre-trained models for ImageNet providedby Keras Team [28]. The results, as shown in Tables 8, 9 indicate PredCoin can provide signiﬁcant robustness againststate-of-art QBHL attacks while having a low impact on theiraccuracy and inference times.

Since

PredCoin is an inference phase defensive method, weinvestigate whether our approach could be used in conjunc-tion with methods like TRADES [49], which conduct robustoptimization during the target model’s training phase. Weconduct our experiments on the CIFAR-10 dataset, whose For models with different pre-processing procedures, we intentionallynormalise the perturbation into [0,1].

PredCoin cansigniﬁcantly enhance TRADES under HSJA and SFA in both (cid:96) and (cid:96) ∞ . Moreover, the accuracy loss compared to the adver-sary trained model produced by TRADES is merely 3 . PredCoin performance. Hence,

PredCoin can be usedin conjunction with state-of-art white-box defensive methodsto achieve a signiﬁcantly more robust model.

This section investigates whether

PredCoin is robust to adap-tive attacks. We evaluate the robustness of

PredCoin on theCIFAR-10 dataset under the worst-case scenario where theattacker is fully aware of the details of

PredCoin and theparameters of F Q . Since PredCoin has two main steps: (i)

Detection of x t by F Q (ii) Non-deterministic mechanism, wemodify HSJA and SFA each to produce two types of adaptiveattacks that bypass each step: (i)

Bypassing the detection of F Q (ii) Conducting Uncertainty-Aware Attack. F Q The effectiveness of

PredCoin critically depends on the ac-curacy of F Q . While computing the gradient estimation, anattacker could bypass F Q by producing queries that are adver-sarial to F Q . Given some boundary point x t and { u b } Bb = , toevaluate the robustness of PredCoin against such an adaptiveattack, we formulate an optimization problem to generate agradient estimation queries x t + δ u b as follows:With ( y Q , y Q ) = F Q ( F ( x t + δ u b ; θ ) δ b = min (cid:8) δ (cid:12)(cid:12) y Q < γ and x t + δ u b ∈ [ , ] d (cid:9) (12)recall that y Q ≥ γ is the condition for detection in the proposed F Q . We test 1,000 random boundary points x t computed usingthe CIFAR-10 dataset. Interestingly, since F Q takes the outputof C ( F ( x t ; θ ) ) as input, for the adaptive attack to succeed,the gradient estimation queries should bypass both C and F Q simultaneously, which from our experiments is very difﬁcult.On average, for a boundary point, x t a valid δ b (i.e., a querythat bypasses F Q ) is found for only 6% of the samples ( x t + δ b u b ).The adaptive attack performs much worse than the originalattack in most cases except for SFA ( (cid:96) ), as shown in Fig.8. Such observation could be attributed to two reasons: ﬁrst,there are very few x t + δ b u b (around 6%) which can bypass F Q , limiting the number of distinct queries sent for gradientestimation leading to more ﬂawed estimates; second, from Eq.4, accurate gradient estimates require smaller δ but, a large δ isrequired for bypassing F Q again ensuring inaccurate gradientestimates. .

00 0 .

04 0 .

08 0 .

12 0 .

16 0 .

20 0 .

24 0 . ‘ ∞ Norm distance . . . . . . . . . . . A S R Original AttackAdapt-Aware Attack (a) CIFAR-10(SFA) . . . . . . ‘ Norm distance . . . . . . . . . . . A S R Original AttackAdapt-Aware Attack (b) CIFAR-10(SFA) .

00 0 .

04 0 .

08 0 .

12 0 .

16 0 .

20 0 .

24 0 . ‘ ∞ Norm distance . . . . . . . . . . . A S R Original AttackAdapt-Aware Attack (c) CIFAR-10(HSJA) . . . . . . ‘ Norm distance . . . . . . . . . . . A S R Original AttackAdapt-Aware Attack (d) CIFAR-10(HSJA)

Fig. 8: ASR of

PredCoin under adaptive attacks for Bypassingthe Detection of F Q . .

00 0 .

04 0 .

08 0 .

12 0 .

16 0 .

20 0 .

24 0 . ‘ ∞ Norm distance . . . . . . . . . . . A S R Attack Without DefenseThe Original Attack With DefenseUncertainty-Aware Attack (k=3)Uncertainty-Aware Attack (k=5)Uncertainty-Aware Attack (k=7) (a) CIFAR-10(SFA) . . . . . . ‘ Norm distance . . . . . . . . . . . A S R Attack Without DefenseThe Original Attack With DefenseUncertainty-Aware Attack (k=3)Uncertainty-Aware Attack (k=5)Uncertainty-Aware Attack (k=7) (b) CIFAR-10(SFA) .

00 0 .

04 0 .

08 0 .

12 0 .

16 0 .

20 0 .

24 0 . ‘ ∞ Norm distance . . . . . . . . . . . A S R Attack Without DefenseThe Original Attack With DefenseUncertainty-Aware Attack (k=3)Uncertainty-Aware Attack (k=5)Uncertainty-Aware Attack (k=7) (c) CIFAR-10(HSJA) . . . . . . ‘ Norm distance . . . . . . . . . . . A S R Attack Without DefenseThe Original Attack With DefenseUncertainty-Aware Attack (k=3)Uncertainty-Aware Attack (k=5)Uncertainty-Aware Attack (k=7) (d) CIFAR-10(HSJA)

Fig. 9: ASR of

PredCoin against Uncertainty-Aware Attackswith various values of k . Another option for the attacker is to exploit

PredCoin ’s non-deterministic nature. Speciﬁcally, the attacker can send k ( k ≥

2) identical samples in the gradient estimation step, making F ( x t ; θ ) approach F Q ’s decision boundary, which causes C to misclassify x t . The algorithm is detailed as Algorithm 2 inthe Appendix.12 . . . . . . ‘ Norm distance . . . . . . . . . . . A S R Attack Without DefenseThe Original Attack With DefenseOur CountermeasureUncertainty-Aware Attack (k=3)Uncertainty-Aware Attack (k=5)Uncertainty-Aware Attack (k=7) (a) CIFAR-10(SFA) .

00 0 .

04 0 .

08 0 .

12 0 .

16 0 .

20 0 .

00 0 .

04 0 .

08 0 .

12 0 .

16 0 .

20 0 .

24 0 . ‘ ∞ Norm distance . . . . . . . . . . . A S R Attack Without DefenseThe Original Attack With DefenseOur CountermeasureUncertainty-Aware Attack (k=3)Uncertainty-Aware Attack (k=5)Uncertainty-Aware Attack (k=7) (c) CIFAR-10(HSJA) . . . . . . ‘ Norm distance . . . . . . . . . . . A S R Attack Without DefenseThe Original Attack With DefenseOur CountermeasureUncertainty-Aware Attack (k=3)Uncertainty-Aware Attack (k=5)Uncertainty-Aware Attack (k=7) (d) CIFAR-10(HSJA)

Fig. 10: ASR of our countermeasure and the original

PredCoin against original and Uncertainty-Aware attacks with various k . .

00 0 .

04 0 .

08 0 .

12 0 .

16 0 .

20 0 .

24 0 . ‘ ∞ Norm distance . . . . . . . . . . . A S R OurCountermeasureTRADESTRADES+OurApproachTRADES+OurCountermeasure (a) CIFAR-10(SFA) . . . . . . ‘ Norm distance . . . . . . . . . . . A S R Our CountermeasureTRADESTRADES + Our ApproachTRADES + Our Countermeasure (b) CIFAR-10(SFA) .

00 0 .

04 0 .

08 0 .

12 0 .

16 0 .

20 0 .

24 0 . ‘ ∞ Norm distance . . . . . . . . . . . A S R Our CountermeasureTRADESTRADES + Our ApproachTRADES + Our Countermeasure (c) CIFAR-10(HSJA) . . . . . . ‘ Norm distance . . . . . . . . . . . A S R Our CountermeasureTRADESTRADES + Our ApproachTRADES + Our Countermeasure (d) CIFAR-10(HSJA)

Fig. 11: ASR comparison between TRADES,

PredCoin , coun-termeasure, combination of TRADES and

PredCoin and com-bination of TRADES and countermeasure.We evaluate the uncertainty-aware attack with various val-ues of k , as shown in Fig. 9. We observe that as k increases, PredCoin gradually becomes invalid. This is because eachinference’s uncertainty is associated with ( ) k − under ourapproach; thus, when k ≥ PredCoin can not cause enoughuncertainty to disrupt the gradient estimation step. However,such adaptive attacks have two main issues: ﬁrst, multiple passes of the same input simultaneously can easily be de-tected. Second, the adaptive attack’s adversarial samples havea 50% success rate.We propose a countermeasure to bypass such uncertainty-aware attack. Instead of adding a probabilistic nature to φ x ∗ ( x t ) , we could add an input induced behavior to ensure E [ | β | v ] →

0, securing the validity of Eq. 9. This removes theuncertainty in

PredCoin , thwarting these uncertainty-awareattacks. Speciﬁcally, for each detected x t , we leverage thelast digit f of the sum of Conv ( x t ) , where Conv ( x t ) is theoutput of the ﬁrst convolution layer of the base DNN. If f is an even (odd) number, we change φ x ∗ ( x t ) ; otherwise, notchange it. Intuitively, the high-complexity and non-linearityof DNNs, ensures f would be even or odd with a probabil-ity close to 50%. Furthermore, empirical observations showthat the countermeasure causes similar accuracy loss as theoriginal approach ( ≈ .

7% for CIFAR-10 and 6 .

71% for Ima-geNet).Fig. 10 shows our countermeasure results; we can see thatthe performance is comparable to our approach against HSJA( (cid:96) , (cid:96) ∞ ) and SFA ( (cid:96) ∞ ). Like the adaptive attack in Sec. 7.1,the uncertainty-aware attack performs well with SFA in the (cid:96) setting, indicating that the gradient estimation step in SFA ( (cid:96) )is more effective than other settings. Also, the adaptive attacksfail for all k values against our countermeasure. The downsideto our countermeasure is the additional cost in computingthe sum of Conv ( x t ) . We can easily combine our originalapproach and countermeasure to provide additional robustnessagainst QBHL. Speciﬁcally, we can arbitrarily choose to applythe countermeasure for every set number of inputs.Similar to Sec. 6.5, we investigate whether our counter-measure is also viable to use in conjunction with robustoptimization-based approaches. As seen in Fig. 11, our coun-termeasure does enhance the performance of TRADES sig-niﬁcantly. This paper presents

PredCoin , the ﬁrst practical and gener-alizable framework to provide robustness for DNN modelsagainst hard-label query-based attacks. Inspired by the theo-retical analysis of gradient estimation, the most critical pro-cedure within most hard-label query-based attacks,

PredCoin can defend against hard-label query-based attacks by breakingthe underlying condition for gradient estimation.

PredCoin leverages a second DNN model to distinguish between normaland adversary queries.

PredCoin then introduces uncertaintyinto the prediction of the adversary queries.

PredCoin pro-vides signiﬁcant robustness against state-of-the-art hard-labelquery-based attacks under practical scenarios, with minimalaccuracy loss and computation cost.

PredCoin can enhancestate-of-art white-box defensive methods against query-basedhard-label attacks. Finally,

PredCoin is still robust against aset of adaptive attacks that know its internal mechanisms.13 eferences [1] adv-training model. adv-training model. https://github.com/MadryLab/cifar10_challenge .[2] Mariusz Bojarski, Philip Yeres, Anna Choromanska,Krzysztof Choromanski, Bernhard Firner, LawrenceJackel, and Urs Muller. Explaining how a deep neu-ral network trained with end-to-end learning steers a car. arXiv preprint arXiv:1704.07911 , 2017.[3] Wieland Brendel, Jonas Rauber, and Matthias Bethge.Decision-based adversarial attacks: Reliable attacksagainst black-box machine learning models, 2018.[4] Amar Budhiraja, Kartik Dutta, Raghu Reddy, and Man-ish Shrivastava. Dwen: deep word embedding networkfor duplicate bug report detection in software reposito-ries. In

Proceedings of the 40th International Confer-ence on software engineering: companion proceeedings ,pages 193–194, 2018.[5] Paul H Calamai and Jorge J Moré. Projected gradientmethods for linearly constrained problems.

Mathemati-cal programming , 39(1):93–116, 1987.[6] Xiaoyu Cao and Neil Zhenqiang Gong. Mitigating eva-sion attacks to deep neural networks via region-basedclassiﬁcation. In

Proceedings of the 33rd Annual Com-puter Security Applications Conference , pages 278–287,2017.[7] Nicholas Carlini and David Wagner. Adversarial ex-amples are not easily detected: Bypassing ten detectionmethods. In

Proceedings of the 10th ACM Workshop onArtiﬁcial Intelligence and Security , pages 3–14, 2017.[8] Nicholas Carlini and David Wagner. Towards evaluatingthe robustness of neural networks. In , pages 39–57. IEEE,2017.[9] Jianbo Chen, Michael I Jordan, and Martin J Wainwright.Hopskipjumpattack: A query-efﬁcient decision-basedattack. In , pages 1277–1294. IEEE, 2020.[10] Jinyin Chen, Yangyang Wu, Xuanheng Xu, Yixian Chen,Haibin Zheng, and Qi Xuan. Fast gradient attack onnetwork embedding. arXiv preprint arXiv:1809.02797 ,2018.[11] Steven Chen, Nicholas Carlini, and David Wagner. State-ful detection of black-box adversarial attacks. In

Pro-ceedings of the 1st ACM Workshop on Security andPrivacy on Artiﬁcial Intelligence , pages 30–39, 2020. [12] Weilun Chen, Zhaoxiang Zhang, Xiaolin Hu, andBaoyuan Wu. Boosting decision-based black-box adver-sarial attacks with random sign ﬂip. In

European Con-ference on Computer Vision , pages 276–293. Springer,2020.[13] Minhao Cheng, Simranjit Singh, Patrick H. Chen, Pin-Yu Chen, Sijia Liu, and Cho-Jui Hsieh. Sign-opt: Aquery-efﬁcient hard-label adversarial attack.

CoRR ,abs/1909.10773, 2019.[14] Ronan Collobert and Jason Weston. A uniﬁed archi-tecture for natural language processing: Deep neuralnetworks with multitask learning. In

Proceedings ofthe 25th international conference on Machine learning ,pages 160–167, 2008.[15] Kalyanmoy Deb, Ashish Anand, and Dhiraj Joshi. Acomputationally efﬁcient evolutionary algorithm forreal-parameter optimization.

Evolutionary computation ,10(4):371–395, 2002.[16] Li Deng. The mnist database of handwritten digit imagesfor machine learning research [best of the web].

IEEESignal Processing Magazine , 29(6):141–142, 2012.[17] Yoav Goldberg. A primer on neural network modelsfor natural language processing.

Journal of ArtiﬁcialIntelligence Research , 57:345–420, 2016.[18] Yoav Goldberg. Neural network methods for naturallanguage processing.

Synthesis lectures on human lan-guage technologies , 10(1):1–309, 2017.[19] Zhitao Gong, Wenlu Wang, and Wei-Shinn Ku. Adver-sarial and clean data are not twins, 2017.[20] Ian J Goodfellow, Jonathon Shlens, and ChristianSzegedy. Explaining and harnessing adversarial exam-ples. arXiv preprint arXiv:1412.6572 , 2014.[21] Google. Google Cloud. https://cloud.google.com/ .[22] Kathrin Grosse, Nicolas Papernot, Praveen Manoharan,Michael Backes, and Patrick McDaniel. Adversarialperturbations against deep neural networks for malwareclassiﬁcation, 2016.[23] Junfeng Guo and Cong Liu. Practical poisoning attackson neural networks. In

European Conference on Com-puter Vision , pages 142–158. Springer, 2020.[24] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and JianSun. Deep residual learning for image recognition. In

Proceedings of the IEEE conference on computer visionand pattern recognition , pages 770–778, 2016.1425] Andrew Ilyas, Logan Engstrom, Anish Athalye, andJessy Lin. Black-box adversarial attacks with limitedqueries and information. In

International Conferenceon Machine Learning , pages 2137–2146. PMLR, 2018.[26] Alaa Abdulhady Jaber and Robert Bicker. Fault di-agnosis of industrial robot gears based on discretewavelet transform and artiﬁcial neural network.

Insight-Non-Destructive Testing and Condition Monitoring ,58(4):179–186, 2016.[27] Robin Jia and Percy Liang. Adversarial examplesfor evaluating reading comprehension systems. arXivpreprint arXiv:1707.07328 , 2017.[28] Keras. Keras model. https://keras.io/api/applications/ .[29] Zelun Kong, Junfeng Guo, Ang Li, and Cong Liu. Phys-gan: Generating physical-world-resilient adversarial ex-amples for autonomous driving. In

Proceedings of theIEEE/CVF Conference on Computer Vision and PatternRecognition , pages 14254–14263, 2020.[30] Huiying Li, Shawn Shan, Emily Wenger, Jiayun Zhang,Haitao Zheng, and Ben Y Zhao. Blacklight: Defendingblack-box adversarial attacks on deep neural networks. arXiv preprint arXiv:2006.14042 , 2020.[31] Yandong Li, Lijun Li, Liqiang Wang, Tong Zhang, andBoqing Gong. Nattack: Learning the distributions ofadversarial examples for an improved black-box attackon deep neural networks. In

International Conferenceon Machine Learning , pages 3866–3876. PMLR, 2019.[32] Yi Li, Shaohua Wang, Tien N Nguyen, and SonVan Nguyen. Improving bug detection via context-basedcode representation learning and attention-based neuralnetworks.

Proceedings of the ACM on ProgrammingLanguages , 3(OOPSLA):1–30, 2019.[33] Bingchang Liu, Wei Huo, Chao Zhang, Wenchao Li,Feng Li, Aihua Piao, and Wei Zou. α diff: cross-versionbinary code similarity detection with dnn. In Proceed-ings of the 33rd ACM/IEEE International Conferenceon Automated Software Engineering , pages 667–678,2018.[34] Yanpei Liu, Xinyun Chen, Chang Liu, and Dawn Song.Delving into transferable adversarial examples andblack-box attacks. arXiv preprint arXiv:1611.02770 ,2016.[35] Yingqi Liu, Shiqing Ma, Yousra Aafer, Wen-Chuan Lee,Juan Zhai, Weihang Wang, and Xiangyu Zhang. Trojan-ing attack on neural networks. 2017. [36] Aleksander Madry, Aleksandar Makelov, LudwigSchmidt, Dimitris Tsipras, and Adrian Vladu. Towardsdeep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083 , 2017.[37] Wanli Ouyang, Xiaogang Wang, Xingyu Zeng, Shi Qiu,Ping Luo, Yonglong Tian, Hongsheng Li, Shuo Yang,Zhe Wang, Chen-Change Loy, et al. Deepid-net: De-formable deep convolutional neural networks for objectdetection. In

Proceedings of the IEEE conference oncomputer vision and pattern recognition , pages 2403–2412, 2015.[38] Nicolas Papernot, Patrick McDaniel, Ian Goodfellow,Somesh Jha, Z Berkay Celik, and Ananthram Swami.Practical black-box attacks against machine learning. In

Proceedings of the 2017 ACM on Asia conference oncomputer and communications security , pages 506–519,2017.[39] Nicolas Papernot, Patrick McDaniel, Xi Wu, SomeshJha, and Ananthram Swami. Distillation as a defense toadversarial perturbations against deep neural networks.In ,pages 582–597. IEEE, 2016.[40] Michael Pradel and Koushik Sen. Deepbugs: A learningapproach to name-based bug detection.

Proceedings ofthe ACM on Programming Languages , 2(OOPSLA):1–25, 2018.[41] Ahmad EL Sallab, Mohammed Abdou, Etienne Perot,and Senthil Yogamani. Deep reinforcement learningframework for autonomous driving.

Electronic Imaging ,2017(19):70–76, 2017.[42] Mahmood Sharif, Sruti Bhagavatula, Lujo Bauer, andMichael K Reiter. Accessorize to a crime: Real andstealthy attacks on state-of-the-art face recognition. In

Proceedings of the 2016 acm sigsac conference on com-puter and communications security , pages 1528–1540,2016.[43] Eui Chul Richard Shin, Dawn Song, and Reza Moazzezi.Recognizing functions in binaries with neural networks.In { USENIX } Security Symposium ( { USENIX } Security 15) , pages 611–626, 2015.[44] Reza Shokri, Marco Stronati, Congzheng Song, and Vi-taly Shmatikov. Membership inference attacks againstmachine learning models, 2017.[45] Yi Sun, Xiaogang Wang, and Xiaoou Tang. Deep learn-ing face representation from predicting 10,000 classes.In

Proceedings of the IEEE conference on computervision and pattern recognition , pages 1891–1898, 2014.1546] Tencent. Tencent Image Classiﬁcation API. https://ai.qq.com/hr/youtu.shtml .[47] Yuchi Tian, Kexin Pei, Suman Jana, and Baishakhi Ray.Deeptest: Automated testing of deep-neural-network-driven autonomous cars. In

Proceedings of the 40thinternational conference on software engineering , pages303–314, 2018.[48] Yuchi Tian, Kexin Pei, Suman Jana, and Baishakhi Ray.Deeptest: Automated testing of deep-neural-network-driven autonomous cars. In

Proceedings of the 40thinternational conference on software engineering , pages303–314, 2018.[49] TRADES. TRADES model. https://github.com/yaodongyu/TRADES .[50] Florian Tramèr, Alexey Kurakin, Nicolas Papernot, IanGoodfellow, Dan Boneh, and Patrick McDaniel. Ensem-ble adversarial training: Attacks and defenses. arXivpreprint arXiv:1705.07204 , 2017.[51] Arun T Vemuri and Marios M Polycarpou. Neural-network-based robust fault diagnosis in robotic systems.

IEEE Transactions on neural networks , 8(6):1410–1420,1997.[52] Chaowei Xiao, Bo Li, Jun-Yan Zhu, Warren He,Mingyan Liu, and Dawn Song. Generating adversar-ial examples with adversarial networks. arXiv preprintarXiv:1801.02610 , 2018.[53] Yan Xiao, Jacky Keung, Qing Mi, and Kwabena E Ben-nin. Bug localization with semantic and structural fea-tures using convolutional neural network and cascadeforest. In

Proceedings of the 22nd International Confer-ence on Evaluation and Assessment in Software Engi-neering 2018 , pages 101–111, 2018.[54] Weilin Xu, David Evans, and Yanjun Qi. Feature squeez-ing: Detecting adversarial examples in deep neural net-works. arXiv preprint arXiv:1704.01155 , 2017.[55] Xiaojun Xu, Chang Liu, Qian Feng, Heng Yin, Le Song,and Dawn Song. Neural network-based graph embed-ding for cross-platform binary code similarity detection.In

Proceedings of the 2017 ACM SIGSAC Conferenceon Computer and Communications Security , pages 363–376, 2017.[56] Wenpeng Yin, Katharina Kann, Mo Yu, and HinrichSchütze. Comparative study of cnn and rnn for naturallanguage processing. arXiv preprint arXiv:1702.01923 ,2017.[57] Hongyang Zhang, Yaodong Yu, Jiantao Jiao, Eric Xing,Laurent El Ghaoui, and Michael Jordan. Theoretically principled trade-off between robustness and accuracy. In

International Conference on Machine Learning , pages7472–7482. PMLR, 2019.[58] Husheng Zhou, Wei Li, Zelun Kong, Junfeng Guo,Yuqun Zhang, Bei Yu, Lingming Zhang, and Cong Liu.Deepbillboard: Systematic physical-world testing of au-tonomous driving systems. In ,pages 347–358. IEEE, 2020.[59] Fei Zuo, Xiaopeng Li, Patrick Young, Lannan Luo,Qiang Zeng, and Zhexin Zhang. Neural machine transla-tion inspired binary code similarity comparison beyondfunction pairs. arXiv preprint arXiv:1808.04706 , 2018.16

Proof for gradient estimation in HSJA

Theorem 1.

According to HSJA [9], given an orthogonalbase of R d : v = ∇ S x ∗ ( x t ) (cid:107) ∇ S x ∗ x t (cid:107) , v , v , ..., v d , each u b can be rep-resented as u b = ∑ di = β i v i with | β i | ∈ ( , ) . With sufﬁcient samples for Monte Carlo estimation, the effec-tiveness of the gradient estimation for HSJA can be guaran-teed as lim δ → cos ∠ ( (cid:102) ∇ S x ∗ ( x t ) , ∇ S x ∗ ( x t )) = ROOF . By Taylor’s expansion, we have that S x ∗ ( x t + δ u ) = S x ∗ ( x t ) + δ∇ S x ∗ ( x t ) T u + δ u T ∇ S x ∗ ( x t ) u , where u ∈ ( , ) is a random vector and x t lies between x and x + δ u . Throughintentionally making S x ∗ ( x t ) =

0, we have: S x ∗ ( x t + δ u ) = δ∇ S x ∗ ( x t ) T u + δ u T ∇ S x ∗ ( x t ) u (13)Since S is Lipschitsz continuous, the second-order can bebounded as: | δ u T ∇ S x ∗ ( x t ) u | ≤ L δ (14)Therefore, we have that: φ x ∗ ( x t + δ u b ) = (cid:26) ∇ S x ∗ ( x t ) T u > L δ , − ∇ S x ∗ ( x t ) T u < − L δ (15)Given an orthogonal base of R d : v = ∇ S x ∗ ( x t ) || ∇ S x ∗ ( x t ) || , v , . . . , v d .Thus each u b can be represented as u b = ∑ di = β i v i , with β i ∈ ( , ) . Let q be the probability of event | ∇ S x ∗ ( x t ) T u | ≤ L δ .Therefore, we can bound the difference between (cid:102) ∇ S x ∗ ( x t ) = E [ φ x ∗ ( x t + u b ) u b ] and E [ | β | v ] as: || (cid:102) ∇ S x ∗ ( x t ) − E [ | β | v ] || ≤ q (16)Basically, E [ | β | v ] and ∇ S x ∗ ( x t ) can be easily bridged as: E [ | β | v ] = B ∑ b = | β | ∇ S x ∗ ( x t ) || ∇ S x ∗ ( x t ) || = ∇ S x ∗ ( x t ) || ∇ S x ∗ ( x t ) || E [ | β | ] , (17)Therefore, combining Eq. 16 and Eq. 17, we have: cos ∠ ( (cid:102) ∇ S x ∗ ( x t ) , ∇ S x ∗ ( x t )) ≥ − (cid:18) q E [ | β | ] (cid:19) , (18)By observing that < ∇ S x ∗ ( x t ) || ∇ S x ∗ ( x t ) || , u > is a Beta distribution B ( , d − ) , we can bound q as: q ≤ L δ B ( , d − ) || ∇ S x ∗ ( x t ) || , (19) Plugging Eq. 19 into Eq. 18, we get: cos ∠ ( (cid:102) ∇ S x ∗ ( x t ) , ∇ S x ∗ ( x t )) ≥ − L δ ( d − ) || ∇ S x ∗ ( x t ) || , (20)As a consequence, we established:lim δ → cos ∠ ( (cid:102) ∇ S x ∗ ( x t ) , ∇ S x ∗ ( x t )) = (cid:4) B Target model descriptions

Tables 10-12 show the structure and parameters of target mod-els used in our evaluation for various datasets as discussed inSec. 6. The structures of models for MNIST, GTSRB datasetsare shown in Tables 10 and 11. The training conﬁgurationfor these models is shown in Table 12. The ResNet.V1.50model for CIFAR-10 and ImageNet are implemented usingprior work [24] structures and training conﬁgurations.Layer Type Known ModelConvolution + ReLU 3 × × × × × × × × × × × × × × × × × × × × C Uncertainty-Aware Attack Algorithm

For each QBHL attack , we incorporate Algorithm 2 into thegradient estimation procedure to conduct Uncertainty-AwareAttack.

Algorithm 2: Uncertainty-Aware Attackinput :

Target Model C input : The values of k input : The basic input x ∗ Set x t approaches the boundary of C ( φ x ∗ ( x t ) = Set i =0; Set A=0 Given each x t + δ u while i ≤ k do A=A+ φ x ∗ ( x t + δ u ) return A if Ak = then We set φ x ∗ ( x t + δ u ) = − else We set φ x ∗ ( x t + δ u ) = Norm distance . . . . . . . . . . . A S R Our Defense (30K Queries)Our Defense (50K Queries)Basic Model (30K Queries)Basic Model (50K Queries) (a) MNIST(BA) . . . . . . ∞ Norm distance . . . . . . . . . . . A S R Our Defense (30K Queries)Our Defense (50K Queries)Basic Model (30K Queries)Basic Model (50K Queries) (b) MNIST(BA) . . . . . . . . Norm distance . . . . . . . . . . . A S R Our Defense (30K Queries)Our Defense (50K Queries)Basic Model (30K Queries)Basic Model (50K Queries) (c) CIFAR(BA) .

00 0 .

02 0 .

04 0 .

06 0 .

08 0 .

10 0 .

12 0 . ∞ Norm distance . . . . . . . . . . . A S R Our Defense (30K Queries)Our Defense (50K Queries)Basic Model (30K Queries)Basic Model (50K Queries) (d) CIFAR(BA) Norm distance . . . . . . . . . . . A S R Our Defense (30K Queries)Our Defense (50K Queries)Basic Model (30K Queries)Basic Model (50K Queries) (e) GTSRB(BA) . . . . . . ∞ Norm distance . . . . . . . . . . . A S R Our Defense (30K Queries)Our Defense (50K Queries)Basic Model (30K Queries)Basic Model (50K Queries) (f) GTSRB(BA) Norm distance . . . . . . . . . . . A S R Our Defense (30K Queries)Our Defense (50K Queries)Basic Model (30K Queries)Basic Model (50K Queries) (g) ImageNet(BA) .

00 0 .

02 0 .

04 0 .

06 0 .

08 0 . ∞ Norm distance . . . . . . . . . . . A S R Our Defense (30K Queries)Our Defense (50K Queries)Basic Model (30K Queries)Basic Model (50K Queries) (h) ImageNet(BA) . . . . . . . . Norm distance . . . . . . . . . . . A S R Our Defense (30K Queries)Our Defense (50K Queries)Basic Model (30K Queries)Basic Model (50K Queries) (i) CIFAR-10(Sign-OPT) Norm distance . . . . . . . . . . . A S R Our Defense (30K Queries)Our Defense (50K Queries)Basic Model (30K Queries)Basic Model (50K Queries) (j) ImageNet(Sign-OPT) . . . . . . ∞ Norm distance . . . . . . . . . . . A S R Our Defense (30K Queries)Our Defense (50K Queries)Basic Model (30K Queries)Basic Model (50K Queries) (k) GTSRB(HSJA) Norm distance . . . . . . . . . . . A S R Our Defense (30K Queries)Our Defense (50K Queries)Basic Model (30K Queries)Basic Model (50K Queries) (l) GTSRB(HSJA) Norm distance . . . . . . . . . . . A S R Our Defense (30K Queries)Our Defense (50K Queries)Basic Model (30K Queries)Basic Model (50K Queries) (m) GTSRB(SFA) . . . . . . ∞ Norm distance . . . . . . . . . . . A S R Our Defense (30K Queries)Our Defense (50K Queries)Basic Model (30K Queries)Basic Model (50K Queries) (n) GTSRB(SFA)

Fig. 12: ASR under different thresholds for perturbation under SFA and HSJA, across various tasks and settings.19 a) Basic Image (2.88) (11.68) (2.42) (10.71)(0.028) (0.216) (0.026) (0.216) (b) BA (1.25) (10.87) (1.25) (9.85) (c) Sign-OPT (4.61) (20.02) (4.61) (19.46)(0.004) (0.203) (0.004) (0.203) (d) SFA (1.08) (9.67) (1.04) (9.56)(0.009) (0.208) (0.009) (0.201) (e) HSJA Fig. 13: The visualized demonstration of the crafted adversarial sample with or without our approach. For BA, HSJA and SFAattacks, in the ﬁrst row, from left to right, the ﬁrst two images represent the crafted sample with and without

PredCoin under 30Kquery budget and the last two images represent the crafted sample with and without

PredCoin under 50K query budget in the (cid:96) settings , respectively; the bottom row represents the crafted images in the (cid:96) ∞ settings. As for Sign-opt, in the ﬁrst two, fromleft to right, the images represent the crafted input with or without defense under 30K query budget in the (cid:96)2