[PDF] A PAC-Bayes Analysis of Adversarial Robustness

Abstract

We propose the first general PAC-Bayesian generalization bounds for adversarial robustness, that estimate, at test time, how much a model will be invariant to imperceptible perturbations in the input. Instead of deriving a worst-case analysis of the risk of a hypothesis over all the possible perturbations, we leverage the PAC-Bayesian framework to bound the averaged risk on the perturbations for majority votes (over the whole class of hypotheses). Our theoretically founded analysis has the advantage to provide general bounds (i) independent from the type of perturbations (i.e., the adversarial attacks), (ii) that are tight thanks to the PAC-Bayesian framework, (iii) that can be directly minimized during the learning phase to obtain a robust model on different attacks at test time.

Full PDF

aa r X i v : . [ c s . L G ] F e b A PAC-B

AYES A NALYSIS OF A DVERSARIAL R OBUSTNESS G UILLAUME V IDOT ∗ ,1,2 , P AUL V IALLARD ∗ ,3 , A MAURY H ABRARD , and E MILIE M ORVANT Airbus Op´eration S.A.S University of Toulouse, Institut de Recherche en Informatique de Toulouse, France [email protected] Univ Lyon, UJM-Saint-Etienne, CNRS, Institut d Optique Graduate School,Laboratoire Hubert Curien UMR 5516, F-42023, SAINT-ETIENNE, France [email protected] ∗ The authors contributed equally to this work

Abstract

We propose the ﬁrst general PAC-Bayesian gen-eralization bounds for adversarial robustness,that estimate, at test time, how much a modelwill be invariant to imperceptible perturbations inthe input. Instead of deriving a worst-case analy-sis of the risk of a hypothesis over all the possi-ble perturbations, we leverage the PAC-Bayesianframework to bound the averaged risk on theperturbations for majority votes (over the wholeclass of hypotheses). Our theoretically foundedanalysis has the advantage to provide generalbounds (i) independent from the type of pertur-bations ( i.e. , the adversarial attacks), (ii) that aretight thanks to the PAC-Bayesian framework, (iii) that can be directly minimized during the learn-ing phase to obtain a robust model on differentattacks at test time.

While machine learning algorithms are able to solve a hugevariety of tasks, Szegedy et al. (2014) pointed out a cru-cial weakness : the possibility to generate samples similarto the originals ( i.e. , with no or insigniﬁcant change recog-nizable by the human eyes) but with a different outcomefrom the algorithm. This phenomenon known as adversar-ial robustness contributes to the impossibility to ensure thesafety of machine learning algorithms for safety-critical ap-plications such as aeronautics functions ( e.g. , vision-basednavigation), autonomous driving or medical diagnosis. Ad-versarial robustness is thus a critical issue in machine learn-ing that studies the ability of a model to be robust or in-variant to perturbations of its input; we talk about adver-sarial examples . In other words, an adversarial examplecan be deﬁned as an example that has been modiﬁed byan imperceptible noise (or that does not exceed a thresh-old) but which leads to a misclassiﬁcation. One line ofresearch is referred to as adversarial robustness veriﬁcation (see, e.g. , Gehr et al., 2018; Huang et al., 2017; Singh et al.,2019; Tsuzuku et al., 2018), where the objective is to for-mally checking whether the neighborhood of each sam-ple does not contain any adversarial examples. This kindof method comes with some limitations such as scalabil-ity, overapproximation, etc (Gehr et al., 2018; Katz et al.,2017; Singh et al., 2019). In this paper we stand in an-other setting called adversarial attack/defense (see, e.g. ,Papernot et al., 2016; Goodfellow et al., 2015; Madry et al.,2018; Carlini and Wagner, 2017; Zantedeschi et al., 2017;Kurakin et al., 2017). An adversarial attack consists in ﬁnd-ing perturbed examples that defeat machine learning algo-rithms while the adversarial defense techniques enhancetheir adversarial robustness to make the attacks useless.While a lot of methods exist, adversarial robustness suffersfrom a lack of general theoretical understandings (see Sec-tion 2.2).To tackle this issue, we propose in this paper to for-mulate the adversarial robustness in the lens of a well-founded statistical machine learning theory called PAC-Bayes introduced by Shawe-Taylor and Williamson (1997);McAllester (1998). This theory has the advantage to pro-vide tight generalization bounds in average over the set ofhypothesis considered (leading to bounds for a weightedmajority vote over this set), in contrast to other theoriessuch as VC-dimension or Rademacher-based approachesthat give worst-case analysis, i.e. , for all the hypotheses.We start by deﬁning our setting called adversarial robustPAC-Bayes . The idea consists in considering an averagedadversarial robustness risk which corresponds to the proba-bility that the model misclassiﬁes a perturbed example ( i.e. ,this can be seen as an averaged risk over the perturbations).This measure can be too optimistic and not enough informa-tive, since for each example we sample only one perturba- The reader can refer to Ren et al. (2020) for a survey on adversarialattacks and defenses. Majority vote learning is rather general since a lot of machine learn-ing model can be expressed as a majority vote.

AYES A NALYSIS OF A DVERSARIAL R OBUSTNESS tion. Thus we also deﬁne an averaged-max adversarial risk as the probability that there exists at least one perturbation(taken in a set of sampled perturbations) that leads to a mis-classiﬁcation. These deﬁnitions have the advantages (i) tobe suitable to majority vote classiﬁers, and (ii) to be relatedto the classical adversarial robustness risk. Then, we derivea PAC-Bayesian generalization bound for each of our adver-sarial risks that have the advantage to be independent fromthe kind of attacks considered. From an algorithmic pointof view, these bounds can be directly minimized in order tolearn a majority vote robust in averaged to attacks; in otherwords, the minimization of bounds ensures that attacks willbe ineffective on average. We empirically illustrate this be-havior.The paper is organized as follows. Section 2 recalls somebasics on the classical adversarial robustness setting. Westate our new adversarial robustness PAC-Bayesian settingalong with our theoretical results in Section 3, and we em-pirically show it soundness in Section 4.

We stand in binary classiﬁcation where the input space is X ⊆ R d and the output space is Y = {− , +1 } . We assume D a ﬁxed but unknown distribution on X × Y . An exampleis denoted by ( x, y ) ∈ X × Y . Let S = { ( x i , y i ) } mi =1 be thelearning sample constituted by m examples i.i.d. from D .We denote the distribution of such a m -sample by D m . Let H be a set of real-valued functions from X to [ − , +1] called voters or hypothesis. Usually, given S , a learneraims at ﬁnding the best hypothesis h from H that com-mits less error as possible on unseen data from D . In otherwords, one wants to ﬁnd h ∈ H that minimizes the true risk R D ( h ) on D deﬁned as R D ( h ) = E ( x,y ) ∼ D ℓ ( h, ( x, y )) , (1)where ℓ : H× X × Y → R + is the loss function. In prac-tice since D is unknown we cannot compute R D ( h ) , weusually deal with the empirical risk R S ( h ) estimated on S and deﬁned as R S ( h ) = m P mi =1 ℓ ( h, ( x i , y i )) . From aclassic ideal machine learning standpoint, we are able tolearn a well-performing classiﬁer with strong guaranteeson unseen data, and even to measure how much the modelwill be able to generalize on D ( e.g. , with generalizationbounds).However in real-life applications at classiﬁcation time, animperceptible perturbation of the input (due to malicious at-tacks or noise for instance) can have a bad inﬂuence on theclassiﬁcation performance on unseen data (Szegedy et al.,2014): the usual guarantees do not stand anymore. Suchimperceptible perturbation can be modeled by a (relatively small) noise in the input. The set of possible noise is de-ﬁned by B = { ǫ ∈ X | k ǫ k ≤ b } , where k·k is an arbitrarynorm and b > . The learner objective is then to ﬁnd an adversarial robust classiﬁer that is in average robust to allnoises in B over ( x, y ) ∼ D . More formally, one wants tominimize the adversarial robust true risk R ROB D ( h ) deﬁnedas R ROB D ( h ) = E ( x,y ) ∼ D max ǫ ∈ B ℓ ( h, ( x + ǫ, y )) (2)Similarly as in the classic setting, since D is un-known, R ROB D ( h ) cannot be directly computed, and thenone usually deals with the empirical adversarial risk R ROB S ( h )= m P mi =1 max ǫ ∈ B ℓ ( h, ( x i + ǫ, y i )) . That beingsaid, a learned classiﬁer h should be robust to adversar-ial attacks that aim at ﬁnding an adversarial example x + ǫ ∗ ( x,y ) to fool h for given example ( x, y ) , where ǫ ∗ ( x,y ) is ǫ ∗ ( x,y ) ∈ argmax ǫ ∈ B ℓ ( h, ( x i + ǫ, y i )) . (3)In consequence, adversarial defense mechanisms often relyon the adversarial attacks by replacing the original exam-ples by the adversarial ones during the learning phase; thisprocedure is referred to as adversarial training. Even ifthere exists other defenses, adversarial training appears tobe one of the most efﬁcient defense mechanisms (Ren et al.,2020). Numerous methods ex-ist to solve—or approximate—the optimization problem ofEquation (3) . Among them, we can cite the Fast Gradi-ent Sign Method (FGSM Goodfellow et al., 2015). Thisattack consists in generating a noise ǫ in the direction ofthe gradient of the loss function with respect to the input x .Kurakin et al. (2017) have introduced ( IFGSM ) an iterativeversion of

FGSM : at each iteration, one repeats

FGSM andadds to the input x a noise, that corresponds to the signof the gradient of the loss with respect to x . Followingthe same principle as IFGSM , Madry et al. (2018) have pro-posed a method based on Projected Gradient Descent (

PGD )that includes a random initialization of x before the op-timization. Another technique known as the Carlini andWagner Attack

Carlini and Wagner (2017) considers ﬁnd-ing adversarial example x + ǫ ∗ ( x,y ) the closest as possibleto the original input x , i.e. they want an attack being themost imperceptible as possible. However, producing suchimperceptible perturbations leads to a high-running time inpractice.Contrary to the most popular techniques that look for amodel with low adversarial robust risk of Equation (2), our In the literature, the most used norms are the ℓ -norm, the ℓ -normand the ℓ ∞ -norm. Technical Report 2 PAC-B

AYES A NALYSIS OF A DVERSARIAL R OBUSTNESS work stands in another line of research where the idea isto relax this worst-case risk measure by considering an av-eraged adversarial robust risk over the noise instead of a max -based formulation (see, e.g. , Zantedeschi et al., 2017;Hendrycks and Dietterich, 2019). Our averaged formula-tion is introduced in the next section.

Generalization Bounds.

Recently, few generalizationbounds for adversarial robustness have been introduced.Among them, we can mention Rademacher complexity-based bounds (Khim and Loh, 2018; Yin et al., 2019).Khim and Loh’s result is based on a surrogate of the adver-sarial robust true risk, and Yin et al. have obtained boundsin the speciﬁc case of neural networks and linear classiﬁers.Note that, in the binary setting, the latter one upper-boundsthe Rademacher complexity with an unavoidable polyno-mial dependence on the dimension of the input. Further-more Farnia et al. (2019) present margin-based bounds onthe adversarial robust true risk for speciﬁc neural networksand attacks (such as FGSM or PGD). While they madeuse of a classical PAC-Bayesian bound (McAllester, 2003),their result is not directly a PAC-Bayesian analysis on in-dividual classiﬁers while we provide PAC-Bayesian boundsfor general models expressed as majority votes. It is thusimportant to notice that their bounds are not directly com-parable to ours. Although there exists few theoretical results, the majorityof existing work comes either without theoretical guaran-tee or with very speciﬁc theoretical justiﬁcations. In thefollowing, we aim at giving a different point of view on ad-versarial robustness based on the so-called PAC-Bayesianframework. By leveraging this framework, we derive a gen-eral generalization bound for adversarial robustness basedon an averaged notion of risk that allows us to learn robustmodels at test time. We introduce below our new settingreferred to as adversarial robust PAC-Bayes.

The PAC-Bayesian framework provides practical and theo-retical tools to analyze majority vote classiﬁers. Assumingthe voters’ set H and a learning sample S deﬁned as in Sec-tion 2, our goal is not anymore to learn one classiﬁer in H but to learn a well-performing weighted combination of thevoters involved in H , the weights being modeled by a dis-tribution Q on H . In PAC-Bayes, Q is called the posteriordistribution and is learned from S given H and a prior dis-tribution P on H (deﬁned before the observation of S ). Thelearned weighted combination is called Q -weighted major- Farnia et al. (2019)’s bounds are uniform-convergence bounds, seeNagarajan and Kolter (2019, Appendix J.) for more details. ity vote and is deﬁned by ∀ x ∈ X, H Q ( x ) = sign (cid:20) E h ∼Q h ( x ) (cid:21) . (4)In the rest of the paper, we consider the 0-1 loss functionclassically used for majority votes in PAC-Bayes and de-ﬁned as ℓ ( h, ( x, y ))= I ( h ( x ) = y ) with I ( a )=1 if a is true,and otherwise. In this context, the adversarial perturba-tion related to Equation (3) becomes ǫ ∗ ( x,y ) ∈ argmax ǫ ∈ B I ( H Q ( x + ǫ ) = y ) . (5)Optimizing this problem is intractable due to the non-convexity of H Q induced by the sign function. Therefore,all the attacks based on that deﬁnition give only an approx-imation of the exact solution.Hence, instead of searching for the noise that max-imizes the chance of fooling the algorithm, we pro-pose to model the noise perturbation according to anexample-dependent distribution. First let us deﬁne ω ( x,y ) a distribution on B that is dependent on an example ( x, y ) ∈ X × Y . Then we denote as D the distribution on ( X × Y ) × B deﬁned as D (( x, y ) , ǫ ) = D ( x, y ) · ω ( x,y ) ( ǫ ) which further permits to generate perturbed examples .For estimating the risks on a sample, for each example ( x i , y i ) sampled from D , we consider a set of n pertur-bations sampled from ω ( x i ,y i ) denoted by E i = { ǫ ij } nj =1 .Then we consider as a learning set the m × n -sample S = { (( x i , y i ) , E i ) } mi =1 ∈ ( X × Y ) m × B n . In other words,each (( x i , y i ) , E i ) ∈ S is sampled from a distribution thatwe denote by D n such that D n (( x i , y i ) , E i )= D ( x i , y i ) · n Y j =1 ω ( x i ,y i ) ( ǫ ij ) . Then, inspired by the works of Zantedeschi et al. (2017);Hendrycks and Dietterich (2019), we deﬁne our robustnessaveraged adversarial risk as follows.

Deﬁnition 1 (Averaged Adversarial risk) . For any distribu-tion D on ( X × Y ) × B , for any distribution Q on H , theaveraged adversarial risk of H Q is deﬁned as R D ( H Q ) = Pr (( x,y ) ,ǫ ) ∼ D ( H Q ( x + ǫ ) = y )= E (( x,y ) ,ǫ ) ∼ D I ( H Q ( x + ǫ ) = y ) . (6) The empirical averaged adversarial risk is computed on a m × n -sample S = { (( x i , y i ) , E i ) } mi =1 is R S ( H Q ) = 1 nm m X i =1 n X j =1 I ( H Q ( x i + ǫ ij ) = y i ) . As we will show in Proposition 3, the risk R D ( H Q ) can beseen as an optimistic risk regarding ǫ ∗ ( x,y ) of Equation (5).Technical Report 3 PAC-B AYES A NALYSIS OF A DVERSARIAL R OBUSTNESS

Indeed, instead of taking the ǫ that maximizes the loss, aunique ǫ is drawn from a distribution. Hence, it can leadto a non-informative risk regarding the occurrence of ad-versarial examples. To overcome this drawback, we pro-pose an extension of this risk that we refer as averaged-maxadversarial risk . Deﬁnition 2 (Averaged-Max Adversarial Risk) . For anydistribution D on ( X × Y ) × B , for any distribution Q on H , the averaged-max adversarial risk of H Q is deﬁned as A D n ( H Q ) = Pr (( x,y ) , E ) ∼ D n ( ∃ ǫ ∈ E , H Q ( x + ǫ ) = y ) . The empirical averaged-max adversarial risk is computedon a m × n -sample S = { (( x i , y i ) , E i ) } mi =1 is A S ( H Q ) = 1 m m X i =1 max ǫ ∈ E i I ( H Q ( x i + ǫ ) = y i ) . Concretely, for a particular example ( x, y ) ∼ D , instead ofchecking if one perturbed example x + ǫ is an adversarialone, we sample n perturbed examples x + ǫ , . . . , x + ǫ n andwe check if at least one example is adversarial. Actually,we show in the following that R D ( H Q ) , and A D n ( H Q ) andthe classical adversarial risk R ROB D ( H Q ) are related. Proposition 3 below shows the intrinsic relationships be-tween the classical adversarial risk R ROB D ( H Q ) and our tworelaxations R D ( H Q ) and A D n ( H Q ) . In particular, this re-sult shows that the larger n the number of perturbed exam-ples, the higher is the chance to get an adversarial exampleand then to be close to the adversarial risk R ROB D ( H Q ) . Proposition 3.

For any distribution D on ( X × Y ) × B ,for any distribution Q on H , for any n, n ′ ∈ N , with n ≥ n ′ ≥ , we have R D ( H Q ) ≤ A D n ′ ( H Q ) ≤ A D n ( H Q ) ≤ R ROB D ( H Q ) . Proof.

First, we prove A D ( H Q )= R D ( H Q ) . We have A D ( H Q ) = 1 − Pr (( x,y ) , E ) ∼ D ( ∀ ǫ ∈ E , H Q ( x + ǫ ) = y )= 1 − Pr (( x,y ) , E ) ∼ D ( ∀ ǫ ∈ { ǫ } , H Q ( x + ǫ ) = y )= 1 − Pr (( x,y ) , E ) ∼ D ( H Q ( x + ǫ ) = y ) = R D ( H Q ) . Then, we prove the inequality A D n ′ ( H Q ) ≤ A D n ( H Q ) from the fact that the indicator function I ( · ) is upper- bounded by . Indeed, from Deﬁnition 2 we have − A D n ( H Q ) = E ( x,y ) ∼ D E E ∼ ω n ( x,y ) I ( ∀ ǫ ∈ E , H Q ( x + ǫ )= y )= E ( x,y ) ∼ D " n Y i =1 E ǫ i ∼ ω ( x,y ) I ( H Q ( x + ǫ i )= y ) ≤ E ( x,y ) ∼ D  n ′ Y i =1 E ǫ i ∼ ω ( x,y ) I ( H Q ( x + ǫ i )= y )  = E ( x,y ) ∼ D E E ′ ∼ ω n ′ ( x,y ) I (cid:0) ∀ ǫ ∈ E ′ , H Q ( x + ǫ )= y (cid:1) =1 − A D n ′ ( H Q ) . Lastly, to prove the rightmost inequality, we have to use thefact that the expectation over the set B is bounded by themaximum over the set B . We have A D n ( H Q )= E ( x,y ) ∼ D E ǫ ∼ ω ( x,y ) . . . E ǫ n ∼ ω ( x,y ) I ( ∃ ǫ ∈{ ǫ , . . . , ǫ n } , H Q ( x + ǫ ) = y ) ≤ E ( x,y ) ∼ D max ǫ ∈ B . . . max ǫ n ∈ B I ( ∃ ǫ ∈ { ǫ , . . . ,ǫ n } , H Q ( x + ǫ ) = y )= E ( x,y ) ∼ D max ǫ ∈ B . . . max ǫ n − ∈ B I ( ∃ ǫ ∈ { ǫ , . . . , ǫ ∗ } , H Q ( x + ǫ ) = y )= E ( x,y ) ∼ D I ( H Q ( x + ǫ ∗ ) = y )= E ( x,y ) ∼ D max ǫ ∈ B I ( H Q ( x + ǫ ) = y ) = R ROB D ( H Q ) . Merging the three equations proves the claim.The left-hand side of Proposition 3’s result conﬁrms thatthe averaged adversarial risk R D ( H Q ) is optimistic regard-ing the classical adversarial risk R ROB D ( H Q ) . Proposition 4estimates how close R D ( H Q ) can be to R ROB D ( H Q ) . Proposition 4.

For any distribution D on ( X × Y ) × B ,for any distribution Q on H , we have R ROB D ( H Q ) − TV(Π k ∆) ≤ R D ( H Q ) . where ∆ and Π are distributions on X × Y ; and ∆( x ′ , y ′ ) ,respectively Π( x ′ , y ′ ) , corresponds to the probability ofdrawing a perturbed example ( x + ǫ ) with (( x, y ) , ǫ ) ∼ D ,respectively an adversarial example ( x + ǫ ∗ ( x, y ) , y ) with ( x, y ) ∼ D , we have ∆( x ′ , y ′ ) = Pr (( x,y ) ,ǫ ) ∼ D [ x + ǫ = x ′ , y = y ′ ] , (7) and Π( x ′ , y ′ ) = Pr ( x,y ) ∼ D [ x + ǫ ∗ ( x, y )= x ′ , y = y ′ ] . (8) Moreover,

TV(Π k ∆) = E ( x ′ ,y ′ ) ∼ ∆ (cid:12)(cid:12)(cid:12)(cid:12) Π( x ′ ,y ′ )∆( x ′ ,y ′ ) − (cid:12)(cid:12)(cid:12)(cid:12) , is theTotal Variation (TV) distance between Π and ∆ .Proof. Deferred in Appendix A.Technical Report 4 PAC-B

AYES A NALYSIS OF A DVERSARIAL R OBUSTNESS

From Equations (7) and (8), it is important to notice that R ROB D ( H Q ) and R D ( H Q ) can be rewritten (see Lemma 8and Lemma 9 in Appendix A) respectively with ∆ and Π as R D ( H Q ) = Pr ( x ′ ,y ′ ) ∼ ∆ [ H Q ( x ′ ) = y ′ ] , and R ROB D ( H Q ) = Pr ( x ′ ,y ′ ) ∼ Π [ H Q ( x ′ ) = y ′ ] . Finally, by merging Propositions 3 and 4 we obtain R ROB D ( H Q ) − TV(Π k ∆) ≤ R D ( H Q ) ≤ R ROB D ( H Q ) . Hence, the smaller the TV distance

TV(Π k ∆) , the closerthe averaged adversarial risk R D ( H Q ) is from R ROB D ( H Q ) and the more probable an example (( x, y ) , ǫ ) sampledfrom D would be adversarial.In the next section, we introduce our PAC-Bayesian gener-alization bounds on our two risks R D ( H Q ) and A D n ( H Q ) . First of all, since R D ( H Q ) and A D n ( H Q ) risks are notdifferentiable due to the indicator function, we propose touse a common surrogate in the PAC-Bayesian framework(known as the Gibbs risk): instead of considering the riskof the Q -weighted majority vote, we consider the expecta-tion over Q of the individual risks of the voters involved in H . In our case, we deﬁne the surrogates with the linear lossas R D ( H Q ) = E (( x,y ) ,ǫ ) ∼ D h − y E h ∼Q h ( x + ǫ ) i and A D n ( H Q ) = E (( x, y ) , E ) ∼ D n h − min ǫ ∈ E (cid:16) y E h ∼Q h ( x + ǫ ) (cid:17)i . In the next theorem, we state the relationship between thesesurrogates and our risks, implying that a generalizationbound for R D ( H Q ) and, resp. for A D n ( H Q ) , leads to ageneralization bound for R D ( H Q ) , resp. A D n ( H Q ) . Theorem 5.

For any distribution D on ( X × Y ) × B , forany distribution Q on H , for any n > , we have R D ( H Q ) ≤ R D ( H Q ) , and A D n ( H Q ) ≤ A D n ( H Q ) . Proof.

By the deﬁnition of the majority vote, we have R D ( H Q ) = 12 Pr (( x,y ) ,ǫ ) ∼ D (cid:18) y E h ∼Q h ( x + ǫ ) ≤ (cid:19) = 12 Pr (( x,y ) ,ǫ ) ∼ D (cid:18) − y E h ∼Q h ( x + ǫ ) ≥ (cid:19) ≤ E (( x,y ) ,ǫ ) ∼ D h − y E h ∼Q h ( x + ǫ ) i (Markov’s ineq. on y E h ( x + ǫ ) ) . Similarly we have A D n ( H Q ) = 12 Pr (( x,y ) , E ) ∼ D n (cid:18) ∃ ǫ ∈ E , y E h ∼Q h ( x + ǫ ) ≤ (cid:19) = 12 Pr (( x,y ) , E ) ∼ D n (cid:18) min ǫ ∈ E (cid:16) y E h ∼Q h ( x + ǫ ) (cid:17) ≤ (cid:19) = 12 Pr (( x,y ) ,ǫ ) ∼ D (cid:18) − min ǫ ∈ E (cid:16) y E h ∼Q h ( x + ǫ ) (cid:17) ≥ (cid:19) ≤ E (( x,y ) ,ǫ ) ∼ D h − min ǫ ∈ E (cid:16) y E h ∼Q h ( x + ǫ ) (cid:17)i (Markov’s ineq. on min y E h ( x + ǫ ) ) . Theorem 6 below presents our PAC-Bayesian generaliza-tion bounds for R D ( H Q ) . Before that, it is important tomention that the empirical counterpart of R D ( H Q ) is com-puted on S in which the samples are not identically indepen-dently distributed, meaning that a “classical” proof processis not applicable. The trick here is to make use of a resultof Ralaivola et al. (2010) that provides a chromatic PAC-Bayes bound , i.e. , a bound which supports non-independentdata. Theorem 6.

For any distribution D on ( X × Y ) × B , forany set of voters H , for any prior distribution P on H , withprobability at least − δ over the random choice of S , forall posterior distribution Q on H , we have kl( R D ( H Q ) k R S ( H Q )) ≤ m (cid:20) KL(

QkP )+ ln m +1 δ (cid:21) , (9) and R D ( H Q ) ≤ R S ( H Q )+ s m (cid:20) KL(

QkP )+ ln m +1 δ (cid:21) , (10) where KL(

QkP )= E h ∼P ln P ( h ) Q ( h ) is the KL-divergence be-tween P and Q and kl ( a k b )= a ln ab +(1 − a ) ln − a − b , and R S ( H Q ) = mn P mi =1 P nj =1 12 (cid:2) − y i E h ∼Q h ( x i + ǫ ij ) (cid:3) . Proof.

Let

Γ=(

V, E ) be the graph representing the depen-dencies between the random variables where (i) the set ofvertices is V = S , (ii) the set of edges E is deﬁned suchthat ((( x, y ) , ǫ ) , (( x ′ , y ′ ) , ǫ ′ )) / ∈ E ⇔ x = x ′ . Then, apply-ing Th. 8 of Ralaivola et al. (2010) with our notations gives kl( R D ( H Q ) k R S ( H Q )) ≤ χ (Γ) mn " KL(

QkP )+ln mn + χ (Γ) δχ (Γ) , where χ (Γ) is the fractional chromatic number of Γ . Froma property of Scheinerman and Ullman (2011), we have c (Γ) ≤ χ (Γ) ≤ ∆(Γ)+1 , where c (Γ) is the order of the largest clique in Γ and ∆(Γ) is the maximum degree of a vertex in Γ . By construction ofTechnical Report 5 PAC-B AYES A NALYSIS OF A DVERSARIAL R OBUSTNESS Γ , c (Γ)= n and ∆(Γ)= n − . Thus, χ (Γ)= n and rearrang-ing the terms proves Equation (9). Finally, by applyingPinsker’s inequality ( i.e. , | a − b |≤ q kl( a k b ) ), we obtainEquation (10).Surprisingly, this theorem states results that are classic forthe PAC-Bayes literature, especially it does not depend onthe number n of perturbed examples while involving theusual trade-off between the empirical counterpart R S ( H Q ) and KL(

QkP ) . Note that Equation (9) is under the formof a Seeger’s bound (Seeger, 2002) and is tighter but lessinterpretable than Equation (10) which is under the form ofa McAllester’s bound (McAllester, 1998).We now state our generalization bound A D n ( H Q ) . Sincethis value depends on a minimum term, we cannot use thesame trick as for Theorem 6. The trick to bypass this issueis based on the use of the TV distance between two “artiﬁ-cal” distributions on E i . Given (( x i , y i ) , E i ) ∈ S , let π i bean arbitrary distribution on E i , and given h ∈ H , let ρ hi bea Dirac distribution on E i such that ρ hi ( ǫ )=1 if and only if ǫ = argmax ǫ ∈ E i (cid:2) − y i h ( x i + ǫ ) (cid:3) , i.e. , if ǫ is the perturba-tion that maximizes the linear loss, and otherwise. Theorem 7.

For any distribution D on ( X × Y ) × B , forany set of voters H , for any prior distribution P on H ,for any n , with probability at least − δ over the randomchoice of S , for all posterior distribution Q on H , for all i ∈ { , . . . , m } , for all distribution π i on E i independentfrom a voter h ∈ H , we have A D n ( H Q ) ≤ A S ( H Q ) + 1 m m X i =1 E h ∼Q TV( ρ hi k π i )+ r m h KL(

QkP )+ ln √ mδ i . (11) where A S ( H Q ) = 1 m m X i =1 h − min ǫ ∈ E i (cid:16) y i E h ∼Q h ( x i + ǫ ) (cid:17)i , and TV( ρ k π ) = E ǫ ∼ π (cid:12)(cid:12)(cid:2) ρ ( ǫ ) /π ( ǫ ) (cid:3) − (cid:12)(cid:12) Proof.

Let L h, ( x,y ) ,ǫ = (cid:2) − yh ( x + ǫ ) (cid:3) for the sakeof readability. The losses max ǫ ∈ E L h, ( x ,y ) ,ǫ , . . . max ǫ ∈ E L h, ( x m ,y m ) ,ǫ are i.i.d. for any h ∈ H . Hence,we can apply Theorem 20 of Germain et al. (2015) andPinsker’s inequality ( i.e. , | q − p | ≤ q kl( q k p ) ) to obtain E h ∼Q E ( x,y ) , E ) ∼ D n max ǫ ∈ E L h, ( x,y ) ,ǫ ≤ E h ∼Q m m X i =1 max ǫ ∈ E i L h, ( x i ,y i ) ,ǫ + s KL(

QkP )+ ln √ mδ m . Then, we lower-bound the left hand side of the inequalitywith A D n ( H Q ) , we have A D n ( H Q ) ≤ E h ∼Q E (( x,y ) , E ) ∼ D n max ǫ ∈ E L h, ( x,y ) ,ǫ . Finally, from the deﬁnition of ρ hi , and from Lemma 4 ofOhnishi and Honorio (2020), we have E h ∼Q m m X i =1 max ǫ ∈ E i L h, ( x i ,y i ) ,ǫ = E h ∼Q m m X i =1 E ǫ ∼ ρ hi L h, ( x i ,y i ) ,ǫ ≤ E h ∼Q m m X i =1 TV( ρ hi k π i ) + E h ∼Q m m X i =1 E ǫ ∼ π i L h, ( x i ,y i ) ,ǫ = E h ∼Q m m X i =1 TV( ρ hi k π i ) + 1 m m X i =1 E ǫ ∼ π i E h ∼Q L h, ( x i ,y i ) ,ǫ ≤ E h ∼Q m m X i =1 TV( ρ hi k π i ) + A S ( H Q ) . Unusually, this bound involves an additional term m P mi =1 E h ∼Q TV( ρ hi k π i ) . From an algorithmic point ofview, an interesting behavior is that the bound stands for alldistributions π i on E i . This suggests that given ( x i , y i ) , wewant to ﬁnd π i that minimizes E h ∼Q TV( ρ hi k π i ) . Ideally,this term tends to when (i) the distribution π i is close to ρ hi , (ii) all voters have their loss maximized by the sameperturbation ǫ ∈ E i .From a practical point of view, to learn a well-performingmajority vote, one solution consists in minimizing the right-hand side of the bounds, meaning that we would like to ﬁnda good trade-off between (i) a small empirical risk R S ( H Q ) or A S ( H Q ) and (ii) a small divergence between the priorweights and the learned posterior ones KL(

QkP ) . In this section, we illustrate the soundness of our frame-work in the context of neural networks learning. First ofall, we describe the learning method designed according toour theoretical results and used in our experiments.

Let h w ′ be a neural network parametrized by the weightvector w ′ ∈ R d . We consider that the weights w ′ are sam-pled from the posterior distribution Q = N ( w , λ I d ) which Note that, since ρ hi is a Dirac distribution, we have E h TV( ρ hi k π i )= h − E h π i ( ǫ ∗ h )+ E h P ǫ = ǫ ∗ h π i ( ǫ ) i , with ǫ ∗ h = argmax ǫ ∈ E i (cid:2) − y i h ( x i + ǫ ) (cid:3) . Technical Report 6 PAC-B

AYES A NALYSIS OF A DVERSARIAL R OBUSTNESS is a Gaussian distribution centered at w with covariancematrix λ I d where I d is the identity matrix of dimension d × d . Then, the majority vote is deﬁned by H Q ( x ) = sign (cid:20) E w ′ ∼Q h w ′ ( x ) (cid:21) . Starting from a majority vote deﬁned a priori with theprior distribution P = N ( v , λ I d ) , we learn the majorityvote H Q by optimizing the bounds. This means that weneed to minimize the risk and the KL divergence term KL(

QkP ) . To do so, we consider in our experiments adata-dependent prior; this is a common approach in PAC-Bayes (Parrado-Hern´andez et al., 2012; Lever et al., 2013;Dziugaite and Roy, 2018; Dziugaite et al., 2020). For thispurpose, we use a two-step learning process summarizedin Algorithm 1 and that takes as input two disjoint learningsets S and S ′ . Step (i)

We learn the prior P . At each epoch t ofthe algorithm we learn from S ′ an “intermediate” prior P t = N ( v t , λ I d ) (Lines 2 to 12). At the end of the process(Line 13) we set P as the “intermediate” prior that leads to agood empirical performance on S estimated batch by batch.More precisely, we keep the one that minimizes (with thelinear loss) E S R S ( h v S t ) , where E S is the expectation overthe batches that are sampled uniformly from a partition of S , and v S t are the weights sampled from N ( v t , λ I d ) forthe batch S . At a given epoch t , for each iteration of the op-timizer 1) we sample from P t − = N ( v t − , λ I d ) a weightvector w ′ , and 2) we attack our sampled model to obtain x + ǫ a perturbed example, and 3) we forward in the sam-pled network the perturbed example and update the weightsaccording to the linear loss (Line 10). Step (ii) . Starting from the prior P and the learning set S ,we perform the same process as in Step (i) except that theloss considered corresponds to the desired bound to opti-mize (Line 23, denoted ℓ bnd () ). For the sake of readability,we deferred in Appendix the deﬁnition of ℓ bnd for Equa-tion (9) and Equation (11). We perform our experiment on MNIST.We decompose the learning set into two disjoint subsets S ′ of around , examples (to learn the prior) and S of ex-actly , examples (to learn the posterior). We select the best prior on S . We keep as test set the original setdenoted T , that contains , examples. Note that, weconsider the same architecture as Madry et al. (2018) butin a binary setting. To do so, we select pairs of classesthat share similarities. We report here the results on one The “intermediate” priors does not depend on S , since they arelearned with S ′ . The bounds are then still valid. Results for some other tasks are deferred in Appendix.

Algorithm 1

Average Adversarial Training with Guarantee

Require: S , S ′ : disjoint learning sets T, T ′ : number of epochs η, η ′ : learning rates attack () : the attack function ℓ bnd () : the loss associated to a bound to minimize v : initial weights Step (i) for t from to T do v t ←− v t − for all batches S ′ (from S ′ ) do for all ( x, y ) ∈ S ′ do ( x + ǫ, y ) ←− attack ( x, y ) Replace ( x, y ) by ( x + ǫ, y ) in S ′ end for Sample the weights v ′ t ∼N ( v t , λ I d ) v t ←− v t + η E ( x + ǫ,y ) ∼ S ∇ v t (cid:2) − yh v ′ t ( x + ǫ ) (cid:3) end for end for P ← N ( v best , λ I d ) with v best ← argmin v ∈{ v t } Tt =1 E S R S ( h v S t ) Step (ii) w ← v best for t from to T ′ do for all batches S (from S ) do for all ( x, y ) ∈ S do ( x + ǫ, y ) ← attack ( x, y ) Replace ( x, y ) by ( x + ǫ, y ) in S end for Sample the weights w ′ ∼N ( w , λ I d ) w ← w + η ′ E ( x + ǫ,y ) ∼ S ∇ w ℓ bnd ( h w ′ , ( x + ǫ, y ) , v best ) end for end for return Q ← N ( w , λ I d ) task: vs . We use a neural network consisting of a convo-lutional part ( layers: and ﬁlters with Leaky ReLUand × max-pooling) and a fully connected part ( layerof size , ). We train all the models using Adam op-timizer for epochs with a learning rate at e − and abatch size at . To optimize the bound, we ﬁx the conﬁ-dence parameter δ = . . Defenses/Attacks Setting.

We stand in a white-box set-ting meaning that the attacker knows at least the archi-tecture and the parameters of the model. We empiricallystudy two attacks with ℓ ∞ -norm: the Projected GradientDescent ( PGD , Madry et al. (2018)) and the iterative ver-sion of

FGSM ( IFGSM , Kurakin et al. (2017)). We ﬁx thenumber of iterations at and the step size at . for PGD and

IFGSM . One speciﬁcity of our setting is that weTechnical Report 7 PAC-B

AYES A NALYSIS OF A DVERSARIAL R OBUSTNESS T ABLE

1: The table shows the different test risks and bounds for

MNIST 1vs7 with n =50 perturbations for all the pairs(Defense,Attack). In bold are highlighted the most signiﬁcant results. Note that, the results in italic corresponds to thebaseline on the deterministic network h w : importantly, for the baseline, we did not sampled from the uniform distribution,but we put the results in the table as a reference. b = 0 . b = 0 . baseline Algo.1 Algo.1 baseline Algo.1 Algo.1 without U with Eq. (9) with Eq. (11) without U with Eq. (9) with Eq. (11)Defense Attack R T ( h w ) R T U ( H Q ) Th. 6 A T U ( H Q ) Th. 7 R T ( h w ) R T U ( H Q ) Th. 6 A T U ( H Q ) Th. 7—

PGD U IFGSM U UNIF PGD U UNIF IFGSM U PGD U PGD U PGD U IFGSM U IFGSM U PGD U IFGSM U IFGSM U deal with the perturbation distribution ω ( x,y ) . In conse-quence, we cannot use a single attack PGD or IFGSM asusually done. We propose thus to sample uniformly n perturbations between − . and +0 . to generate n ex-amples from the attacked example: the associated meth-ods are referred as PGD U and IFGSM U . Note that we set n =1 when these attacks are used as defense mechanismin Algorithm 1. Indeed since the adversarial training isiterative, we do not need to sample numerous perturba-tions for each example: we sample a new perturbation eachtime the example is forward through the network. We alsoconsider a naive defense referred as UNIF that only addsa noise uniformly sampled between − b and + b (where b is the maximal allowed norm for the perturbation). Notethat we also report for the sake of completeness the re-sult of the classic baselines PGD and

IFGSM of the litera-ture which consist of running

PGD U and IFGSM U withoutadding of a uniform noise. We run the baseline andour Algorithm 1 with Equations (9) and (11) in differ-ent scenarios of defense/attack. These scenarios corre-spond to all the pairs ( Defense , Attack ) belonging to theset { — , UNIF , PGD U , IFGSM U }×{ PGD U , IFGSM U } , where“—” means that we do not defend, i.e. , the attack re-turns the original example (in the case of the baseline PGD U and IFGSM U are substituted by PGD and

IFGSM ).We report in Table 1, the risks and the bounds valuescomputed with the test set T perturbed depending on thesituation: T (for the baseline) is perturbed with PGD or IFGSM , and T U (for our algorithm) is perturbed with PGD U or IFGSM U taking n =50 perturbations. Analysis of the results.

First of all, note that, from Ta-ble 1 the bounds of Theorem 6 are tighter than the one of The perturbed set T U is generated by sampling a network from P andattacking this network with PGD U or IFGSM U . We provide more details inAppendix. Theorem 7: this is an expected result since we showed thatthe Averaged-Max adversarial risk A D n ( H Q ) is more pes-simistic than its averaged counterpart R D ( H Q ) .Second, we observe that the naive defense UNIF does notimprove the risks R T ( H Q ) and A T ( H Q ) , while PGD U and IFGSM U are able to improve them. In the case of the max-imum perturbation b =0 . , the risks are high (between . and . ) meaning that the models commit an error almost of the time. Indeed, with a ℓ ∞ -based attack, eachgrayscale pixel could be modiﬁed up to and are thusstrongly degraded making the task hard. Due to thesestrongly perturbed instances, the bounds are unsurprisinglynot informative illustrating the difﬁculty of the task.When considering a reduced level of noise by setting b =0 . ,the task becomes more accessible. In this situation, if wefocus on the results in bold in Table 1, we observe that alldefense mechanisms provide better risks than when we useno defense. The bounds here are all informative (lower than ) and give insightful guarantees for our models. An inter-esting fact is that the bounds we obtain when defendingwith PGD U or IFGSM U are tighter than the bounds we getwhen defending in a naive way with UNIF , showing an im-provement from to . This behavior conﬁrms thatwe are able to learn models that are robust against the at-tacks tested with theoretical guarantees. Our work is the ﬁrst one that studies from a general stand-point adversarial robustness through the lens of the PAC-Bayesian framework. We have started by formalizing anew adversarial robustness setting (for binary classiﬁca-tion) specialized for models that can be expressed as aweighted majority vote; we referred to this setting as Ad-Technical Report 8 PAC-B

AYES A NALYSIS OF A DVERSARIAL R OBUSTNESS versarial Robust PAC-Bayes. This formulation allowed usto derive PAC-Bayesian generalization bounds on the ad-versarial risk of general majority votes. We illustrated theusefulness of this setting on the training of neural networks.This work gives rise to many interesting questions and lineof future research. Some perspectives will focus on extend-ing our results to other classiﬁcation settings such as mul-ticlass or multilabel. Another line of research could focuson taking advantages of other tools of the PAC-Bayesianliterature. Among them, we can make use of other boundson the risk of the majority vote that take into considerationthe diversity between the individual voters; for example,the C-bound (Lacasse et al., 2006), or more recently thetandem loss (Masegosa et al., 2020). Last but not least, inreal-life applications, one often wants to combine differentinput sources (from different sensors, cameras, etc). Beingable to combine these sources in an effective way is then akey issue. We believe that our new adversarial robustnesssetting can offer theoretical guarantees and well-foundedalgorithms when the model we learn is expressed as amajority vote, whether for ensemble methods with weakvoters ( e.g.,

Roy et al. (2011); Lorenzen et al. (2019)), orfor fusion of classiﬁers ( e.g.,

Morvant et al. (2014)), orfor multimodal/multiview learning ( e.g.,

Sun et al. (2017);Goyal et al. (2019)).

References

Nicholas Carlini and David A. Wagner. Towards Evaluat-ing the Robustness of Neural Networks. In

IEEE SP ,2017.Gintare Karolina Dziugaite and Daniel M. Roy. Data-dependent PAC-Bayes priors via differential privacy. In

NeurIPS , 2018.Gintare Karolina Dziugaite, Kyle Hsu, Waseem Gharbieh,and Daniel M. Roy. On the role of data in PAC-Bayesbounds.

CoRR , 2020.Farzan Farnia, Jesse M. Zhang, and David Tse. Generaliz-able Adversarial Training via Spectral Normalization. In

ICLR , 2019.Timon Gehr, Matthew Mirman, Dana Drachsler-Cohen,Petar Tsankov, Swarat Chaudhuri, and Martin T. Vechev.AI2: Safety and Robustness Certiﬁcation of Neural Net-works with Abstract Interpretation. In

IEEE SP , 2018.Pascal Germain, Alexandre Lacasse, Franc¸ois Laviolette,Mario Marchand, and Jean-Francis Roy. Risk Boundsfor the Majority Vote: From a PAC-Bayesian Analysisto a Learning Algorithm.

JMLR , 16(26):787–860, 2015.Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and Harnessing Adversarial Examples. In

ICLR , 2015.Anil Goyal, Emilie Morvant, Pascal Germain, and Massih-Reza Amini. Multiview Boosting by Controlling the Di-versity and the Accuracy of View-speciﬁc Voters.

Neu-rocomputing , 2019.Dan Hendrycks and Thomas G. Dietterich. Benchmark-ing Neural Network Robustness to Common Corruptionsand Perturbations. In

ICLR . OpenReview.net, 2019.Xiaowei Huang, Marta Kwiatkowska, Sen Wang, and MinWu. Safety Veriﬁcation of Deep Neural Networks. In

CAV , 2017.Guy Katz, Clark W. Barrett, David L. Dill, Kyle Julian, andMykel J. Kochenderfer. Reluplex: An Efﬁcient SMTSolver for Verifying Deep Neural Networks. In

CAV , vol-ume 10426 of

Lecture Notes in Computer Science , pages97–117. Springer, 2017.Justin Khim and Po-Ling Loh. Adversarial Risk Boundsfor Binary Classiﬁcation via Function Transformation.

CoRR , abs/1810.09519, 2018.Alexey Kurakin, Ian J. Goodfellow, and Samy Bengio. Ad-versarial Machine Learning at Scale. In

ICLR , 2017.Alexandre Lacasse, Franc¸ois Laviolette, Mario Marchand,Pascal Germain, and Nicolas Usunier. PAC-BayesBounds for the Risk of the Majority Vote and the Vari-ance of the Gibbs Classiﬁer. In

NIPS , pages 769–776,2006.Guy Lever, Franc¸ois Laviolette, and John Shawe-Taylor. Tighter PAC-Bayes bounds through distribution-dependent priors.

Theoretical Computer Science , 2013.Stephan S Lorenzen, Christian Igel, and Yevgeny Seldin.On pac-bayesian bounds for random forests.

MachineLearning , 108(8):1503–1522, 2019.Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt,Dimitris Tsipras, and Adrian Vladu. Towards DeepLearning Models Resistant to Adversarial Attacks. In

ICLR , 2018.Andr´es R. Masegosa, Stephan Sloth Lorenzen, ChristianIgel, and Yevgeny Seldin. Second Order PAC-BayesianBounds for the Weighted Majority Vote. In

NeurIPS ,2020.David A. McAllester. Some PAC-Bayesian Theorems. In

COLT , pages 230–234, 1998.David A. McAllester. Simpliﬁed PAC-Bayesian MarginBounds. In

COLT , 2003.Technical Report 9 PAC-B

AYES A NALYSIS OF A DVERSARIAL R OBUSTNESS

Emilie Morvant, Amaury Habrard, and St´ephane Ayache.Majority Vote of Diverse Classiﬁers for Late Fusion. In

S+SSPR , 2014.Vaishnavh Nagarajan and J. Zico Kolter. Uniform conver-gence may be unable to explain generalization in deeplearning. In

NeurIPS , 2019.Yuki Ohnishi and Jean Honorio. Novel Change of MeasureInequalities and PAC-Bayesian Bounds.

CoRR , 2020.Nicolas Papernot, Patrick D. McDaniel, Somesh Jha, MattFredrikson, Z. Berkay Celik, and Ananthram Swami.The Limitations of Deep Learning in Adversarial Set-tings. In

IEEE EuroS&P , 2016.Emilio Parrado-Hern´andez, Amiran Ambroladze,John Shawe-Taylor, and Shiliang Sun. Pac-bayes bounds with data dependent priors.

J.Mach. Learn. Res. , 13:3507–3531, 2012. URL http://dl.acm.org/citation.cfm?id=2503353 .Liva Ralaivola, Marie Szafranski, and Guillaume Stempfel.Chromatic PAC-Bayes Bounds for Non-IID Data: Appli-cations to Ranking and Stationary β -Mixing Processes. JMLR , 2010.Kui Ren, Tianhang Zheng, Zhan Qin, and Xue Liu. Adver-sarial Attacks and Defenses in Deep Learning.

Engineer-ing , 6(3):346 – 360, 2020.Jean-Francis Roy, Franc¸ois Laviolette, and Mario Marc-hand. From PAC-Bayes bounds to quadratic programsfor majority votes. In Lise Getoor and Tobias Scheffer,editors,

ICML , pages 649–656, 2011.Edward R. Scheinerman and Daniel H. Ullman.

FractionalGraph Theory: A Rational Approach to the Theory ofGraphs . 2011.Matthias Seeger. PAC-Bayesian generalisation errorbounds for gaussian process classiﬁcation.

Journal ofmachine learning research , 3(Oct):233–269, 2002.John Shawe-Taylor and Robert C. Williamson. A PACAnalysis of a Bayesian Estimator. In

COLT , 1997.Gagandeep Singh, Timon Gehr, Markus P¨uschel, and Mar-tin T. Vechev. Boosting Robustness Certiﬁcation of Neu-ral Networks. In

ICLR , 2019.Shiliang Sun, John Shawe-Taylor, and Liang Mao. Pac-bayes analysis of multi-view learning.

Information Fu-sion , 35:117–131, 2017.Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, JoanBruna, Dumitru Erhan, Ian J. Goodfellow, and Rob Fer-gus. Intriguing properties of neural networks. In

ICLR ,2014. Yusuke Tsuzuku, Issei Sato, and Masashi Sugiyama.Lipschitz-Margin Training: Scalable Certiﬁcation of Per-turbation Invariance for Deep Neural Networks. In

NeurIPS , 2018.Dong Yin, Kannan Ramchandran, and Peter L. Bartlett.Rademacher Complexity for Adversarially Robust Gen-eralization. In

ICML , 2019.Valentina Zantedeschi, Maria-Irina Nicolae, and AmbrishRawat. Efﬁcient Defenses Against Adversarial Attacks.In

ACM Workshop on Artiﬁcial Intelligence and Security,AISec@CCS , 2017.Technical Report 10

PAC-B

AYES A NALYSIS OF A DVERSARIAL R OBUSTNESS S UPPLEMENTARY M ATERIAL

The supplementary material is structured as follows. In section A we provide a proof of Proposition 4. We discuss, inSection B, the validity of the bound when we select a prior with S and having a distribution on perturbations dependingon this selected prior. Moreover, we detail how we optimize and compute our bounds in Section C. Finally, we presentadditional experiments in Section D.Note we have a typing mistake in Equation (9), kl( R D ( H Q ) k R S ( H Q )) must be kl( R S ( H Q ) k R D ( H Q )) . In consequence,this will be modiﬁed in the paper in case of acceptance. A Proof of Proposition 4

In this section, we provide the proof of Proposition 4 that relies on Lemmas 8 and 9 which are also described and proved.Lemma 8 shows that R D ( H Q ) is equivalent to R ∆ ( H Q ) . Lemma 8.

For any distribution D on ( X × Y ) × B and its associated distribution ∆ , for any posterior Q on H , we have R D ( H Q ) = Pr ( x + ǫ,y ) ∼ ∆ [ H Q ( x + ǫ ) = y ] = R ∆ ( H Q ) . Proof.

Starting from the averaged adversarial risk R D ( H Q ) = E (( x,y ) ,ǫ ) ∼ D I [ H Q ( x + ǫ ) = y ] , we have R D ( H Q ) = E ( x ′ + ǫ ′ ,y ′ ) ∼ ∆ 1∆( x ′ + ǫ ′ ,y ′ ) (cid:20) Pr (( x,y ) ,ǫ ) ∼ D [ H Q ( x + ǫ ) = y, x ′ + ǫ ′ = x + ǫ, y ′ = y ] (cid:21) = E ( x ′ + ǫ ′ ,y ′ ) ∼ ∆ 1∆( x ′ + ǫ ′ ,y ′ ) (cid:20) E (( x,y ) ,ǫ ) ∼ D I [ H Q ( x + ǫ ) = y ] I [ x ′ + ǫ ′ = x + ǫ, y ′ = y ] (cid:21) . In other words, the double expectation only rearranges the terms of the original expectation: given an example ( x ′ + ǫ ′ ,y ′ ) ,we gather probabilities such that H Q ( x + ǫ ) = y with ( x + ǫ,y )=( x ′ + ǫ ′ ,y ′ ) in the inner expectation, while integrating overall couple ( x ′ + ǫ ′ , y ′ ) ∈ X × Y in the outer expectation. Then, from the fact that when x ′ + ǫ ′ = x + ǫ and that y ′ = y , I [ H Q ( x + ǫ ) = y ] = I [ H Q ( x ′ + ǫ ′ ) = y ′ ] , we have R D ( H Q ) = E ( x ′ + ǫ ′ ,y ′ ) ∼ ∆ 1∆( x ′ + ǫ ′ ,y ′ ) (cid:20) E (( x,y ) ,ǫ ) ∼ D I [ H Q ( x ′ + ǫ ′ ) = y ′ ] I [ x ′ + ǫ ′ = x + ǫ, y ′ = y ] (cid:21) = E ( x ′ + ǫ ′ ,y ′ ) ∼ ∆ 1∆( x ′ + ǫ ′ ,y ′ ) (cid:20) I [ H Q ( x ′ + ǫ ′ ) = y ′ ] E (( x,y ) ,ǫ ) ∼ D I [ x ′ + ǫ ′ = x + ǫ, y ′ = y ] (cid:21) . Finally, by deﬁnition of ∆( x ′ + ǫ ′ ,y ′ ) , we can deduce that R D ( H Q ) = E ( x ′ + ǫ ′ ,y ′ ) ∼ ∆ 1∆( x ′ + ǫ ′ ,y ′ ) [ I [ H Q ( x ′ + ǫ ′ ) = y ′ ] ∆( x ′ + ǫ ′ ,y ′ )]= E ( x ′ + ǫ ′ ,y ′ ) ∼ ∆ I [ H Q ( x ′ + ǫ ′ ) = y ′ ] = R ∆ ( H Q ) . Similarly, Lemma 9 shows that R ROB D ( H Q ) is equivalent to R Π ( H Q ) .11 PAC-B AYES A NALYSIS OF A DVERSARIAL R OBUSTNESS S UPPLEMENTARY M ATERIAL

Lemma 9.

For any distribution D on X × Y and its associated distribution Π , for any posterior Q on H , we have R ROB D ( H Q ) = Pr ( x + ǫ,y ) ∼ Π [ H Q ( x + ǫ ) = y ] = R Π ( H Q ) . Proof.

The proof is similar to the one of Lemma 8. Indeed, starting from the deﬁnition of R ROB D ( H Q ) = E ( x,y ) ∼ D I [ H Q ( x + ǫ ∗ ( x,y ) = y ] , we have R ROB D ( H Q ) = E ( x ′ + ǫ ′ ,y ′ ) ∼ Π 1Π( x ′ + ǫ ′ ,y ′ ) (cid:20) E ( x,y ) ∼ D I [ H Q ( x + ǫ ∗ ( x,y )) = y ] I [ x ′ + ǫ ′ = x + ǫ ∗ ( x,y ) , y ′ = y ] (cid:21) = E ( x ′ + ǫ ′ ,y ′ ) ∼ Π 1Π( x ′ + ǫ ′ ,y ′ ) (cid:20) E ( x,y ) ∼ D I [ H Q ( x ′ + ǫ ′ ) = y ′ ] I [ x ′ + ǫ ′ = x + ǫ ∗ ( x,y ) , y ′ = y ] (cid:21) . Finally, by deﬁnition of Π( x ′ + ǫ ′ , y ′ ) , we can deduce that R ROB D ( H Q ) = E ( x ′ + ǫ ′ ,y ′ ) ∼ Π 1Π( x ′ + ǫ ′ ,y ′ ) [ I [ H Q ( x ′ + ǫ ′ ) = y ′ ] Π( x ′ + ǫ ′ ,y ′ )]= E ( x ′ + ǫ ′ ,y ′ ) ∼ Π I [ H Q ( x ′ + ǫ ′ ) = y ′ ] = R Π ( H Q ) . We can now prove Proposition 4.

Proposition 10.

For any distribution D on ( X × Y ) × B , for any distribution Q on H , we have R ROB D ( H Q ) − TV(Π k ∆) ≤ R D ( H Q ) . Proof.

From Lemmas 8 and 9, we have R D ( H Q ) = R ∆ ( H Q ) , and R ROB D ( H Q ) = R Π ( H Q ) . Then, we apply Lemma 4 of Ohnishi and Honorio (2020), we have R Π ( H Q ) ≤ TV(Π k ∆) + R ∆ ( H Q ) ⇐⇒ R ROB D ( H Q ) ≤ TV(Π k ∆) + R D ( H Q ) . Technical Report 12 PAC-B

AYES A NALYSIS OF A DVERSARIAL R OBUSTNESS S UPPLEMENTARY M ATERIAL

B Details on the Validity of the Bounds

In this section, we discuss about the validity of the bound when (i) generating perturbed sets such as T U or S from adistribution D dependent on the prior P (ii) selecting the prior P with S .Actually, computing the bounds implies perturbing examples, i.e. , generating examples from D that is deﬁned as D (( x, y ) , ǫ ) = D ( x, y ) · ω ( x,y ) ( ǫ ) . However, in order to obtain valid bounds, ω ( x,y ) must be deﬁned a priori . Sincethe prior P is deﬁned a priori as well, ω ( x,y ) can be dependent on P . Hence, ω ( x,y ) boils down to generate perturbedexample ( x + ǫ, y ) by attacking the prior network P with PGD U or IFGSM U . Nevertheless, our selection of the prior P with S may seem like “cheating”, but this remains a valid strategy when we perform a union bound.We explain the union bound for Theorem 6, and the same technique can be applied for Theorem 7.Let D , . . . , D T be T distributions deﬁned as D = D ( x, y ) · ω x,y ) ( ǫ ) , . . . , D T = D ( x, y ) · ω T ( x,y ) ( ǫ ) on ( X × Y ) × B where each distribution ω t ( x,y ) depends on the example ( x, y ) and possibly on the ﬁxed prior P t . Furthermore, we denote as ( D nt ) m the distribution on the perturbed learning sample constituted by m examples and n perturbations for each example.Then, for all distributions D t , we can derive a bound on the risk R D t ( H Q ) which holds with probability at least − δT , wehave Pr S t ∼ ( D nt ) m (cid:20) ∀Q , kl( R S t ( H Q ) k R D t ( H Q )) ≤ m (cid:20) KL(

QkP t )+ ln T ( m +1) δ (cid:21)(cid:21) = Pr S ∼ ( D n ) m ,..., S T ∼ ( D nT ) m (cid:20) ∀Q , kl( R S t ( H Q ) k R D t ( H Q )) ≤ m (cid:20) KL(

QkP t )+ ln T ( m +1) δ (cid:21)(cid:21) ≥ − δT . Then, from a union bound argument, we have Pr S ∼ ( D n ) m ,..., S T ∼ ( D nT ) m " ∀Q , kl( R S ( H Q ) k R D ( H Q )) ≤ m (cid:20) KL(

QkP t )+ ln T ( m +1) δ (cid:21) , and . . . , and kl( R S T ( H Q ) k R D T ( H Q )) ≤ m (cid:20) KL(

QkP T )+ ln T ( m +1) δ (cid:21) ≥ − δ. Hence, we can select

P∈{P , . . . , P T } with S , and let D (( x, y ) , ǫ ) = D ( x, y ) · ω ( x,y ) ( ǫ ) be the distributions on ( X × Y ) × B where ω ( x,y ) ( ǫ ) is dependent on P and on the example ( x, y ) , we can say that Pr S ∼ ( D n ) m " ∀Q , kl( R S ( H Q ) k R D ( H Q )) ≤ m (cid:20) KL(

QkP )+ ln T ( m +1) δ (cid:21) ≥ − δ. Additionally, when applying the same process for Equation (11) in Theorem 7, we have Pr S ∼ D " ∀Q , A D n ( H Q ) ≤ A S ( H Q ) + 1 m m X i =1 E h ∼Q TV( ρ hi k π i ) + r m h KL(

QkP )+ ln T √ mδ i . ≥ − δ. C Optimizing and Computing the Bounds

In this section, we explain how we compute and optimize the bounds obtained from Algorithm 1. Additionally, notethat our losses ℓ bnd and our computed bounds are instantiated with the KL divergence between P = N ( v best , λ I d ) and Q = N ( w , λ I d ) , i.e. , we have KL(

QkP ) = λ k w − v best k . Furthermore, remark that the bounds involve the number ofepochs T (see Section B for more details). Optimizing the bound.

The optimization differs when we optimize the bound of Theorem 6 or 7. Equation (9) is notdirectly optimizable since we upper-bound a deviation (the kl ) between the empirical and true risk. Instead, we use thefollowing loss ℓ bnd ℓ bnd ( h w ′ , ( x + ǫ, y ) , v best ) = 1 − exp (cid:16) − C h − yh w ′ ( x + ǫ ) i − m h λ k w − v best k + ln T ( m +1) δ i(cid:17) − exp( − C ) , Technical Report 13 PAC-B

AYES A NALYSIS OF A DVERSARIAL R OBUSTNESS S UPPLEMENTARY M ATERIAL which involves another parameter C that is learned during the optimization (we set C = 0 . at the initialization). Themain advantage of this bound is that, when C is optimal, it will give the same upper-bound as Equation (9) . On thecontrary, the loss ℓ bnd for Theorem 7 is crafted to minimize Equation 11. Indeed, we minimize ℓ bnd ( h w ′ , ( x + ǫ, y ) , v best ) = 12 h − yh w ′ ( x + ǫ ) i + s m (cid:20) λ k w − v best k + ln T √ mδ (cid:21) , Note that we do not optimize the TV distance since this term is zero when we approximate the expectation over the voters h ∼ Q with only one voter, i.e. , we can set π i = ρ hi for each example ( x i , y i ) . Computing the bounds.

Concerning the computation of Equation (9), note that computing R S ( H Q ) is not feasible inpractice because of the expectation E h ∼Q h ( x + ǫ ) . Hence, we approximate this quantity with a Monte Carlo estimation, i.e. , we sample N networks ( i.e. , N weights) h , h , . . . , h N ∼ Q and we compute the approximated quantity R S ( H Q ) ≈ mn m X i =1 n X j =1 " − y i N X k =1 h k ( x i + ǫ ij ) . From this estimation, we compute the worst possible true risk satisfying Equation (9), i.e. , we compute sup R S ( H Q ) ≤ r ≤ (cid:26) r (cid:12)(cid:12)(cid:12)(cid:12) kl( R S ( H Q ) k r ) ≤ m (cid:20) λ k w − v best k + ln T ( m +1) δ (cid:21)(cid:27) . Similarly, for Equation (11), we approximate the risk A S ( H Q ) and the expected value over h of the TV divergence term A S ( H Q ) ≈ m m X i =1 h − min ǫ ∈ E i (cid:16) y i N X k =1 h k ( x i + ǫ ) (cid:17)i , and E h ∼Q TV( ρ hi k π i ) ≈ N N X k =1 TV( ρ h k i k π i ) , where π i ( ǫ ) = N P Nk =1 ρ h k i ( ǫ ) for our approximation. Finally, the bound that is computed for Theorem 7 is A S ( H Q ) + 1 mN m X i =1 N X k =1 TV( ρ h k i k π i ) + s m (cid:20) λ k w − v best k + ln T √ mδ (cid:21) . See Theorem 3 in“Dichotomize and Generalize: PAC-Bayesian Binary Activated Deep Neural Networks” of Letarte et al . Technical Report 14 PAC-B

AYES A NALYSIS OF A DVERSARIAL R OBUSTNESS S UPPLEMENTARY M ATERIAL

D Additional Experimental Results

In this section, we report the results for

MNIST 4vs9 and

MNIST 5vs6 , respectively on Table 2 and Table 3. We train allthe models using Adam optimizer for epochs with a learning rate at e − and a batch size at . Moreover, we ﬁx theconﬁdence parameter δ = . for the bounds. For these tasks, we have a similar behavior than MNIST 1vs7 . Indeed, usingthe attacks

PGD U and IFGSM U as defense mechanism allows to obtain better risks and also tighter bounds. Compared tothe bounds obtained with a defense based on UNIF (which is a naive defense), the bounds obtained with a defense basedon

PGD U or IFGSM U are improved from to for the task MNIST 4vs9 (bold values in Table 2) and from to for the task MNIST 5vs6 (bold values in Table 3). These results make sense since the defenses based on

PGD U and IFGSM U are more elaborated than the one based on UNIF and therefore lead to better guarantees. When the perturbationis set to . , in most cases, the bounds are non-informative (above ) and the risks are around . , ( i.e. , almost half of thetime the model predictions are false on perturbed examples). Moreover, we can notice that the risks of the baseline are alsohigh, suggesting that the task becomes hard to learn when the perturbation is set to . .T ABLE

2: The table shows the different test risks and bounds for

MNIST 4vs9 with n =50 perturbations for all the pairs(Defense, Attack). Tthe results in italic corresponds to the baseline on the deterministic network h w : importantly, for thebaseline, we did not sampled from the uniform distribution, but we put the results in the table as a reference. b = 0 . b = 0 . baseline Algo.1 Algo.1 baseline Algo.1 Algo.1 without U with Eq. (9) with Eq. (11) without U with Eq. (9) with Eq. (11)Defense Attack R T ( h w ) R T U ( H Q ) Th. 6 A T U ( H Q ) Th. 7 R T ( h w ) R T U ( H Q ) Th. 6 A T U ( H Q ) Th. 7—

PGD U IFGSM U UNIF PGD U UNIF IFGSM U PGD U PGD U PGD U IFGSM U IFGSM U PGD U IFGSM U IFGSM U T ABLE

3: The table shows the different test risks and bounds for

MNIST 5vs6 with n =50 perturbations for all the pairs(Defense, Attack). The results in italic corresponds to the baseline on the deterministic network h w : importantly, for thebaseline, we did not sampled from the uniform distribution, but we put the results in the table as a reference. b = 0 . b = 0 . baseline Algo.1 Algo.1 baseline Algo.1 Algo.1 without U with Eq. (9) with Eq. (11) without U with Eq. (9) with Eq. (11)Defense Attack R T ( h w ) R T U ( H Q ) Th. 6 A T U ( H Q ) Th. 7 R T ( h w ) R T U ( H Q ) Th. 6 A T U ( H Q ) Th. 7—

PGD U IFGSM U UNIF PGD U UNIF IFGSM U PGD U PGD U PGD U IFGSM U IFGSM U PGD U IFGSM U IFGSM U0.1957 1.0477 0.1946 0.6347