[PDF] Accelerating Robustness Verification of Deep Neural Networks Guided by Target Labels

Abstract

Deep Neural Networks (DNNs) have become key components of many safety-critical applications such as autonomous driving and medical diagnosis. However, DNNs have been shown suffering from poor robustness because of their susceptibility to adversarial examples such that small perturbations to an input result in misprediction. Addressing to this concern, various approaches have been proposed to formally verify the robustness of DNNs. Most of these approaches reduce the verification problem to optimization problems of searching an adversarial example for a given input so that it is not correctly classified to the original label. However, they are limited in accuracy and scalability. In this paper, we propose a novel approach that can accelerate the robustness verification techniques by guiding the verification with target labels. The key insight of our approach is that the robustness verification problem of DNNs can be solved by verifying sub-problems of DNNs, one per target label. Fixing the target label during verification can drastically reduce the search space and thus improve the efficiency. We also propose an approach by leveraging symbolic interval propagation and linear relaxation techniques to sort the target labels in terms of chances that adversarial examples exist. This often allows us to quickly falsify the robustness of DNNs and the verification for remaining target labels could be avoided. Our approach is orthogonal to, and can be integrated with, many existing verification techniques. For evaluation purposes, we integrate it with three recent promising DNN verification tools, i.e., MipVerify, DeepZ, and Neurify. Experimental results show that our approach can significantly improve these tools by 36X speedup when the perturbation distance is set in a reasonable range.

Full PDF

AAccelerating Robustness Veriﬁcation of DeepNeural Networks Guided by Target Labels

Wenjie Wan , Zhaodi Zhang , Yiwei Zhu , Min Zhang , and Fu Song Shanghai Key Laboratory of Trustworthy Computing,East China Normal University, Shanghai, China { } @stu.ecnu.edu.cn,[email protected] ShanghaiTech University, Shanghai, China [email protected]

Abstract.

Deep Neural Networks (DNNs) have become key componentsof many safety-critical applications such as autonomous driving and med-ical diagnosis. However, DNNs have been shown suﬀering from poor ro-bustness because of their susceptibility to adversarial examples such thatsmall perturbations to an input result in misprediction. Addressing tothis concern, various approaches have been proposed to formally verifythe robustness of DNNs. Most of these approaches reduce the veriﬁcationproblem to optimization problems of searching an adversarial examplefor a given input so that it is not correctly classiﬁed to the original label.However, they are limited in accuracy and scalability. In this paper, wepropose a novel approach that can accelerate the robustness veriﬁcationtechniques by guiding the veriﬁcation with target labels. The key insightof our approach is that the robustness veriﬁcation problem of DNNs canbe solved by verifying sub-problems of DNNs, one per target label. Fix-ing the target label during veriﬁcation can drastically reduce the searchspace and thus improve the eﬃciency. We also propose an approach byleveraging symbolic interval propagation and linear relaxation techniquesto sort the target labels in terms of chances that adversarial examples ex-ist. This often allows us to quickly falsify the robustness of DNNs and theveriﬁcation for remaining target labels could be avoided. Our approachis orthogonal to, and can be integrated with, many existing veriﬁcationtechniques. For evaluation purposes, we integrate it with three recentpromising DNN veriﬁcation tools, i . e ., MipVerify , DeepZ , and

Neurify .Experimental results show that our approach can signiﬁcantly improvethese tools by 36X speedup when the perturbation distance is set in areasonable range.

Deep Neural Networks (DNNs) have achieved remarkable performance and ac-complished unprecedented breakthrough in many complex tasks such as imageclassiﬁcation [27,18] and speech recognition [20]. The progress makes it possi-ble to apply DNNs to real-world safety-critical domains, e . g ., autonomous driv-ing [21,1,54] and medical diagnostics [10,41,37]. Systems in such domains must a r X i v : . [ c s . L G ] J u l Wan et al . be highly dependable and hereby their safety should be comprehensively certi-ﬁed before deployments. One of the most challenging problems in this domainis that DNNs have been shown suﬀering from poor robustness. That is, a smallmodiﬁcation to a valid input may cause systems to make completely wrong de-cisions [47,17,35,30,7,11], which consequently result in serious consequences andeven disasters. For instance, a Tesla car in autopilot mode caused a fatal crashas it failed to detect a white truck against a bright sky with white clouds [45].Therefore, it is important and necessary to certify the robustness of DNN-basedsystems before deployments by proving that the neural networks can alwaysmake the same prediction for a valid input even if the input is slightly perturbedwithin an allowed range due to uncertainties from the environment or adversarialattacks.Many eﬀorts have been made to certify the robustness of DNNs using formalveriﬁcation techniques [6,26,38,19,12,55,58,16,42,50,13,43,44,52]. The essence ofcertifying the robustness is to prove mathematically the absence of adversarialexamples for a DNN within a range of allowable perturbations, which are usu-ally provided by a valid input and a L - norm distance threshold. There are threemain criteria of evaluating veriﬁcation approaches: soundness , completeness and scalability . The ﬁrst states that if a DNN passes the veriﬁcation, then thereare no adversarial examples. The second states that every robust DNN shouldpass the veriﬁcation. The last one indicates the scale of DNNs that a veriﬁcationmethod can handle. It is known that the veriﬁcation problem of DNNs with Rec-tiﬁed Linear Unit (ReLU) activation function is NP-complete [26]. This meansthat sound and complete veriﬁcation approaches usually have limited scalabil-ity. Existing formal veriﬁcation approaches either have limited scalability andcan only handle small networks [26,32,8,15], or rely on abstraction techniquesthat simplify the veriﬁcation problem for better scalability, but they may pro-duce false positives [16,42,43] after loosing the completeness property due to theintroduction of abstraction. Our contribution . In this work, we propose a generic approach that can en-hance neural network veriﬁcation techniques by guiding the veriﬁcation withtarget labels —- thus making it more amenable to veriﬁcation. Our approachis based on the following key insights. Many existing approaches reduce theveriﬁcation problem to some optimization problems of searching an adversarialexample for a given input so that it is not correctly classiﬁed to the original la-bel. We found that by ﬁxing a target label during veriﬁcation, the search spacecould be drastically reduced so that the veriﬁcation problem with respect to thetarget label can be eﬃciently solved, while the overall veriﬁcation problem canbe solved by verifying the DNN for all the possible target labels. Speciﬁcally,guided by the target label, we can eﬃciently compute an adversarial example ifthere exists one for the given input and L - norm distance threshold (cid:15) . In this case,the robustness of the DNN is falsiﬁed and the veriﬁcation for other target labelscan be avoided. Furthermore, rather than choosing target labels randomly, wepropose an algorithmically eﬃcient approach to sort the target labels by lever-aging the symbolic interval propagation and linear relaxation techniques, so that ccelerating Robustness Veriﬁcation of Deep Neural Networks 3 the target labels to which some inputs are misclassiﬁed by the DNN with largerprobabilities are processed ﬁrst. This often allows us to quickly disprove therobustness of the DNN when the target DNN is not robust.Our approach is orthogonal to, and can be integrated with, many existingveriﬁcation techniques which are leveraged to verify the robustness of DNNs fortarget labels. To evaluate the eﬀectiveness and eﬃciency of our approach, weintegrate it with three recent promising neural network veriﬁcation tools, i . e . MipVerify [50],

DeepZ [42], and

Neurify [52]. We compare both the veriﬁcationresult and the time cost for veriﬁcation of the original tools and the tools in-tegrated with our approach. Experimental results show that our approach canhelp the three tools achieve up to 36X acceleration in time eﬃciency under rea-sonable perturbation thresholds. Furthermore, the properties i . e . soundness and completeness (if satisﬁed) of the original tools are still preserved.In summary, this paper makes the following three main contributions: – A novel, generic approach for accelerating the robustness veriﬁcation of neu-ral networks guided by target labels. – An approach for sorting target labels by leveraging the symbolic intervalpropagation and linear relaxation techniques. – Extensions of three recent promising neural network veriﬁcation tools withthe proposed approach.

Outline.

Section 2 brieﬂy introduces some preliminaries used in this work. Sec-tion 3 presents our veriﬁcation approach. Section 4 reports experimental results.Section 5 discusses related work. Section 6 ﬁnally concludes the paper and dis-cusses some future work.

In this section, we recap some preliminaries such as feed-forward deep neuralnetworks, interval analysis, symbolic interval propagation and linear relaxationthat are necessary to understand our approach.

In this work, we consider feed-forward deep neural networks (FNNs). An l -layerFNN can be considered as a function f : I → O , mapping the set of vectors I to the set of vectors O . Function f is recursively deﬁned as follows: x = x,x k +1 = φ ( W k x k + b k ) for k = 0 , ..., l − ,f ( x ) = W l x l + b l . (1)where x = x ∈ I is the input, W k and b k respectively are the weight matrixand bias vector of the k -th layer, and φ ( · ) (e.g., ReLU , sigmoid , tanh etc .) is an Wan et al . activation function applied coordinate-wise to the input vector. ReLU , deﬁnedby

Relu ( x ) ≡ max(0 , x ), is one of the most popular activation functions used inthe modern state-of-the-art DNN architectures [18,22,46]. In this paper we arefocused on FNNs that only take ReLU as the activation function. For a giveninput x , the label of x is determined by the function L , deﬁned as, L ( f ( x )) = arg max j f ( x )[ j ] , where, f ( x )[ j ] denotes the j -th element in the output vector f ( x ) which is theconﬁdence that x is classiﬁed to the label j . In the case that the last step is notwell deﬁned, namely, there are more than one maximum elements in f ( x ), we callthat x admits an adversarial example. Hereafter, we assume that the last stepof an FNN is well deﬁned, otherwise it is not robust. By applying the softmax function to the output f ( x ), we will get the probabilities of the labels to whichthe input x is classiﬁed. For this reason, we in what follows may say f ( x )[ j ] isthe probability that the input x is classiﬁed to the label j . For simplicity, we alsouse the indices j to represent the classiﬁcation labels. L ( f ( x )) returns the labelwhose corresponding probability is the largest among all the labels. We call itoriginal label of the input x . Deﬁnition 1 (Robustness of FNNs).

Given an FNN f : I → O , an input x ∈ I , and an L norm distance threshold (cid:15) , f is robust w.r.t. x and (cid:15) if L ( f ( x )) = L ( f ( x (cid:48) )) for all x (cid:48) ∈ I such that L norm ( x, x (cid:48) ) ≤ (cid:15) .If there exists some x (cid:48) ∈ I such that L ( f ( x )) (cid:54) = L ( f ( x (cid:48) )) , x (cid:48) is called an adversarial example of x .Given a target label j such that j (cid:54) = L ( f ( x )) , the FNN f is called j-robust w.r.t. x and (cid:15) , if f ( x (cid:48) )[ j ] < f ( x (cid:48) )[ j (cid:48) ] for all x (cid:48) ∈ I such that L norm ( x, x (cid:48) ) ≤ (cid:15) ,where j (cid:48) denotes the original label L ( f ( x )) of x . The next proposition states that the robustness problem of a DNN can bereduced to the j -robustness problem of the DNN (which up to our knowledgehas never been stated in the literature though straightforward): Proposition 1.

Given an FNN f : I → O , an input x ∈ I , and an L norm distance threshold (cid:15) , suppose J is the set of all the possible labels of f , then: f is robust w.r.t. x and (cid:15) iﬀ f is j-robust w.r.t. x and (cid:15) , for all j ∈ J \ {L ( f ( x )) } . In this work, we only consider L ∞ norm, that is: for each pair of vectors x, x (cid:48) with the same size, L ∞ ( x, x (cid:48) ) ≡ max {| x [ j ] − x (cid:48) [ j ] | : j is an index of the vector x } . ccelerating Robustness Veriﬁcation of Deep Neural Networks 5 Interval analysis is a technique which works on intervals rather than concretevalues, where an interval represents a set of consecutive, concrete values. Weprovide some basic terms, concepts, and operations of intervals below.An interval X is a pair [ X, X ], where X is the lower bound, and X suchthat X ≥ X is the upper bound. The interval [ X, X ] denotes the set of concretevalues { i ∈ N | X ≤ i ≤ X } .The basic arithmetic operations between intervals are deﬁned in [34]. In thispaper, we only present the deﬁnitions of the addition, diﬀerence and scalar mul-tiplication operations which are suﬃcient for this work. The key point of thesedeﬁnitions is that computing with intervals is computing with sets. By deﬁnition,the addition (+) of two intervals X and Y is the set X + Y = { x + y : x ∈ X, y ∈ Y } = [ X + Y , X + Y ] . For example, let X = [0 ,

2] and Y = [ − , X + Y = [0 + ( − , − , − ) of two intervals X and Y is the set denoted by X − Y ,which is deﬁned as follows: X − Y = { x − y : x ∈ X, y ∈ Y } = [ X − Y , X − Y ] . For instance, let X = [ − ,

0] and Y = [1 , − Y = [-2,-1] and X − Y = X + ( − Y ) = [ − , − · ) between an interval X and a constant c is the set, denoted by c · X or cX , which is deﬁned as follows: c · X = { c × x : x ∈ X } = [ c × X, c × X ] . For instance, let X = [-1,3] and c = 2. Then, we have c · X = [ − , To sort target labels for a DNN f : I → O , an input x ∈ I and a distancethreshold (cid:15) , we will propagate the interval from the input layer to the output layervia interval propagation. However, naively computing the output interval of theDNN in this way suﬀers from high errors as it computes extremely loose boundsdue to the dependency problem. In particular, it may get a very conservativeestimation of the output, which is not tight enough to be useful for sorting labels.Consider a 3-layer DNN given in Figure 1(a), where the weights are associ-ated to the edges and all elements of the bias vectors are 0. Suppose the input ofthe ﬁrst layer are the intervals [1 ,

3] and [2 , − , − n outputs 10 and the neuron n outputs 5. To output 10 forthe neuron n , the neurons n and n should output 3 and 4 simultaneously.But, to output 5 for the neuron n , the neurons n and n should output 1and 2 simultaneously. This eﬀect is known as the dependency problem [34]. Wan et al . [1,3] [2,4]1 12 2-1 1 [5,11][4,10] [-5,7] n n n n n (a) Naive interval propagation [1,3] [2,4] [5,11][4,10] [-1,3]-x+yx y n n n n n n (b) Symbolic interval propagation

Fig. 1.

Naive interval propagation vs. symbolic interval propagation.

Symbolic interval propagation [53] is a technique to minimize overestimationof outputs by preserving as much dependency information as possible duringpropagating the intervals layer-by-layer. A symbolic interval is a pair of linearexpressions [ e, e (cid:48) ] such that e and e (cid:48) are deﬁned over the input variables.Let us consider the same example using symbolic interval propagation asshown in Figure 1(b). Suppose x and y are the input variables of the neurons n and n . By applying the linear transformation of the ﬁrst layer, the valuesof the neurons n and n are 2 x + y and x +2 y respectively. Since x ∈ [1 ,

3] and y ∈ [2 , x + y > x + 2 y >

0. Therefore, the output symbolicintervals of the neurons n and n are [2 x + y, x + y ] and [ x + 2 y, x + 2 y ]respectively. By applying the linear transformation of the second layer, the valueof the neuron n is − x + y . Thus, the output of the DNN will be [ − x + y, − x + y ].From x ∈ [1 ,

3] and y ∈ [2 , − , − ,

7] produced by directlyperforming interval propagation.This example shows that symbolic interval propagation characterizes howeach neuron computes results in terms of the symbolic intervals and related ac-tivation functions. As the symbolic intervals keep the inter-dependence betweenvariables, symbolic interval propagation signiﬁcantly reduces the overestimation.

To tackle the non-linear activation function

ReLU , we use linear relaxation [58] tostrictly overapproximate the symbolic intervals. Consider an intermediate nodewith n = ReLU ( X ). For each symbolic interval X = ( l, u ), based on the signs of l and u (determined by concretizing the symbolic intervals), we consider threecases as shown in Table 1. – If ( l >

ReLU ( x ) = x for every x ≥ l . Thus, ReLU ( X ) is X . – If ( u ≤ ReLU ( x ) = 0 for every x ≤ u . Thus, ReLU ( X ) is [0 , ccelerating Robustness Veriﬁcation of Deep Neural Networks 7 Table 1.

Interval propagation for

ReLU . Activation Function ReLU

Condition Upper Lower0 < l < u x xl < < u a ( x − l ) ax w.r.t. ( uu − l ≤ a ≤

1) w.r.t. (0 ≤ a ≤ l < u < – If ( l ≤ ≤ u ), then X contains both positive and negative values. The outputcannot be exactly represented by one linear interval and thus relaxation isrequired. We adopt the linear relaxation deﬁned in [58]. As shown in Figure 2,we can set upper bound as a ( x − l ) with respect to uu − l ≤ a ≤ ax with respect to 0 ≤ a ≤

1. We select the value of a to minimizethe overestimation error introduced by the linear relaxation. y l u Activation functionLinear UBLinear LB

Fig. 2.

The linear upper and lower bounds for

ReLU when l < < u . In this section, we ﬁrst show how to sort target labels by leveraging the symbolicinterval propagation and linear relaxation techniques and then present our targetlabel guided veriﬁcation approach.

To the best of our knowledge, most existing approaches to the robust veriﬁcationproblem for a given FNN and an input reduce to some other problems, whichchecks whether adversarial examples in the range that are not correctly classiﬁedto the original label of the input exist or not. Therefore, the search space isrelatively larger. Instead of considering all the other labels in one veriﬁcation,we propose to sort labels and verify FNNs for each label in order to reduce thesearch space.A premise order of labels is that the larger that the probability of a label has,the larger that the probability of ﬁnding an adversarial example for this label,thus, the higher priority that the label should be veriﬁed. A na¨ıve approach for

Wan et al . x x x x x x (cid:96) (cid:96) (cid:96) (cid:96) [0,4][1,5][4,10] x = x − x + 3 x Interval: [2,32] x ub = x lb = x x = − x + x − x Interval: [-23,-3] x ub = x lb = 0 x = x − x + x Interval: [-1,13] x ub = k · x + b, x lb = k · x where k = , b = (cid:96) lb = − x + x − x − (cid:96) ub = − x + x − x Interval: [ − , − ] (cid:96) lb = − x − x + x − (cid:96) ub = − x − x + x Interval: [ − , ] (cid:96) lb = − x + x − x − (cid:96) ub = − x + x − x Interval: [ − , ] (cid:96) lb = x − x + x (cid:96) ub = x − x + x + Interval: [ , Fig. 3.

An example of sorting target labels based on output interval label sorting is to calculate the probabilities of all the labels of a target FNNfor the given input x , as the representative of all the possible inputs. Thoughfeasible, the sorting result also relies on the distance threshold (cid:15) , this intuitiveapproach may lead to imprecise sorting result, which consequently misleads thefollowing-up veriﬁcation.When considering the distance threshold (cid:15) , it is infeasible if not impossibleto compute the probabilities of all the labels of a target FNN by enumeratingall the possible inputs x (cid:48) ∈ I that satisﬁes L ∞ ( x, x (cid:48) ) ≤ (cid:15) . To address this tech-nical challenge, we propose a novel approach by leveraging the symbolic intervalpropagation and linear relaxation techniques.Given an FNN f : I → O , for every x ∈ I and L ∞ distance threshold (cid:15) , weapproximate the output range (i.e., output interval) of the FNN for the input x and the distance threshold (cid:15) , by leveraging the symbolic interval propagation andlinear relaxation techniques. Firstly, we encode the set of all the possible inputs x (cid:48) ∈ I such that L ∞ ( x, x (cid:48) ) ≤ (cid:15) as an input interval. By applying the symbolicinterval propagation with the linear relaxation to handle the ReLU function, weobtain the output interval. Finally, we can approximate the probabilities of allthe labels of all the possible inputs based on the approximated output range.The labels except for the original label of the input x are then sorted in thedescending order of the probabilities of the labels for all the possible inputs.Figure 3 shows an example of computing the output interval for label sortingusing symbolic interval propagation. For the node x whose interval is [-1,13], weuse linear relaxation to represent the upper bound x ub and x lb by two linearconstraints. Assume that (cid:96) is the original label of some input. Then, (cid:96) is themost likely target label to which the input with some perturbation would beclassiﬁed because the upper bound of its output interval is larger than that ofthe other labels except the original label (cid:96) . The input with any perturbation in ccelerating Robustness Veriﬁcation of Deep Neural Networks 9 Algorithm 1:

Robustness veriﬁcation of FNN guided by Target Labels input :

An FNN f , an input vector x , a distance threshold (cid:15) output: { Robust, Non-robust with an adversarial example, Unknown } J := Labels ( F ) / {L ( F ( x )) } // J : The list of target labels. J (cid:48) := sort ( F, x, (cid:15), J ); // Sort the list J according to the probabilitiesof the labels to which x with L ∞ distance (cid:15) are classified. ﬂag:=false; // to indicate the unkown case. while J (cid:48) (cid:54) = nil do j := head ( J (cid:48) ); // The label j in J (cid:48) with the largest probability. Result:=

Verifier ( F, x, (cid:15), j ); // To invoke existing NN verifiers switch Result do case true do J (cid:48) := tail ( J (cid:48) ); // Remove the head of J (cid:48) continue ; case false do x (cid:48) := getAdvExample ( F, x, (cid:15), j ); // get adversarial example return x (cid:48) ; case unkown do ﬂag:=true; // fail for the current label, try next one J (cid:48) := tail ( J (cid:48) ); // Remove the head of J (cid:48) continue ; if ﬂag then return unknown; // if the tool fails for some label. else return robust; // If F is robust against all the labels in J (cid:48) the range would never be classiﬁed to the label (cid:96) because its upper bound is lessthan the lower bound of the original label (cid:96) . Thus, (cid:96) can be safely discardedfrom veriﬁcation, and the sorting result is (cid:96) ; (cid:96) .The key diﬀerence between our approach and the na¨ıve one is that: our ap-proach returns an interval of probabilities for each label, while the latter returns aconcrete probability for the label. The interval of probabilities over-approximatesall the possible probabilities for all possible inputs in the input interval. In con-trast, the concrete probability only reﬂects the classiﬁcation result of one con-crete input. It is known that symbolic interval propagation does not excludelabels that may cause mispredication [52], thus the sorted list of labels producedby our approach consists of all possible target labels and is more likely to be areal one than the one produced by the na¨ıve approach. The overview of our veriﬁcation approach guided by the sorted labels is shownin Algorithm 1. We ﬁrst sort the labels as aforementioned (lines 1–2) and thenverify the robustness of the given neural network against the labels one by onein J (cid:48) w . r . t . the input x and a perturbation distance threshold (cid:15) (while-loop). et al . Let J (cid:48) denote the sorted list of the labels (line 2), and j denote the headof J (cid:48) (line 5), the label to which some adversarial examples are most likelymisclassiﬁed (Line 5). We verify if the given FNN f is robust or not against thelabel j by invoking an oracle Verifier (line 6). The oracle

Verifier takes theFNN f , the input x , the distance threshold (cid:15) and the target label j as inputs,and outputs true , false or unknown . There exist several state-of-the-art FNNveriﬁcation tools for verifying robustness, therefore, instead of developing ourown tool for verifying j -robustness, we in this work leverage existing tools toachieve this goal. The diﬀerence is that we provide the tools with the most likelytarget label j as additional information besides F , x and (cid:15) . If the oracle Verifier returns true ( i . e ., robust), the algorithm proceeds to verify the remaining labels.There are two possible outcomes if the FNN f is not robust against thelabel j , depending on the precision of invoked veriﬁcation tool Verifier . If thetool is both sound and complete, e . g . MipVerify , it returns a real adversarialexample to falsify the robustness of the FNN f , namely, the adversarial exampleis misclassiﬁed to the label j . If the tool is incomplete, e . g . DeepZ and

Neurify ,it may return unknown after several iteration of reﬁnements. In this case, weset a ﬂag to record this failure and skip this label. After all the labels havebeen veriﬁed without returning any adversarial examples, the algorithm returns robust if ﬂag is not true, and unknown otherwise.We remark that the soundness and completeness of our algorithm rely onthe oracle Verifier employed in the algorithm. We assume that the implemen-tation of the oracle

Verifier is sound, which is reasonable according to thesurvey [31]. Then, our algorithm is also sound. By the deﬁnition of soundness,if our algorithm returns robust , then the FNN F must be robust w . r . t . x and (cid:15) .That is straightforward because the oracle Verifier returns true for all labelsin J (cid:48) . Likewise, we can show that our algorithm is complete if and only if theimplementation of the oracle Verifier is complete.

We implemented Algorithm 1 using Julia programming language [3]. To evalu-ate its performance, we choose three recent promising DNN veriﬁcation tools,

MipVerify , DeepZ and

Neurify , as backend veriﬁer to verify j -robustness foreach label j . – MipVerify [50] formulates the robustness veriﬁcation of piecewise-linear neu-ral networks as a mixed-integer program. It improves existing Mixed IntegerLinear Programing (MILP) based approaches via a tighter formulation fornon-linearities and a novel presolve algorithm that makes full use of all in-formation available.

MipVerify is both sound and complete. However, theunderlying approach of

MipVerify relies on applying linear programming perneuron to obtain tight bounds for the MILP solver, which does not scale tolarger networks. ccelerating Robustness Veriﬁcation of Deep Neural Networks 11 – DeepZ [42] makes use of the abstract interpretation technique and uses theabstract domain

ZonoType that combines ﬂoating point polyhedra with in-tervals, coupled with abstract transformers for common neural network func-tions such as aﬃne transforms, ReLU, sigmoid and tanh activation functions,and the maxpool operator. These abstract transformers enable

DeepZ to eﬃ-ciently handle both feed-forward and convolutional networks. In contrast to

MipVerify , DeepZ is not complete due to the abstraction of original models. – Neurify [52] uses symbolic interval analysis and linear relaxation to computetighter bounds on the network outputs. Neurify is complete but the linearrelaxation method used in Neurify is not complete. Therefore, it may producespurious adversarial examples and introduce a directed constraint reﬁnementto deal with spurious adversarial examples by iteratively minimizing theerrors introduced during the linear relaxation process. Neurify is completeand at most requires n steps of reﬁnements (n is the number of cross-0hidden nodes in the network). However, in practice, the reﬁnement processmight take too long and thus Neurify sets a time threshold to decide whento terminate.We basically treat the three tools as black boxes in our veriﬁcation framework.However, we slightly modiﬁed them so that they can accept a target label as itsan extra input, besides a neural network, an input of the neural network and an L ∞ distance threshold. They call their builtin veriﬁcation algorithms to verifythe robustness against the given label instead of all the possible labels. We use MipVerify ∗ , DeepZ ∗ , and Neurify ∗ to represent the new tools extended with ourapproach, respectively. To evaluate the eﬀectiveness and eﬃciency of our approach, we compare boththe veriﬁcation precision and execution time of the original tools and their cor-responding extensions by our approach, respectively.

Benchmarks.

We use the widely-tested dataset, MNIST [29], which is a datasetof handwritten digits, in grayscale with 28 ×

28 pixels. The dataset consists ofa training set of 60,000 examples, and a test set of 10,000 examples, associatedwith a label from 10 classes. We selected the ﬁrst 100 images from the test setof MNIST for robustness veriﬁcation.

Architectures.

We use three diﬀerent architectures of fully connected feed-forward networks: 2 ×

24 (FNN 1), 2 ×

100 (FNN 2), and 5 ×

100 (FNN 3), where l × n denotes that the network has l layers and each layer consists of n neurons.The network FNN 1 is taken from Neurify [52], and the networks FNN 2 andthe FNN 3 are taken from

DeepZ [42]. All of them have been pre-trained withoutadversarial training.

Experimental setup.

All the experiments were conducted on a Linux serverrunning Ubuntu 18.04.3 with 32 cores AMD Ryzen Threadripper 3970X CPU @3.7GHz and 128 GB of main memory. We set three hour as timeout threshold per et al . Table 2.

The veriﬁcation results using

MipVerify and

MipVerify ∗ (a) The result of FNN 1FNN 1: (cid:104) (cid:105) , valid input: 99/100 (cid:15) t sort (s) MipVerify (s)

MipVerify ∗ (s) ACC(%) RST RST ∗ (cid:104) (cid:105) valid input: 98/100 (cid:15) t sort (s) MipVerify (s)

MipVerify ∗ (s) ACC(%) RST RST ∗ (cid:104) (cid:105) valid input: 99/100 (cid:15) t sort (s) MipVerify (s)

MipVerify ∗ (s) ACC(%) RST RST ∗ execution for all the experiments. For each FNN, we evaluate the performanceof the tools under diﬀerent distance thresholds on the same set of inputs. We evaluate our approach in terms of veriﬁcation time and veriﬁcation precision.Speciﬁcally, for each tool T , we denote by T the veriﬁcation time of the originaltool and T ∗ the veriﬁcation time of the corresponding tool extended with ourapproach. We also record the execution time for sorting all target labels anddenote it by t sort . The total time cost by the extended tools is the sum ofthe veriﬁcation time and sorting time. We calculate the time reduction rate by( T − T ∗ − t sort ) /T . We use the form m/n to represent the veriﬁcation precision,where m is the number of inputs that are proved to be robust, and n is thenumber of all the inputs. The time is measured by seconds. We will denote byRST and RST ∗ respectively the veriﬁcation precision of the original tool andour tool, and by ACC the time reduction rate. Performance on

MipVerify . Table 2 shows the veriﬁcation results using

MipVer-ify and

MipVerify ∗ on the three neural networks. One can see that both tools ccelerating Robustness Veriﬁcation of Deep Neural Networks 13 return the same veriﬁcation results under diﬀerent perturbation distance thresh-olds and networks. MipVerify ∗ is more eﬃcient than MipVerify in all the cases.There is a signiﬁcant speedup when the perturbation threshold is small, e . g ., (cid:15) ≤

3. It can even achieve 97% time reduction for small neural network when (cid:15) = 1. We can observe that the sorting time is almost linear with a very smallcoeﬃcient e . g ., 0.04 for FNN 1 and 0.37 for FNN 3, respectively.The reason of the acceleration is that MipVerify ∗ will not spend extra timeon verifying those labels that are proved impossible to be misclassiﬁed to orthe robustness is already falsiﬁed. In our approach, we exclude such impossiblelabels during sorting as described in Section 3.1. In contrast, MipVerify treats allthe classiﬁed labels except the original one with no diﬀerence and tries to ﬁnd anadversarial example within a perturbation threshold for all labels, which incursmore time on ﬁnding solutions.The experimental results also show that the acceleration of veriﬁcation de-creases with the increasing of perturbation distance threshold and the size ofnetwork. This is because that the increase of perturbation distance thresholdwill make more labels to be target labels to which adversarial examples exist.In the worst case, our approach will not accelerate the veriﬁcation due to theintrinsic NP-completeness of the problem. As shown in Table 2(b) and 2(c), bothtools run out of time when (cid:15) is too lager for the networks FNN 2 and FNN 3.

Performance on

DeepZ . Table 3 reports the veriﬁcation results using

DeepZ and

DeepZ ∗ . The veriﬁcation results of both tools are the same. Because DeepZ is an abstract interpretation based tool, it is not surprised that

DeepZ is moreeﬃcient than

MipVerify . However,

DeepZ does not preserve completeness afterintroducing abstraction and therefore it may not be able to certify an input evenif the neural network is robust to it against the preset perturbation range. Ourresults conﬁrm this.The experimental results also show that when (cid:15) ≤

5, our approach can ac-celerate the veriﬁcation and improve the time by up to 93.10% in some cases.However, we also notice that when (cid:15) becomes larger, e . g ., greater than 5, theacceleration becomes weak, and it can be even slower than the original tool.The reason is similar to the one of MipVerify , i . e ., there are more target labelswith the increasing of (cid:15) . It is worth mentioning that the reduced time is notalways strictly monotonic. Tables 3(b) and 3(c) consists of such cases. It wouldbe interesting to perform an in-depth analysis of these cases. One possible rea-son is that veriﬁcation per target label cannot reuse the intermediate results ofprevious veriﬁcations. One may use incremental veriﬁcation approach to solvethis problem. We retain them as future work. Performance on

Neurify . Table 4 shows the veriﬁcation results using

Neurify and

Neurify ∗ . Diﬀerent from the results of the above two sections, Neurify ∗ mayproduce diﬀerent veriﬁcation results from the original tool Neurify . For instance,

Neurify ∗ ﬁnds more inputs to which FNN 1 is robust using less time than Neurify when (cid:15) ≥ Neurify uses abstract and re-ﬁnement in their veriﬁcation approach, and a maximum number of reﬁnements et al . Table 3.

The veriﬁcation results using

DeepZ and

DeepZ ∗ (a) The result of FNN 1FNN 1 (cid:104) (cid:105) valid input: 99/100 (cid:15) t sort (s) DeepZ (s)

DeepZ ∗ (s) ACC(%) RST RST ∗ (cid:104) (cid:105) valid input: 98/100 (cid:15) t sort (s) DeepZ (s)

DeepZ ∗ (s) ACC(%) RST RST ∗ (cid:104) (cid:105) valid input: 99/100 (cid:15) t sort (s) DeepZ (s)

DeepZ ∗ (s) ACC(%) RST RST ∗ is predeﬁned to guarantee the termination. It returns unknown once it exceedsthe reﬁnement threshold. The order of target labels determines how fast that thetool reaches the threshold. The experimental results show that by sorting thetarget labels we can reduce the possibility of reaching the reﬁnement threshold.Although our tool may take more veriﬁcation time, some inputs whose veriﬁca-tion results are unknown produced by the original tool can be resolved as robustby our tool.It is worth mentioning that Neurify does not check whether an input can becorrectly classiﬁed before veriﬁcation, and regards it as robust even if it is alwaysclassiﬁed to an incorrect label. Such cases are excluded in our algorithm, andtherefore the number of safe inputs veriﬁed by

Neurify ∗ may be less than the oneveriﬁed by Neurify , as shown by the red numbers in Table 4.

We discuss existing formal veriﬁcation techniques for neural networks (cf. [31,23]for survey). Neural network testing (e.g., [48,39,33,36,25,9,51,4,28,5,30,7,11] to ccelerating Robustness Veriﬁcation of Deep Neural Networks 15

Table 4.

The veriﬁcation results using

Neurify and

Neurify ∗ (a) The result of FNN 1FNN 1 (cid:104) (cid:105) valid input: 99/100 (cid:15) t sort (s) Neurify (s)

Neurify ∗ (s) ACC(%) RST RST ∗ /100 /1007 0.44 73.71 64.25 12.23 /100 /1009 0.49 92.90 88.38 4.33 /100 /10011 0.51 86.19 68.94 19.41 /100 /100(b) The result of FNN 2FNN 1 (cid:104) (cid:105) valid input: 98/100 (cid:15) t sort (s) Neurify (s)

Neurify ∗ (s) ACC(%) RST RST ∗ /100 /1005 1.36 56.88 61.92 -11.24 /100 /1007 1.93 151.25 166.75 -11.53 /100 /1009 1.86 140.95 170.09 -21.99 6/100 6/10011 2.20 115.57 128.02 -12.69 1/100 1/100(c) The result of FNN 3FNN 3 (cid:104) (cid:105) valid input: 99/100 (cid:15) t sort (s) Neurify (s)

Neurify ∗ (s) ACC(%) RST RST ∗ /100 /1005 4.81 116.79 114.04 -1.77 43/100 43/1007 5.23 122.05 146.27 -24.13 /100 /1009 5.41 131.22 146.91 -16.08 1/100 1/10011 6.13 120.69 139.89 -20.99 0/100 0/100 cite a few) are excluded, which are computationally less expensive and are ableto work with large networks, but at the cost of the provable guarantees.Existing formal veriﬁcation techniques can be broadly classiﬁed as eithercomplete or incomplete ones. Complete techniques are based on constraint solverssuch as SAT/SMT/MILP solving [13,13,26,49] or reﬁnement [52]. They do notproduce false positives but have limited scalability and hardly handle neuralnetworks containing more than a few hundreds of hidden units. In contrast, in-complete ones usually rely on approximation for better scalability, but they mayproduce false positives. Such techniques mainly include duality [12,40,56], layer-by-layer approximations of the adversarial polytope [57], discretizing the searchspace [24], abstract interpretation [16,42,43], linear approximations [55,56], bound-ing the local Lipschitz constant [55], or bounding the activation of the ReLU withlinear functions [55].The complete robustness veriﬁcation of ReLU-based neural network is essen-tially a collection of linear programming problems. For a neuron with a ReLU ac-tivation function, the function can be active or inactive, depending on the input.Thus, every neuron is transformed into linear constraints. Consequently, the sizeof linear programming problems to solve increases exponentially with the num- et al . ber of neurons in a network, which is obviously not scalable. Katz et al . provedthat the robustness veriﬁcation problem of DNNs is NP-complete [26], whichillustrates the necessity of devising algorithmically eﬃcient veriﬁcation method.They extended the classical simplex algorithm to solve this problem [26]. How-ever, the algorithm is still limited to small-scale neural networks. For example,verifying a feed-forward network with 5 inputs, 5 outputs and 300 total hiddenneurons on a single data point can take a few hours [26]. Another solver-basedveriﬁcation system is Planet [13], which resorts to satisﬁability (SAT) solvers.Although an adversarial example found by such approaches is a genuine one,their scalability is always an obstacle which prevents them from being appliedto relatively large neural networks.Incomplete veriﬁcation techniques do not intend to solve the veriﬁcation taskdirectly. Instead, they tune the veriﬁcation problem into a classical linear pro-gramming problem by over-approximating the adversarial polytope or the spaceof outputs of a network for a region of possible inputs. For instance, [56] and [40]transform the veriﬁcation problem into a convex optimization problem usingrelaxations to over-approximate the outputs of ReLU nodes. Another typicalwork [16] leverages zonotopes for approximating each ReLU outputs. Dvijotham et al . propose to transform the veriﬁcation problem into an unconstrained dualformulation using Lagrange relaxation and use gradient-descent to solve theoptimization problem [12]. Such over-approximation drastically improves the ef-ﬁciency of obtaining provable adversarial accuracy results. However, incompleteveriﬁcation may produce false positives. Recently, two novel abstraction-basedframeworks for neural network veriﬁcation have been proposed [14,2] whichmerge several neurons into a single neuron and obtain a smaller, abstractedneural network, while prior work abstracts the transformation of each neuron.Our approach is orthogonal to, and can be integrated with existing neuralnetwork veriﬁcation techniques and abstraction-based frameworks, to acceleraterobustness veriﬁcation. Although both the symbolic interval analysis and linearrelaxation techniques have been used in existing works, to our knowledge, theyare the ﬁrst time used for ranking labels. Furthermore, our robustness veriﬁca-tion methodology that reduces to the veriﬁcation for each target label and theveriﬁcation approach for one target label are new to some extent. In this work, we proposed a novel, generic approach to accelerate the robustnessveriﬁcation of DNNs. The novelty of our approach is threefold. First, we showedthat the overall veriﬁcation problem can be reduced to the veriﬁcation problemfor each target label. Second, we presented an eﬃcient and eﬀective approachfor ranking labels. Finally, we integrated our approach into three recent promis-ing DNN veriﬁcation tools. Experimental results showed that our approach iseﬀective when the perturbation distance is set in a reasonable range.In future, we plan to investigate incremental veriﬁcation approaches so thatthe intermediate results of previous veriﬁcations could be reused for verifying ccelerating Robustness Veriﬁcation of Deep Neural Networks 17 later labels. We also plan to verify industry-level networks using more powerfulhardware such as GPU. We believe that the improvement in eﬃciency makes itpossible to verify DNN-based systems which is crucial to apply to safety-criticaldomains.

References

1. Apollo: An open, reliable and secure software platform for autonomous drivingsystems. http://apollo.auto (2018) 12. Ashok, P., Hashemi, V., Kret´ınsk´y, J., Mohr, S.: Deepabstract: Neural networkabstraction for accelerating veriﬁcation. In: Proceedings of The 18th Interna-tional Symposium on Automated Technology for Veriﬁcation and Analysis (ATVA)(2020) 163. Bezanson, J., Edelman, A., Karpinski, S., Shah, V.B.: Julia: A fresh approach tonumerical computing. SIAM review (1), 65–98 (2017) 104. Bhagoji, A.N., He, W., Li, B., Song, D.: Exploring the space of black-box attackson deep neural networks. CoRR abs/1712.09491 (2017) 145. Brendel, W., Rauber, J., Bethge, M.: Decision-based adversarial attacks: Reliableattacks against black-box machine learning models. In: Proceedings of the Inter-national Conference on Learning Representations (ICLR) (2018) 146. Carlini, N., Wagner, D.: Towards evaluating the robustness of neural networks. In:2017 IEEE Symposium on Security and Privacy (SP). pp. 39–57. IEEE (2017) 27. Chen, G., Chen, S., Fan, L., Du, X., Zhao, Z., Song, F., Liu, Y.: Who is realbob? adversarial attacks on speaker recognition systems. CoRR abs/1911.01840 (2019) 2, 148. Cheng, C.H., N¨uhrenberg, G., Ruess, H.: Maximum resilience of artiﬁcial neuralnetworks. In: International Symposium on Automated Technology for Veriﬁcationand Analysis. pp. 251–268. Springer (2017) 29. Cheng, M., Le, T., Chen, P., Zhang, H., Yi, J., Hsieh, C.: Query-eﬃcient hard-label black-box attack: An optimization-based approach. In: Proceedings of the7th International Conference on Learning Representations (ICLR) (2019) 1410. Ciresan, D.C., Giusti, A., Gambardella, L.M., Schmidhuber, J.: Deep neural net-works segment neuronal membranes in electron microscopy images. In: Proceed-ings of the 26th Annual Conference on Neural Information Processing Systems.pp. 2852–2860 (2012) 111. Duan, Y., Zhao, Z., Bu, L., Song, F.: Things you may not know about adversarialexample: A black-box adversarial image attack. CoRR abs/1905.07672 (2019) 2,1412. Dvijotham, K., Stanforth, R., Gowal, S., Mann, T.A., Kohli, P.: A dual approachto scalable veriﬁcation of deep networks. In: UAI. pp. 550–559 (2018) 2, 15, 1613. Ehlers, R.: Formal veriﬁcation of piece-wise linear feed-forward neural networks. In:International Symposium on Automated Technology for Veriﬁcation and Analysis.pp. 269–286. Springer (2017) 2, 15, 1614. Elboher, Y.Y., Gottschlich, J., Katz, G.: An abstraction-based framework for neu-ral network veriﬁcation. In: Proceedings of The 32nd International Conference onComputer-Aided Veriﬁcation (CAV) (2020) 1615. Fischetti, M., Jo, J.: Deep neural networks as 0-1 mixed integer linear programs:A feasibility study. arXiv preprint arXiv:1712.06174 (2017) 28 Wan et al .16. Gehr, T., Mirman, M., Drachsler-Cohen, D., Tsankov, P., Chaudhuri, S., Vechev,M.: Ai2: Safety and robustness certiﬁcation of neural networks with abstract in-terpretation. In: 2018 IEEE Symposium on Security and Privacy (SP). pp. 3–18.IEEE (2018) 2, 15, 1617. Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarialexamples. arXiv preprint arXiv:1412.6572 (2014) 218. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In:Proceedings of the IEEE conference on computer vision and pattern recognition.pp. 770–778 (2016) 1, 419. Hein, M., Andriushchenko, M.: Formal guarantees on the robustness of a classiﬁeragainst adversarial manipulation. In: Advances in Neural Information ProcessingSystems. pp. 2266–2276 (2017) 220. Hinton, G., Deng, L., Yu, D., Dahl, G., Mohamed, A.r., Jaitly, N., Senior, A.,Vanhoucke, V., Nguyen, P., Kingsbury, B., et al.: Deep neural networks for acousticmodeling in speech recognition. IEEE Signal processing magazine (2012) 121. Holley, P.: Texas becomes the latest state to get a self-driving car service. (May 2018)122. Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connectedconvolutional networks. In: Proceedings of the IEEE conference on computer visionand pattern recognition. pp. 4700–4708 (2017) 423. Huang, X., Kroening, D., Ruan, W., Sharp, J., Sun, Y., Thamo, E., Wu, M.,Yi, X.: Safety and trustworthiness of deep neural networks: A survey. CoRR abs/1812.08342v4 (2019), http://arxiv.org/abs/1812.08342v4 (11), 2278–2324 (1998)1130. Lei, Y., Chen, S., Fan, L., Song, F., Liu, Y.: Advanced evasion attacks and mitiga-tions on practical ml-based phishing website classiﬁers. CoRR abs/2004.06954 (2020) 2, 1431. Liu, C., Arnon, T., Lazarus, C., Barrett, C.W., Kochenderfer, M.J.: Algorithmsfor verifying deep neural networks. CoRR abs/1903.06758 (2019) 10, 1432. Lomuscio, A., Maganti, L.: An approach to reachability analysis for feed-forwardrelu neural networks. arXiv preprint arXiv:1706.07351 (2017) 2ccelerating Robustness Veriﬁcation of Deep Neural Networks 1933. Ma, L., Juefei-Xu, F., Zhang, F., Sun, J., Xue, M., Li, B., Chen, C., Su, T., Li, L.,Liu, Y., Zhao, J., Wang, Y.: Deepgauge: multi-granularity testing criteria for deeplearning systems. In: Proceedings of the 33rd ACM/IEEE International Conferenceon Automated Software Engineering (ASE). pp. 120–131 (2018) 1434. Moore, R.E., Kearfott, R.B., Cloud, M.J.: Introduction to interval analysis,vol. 110. Siam (2009) 535. Moosavi-Dezfooli, S.M., Fawzi, A., Frossard, P.: Deepfool: a simple and accuratemethod to fool deep neural networks. In: Proceedings of the IEEE conference oncomputer vision and pattern recognition. pp. 2574–2582 (2016) 236. Papernot, N., McDaniel, P.D., Goodfellow, I.J., Jha, S., Celik, Z.B., Swami, A.:Practical black-box attacks against machine learning. In: Proceedings of the ACMon Asia Conference on Computer and Communications Security (AsiaCCS). pp.506–519 (2017) 1437. Parag, T., Ciresan, D.C., Giusti, A.: Eﬃcient classiﬁer training to minimize falsemerges in electron microscopy segmentation. In: Proceedings of 2015 IEEE Inter-national Conference on Computer Vision. pp. 657–665 (2015) 138. Peck, J., Roels, J., Goossens, B., Saeys, Y.: Lower bounds on the robustness toadversarial perturbations. In: Advances in Neural Information Processing Systems.pp. 804–813 (2017) 239. Pei, K., Cao, Y., Yang, J., Jana, S.: Deepxplore: Automated whitebox testingof deep learning systems. In: Proceedings of the 26th Symposium on OperatingSystems Principles (SOSP). pp. 1–18 (2017) 1440. Raghunathan, A., Steinhardt, J., Liang, P.: Certiﬁed defenses against adversarialexamples. arXiv preprint arXiv:1801.09344 (2018) 15, 1641. Shen, D., Wu, G., , Suk, H.I.: Deep learning in medical image analysis. AnnualReview of Biomedical Engineering , 221–248 (2017) 142. Singh, G., Gehr, T., Mirman, M., P¨uschel, M., Vechev, M.: Fast and eﬀectiverobustness certiﬁcation. In: Advances in Neural Information Processing Systems.pp. 10802–10813 (2018) 2, 3, 11, 1543. Singh, G., Gehr, T., P¨uschel, M., Vechev, M.: An abstract domain for certifyingneural networks. Proceedings of the ACM on Programming Languages (POPL),41 (2019) 2, 1544. Singh, G., Gehr, T., P¨uschel, M., Vechev, M.T.: Boosting robustness certiﬁcationof neural networks. In: 7th International Conference on Learning Representations(ICLR) (2019) 245. Stewart, J.: Tesla’s autopilot was involved in another deadly car crash. (2018) 246. Szegedy, C., Vanhoucke, V., Ioﬀe, S., Shlens, J., Wojna, Z.: Rethinking the incep-tion architecture for computer vision. In: Proceedings of the IEEE conference oncomputer vision and pattern recognition. pp. 2818–2826 (2016) 447. Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., Fer-gus, R.: Intriguing properties of neural networks. In: Proceedings of InternationalConference on Learning Representations (2014) 248. Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I.,Fergus, R.: Intriguing properties of neural networks. In: Proceedings of the 2ndInternational Conference on Learning Representations (ICLR) (2014) 1449. Tjeng, V., Tedrake, R.: Verifying neural networks with mixed integer programming.corr abs/1711.07356 (2017). arXiv preprint arXiv:1711.07356 (2017) 150 Wan et al .50. Tjeng, V., Xiao, K.Y., Tedrake, R.: Evaluating robustness of neural networks withmixed integer programming. In: 7th International Conference on Learning Repre-sentations (ICLR) (2019) 2, 3, 1051. Tu, C., Ting, P., Chen, P., Liu, S., Zhang, H., Yi, J., Hsieh, C., Cheng, S.: Auto-zoom: Autoencoder-based zeroth order optimization method for attacking black-box neural networks. In: Proceedings of the 33rd AAAI Conference on ArtiﬁcialIntelligence (AAAI). pp. 742–749 (2019) 1452. Wang, S., Pei, K., Whitehouse, J., Yang, J., Jana, S.: Eﬃcient formal safety analysisof neural networks. In: Advances in Neural Information Processing Systems. pp.6367–6377 (2018) 2, 3, 9, 11, 1553. Wang, S., Pei, K., Whitehouse, J., Yang, J., Jana, S.: Formal security analysis ofneural networks using symbolic intervals. In: 27th USENIX Security Symposium.pp. 1599–1614 (2018) 654. Waymo: A self-driving technology development company. https://waymo.com/ (2009) 155. Weng, T.W., Zhang, H., Chen, H., Song, Z., Hsieh, C.J., Boning, D., Dhillon, I.S.,Daniel, L.: Towards fast computation of certiﬁed robustness for relu networks.arXiv preprint arXiv:1804.09699 (2018) 2, 1556. Wong, E., Kolter, J.Z.: Provable defenses against adversarial examples via theconvex outer adversarial polytope. arXiv preprint arXiv:1711.00851 (2017) 15, 1657. Xiang, W., Tran, H.D., Johnson, T.T.: Output reachable set estimation and veri-ﬁcation for multilayer neural networks. IEEE transactions on neural networks andlearning systems29