[PDF] NeuroDiff: Scalable Differential Verification of Neural Networks using Fine-Grained Approximation

Abstract

As neural networks make their way into safety-critical systems, where misbehavior can lead to catastrophes, there is a growing interest in certifying the equivalence of two structurally similar neural networks. For example, compression techniques are often used in practice for deploying trained neural networks on computationally- and energy-constrained devices, which raises the question of how faithfully the compressed network mimics the original network. Unfortunately, existing methods either focus on verifying a single network or rely on loose approximations to prove the equivalence of two networks. Due to overly conservative approximation, differential verification lacks scalability in terms of both accuracy and computational cost. To overcome these problems, we propose NeuroDiff, a symbolic and fine-grained approximation technique that drastically increases the accuracy of differential verification while achieving many orders-of-magnitude speedup. NeuroDiff has two key contributions. The first one is new convex approximations that more accurately bound the difference neurons of two networks under all possible inputs. The second one is judicious use of symbolic variables to represent neurons whose difference bounds have accumulated significant error. We also find that these two techniques are complementary, i.e., when combined, the benefit is greater than the sum of their individual benefits. We have evaluated NeuroDiff on a variety of differential verification tasks. Our results show that NeuroDiff is up to 1000X faster and 5X more accurate than the state-of-the-art tool.

Full PDF

NNeuroDiff: Scalable Differential Verification of NeuralNetworks using Fine-Grained Approximation

Brandon Paulsen

University of Southern CaliforniaLos Angeles, California, USA

Jingbo Wang

University of Southern CaliforniaLos Angeles, California, USA

Jiawei Wang

University of Southern CaliforniaLos Angeles, California, USA

Chao Wang

University of Southern CaliforniaLos Angeles, California, USA

ABSTRACT

As neural networks make their way into safety-critical systems,where misbehavior can lead to catastrophes, there is a growinginterest in certifying the equivalence of two structurally similarneural networks – a problem known as differential verification .For example, compression techniques are often used in practice fordeploying trained neural networks on computationally- and energy-constrained devices, which raises the question of how faithfully thecompressed network mimics the original network. Unfortunately,existing methods either focus on verifying a single network or relyon loose approximations to prove the equivalence of two networks.Due to overly conservative approximation, differential verificationlacks scalability in terms of both accuracy and computational cost.To overcome these problems, we propose NeuroDiff, a symbolic and fine-grained approximation technique that drastically increasesthe accuracy of differential verification on feed-forward ReLU net-works while achieving many orders-of-magnitude speedup. Neu-roDiff has two key contributions. The first one is new convexapproximations that more accurately bound the difference of twonetworks under all possible inputs. The second one is judicioususe of symbolic variables to represent neurons whose differencebounds have accumulated significant error. We find that these twotechniques are complementary, i.e., when combined, the benefit isgreater than the sum of their individual benefits. We have evalu-ated NeuroDiff on a variety of differential verification tasks. Ourresults show that NeuroDiff is up to 1000X faster and 5X moreaccurate than the state-of-the-art tool.

There is a growing need for rigorous analysis techniques that cancompare the behaviors of two or more neural networks trained forthe same task. For example, such techniques have applications inbetter understanding the representations learned by different net-works [46], and finding inputs where networks disagree [52]. Theneed is further motivated by the increasing use of neural network

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for components of this work owned by others than ACMmust be honored. Abstracting with credit is permitted. To copy otherwise, or republish,to post on servers or to redistribute to lists, requires prior specific permission and/or afee. Request permissions from [email protected].

ASE ’20, September 21–25, 2020, Virtual Event, Australia © 2020 Association for Computing Machinery.ACM ISBN 978-1-4503-6768-4/20/09...$15.00https://doi.org/10.1145/3324884.3416560 compression [14] – a technique that alters the network’s parame-ters to reduce its energy and computational cost – where we expect the compressed network to be functionally equivalent to the origi-nal network. In safety-critical systems where a single instance ofmisbehavior can lead to catastrophe, having formal guarantees onthe equivalence of the original and compressed networks is highlydesirable.Unfortunately, most work aimed at verifying or testing neuralnetworks does not provide formal guarantees on their equivalence.For example, testing techniques geared toward refutation can pro-vide inputs where a single network misbehaves [22, 31, 42, 44, 51] ormultiple networks disagree [23, 34, 52], but they do not guaranteethe absence of misbehaviors or disagreements. While techniquesgeared toward verification can prove safety or robustness prop-erties of a single network [7–9, 15, 18, 25, 38, 41, 47], they lackcrucial information needed to prove the equivalence of multiplenetworks. One exception is the ReluDiff tool of Paulsen et al. [33],which computes a sound approximation of the difference of twoneural networks, a problem known as differential verification . WhileReluDiff performs better than other techniques, the overly con-servative approximation it computes often causes both accuracyand efficiency to suffer.To overcome these problems, we propose NeuroDiff, a new sym-bolic and fine-grained approximation technique that significantlyincreases the accuracy of differential verification while achievingmany orders-of-magnitude speedup. NeuroDiff has two key con-tributions. The first contribution is the development of convex ap-proximations , a fine-grained approximation technique for bound-ing the output difference of neurons for all possible inputs, whichdrastically improves over the coarse-grained concretizations usedby ReluDiff. The second contribution is judiciously introducingsymbolic variables to represent neurons in hidden layers whose dif-ference bounds have accumulated significant approximation error.These two techniques are also complementary, i.e., when combined,the benefit is significantly greater than the sum of their individualbenefits.The overall flow of NeuroDiff is shown in Figure 1, where ittakes as input two neural networks f and f ′ , a set of inputs to theneural networks X defined by box intervals, and a small constant ϵ that quantifies the tolerance for disagreement. We assume that f and f ′ have the same network topology and only differ in thenumerical values of their weights. In practice, f ′ could be thecompressed version of f , or they could be networks constructedusing the same network topology but slightly different training a r X i v : . [ c s . L G ] S e p SE ’20, September 21–25, 2020, Virtual Event, Australia Brandon Paulsen, Jingbo Wang, Jiawei Wang, and Chao Wang

Inputs NeuroDiffForwardAnalysis

ConvexApproximationIntermediateVariables

Check ϵ Yes Verified

Partition X No f f ′ X ⊆ R n ϵ X X Proven?

Figure 1: The overall flow of NeuroDiff. data. We also note that this assumption can support compressiontechniques such as weight pruning [14] (by setting edges’ weightsto 0) and even neuron removal [10] (by setting all of a neuron’sincoming edge weights to 0). NeuroDiff then aims to prove ∀ x ∈ X . | f ′ ( x ) − f ( x )| < ϵ . It can return (1) verified if a proof can befound, or (2) undetermined if a specified timeout is reached.Internally, NeuroDiff first performs a forward analysis usingsymbolic interval arithmetic to bound both the absolute valueranges of all neurons, as in single network verification, and the dif-ference between the neurons of the two networks. NeuroDiff thenchecks if the difference between the output neurons satisfies ϵ , andif so returns verified . Otherwise, NeuroDiff uses a gradient-basedrefinement to partition X into two disjoint sub regions X and X ,and attempts the analysis again on the individual regions. Since X and X form independent sub-problems, we can do these analysesin parallel, hence gaining significant speedup.The new convex approximations used in NeuroDiff are signifi-cantly more accurate than not only the coarse-grained concretiza-tions in ReluDiff [33] but also the standard convex approximationsin single-network verification tools [39, 40, 47, 54]. While these(standard) convex approximations aim to bound the absolute valuerange of y = ReLU ( x ) , where x is the input of the rectified linearunit (ReLU) activation function, our new convex approximationsaim to bound the difference z = ReLU ( x + ∆ ) − ReLU ( x ) , where x and x + ∆ are ReLU inputs of two corresponding neurons. Thisis significantly more challenging because it involves the search ofbounding planes in a three-dimensional space (defined by x , ∆ and z ) as opposed to a two-dimensional space as in the prior work.The symbolic variables we judiciously add to represent values ofneurons in hidden layers should not be confused with the symbolicinputs used by existing tools either. While the use of symbolic inputsis well understood, e.g., both in single-network verification [39, 40,47, 54] and differential verification [33], this is the first time thatsymbolic variables are used to substitute values of hidden neuronsduring differential verification. While the impact of symbolic inputsoften diminishes after the first few layers of neurons, the impactof these new symbolic variables, when judiciously added, can bemaintained in any hidden layer.We have implemented the proposed NeuroDiff in a tool andevaluated it on a large set of differential verification tasks. Ourbenchmarks consists of 49 networks, from applications such asaircraft collision avoidance, image classification, and human activityrecognition. We have experimentally compared with ReluDiff [33],the state-of-the-art tool which has also been shown to be superior n , n , n , n , n , n , n , x ∈ [− , ] x ∈ [− , ] -2.01.0 2.01.0 -1.01.0 2.01.0 1.0-1.0 Figure 2: Motivating example. to ReluVal [48] and DeepPoly [40] for differential verification.Our results show that NeuroDiff is up to 1,000X faster and 5Xmore accurate. In addition, NeuroDiff is able to prove many ofthe same properties as ReluDiff while considering much largerinput regions.To summarize, this paper makes the following contributions: • We propose new convex approximations to more accuratelybound the difference between corresponding neurons of twostructurally similar neural networks. • We propose a method for judiciously introducing symbolicvariables to neurons in hidden layers to mitigate the propa-gation of approximation error. • We implement and evaluate the proposed technique on alarge number of differential verification tasks and demon-strate its significant speed and accuracy gains.The remainder of this paper is organized as follows. First, weprovide a brief overview of our method in Section 2. Then, weprovide the technical background in Section 3. Next, we presentthe detailed algorithms in Section 4 and the experimental results inSection 5. We review the related work in Section 6. Finally, we giveour conclusions in Section 7.

In this section, we highlight our main contributions and illustratethe shortcomings of previous work on a motivating example.

We use the neural network in Figure 2 as a running example. Thenetwork has two input nodes n , , n , , two hidden layers withtwo neurons each ( n , , n , and n , , n , ), and one output node n , . Each neuron in the hidden layer performs a summation oftheir inputs, followed by a rectified linear unit (ReLU) activationfunction, defined as y = max ( , x ) , where x is the input to the ReLUactivation function, and y is the output.Let this entire network be f , and the value of the output node be n , = f ( x , x ) , where x and x are the values of input nodes n , and n , , respectively. The network can be evaluated on a specificinput by performing a series matrix multiplications (i.e., affinetransformations) followed by element-wise ReLU transformations.For example, the output of the neurons of the first hidden layer is (cid:20) n , n , (cid:21) = ReLU (cid:32) (cid:20) . − . . . (cid:21) · (cid:20) x x (cid:21) (cid:33) = (cid:20) ReLU ( . x − . x ) ReLU ( . x + . x ) (cid:21) SE ’20, September 21–25, 2020, Virtual Event, Australia

Differential verification aims to compare f to another network f ′ that is structurally similar. For our example, f ′ is obtained byrounding the edge weights of f to the nearest whole numbers,a network compression technique known as weight quantization .Thus, f ′ , n ′ k , j and n ′ , = f ′ ( x , x ) are counterparts of f , n k , j and n , = f ( x , x ) for 0 ≤ k ≤ ≤ j ≤

2. Our goal is to provethat | f ′ ( x , x ) − f ( x , x )| is less than some reasonably small ϵ forall inputs defined by the intervals x ∈ [− , ] and x ∈ [− , ] . Forease of understanding, we show the edge weights of f in black, and f ′ in light blue in Figure 2. Naively, one could adapt any state-of-the-art, single-network verifi-cation tool for our task, including DeepPoly [40] and Neurify [47].Neurify, in particular, takes a neural network and an input regionof the network, and uses interval arithmetic [27, 48] to producesound symbolic lower and upper bounds for each output node. Typ-ically, Neurify would then use the computed bounds to certify theabsence of adversarial examples [43] for the network.However, for our task, the bounds must be computed for both net-works f and f ′ . Then, we subtract them, and concretize to computelower and upper bounds on f ′ ( x , x ) − f ( x , x ) . In our example,the individual bounds would be (approximately, due to rounding) [ LB ( f ) , U B ( f )] = [− . x − . x − . , . x − . x + . ] and [ LB ( f ′ ) , U B ( f ′ )] = [− . x − . x − . , . x − . x + . ] for nodes n , and n ′ , , respectively. After the subtraction, wewould obtain the bounds [ LB ( f ′ ) − U B ( f ) , U B ( f ′ ) − LB ( f )] = [− . x + . x − . , . x − . x + . ] . After concretiza-tion, we would obtain the bounds [− . , . ] . Unfortunately,the bounds are far from being accurate.The ReluDiff method of Paulsen et al. [33] showed that, bydirectly computing a difference interval layer-by-layer, the accuracycan be greatly improved. For the running example, ReluDiff wouldfirst compute bounds on the difference between the neurons n , and n ′ , , which is [ , . ] , and then similarly compute bounds on thedifference between outputs of n , and n ′ , . Then, the results wouldbe used to compute difference bounds of the subsequent layer. Thereason it is more accurate is because it begins computing part ofthe difference bound before errors have accumulated, whereas thenaive approach first accumulates significant errors at each neuron,and then computes the difference bound. In our running example,ReluDiff [33] would compute the tighter bounds [− . , . ] .While ReluDiff improves over the naive approach, in manycases, it uses concrete values for the upper and lower bounds. In prac-tice, this approach can suffer from severe error-explosion. Specifi-cally, whenever a neuron of either network is in an unstable state –i.e., when a ReLU’s input interval contains the value 0 – it has toconcretize the symbolic expressions. The key contribution in NeuroDiff, our new method, is a symbolic and fine-grained approximation technique that both reduces theapproximation error introduced when a neuron is in an unstablestate, and mitigates the explosion of such approximation error afterit is introduced.

Our firstcontribution is developing convex approximations to directly boundthe difference between two neurons after these ReLU activations.Specifically, for a neuron n in f and corresponding neuron n ′ in f ′ ,we want to bound the value of ReLU ( n ′ ) − ReLU ( n ) . We illustratethe various choices using Figures 3, 4, and 5.The naive way to bound this difference is to first compute ap-proximations of y = ReLU ( n ) and y ′ = ReLU ( n ′ ) separately, andthen subtract them. Since each of these functions has a single vari-able, convex approximation is simple and is already used by single-network verification tools [40, 47, 49]. Figure 6 shows the function y = ReLU ( n ) and its bounding planes (shown as dashed-lines) ina two-dimensional space (details in Section 3). However, as wehave already mentioned, approximation errors would be accumu-lated in the bounds of ReLU ( n ) and ReLU ( n ′ ) and then amplified bythe interval subtraction. This is precisely why the naive approachperforms poorly.The ReluDiff method of Paulsen et al. [33] improves upon thenew approximation by computing an interval bound on n ′ − n , de-noted ∆ , then rewriting z = ReLU ( n ′ ) − ReLU ( n ) as z = ReLU ( n + ∆ ) − ReLU ( n ) , and finally bounding this new function instead. Fig-ure 3 shows the shape of z = ReLU ( n + ∆ ) − ReLU ( n ) in a three-dimensional space. Note that it has four piece-wise linear subre-gions, defined by values of the input variables n and ∆ . While thebounds computed by ReluDiff [33], shown as the (horizontal)yellow planes in Figure 4, are sound, in practice they tend to beloose because the upper and lower bounds are both concrete values.Such eager concretization eliminates symbolic information that ∆ contained before applying the ReLU activation.In contrast, our method computes a convex approximation of z ,shown by the (tilted) yellow planes in Figure 5. Since these tiltedbounding planes are in a three-dimensional space, they are sig-nificantly more challenging to compute than the standard two-dimensional convex approximations (shown in Figure 6) used bysingle network verification tools. Our approximations have theadvantage of introducing significantly less error than the horizon-tal planes used in ReluDiff [33], while maintaining some of thesymbolic information for ∆ before applying the ReLU activation.We will show through experimental evaluation (Section 5) thatour convex approximation can drastically improve the accuracy ofthe difference bounds, and are particularly effective when the inputregion being considered is large. Furthermore, the tilted planesshown in Figure 5 are for the general case. For certain special cases,we obtain even tighter bounding planes (details in Section 4). Inthe running example, using our new convex approximations wouldimprove the final bounds to [− . , . ] . Our second contri-bution is introducing symbolic variables to represent the outputvalues of some unstable neurons, with the goal of limiting the prop-agation of approximation errors after they are introduced. In therunning example, since both n , and n ′ , are in unstable states,i.e., the input intervals of the ReLUs contain the value 0, we mayintroduce a new symbol x = ReLU ( n ′ , ) − ReLU ( n , ) . In all sub-sequent layers, whenever the value of ReLU ( n ′ , ) − ReLU ( n , ) isneeded, we use the bounds [ x , x ] instead of the actual bounds. SE ’20, September 21–25, 2020, Virtual Event, Australia Brandon Paulsen, Jingbo Wang, Jiawei Wang, and Chao Wang

Figure 3: The shape of z = ReLU ( n + ∆ ) − ReLU ( n ) . Figure 4: Bounding planes computed by ReluDiff [33].Figure 5: Bounding planes computed by our new method. LB ( n ) U B ( n ) U B ( R e L U ( n ) ) L B ( R e L U ( n ) ) Figure 6: Bounding planes computed by Neurify [47].

The reason why using x can lead to more accurate results isbecause, even though our convex approximations reduce the errorintroduced, there is inevitably some error that accumulates. Intro-ducing x allows this error to partially cancel in the subsequentlayers. In our running example, introducing the new symbolic vari-able x would be able to improve the final bounds to [− . , . ] .While creating x improved the result in this case, carelesslyintroducing new variables for all the unstable neurons can actuallyreduce the overall benefit (see Section 4). In addition, the computa-tional cost of introducing new variables is not negligible. Therefore,in practice, we must introduce these symbolic variables judiciously,to maximize the benefit. Part of our contribution in NeuroDiff isin developing heuristics to automatically determine when to createnew symbolic variables (details in Section 4). In this section, we review the technical background and then intro-duce notations that we use throughout the paper.

We focus on feed-forward neural networks, which we define as afunction f that takes an n -dimensional vector of real values x ∈ X ,where X ⊆ R n , and maps it to an m -dimensional vector y ∈ Y ,where Y ⊆ R m . We denote this function as f : X → Y . Typically, each dimension of y represents a score, such as a probability, thatthe input x belongs to class i , where 1 ≤ i ≤ m .A network with l layers has l weight matrices, each of whichis denoted W k , for 1 ≤ k ≤ l . For each weight matrix, we have W k ∈ R l k − × l k where l k − is the number of neurons in layer ( k − ) and likewise for l k , and l = n . Each element in W k represents theweight of an edge from a neuron in layer ( k − ) to one in layer k . Let n k , j denote the j th neuron of layer k , and n k − , i denotethe i th neuron of layer ( k − ) . We use W k [ i , j ] to denote the edgeweight from n k − , i to n k , j . In our motivating example, we have W [ , ] = . W [ , ] = . f ( x ) = f l ( W l · f l − ( W l − · ... f ( W · x ) ... )) , where f k is theactivation function of the k th layer and 1 ≤ k ≤ l . We focus onneural networks with ReLU activations because they are the mostwidely implemented in practice, but our method can be extendedto other activation functions, such as siдmoid and tanh , and otherlayer types, such as convolutional and max-pooling. We leave thisas future work. To compute approximations of the output nodes that are sound forall input values, we leverage interval arithmetic [27], which can beviewed as an instance of the abstract interpretation framework [5].It is well-suited to the verification task because interval arithmetic SE ’20, September 21–25, 2020, Virtual Event, Australia is soundly defined for basic operations of the network such asaddition, subtraction, and scaling.Let I = [ LB ( I ) , UB ( I )] be an interval with lower bound LB ( I ) andupper bound UB ( I ) . Then, for intervals I , I , we have addition andsubtraction defined as I + I = [ LB ( I ) + LB ( I ) , UB ( I ) + UB ( I )] and I − I = [ LB ( I ) − UB ( I ) , UB ( I ) − LB ( I )] , respectively. Fora constant c , scaling is defined as c × I = [ c × LB ( I ) , c × UB ( I ) when c >

0, and c × I = [ c × UB ( I ) , c × LB ( I )] otherwise.While interval arithmetic is a sound over-approximation, it isnot always accurate. To illustrate, let f ( x ) = x − x , and say weare interested in bounding f ( x ) when x ∈ [− , ] . One way tobound f is by evaluating f ( I ) where I = [− , ] . Doing so yields3 × [− , ] − [− , ] = [− , ] . Unfortunately, the most accuratebounds are [− , ] .There are (at least) two ways we can improve the accuracy. First,we can soundly refine the result by dividing the input intervalsinto disjoint partitions, performing the analysis independently oneach partition, and then unioning the resulting output intervalstogether. Previous work has shown the result will be at least asprecise [48], and often better. For example, if we partition x ∈ [− , ] into x ∈ [− , ] and x ∈ [ , ] , and perform the analysis for eachpartition, the resulting bounds improve to [− , ] .Second, the dependence between the two intervals are not lever-aged when we subtract them, i.e., that they were both x terms andhence could partially cancel out. To capture the dependence, wecan use symbolic lower and upper bounds [48], which are expres-sions in terms of the input variable, i.e., I = [ x , x ] . Evaluating f ( I ) then yields the interval I f = [ x , x ] , for x ∈ [− , ] . When usingsymbolic bounds, eventually, we must concretize the lower andupper bound equations. We denote concretization of LB ( I f ) = x and UB ( I f ) = x as LB ( I f ) = − UB ( I f ) =

2, respectively.Compared to the naive solution, [− , ] , this is a significant im-provement.When approximating the output of a given function f : X → Y over an input interval X ⊆ X , one may prove soundness by showingthat the evaluation of the lower and upper bounds on any input x ∈ X are always greater than and less than, respectively, to thetrue value of f ( x ) . Formally, for an interval I , let LB ( I )( x ) be theevaluation of the lower bound equation on input x , and similarly for UB ( I )( x ) . Then, the approximation is considered sound if ∀ x ∈ X ,we have LB ( I )( x ) ≤ f ( x ) ≤ UB ( I )( x ) . While symbolic intervals are exact for linear operations (i.e. they donot introduce error), this is not the case for non-linear operations,such as the ReLU activation. This is because, for efficiency reasons,the symbolic lower and upper bounds must be kept linear. Thus, de-veloping linear approximations for non-linear activation functionshas become a signifciant area of research for single neural networkverification [40, 47, 49, 54]. We review the basics below, but cautionthat they are different from our new convex approximations inNeuroDiff.We denote the input to the ReLU of a neuron n k , j as S in ( n k , j ) andthe output as S ( n k , j ) . The approach used by existing single-networkverification tools is to apply an affine transformation to the upperbound of S in ( n k , j ) such that UB ( S in ( n k , j ))( x ) ≥

0, where x ∈ X , and X is the input region for the entire network. For the lowerbound, there exist several possible transformations, including theone used by Neurify [47], shown in Figure 6, where n = S in ( n k , j ) and the dashed lines are the upper and lower bounds.We illustrate the upper bound transformation for n , of our moti-vating example. After computing the upper bound of the ReLU input UB ( S in ( n , )) = . x − . x , where x ∈ [− , ] and x ∈ [− , ] ,it computes the concrete lower and upper bounds. We denote theseas UB ( S in ( n , )) = − . UB ( S in ( n , )) = .

6. We refer to themas l and u , respectively, for short hand. Then, it computes the linethat passes through ( u , u ) and ( , l ) . Letting y = UB ( S in ( n , )) be the upper bound equation of the ReLU input, it computes theupper bound of the ReLU output as UB ( S ( n , )) = uu − l ( y − l ) = . x − . x + . ∆ = n ′ − n .As we will show in Section 4, there are significantly more consider-ations in our problem domain. We first present our baseline procedure for differential verificationof feed-forward neural networks (Section 4.1), and then present ouralgorithms for computing convex approximations (Section 4.3) andintroducing symbolic variables (Section 4.4).

We build off the work of Paulsen et al. [33], so in this section wereview the relevant pieces. We assume that the input to NeuroD-iff consists of two networks f and f ′ , each with l layers of thesame size. Let n ′ k , j in f ′ be the neuron paired with n k , j in f . Thisimplicitly creates a pairing of the edge weights between the twonetworks. We first introduce additional notation. • We denote the difference between a pair of neurons as ∆ k , j = n ′ k , j − n k , j . For example, ∆ , = . x = , x = • We denote the difference in a pair of edge weights as W ∆ k [ i , j ] = W ′ k [ i , j ] − W k [ i , j ] . For example, W ∆ [ , ] = . − . = . • We extend the symbolic interval notation to these terms.That is, S in ( ∆ k , j ) denotes the interval that bounds n ′ k , j − n k , j before applying ReLU, and S ( ∆ k , j ) denotes the interval afterapplying ReLU.Given that we have computed S ( n k − , i ) , S ( n ′ k − , i ) , S ( ∆ k − , i ) forevery neuron in the layer k −

1, now, we compute a single S ( ∆ k , j ) in the subsequent layer k in two steps (and then repeat for each1 ≤ j ≤ l k ).First, we compute S in ( ∆ k , j ) by propagating the output intervalsfrom the previous layer through the edges connecting to the target SE ’20, September 21–25, 2020, Virtual Event, Australia Brandon Paulsen, Jingbo Wang, Jiawei Wang, and Chao Wang ( x − l ) u − l ′ u − l + l ′ ( x − u ) u ′ − lu − l + u ′ l u Figure 7: Illustration of Lemmas 4.1 and 4.2. neuron. This is defined as S in ( ∆ k , j ) = (cid:213) i (cid:18) S ( ∆ k − , i ) × W ′ k [ i , j ] + S ( n k − , i ) × W ∆ k − [ i , j ] (cid:19) We illustrate this computation on node ∆ , in our example. First,we initialize S ( ∆ , ) = [ , ] , S ( ∆ , ) = [ , ] . Then we compute S in ( ∆ , ) = [ , ]× . + [ x , x ]× . + [ , ]×− . + [ x , x ]×− . = [ . x − . x , . x − . x ] .For the second step, we apply ReLU to S in ( ∆ k , j ) to obtain S ( ∆ k , j ) .This is where we apply the new convex approximations (Section 4.3)to obtain tighter bounds. Toward this end, we will focus on thefollowing two equations: z = ReLU ( n k , j + ∆ k , j ) − ReLU ( n k , j ) (1) z = ReLU ( n ′ k , j ) − ReLU ( n ′ k , j − ∆ k , j ) (2)While Paulsen et al. [33] also compute bounds of these two equa-tions, they use concretizations instead of linear approximations , thusthrowing away all the symbolic information. For the running exam-ple, their method would result in the bounds of S ( ∆ , ) = [− . , . ] .In contrast, our method will be able to maintain some or all of thesymbolic information, thus improving the accuracy. Before presenting our new linear approximations, we introducetwo useful lemmas, which will simplify our presentation as well asour soundness proofs.Lemma 4.1.

Let x be a variable such that l ≤ x ≤ u for constants l ≤ and ≤ u . For a constant l ′ such that l ≤ l ′ ≤ , we have x ≤ ( x − l ) ∗ u − l ′ u − l + l ′ ≥ l ′ . Lemma 4.2.

Let x be a variable such that l ≤ x ≤ u for constants l ≤ and ≤ u . For a constant u ′ such that ≤ u ′ ≤ u , we have u ′ ≥ ( x − u ) ∗ u ′ − lu − l + u ′ ≤ x . We illustrate these lemmas in Figure 7. The solid blue line showsthe equation y = x for the input interval l ≤ x ≤ u . The upperdashed line illustrates the transformation of Lemma 4.1, and thelower dashed line illustrates Lemma 4.2. Specifically, Lemma 4.1shows a transformation applied to x whose result is always greaterthan both l ′ and x . Similarly, Lemma 4.2 shows a transformationapplied to x whose result is always less than both u ′ and x . Theselemmas will be useful in bounding Equations 1 and 2. S ( ∆ k , j ) Now, we are ready to present our new approximations, which arelinear symbolic expressions derived from Equations 1 and 2.We first assume that n k , j and n ′ k , j could both be unstable, i.e.,they could take values both greater than and less than 0. This yieldsbounds for the general case in that they are sound in all states of n k , j and n ′ k , j (Sections 4.3.1 and 4.3.2). Then, we consider special casesof n k , j and n ′ k , j , in which even tighter upper and lower bounds arederived (Section 4.3.3).To simplify notation, we let n , n ′ , and ∆ stand in for n k , j , n ′ k , j , and ∆ k , j in the remainder of this section. Let l = UB ( S in ( ∆ )) and u = UB ( S in ( ∆ )) . The upper bound approximation is: UB ( S ( ∆ )) =  UB ( S in ( ∆ )) UB ( S in ( ∆ )) ≥ UB ( S in ( ∆ )) ≤ ( UB ( S in ( ∆ )) − l ) ∗ uu − l otherwise That is, when the input’s (delta) upper bound is greater than 0 for all x ∈ X , we can use the input’s upper bound unchanged. When theupper bound is always less than 0, the new output’s upper boundis then 0. Otherwise, we apply a linear transformation to the upperbound, which results in the upper plane illustrated in Figure 5. Weprove all three cases sound.Proof. We consider each case above separately. In the following,we use Equation 1 to derive the bounds, but we note a symmetricproof using Equation 2 exists and produces the same bounds. Case 1: UB ( S in ( ∆ )) ≥ . We first show that, according to Equa-tion 1, when 0 ≤ ∆ we have z ≤ ∆ . This then implies that, if UB ( S in ( ∆ )) ≥

0, then z ≤ UB ( S in ( ∆ ))( x ) for all x ∈ X , and henceit is a valid upper bound for the output interval.Assume 0 ≤ ∆ . We consider two cases of n . First, consider 0 ≤ n .Observe 0 ≤ n ∧ ≤ ∆ = ⇒ ≤ n + ∆ . Thus, the ReLU’s ofEquation 1 simplify to z = n + ∆ − n = ∆ = ⇒ z ≤ ∆ . When n <

0, Equation 1 simplifies to z = ReLU ( n + ∆ ) . Since n < n + ∆ ≤ ∆ ∧ ≤ ∆ = ⇒ ReLU ( n + ∆ ) ≤ ∆ . Thus, z = ReLU ( n + ∆ ) ≤ ∆ , so the approximation is sound. Case 2: UB ( S in ( ∆ )) ≤ . This case was previously proven [33],but we restate it here. UB ( S in ( ∆ )) ≤ ⇐⇒ n ′ ≤ n = ⇒ ReLU ( n ′ ) ≤ ReLU ( n ) ⇐⇒ ReLU ( n ′ ) − ReLU ( n ) ≤ Case 3.

By case 1, any UB ( S ( ∆ )) that satisfies UB ( S ( ∆ ))( x ) ≥ UB ( S ( ∆ ))( x ) ≥ UB ( S in ( ∆ ))( x ) for all x ∈ X is sound. Both in-equalities hold by Lemma 4.1, with x = UB ( S in ( ∆ )) , l = UB ( S in ( ∆ )) , u = UB ( S in ( ∆ )) and l ′ = □ We illustrate the upper bound computation on node n , of ourmotivating example. Recall that UB ( S in ( n , )) = . x − . x .Since UB ( S in ( n , )) = − . UB ( S in ( n , )) = .

4, we are inthe third case of our linear approximation above. Thus, we have UB ( S in ( n , )) = ( . x − . x −(− . ))∗ . . −(− . ) = . x − . x + .

2. This is the upper bounding plane illustrated in Figure 5. Thevolume under this plane is 50% less than the upper bounding planeof ReluDiff shown in Figure 4. SE ’20, September 21–25, 2020, Virtual Event, Australia

Let l = LB ( S in ( ∆ )) and u = LB ( S in ( ∆ )) , the lower bound approximation is: LB ( S ( ∆ )) =  LB ( S in ( ∆ )) LB ( S in ( ∆ )) ≤ LB ( S in ( ∆ )) ≥ ( LB ( S in ( ∆ )) − u ) ∗ − lu − l otherwise That is, when the input lower bound is always less than 0, we canleave it unchanged. When it is always greater than 0, the new lowerbound is then 0. Otherwise, we apply a transformation to the lowerbound, which results in the lower plane illustrated in Figure 5. Weprove all three cases sound.Proof. We consider each case above separately. In the following,we use Equation 1 to derive the bounds, but we note a symmetricproof using Equation 2 exists and produces the same bounds.

Case 1: LB ( S in ( ∆ )) ≤ . We first show that according to Equa-tion 1, when ∆ ≤ ∆ ≤ z . This then implies that, if LB ( S in ( ∆ )) ≤

0, we have LB ( S in ( ∆ ))( x ) ≤ z for all x ∈ X , andhence it is a valid lower bound for the output interval.Assume ∆ ≤

0. We consider two cases of n + ∆ . First, let 0 ≤ n + ∆ . Observe 0 ≤ n + ∆ ∧ ∆ ≤ = ⇒ ≤ n , so we cansimplify Equation 1 to z = n + ∆ − n = ∆ = ⇒ ∆ ≤ z . Second,let n + ∆ < ⇐⇒ ∆ < − n . Then, Equation 1 simplifies to z = − ReLU ( n ) = − max ( , n ) = min ( , − n ) . Now observe ∆ < − n ∧ ∆ < = ⇒ ∆ < min ( , − n ) = z . Case 2: LB ( S in ( ∆ )) ≥ . This case was previously proven sound[33], but we restate it here. LB ( S in ( ∆ )) ≥ ⇐⇒ n ′ ≥ n = ⇒ ReLU ( n ′ ) ≥ ReLU ( n ) ⇐⇒ ReLU ( n ′ ) − ReLU ( n ) ≥ Case 3.

By case 1, any LB ( S ( ∆ )) that satisfies LB ( S ( ∆ ))( x ) ≤ LB ( S ( ∆ ))( x ) ≤ LB ( S in ( ∆ ))( x ) for all x ∈ X will be valid. Bothinequalities hold by Lemma 4.2, with x = LB ( S in ( ∆ )) , u ′ = , l = LB ( S in ( ∆ )) , and u = LB ( S in ( ∆ )) . □ We illustrate the lower bound computation on node n , of ourmotivating example. Recall that LB ( S in ( n , )) = . x − . x . Since LB ( S in ( n , )) = − . LB ( S in ( n , )) = .

4, we are in the thirdcase of our linear approximation. Thus, we have LB ( S ( n , )) = ( . x − . x − (− . )) ∗ −(− . ) . −(− . ) = . x − . x − .

2. This isthe lower bounding plane illustrated in Figure 5. The volume abovethis plane is 50% less than the lower bounding plane of ReluDiffshown in Figure 4.

While the bounds pre-sented so far apply in all states of n and n ′ , under certain con-ditions, we are able to tighten these bounds even further. Towardthis end, we restate the following two lemmas proved by Paulsenet al. [33], which will come in handy. They are related to propertiesof Equations 1 and 2, respectively.Lemma 4.3. ReLU ( n + ∆ ) − n ≡ max (− n , ∆ ) Lemma 4.4. n ′ − ReLU ( n ′ − ∆ ) ≡ min ( n ′ , ∆ ) These lemmas provide bounds when n and n ′ are proved to belinear based on the absolute bounds that we compute. Figure 8: Tighter upper bounding plane.Figure 9: Tighter lower bounding plane.

Tighter Upper Bound When n ′ Is Linear.

In this case, we have UB ( S ( ∆ )) = UB ( S in ( ∆ )) , which is an improvement for the secondor third case of our general upper bound.Proof. By our case assumption, Equation 2 simplifies to the onein Lemma 4.4. Thus, z = min ( n ′ , ∆ ) = ⇒ z ≤ ∆ . □ Tighter Upper Bound When n Is Linear, UB ( S in ( ∆ )) ≤ − LB ( S in ( n ))≤ UB ( S in ( ∆ )) . We illustrate the z plane under these constraintsin Figure 8. Let l = UB ( S in ( ∆ )) , and let u = UB ( S in ( ∆ )) , and l ′ = − LB ( S in ( n )) , we use Lemma 4.1 to derive UB ( S ( ∆ )) = ( UB ( S in ( ∆ ))− l )∗ u − l ′ u − l + l ′ . This results in the upper plane of Figure 8. This improvesover the third case in our general upper bound because it allowsthe lower bound of UB ( S ( ∆ )) to be less than 0.Proof. By our case assumption, Equation 1 simplifies to the onein Lemma 4.3. By Lemma 4.1, we have for all x ∈ X , UB ( S ( ∆ ))( x ) ≥− LB ( S in ( n )) and UB ( S ( ∆ ))( x ) ≥ UB ( S in ( ∆ ))( x ) . These two inequal-ities imply UB ( S ( ∆ )) ≥ max (− n , ∆ ) . □ Tighter Lower Bound When n Is Linear.

Here, we can use theapproximation LB ( S ( ∆ )) = LB ( S in ( ∆ )) . This improves over thesecond and third cases of our general lower bound.Proof. By our case assumption, Equation 1 simplifies to the onein Lemma 4.3. Thus, z = max (− n , ∆ ) = ⇒ z ≥ ∆ . □ Tighter Lower Bound when n ′ is Linear, LB ( S in ( ∆ )) ≤ LB ( S in ( n ′ ))≤ LB ( S in ( ∆ )) . We illustrate the z plane under these constraintsin Figure 9. Here, letting l = LB ( S in ( ∆ )) , u = LB ( S in ( ∆ )) , and u ′ = LB ( S in ( n ′ )) , we can use Lemma 4.2 to derive the approximation LB ( S ( ∆ )) = ( LB ( S in ( ∆ )) − u ) ∗ u − l ′ u − l + u ′ . This results in the lowerplane of Figure 9. This improves over the third case, since it allowsthe upper bound to be greater than 0. SE ’20, September 21–25, 2020, Virtual Event, Australia Brandon Paulsen, Jingbo Wang, Jiawei Wang, and Chao Wang

Proof. By our case assumption, Equation 2 simplifies to theone shown in Lemma 4.4. By Lemma 4.2, we have for all x ∈ X , LB ( S ( ∆ ))( x ) ≤ LB ( S in ( ∆ ))( x ) and LB ( S ( ∆ ))( x ) ≤ LB ( S in ( n ′ )) .These two inequalities imply LB ( S ( ∆ ))( x ) ≤ min ( n ′ , ∆ ) . □ S ( ∆ ) While convex approximations reduce the error introduced by ReLU,even small errors tend to be amplified significantly after a fewlayers. To combat the error explosion, we introduce new symbolicterms to represent the output values of unstable neurons, whichallow their accumulated errors to cancel out.We illustrate the impact of symbolic variables on n , of ourmotivating example. Recall we have S ( ∆ , ) = [ . x − . x − . , . x − . x + . ] . After applying the convex approximation, weintroduce a new variable x such that x = [ . x − . x − . , . x − . x + . ] . Then we set S ( ∆ , ) = [ x , x ] , and propagatethis interval as before. After propagating through n , and n , andcombining them at n , , the x terms partially cancel out, resultingin the tighter final output interval [− . , . ] .In principle, symbolic variables may be introduced at any unsta-ble neurons that introduce approximation errors, however thereare efficiency vs. accuracy tradeoffs when introducing these sym-bolic variables. One consideration is how to deal with intermediatevariables referencing other intermediate variables. For example,if we decide to introduce a variable x for n , , then x will havean x term in its equation. Then, when we are evaluating a sym-bolic bound that contains an x term, which will be the case for n , , we will have to recursively substitute the bounds of the pre-vious intermediate variables, such as x . This becomes expensive,especially when it is used together with our bisection-based refine-ment [33, 48]. Thus, in practice, we first remove any back-referencesto intermediate variables by substituting in their lower bounds andupper bounds into the new intermediate variable’s lower and upperbounds, respectively.Given that we do not allow back-references, there are two ad-ditional considerations. First, we must consider that introducinga new intermediate variable wipes out all the other intermediatevariables. For example, introducing a new variable at n , wipesout references to x , thus preventing any x terms from cancelingat n , . Second, the runtime cost of introducing symbolic variablesis not negligible. The bulk of computation time in NeuroDiff isspent multiplying the network’s weight matrices by the neuron’ssymbolic bound equations, which is implemented using matrix mul-tiplication. Since adding variables increases the matrix size, thisincreases the matrix multiplication cost.Based on these considerations, we have developed heuristics foradding new variables judiciously. First, since the errors introducedby unstable neurons in the earliest layers are the most prone toexplode, and hence benefit the most when we create variables forthem, we rank them higher when choosing where to add symbolicvariables. Second, we bound the total number of symbolic variablesthat may be added, since our experience shows that introducingsymbolic variables for the earliest N unstable neurons gives drasticimprovements in both run time and accuracy. In practice, N is setto a number proportional to the weighted sum of unstable neurons in all layers. Formally, N = Σ Lk = γ k × N k , where N k is the numberof unstable neurons in layer k and γ k = k is the discount factor. We have implemented NeuroDiff and compared it with ReluD-iff [33], the state-of-the-art tool for differential verification ofneural networks. NeuroDiff builds upon the codebase of ReluD-iff [32], which was also used by single-network verification toolssuch as ReluVal [48] and Neurify [47]. All use OpenBLAS [55] tooptimize the symbolic interval arithmetic (namely in applying theweight matrices to the symbolic intervals). We note that NeuroDiffuses the algorithm from Neurify to compute S ( n k , j ) and S ( n ′ k , j ) ,whereas ReluDiff uses the algorithm of ReluVal. Since Neurify isknown to compute tighter bounds than ReluVal [47], we compareto both ReluDiff, and an upgraded version of ReluDiff whichuses the bounds from Neurify to ensure that any performance gainis due to our optimizations and not due to using Neurify’s bounds.We use the name ReluDiff+ to refer to ReluDiff upgraded withNeurify’s bounds. Our benchmarks consist of the 49 feed-forward neural networksused by Paulsen et al. [33], taken from three applications: aircraftcollision avoidance, image classification, and human activity recog-nition. We briefly describe them here. As in Paulsen et al. [33], thesecond network f ′ is generated by truncating the edge weights of f from 32 bit to 16 bit floats. ACAS Xu [16].

ACAS (aircraft collision avoidance system) Xuis a set of forty-five neural networks, each with five inputs, sixhidden layers of 50 units each, and five outputs, designed to advisea pilot (the ownship) how to steer an aircraft in the presence of anintruder aircraft. The inputs describe the position and speed of theintruder relative to the ownship, and the outputs represent scoresfor different actions that the ownship should take. The scores rangefrom [− . , . ] . We use the input regions defined by the propertiesof previous work [17, 48]. MNIST [21].

MNIST is a standard image classification task, wherethe goal is to correctly classify 28 ×

28 pixel greyscale images ofhandwritten digits. Neural networks trained for this task take 784inputs (one for each pixel) each in the range [ , ] , and computeten outputs – one score for each of the ten possible digits. We usethree networks of size 3x100 (three hidden layers of 100 neuronseach), 2x512, and 4x1024 taken from Weng et al. [49] and Wang etal.[47]. All achieve at least 95% accuracy on holdout test data. Human Activity Recognition (HAR) [1].

The goal for this taskis to classify the current activity of human (e.g. walking, sitting,laying down) based on statistics from a smartphone’s gyroscopicsensors. Networks trained on this task take 561 statistics computedfrom the sensors and output six scores for six different activities.We use a 1x500 network.

Our experiments aim to answer the following research questions:(1) Is NeuroDiff significantly faster than state-of-the-art? SE ’20, September 21–25, 2020, Virtual Event, Australia

Figure 10: Comparing the execution times of NeuroDiffand ReluDiff+ on all verification tasks. (2) Is NeuroDiff’s forward pass significantly more accurate?(3) Can NeuroDiff handle significantly larger input regions?(4) How much does each technique contribute to the overallimprovement?To answer these questions, we run both NeuroDiff and ReluD-iff/ReluDiff+ on all benchmarks and compare their results. BothNeuroDiff and ReluDiff/ReluDiff+ can be parallelized to usemultithreading, so we configure a maximum of 12 threads for allexperiments. Our experiments are run on a computer with an AMDRyzen Threadripper 2950X 16-core processor, with a 30-minutetimeout per differential verification task.While we could try and adapt a single-network verification toolto our task as done previously [33], we note that ReluDiff has beenshown to significantly outperform (by several orders of magnitude)this naive approach.

In the remainder of this section, we present our experimental resultsin two steps. First, we present the overall verification results on allbenchmarks. Then, we focus on the detailed verification results onthe more difficult verification tasks.

Our experimentalresults show that, on all benchmarks, the improved ReluDiff+slightly but consistently outperforms the original ReluDiff due toits use of the more accurate component from Neurify instead ofReluVal for bounding the absolute values of individual neurons.Thus, to save space, we will only show the results that compareNeuroDiff (our method) and ReluDiff+.We summarize the comparison between NeuroDiff and ReluD-iff+ using a scatter plot in Figure 10, where each point representsa differential verification task: the x-axis is the execution time ofNeuroDiff in seconds, and the y-axis the execution time of ReluD-iff+ in seconds. Thus, points on the diagonal line are ties, whilepoints above the diagonal line are wins for NeuroDiff.The results show that NeuroDiff outperformed ReluDiff+ formost verification tasks. Since the execution time is in logrithmicscale the speedups of NeuroDiff are more than 1000X for manyof these verification tasks. While there are cases where NeuroDiffis slower than ReluDiff+, due to the overhead of adding symbolicvariables, the differences are on the order of seconds. Since they

Table 1: Results for ACAS networks with ϵ = . . Property NeuroDiff (new) ReluDiff+ Speedupproved undet. time (s) proved undet. time (s) φ

45 0 522.6 44 1 4800.6 9.2 φ

42 0 2.3 42 0 4.1 1.8 φ

42 0 1.7 42 0 2.8 1.7 φ φ φ φ φ φ φ φ φ φ φ Table 2: Results for ACAS networks with ϵ = . . Property NeuroDiff (new) ReluDiff+ Speedupproved undet. time (s) proved undet. time (s) φ

41 4 11400.1 15 30 55778.6 4.9 φ

42 0 14.3 35 7 13642.2 957.2 φ

42 0 3.8 37 5 9115.0 2390.1 φ φ φ φ φ φ φ φ φ φ φ are all on the small MNIST networks and the HAR network thatare very easy for both tools, we omit an in-depth analysis of them.In the remainder of this section, we present an in-depth analysisof the more difficult verification tasks. For ACAS networks, we considertwo different sets of properties, namely the original properties fromPaulsen et al. [33] where ϵ = .

05, and the same properties but with ϵ = .

01. We emphasize that, while verifying ϵ = .

05 is useful, thismeans that the output value can vary by up to 10%. Considering ϵ = .

01 means that the output value can vary by up to 2%, whichis much more useful.Our results are shown in Tables 1 and 2, where the first columnshows the property, which defines the input space considered. Thenext three columns show the results for NeuroDiff, specifically thenumber of verified networks (out of the 45 networks), the numberof unverified networks, and the total run time across all networks.The next three show the same results, but for ReluDiff+. The finalcolumn shows the average speed up of NeuroDiff.The results show that NeuroDiff makes significant gains inboth speed and accuracy. Specifically, the speedups are up to twoand three orders of magnitude for ϵ = .

05 and 0 .

01, respectively. Inaddition, at the more accurate ϵ = .

01 level, NeuroDiff is able tocomplete 53 more verification tasks, out of the total 142 verificationtasks. SE ’20, September 21–25, 2020, Virtual Event, Australia Brandon Paulsen, Jingbo Wang, Jiawei Wang, and Chao Wang

Figure 11: Percentage of verification tasks completed onthe MNIST 4x1024 network for various perturbations. Figure 12: Accuracy comparison for a single forward passon the MNIST 4x1024 network with perturbation of 8.

For MNIST, we focus on the4x1024 network, which is the largest network considered by Paulsenet al. [33]. In contrast, since the smaller networks, namely 3x100and 2x512 networks, were handled easily by both tools, we omittheir results. In the MNIST-related verification tasks, the goal is toverify ϵ = p greyscale units to all of its pixels. In the previouswork, the largest perturbation was p =

3. Figure 11 compares Neu-roDiff and ReluDiff+ on p = / ≈ . × times largerthan 4, or in other words, 138 orders of magnitude larger.Next, we show a comparison of the epsilon verified by a singleforward pass for a perturbation of 8 on the MNIST 4x1024 networkin Figure 12. Points above the blue line indicate NeuroDiff per-formed better. Overall, NeuroDiff is between two and three timesmore accurate than ReluDiff+.Finally, we look at the targeted pixel perturbation properties.For these, the input space is created by taking an image, randomlychoosing n pixels, and setting there bounds to [ , ] , i.e., allowingarbitrary changes to the chosen pixels. We again use the 4x1024MNIST network. The results are summarized in Table 3. The firstcolumn shows the number of randomly perturbed pixels. We canagain see very large speedups, and a significant increase in the sizeof the input region that NeuroDiff can handle. Here, we analyze the con-tribution of individual techniques, namely convex approximationsand symbolic variables, to the overall performance improvement.In Table 4, we present the average ϵ that was able to be verifiedafter a single forward pass on the 4x1024 MNIST network for eachof the four techniques: ReluDiff+ (baseline), NeuroDiff with Table 3: Results of the MNIST 4x1024 pixel experiment.

Num.Pixels NeuroDiff (new) ReluDiff+ Speedupproved undet. time (s) proved undet. time (s)15 100 0 236.5 100 0 1610.2 6.818 100 0 540.8 88 12 34505.8 63.821 100 0 1004.0 30 70 145064.5 144.524 99 1 7860.1 1 99 179715.9 22.927 83 17 49824.0 0 100 180000.0 3.6

Table 4: Evaluating the individual contributions of convexapproximation and symbolic variables using the MNIST4x1024 global perturbation experiment.

Perturb Average ϵ VerifiedReluDiff+ Conv. Approx. Int. Vars. NeuroDiff3 0.59 0.42 (+1.39x) 0.43 (+1.38x) 0.20 (+2.93x)4 1.02 0.70 (+1.46x) 0.87 (+1.18x) 0.36 (+2.85x)5 1.60 1.06 (+1.52x) 1.47 (+1.09x) 0.56 (+2.87x)6 2.29 1.47 (+1.55x) 2.19 (+1.04x) 0.79 (+2.90x)7 3.02 1.92 (+1.58x) 2.96 (+1.02x) 1.04 (+2.91x)8 3.80 2.39 (+1.59x) 3.77 (+1.01x) 1.30 (+2.93x) only convex approximations, NeuroDiff with only intermediatevariables, and the full NeuroDiff.Overall, the individual benefits of the two proposed approxima-tion techniques are obvious. While convex approximation (alone)consistently provides benefit as perturbation increases, the ben-efit of symbolic variables (alone) tends to decrease. In addition,combining the two provides much greater benefit than the sum oftheir individual contributions. With perturbation of 8, for example,convex approximations alone are 1.59 times more accurate thanReluDiff+, and intermediate variables alone are 1.01 times moreaccurate. However, together they are 2.93 times more accurate.The results suggest two things. First, intermediate symbolic vari-ables perform well when a significant portion of the network isalready in the stable state. We confirm, by manually inspecting theexperimental results, that it is indeed the case when we use a per-turbation of 3 and 8 in the MNIST experiments. Second, the convexapproximations provide the most benefit when the pre-ReLU deltaintervals are (1) significantly wide, and (2) still contain a significant SE ’20, September 21–25, 2020, Virtual Event, Australia amount of symbolic information. This is also confirmed by man-ually inspecting our MNIST results: increasing the perturbationincreases the overall width of the delta intervals.

Aside from ReluDiff [33], the most closely related to our workare those that focus on verifying properties of single networks asopposed to two or more networks. These verification approachescan be broadly categorized into those that use exact, constraintsolving-based techniques and those that use approximations.On the constraint solving side, several works have adapted off-the-shelf solvers [2–4, 7, 45], or even implemented solvers specif-ically for neural networks [17, 18]. On the approximation side,many use techniques that fit into the framework of abstract inter-pretation [5]. For example, many works have leveraged abstractdomains such as intervals [16, 48, 49, 54], polyhedra [39, 40], andzonotopes [9, 41].In addition, these verification techniques have also been com-bined [15, 41, 47], or entirely different approaches [6, 12, 38], such asbounding a network’s lipschitz constant, have been studied. Theseverification techniques can also be integrated into the trainingprocess to produce more robust and easier to verify networks [8,25, 26, 50]. These works are orthogonal, though we believe theirtechniques can be adapted to our domain.A related but tangential line of work focuses on discoveringinteresting behaviors of neural networks, though without any guar-antees. Most closely related to our work are differential testingtechniques [23, 34, 52], which focus on finding disagreements be-tween a set of networks. However, these techniques do not attemptto prove the equivalence or similarity of multiple networks.Other works are more geared towards single network testing,and use white-box testing techniques [22, 31, 42, 44, 51], such asneuron coverage statistics, to assess how well a network has beentested, and also report interesting behaviors. Both of these can bethought of as adapting software engineering techniques to machinelearning.In addition, many works use machine learning techniques, suchas gradient optimization, to find interesting behaviors, such asadversarial examples[19, 24, 28, 29, 53]. These interesting behaviorscan then be used to retrain the network to improve robustness [11,36]. Again, these techniques do not provide guarantees, though webelieve they could be integrated into NeuroDiff to quickly findcounterexamples.Finally, our work draws inspiration from classic software en-gineering techniques, such as regression testing [37], differentialassertion checking [20], differential fuzzing [30], and incrementalsymbolic execution [13, 35], where one version of a program is usedas an “oracle”, to more efficiently test or verify a new version of thesame program. In our case, f can be thought of as the oracle, while f ′ is the new version. We have presented NeuroDiff, a scalable differential verificationtechnique for soundly bounding the difference between two feed-forward neural networks. NeuroDiff leverages novel convex ap-proximations, which reduce the overall approximation error, and intermediate symbolic variables, which control the error explosion,to significantly improve efficiency and accuracy of the analysis.Our experimental evaluation shows that NeuroDiff can achieveup to 1000X speedup and is up to five times as accurate.

ACKNOWLEDGMENTS

This work was partially funded by the U.S. Office of Naval Research(ONR) under the grant N00014-17-1-2896. SE ’20, September 21–25, 2020, Virtual Event, Australia Brandon Paulsen, Jingbo Wang, Jiawei Wang, and Chao Wang

REFERENCES [1] Davide Anguita, Alessandro Ghio, Luca Oneto, Xavier Parra, and Jorge L. Reyes-Ortiz. 2013. A Public Domain Dataset for Human Activity Recognition UsingSmartphones. (2013).[2] Teodora Baluta, Shiqi Shen, Shweta Shinde, Kuldeep S Meel, and Prateek Saxena.2019. Quantitative verification of neural networks and its security applications. In

Proceedings of the 2019 ACM SIGSAC Conference on Computer and CommunicationsSecurity . 1249–1264.[3] Osbert Bastani, Yani Ioannou, Leonidas Lampropoulos, Dimitrios Vytiniotis,Aditya V. Nori, and Antonio Criminisi. 2016. Measuring Neural Net Robustnesswith Constraints. In

Annual Conference on Neural Information Processing Systems .2613–2621.[4] Nicholas Carlini and David A. Wagner. 2017. Towards Evaluating the Robustnessof Neural Networks. In

IEEE Symposium on Security and Privacy . 39–57.[5] Patrick Cousot and Radhia Cousot. 1977. Abstract Interpretation: A UnifiedLattice Model for Static Analysis of Programs by Construction or Approximationof Fixpoints. In

ACM SIGACT-SIGPLAN Symposium on Principles of ProgrammingLanguages . 238–252.[6] Krishnamurthy Dvijotham, Robert Stanforth, Sven Gowal, Timothy A. Mann,and Pushmeet Kohli. 2018. A Dual Approach to Scalable Verification of DeepNetworks. In

International Conference on Uncertainty in Artificial Intelligence .550–559.[7] Rüdiger Ehlers. 2017. Formal Verification of Piece-Wise Linear Feed-ForwardNeural Networks. In

Automated Technology for Verification and Analysis - 15thInternational Symposium, ATVA 2017, Pune, India, October 3-6, 2017, Proceedings .269–286.[8] Marc Fischer, Mislav Balunovic, Dana Drachsler-Cohen, Timon Gehr, Ce Zhang,and Martin T. Vechev. 2019. DL2: Training and Querying Neural Networks withLogic. In

International Conference on Machine Learning . 1931–1941.[9] Timon Gehr, Matthew Mirman, Dana Drachsler-Cohen, Petar Tsankov, SwaratChaudhuri, and Martin T. Vechev. 2018. AI2: Safety and Robustness Certificationof Neural Networks with Abstract Interpretation. In

IEEE Symposium on Securityand Privacy . 3–18.[10] Sumathi Gokulanathan, Alexander Feldsher, Adi Malca, Clark Barrett, and GuyKatz. 2019. Simplifying Neural Networks with the Marabou Verification Engine. arXiv preprint arXiv:1910.12396 (2019).[11] Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. 2015. Explainingand Harnessing Adversarial Examples. In

International Conference on LearningRepresentations .[12] Divya Gopinath, Guy Katz, Corina S. Pasareanu, and Clark W. Barrett. 2018.DeepSafe: A Data-Driven Approach for Assessing Robustness of Neural Net-works. In

Automated Technology for Verification and Analysis - 16th InternationalSymposium, ATVA 2018, Los Angeles, CA, USA, October 7-10, 2018, Proceedings .3–19.[13] Shengjian Guo, Markus Kusano, and Chao Wang. 2016. Conc-iSE: IncrementalSymbolic Execution of Concurrent Software. In

IEEE/ACM International Confer-ence On Automated Software Engineering .[14] Song Han, Huizi Mao, and William J. Dally. 2016. Deep Compression: Compress-ing Deep Neural Network with Pruning, Trained Quantization and HuffmanCoding. In

International Conference on Learning Representations .[15] Xiaowei Huang, Marta Kwiatkowska, Sen Wang, and Min Wu. 2017. SafetyVerification of Deep Neural Networks. In

International Conference on ComputerAided Verification . 3–29.[16] Kyle D. Julian, Mykel J. Kochenderfer, and Michael P. Owen. 2018. Deep Neu-ral Network Compression for Aircraft Collision Avoidance Systems.

CoRR abs/1810.04240 (2018). arXiv:1810.04240 http://arxiv.org/abs/1810.04240[17] Guy Katz, Clark W. Barrett, David L. Dill, Kyle Julian, and Mykel J. Kochenderfer.2017. Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks. In

International Conference on Computer Aided Verification . 97–117.[18] Guy Katz, Derek A. Huang, Duligur Ibeling, Kyle Julian, Christopher Lazarus,Rachel Lim, Parth Shah, Shantanu Thakoor, Haoze Wu, Aleksandar Zeljic, David L.Dill, Mykel J. Kochenderfer, and Clark W. Barrett. 2019. The Marabou Frame-work for Verification and Analysis of Deep Neural Networks. In

InternationalConference on Computer Aided Verification . 443–452.[19] Alexey Kurakin, Ian J. Goodfellow, and Samy Bengio. 2017. Adversarial examplesin the physical world. In

International Conference on Learning Representations .[20] Shuvendu K Lahiri, Kenneth L McMillan, Rahul Sharma, and Chris Hawblitzel.2013. Differential assertion checking. In

Proceedings of the 2013 9th Joint Meetingon Foundations of Software Engineering . ACM, 345–355.[21] Yann Lecun, Leon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition.

Proc. IEEE

86, 11 (1998), 2278–2324.[22] Lei Ma, Felix Juefei-Xu, Fuyuan Zhang, Jiyuan Sun, Minhui Xue, Bo Li, ChunyangChen, Ting Su, Li Li, Yang Liu, et al. 2018. Deepgauge: Multi-granularity testingcriteria for deep learning systems. In

IEEE/ACM International Conference OnAutomated Software Engineering . ACM, 120–131. [23] Shiqing Ma, Yingqi Liu, Wen-Chuan Lee, Xiangyu Zhang, and Ananth Grama.2018. MODE: automated neural network model debugging via state differentialanalysis and input selection. In

Proceedings of the 2018 ACM Joint Meeting onEuropean Software Engineering Conference and Symposium on the Foundationsof Software Engineering, ESEC/SIGSOFT FSE 2018, Lake Buena Vista, FL, USA,November 04-09, 2018 . 175–186.[24] Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, andAdrian Vladu. 2018. Towards deep learning models resistant to adversarial attacks.

International Conference on Learning Representations (2018).[25] Matthew Mirman, Timon Gehr, and Martin T. Vechev. 2018. DifferentiableAbstract Interpretation for Provably Robust Neural Networks. In

InternationalConference on Machine Learning . 3575–3583.[26] Martin Vechev Mislav Balunovic. 2020. Adversarial Training and Provable De-fenses: Bridging the Gap.

International Conference on Learning Representations .[27] Ramon E Moore, R Baker Kearfott, and Michael J Cloud. 2009.

Introduction tointerval analysis . Vol. 110. Siam.[28] Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. 2016.DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks. In

IEEE Conference on Computer Vision and Pattern Recognition . 2574–2582.[29] Anh Mai Nguyen, Jason Yosinski, and Jeff Clune. 2015. Deep neural networks areeasily fooled: High confidence predictions for unrecognizable images. In

IEEEConference on Computer Vision and Pattern Recognition . 427–436.[30] Shirin Nilizadeh, Yannic Noller, and Corina S. Pasareanu. 2019. DifFuzz: differ-ential fuzzing for side-channel analysis. In

International Conference on SoftwareEngineering . 176–187.[31] Augustus Odena and Ian Goodfellow. 2018. Tensorfuzz: Debugging neural net-works with coverage-guided fuzzing. arXiv preprint arXiv:1807.10875 (2018).[32] Brandon Paulsen. 2020. ReluDiff-ICSE2020-Artifact. https://github.com/pauls658/ReluDiff-ICSE2020-Artifact.[33] Brandon Paulsen, Jingbo Wang, and Chao Wang. 2020. ReluDiff: DifferentialVerification of Deep Neural Networks.

International Conference on SoftwareEngineering (2020).[34] Kexin Pei, Yinzhi Cao, Junfeng Yang, and Suman Jana. 2017. DeepXplore: Au-tomated Whitebox Testing of Deep Learning Systems. In

ACM symposium onOperating Systems Principles . 1–18.[35] Suzette Person, Guowei Yang, Neha Rungta, and Sarfraz Khurshid. 2011. DirectedIncremental Symbolic Execution. In

ACM SIGPLAN Conference on ProgrammingLanguage Design and Implementation . ACM, New York, NY, USA, 504–515.[36] Aditi Raghunathan, Jacob Steinhardt, and Percy Liang. 2018. Certified Defensesagainst Adversarial Examples. In

International Conference on Learning Represen-tations .[37] Gregg Rothermel and Mary Jean Harrold. 1997. A safe, efficient regression testselection technique.

ACM Transactions on Software Engineering and Methodology(TOSEM)

6, 2 (1997), 173–210.[38] Wenjie Ruan, Xiaowei Huang, and Marta Kwiatkowska. 2018. ReachabilityAnalysis of Deep Neural Networks with Provable Guarantees. In

InternationalJoint Conference on Artificial Intelligence . 2651–2659.[39] Gagandeep Singh, Rupanshu Ganvir, Markus PÃĳschel, and Martin Vechev. 2019.Beyond the Single Neuron Convex Barrier for Neural Network Certification. In

Advances in Neural Information Processing Systems (NeurIPS) .[40] Gagandeep Singh, Timon Gehr, Markus Püschel, and Martin T. Vechev. 2019.An abstract domain for certifying neural networks.

ACM SIGACT-SIGPLANSymposium on Principles of Programming Languages (2019), 41:1–41:30.[41] Gagandeep Singh, Timon Gehr, Markus Püschel, and Martin T. Vechev. 2019.Boosting Robustness Certification of Neural Networks. In

International Conferenceon Learning Representations .[42] Youcheng Sun, Min Wu, Wenjie Ruan, Xiaowei Huang, Marta Kwiatkowska, andDaniel Kroening. 2018. Concolic testing for deep neural networks. In

Proceedingsof the 33rd ACM/IEEE International Conference on Automated Software Engineering,ASE 2018, Montpellier, France, September 3-7, 2018 . 109–119.[43] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan,Ian Goodfellow, and Rob Fergus. 2013. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199 (2013).[44] Yuchi Tian, Kexin Pei, Suman Jana, and Baishakhi Ray. 2018. DeepTest: Auto-mated testing of deep-neural-network-driven autonomous cars. In

InternationalConference on Software Engineering . 303–314.[45] Vincent Tjeng, Kai Xiao, and Russ Tedrake. 2019. Evaluating robustness of neuralnetworks with mixed integer programming.

International Conference on LearningRepresentations (2019).[46] Liwei Wang, Lunjia Hu, Jiayuan Gu, Zhiqiang Hu, Yue Wu, Kun He, and JohnHopcroft. 2018. Towards understanding learning representations: To what extentdo different neural networks learn the same representation. In

Advances in NeuralInformation Processing Systems . 9584–9593.[47] Shiqi Wang, Kexin Pei, Justin Whitehouse, Junfeng Yang, and Suman Jana. 2018.Efficient Formal Safety Analysis of Neural Networks. In

Annual Conference onNeural Information Processing Systems . 6369–6379.12

SE ’20, September 21–25, 2020, Virtual Event, Australia [48] Shiqi Wang, Kexin Pei, Justin Whitehouse, Junfeng Yang, and Suman Jana. 2018.Formal Security Analysis of Neural Networks using Symbolic Intervals. In

USENIXSecurity Symposium . 1599–1614.[49] Tsui-Wei Weng, Huan Zhang, Hongge Chen, Zhao Song, Cho-Jui Hsieh, LucaDaniel, Duane S. Boning, and Inderjit S. Dhillon. 2018. Towards Fast Computa-tion of Certified Robustness for ReLU Networks. In

International Conference onMachine Learning . 5273–5282.[50] Eric Wong and J. Zico Kolter. 2018. Provable Defenses against AdversarialExamples via the Convex Outer Adversarial Polytope. In

International Conferenceon Machine Learning . 5283–5292.[51] Xiaofei Xie, Lei Ma, Felix Juefei-Xu, Minhui Xue, Hongxu Chen, Yang Liu, JianjunZhao, Bo Li, Jianxiong Yin, and Simon See. 2019. DeepHunter: a coverage-guidedfuzz testing framework for deep neural networks. In

Proceedings of the 28th ACMSIGSOFT International Symposium on Software Testing and Analysis . 146–157. [52] Xiaofei Xie, Lei Ma, Haijun Wang, Yuekang Li, Yang Liu, and Xiaohong Li. 2019.Diffchaser: Detecting disagreements for deep neural networks. In

Proceedingsof the 28th International Joint Conference on Artificial Intelligence . AAAI Press,5772–5778.[53] Weilin Xu, Yanjun Qi, and David Evans. 2016. Automatically Evading Classifiers:A Case Study on PDF Malware Classifiers. In

Network and Distributed SystemSecurity Symposium .[54] Huan Zhang, Tsui-Wei Weng, Pin-Yu Chen, Cho-Jui Hsieh, and Luca Daniel.2018. Efficient neural network robustness certification with general activationfunctions. In

Advances in neural information processing systems . 4939–4948.[55] Xianyi Zhang, Qian Wang, and Yunquan Zhang. 2012. Model-driven Level 3 BLASPerformance Optimization on Loongson 3A Processor. In18th IEEE InternationalConference on Parallel and Distributed Systems, ICPADS 2012, Singapore, December17-19, 2012