[PDF] An Abstraction-Based Framework for Neural Network Verification

Abstract

Deep neural networks are increasingly being used as controllers for safety-critical systems. Because neural networks are opaque, certifying their correctness is a significant challenge. To address this issue, several neural network verification approaches have recently been proposed. However, these approaches afford limited scalability, and applying them to large networks can be challenging. In this paper, we propose a framework that can enhance neural network verification techniques by using over-approximation to reduce the size of the network - thus making it more amenable to verification. We perform the approximation such that if the property holds for the smaller (abstract) network, it holds for the original as well. The over-approximation may be too coarse, in which case the underlying verification tool might return a spurious counterexample. Under such conditions, we perform counterexample-guided refinement to adjust the approximation, and then repeat the process. Our approach is orthogonal to, and can be integrated with, many existing verification techniques. For evaluation purposes, we integrate it with the recently proposed Marabou framework, and observe a significant improvement in Marabou's performance. Our experiments demonstrate the great potential of our approach for verifying larger neural networks.

Full PDF

AAn Abstraction-Based Framework for NeuralNetwork Veriﬁcation

Yizhak Yisrael Elboher , Justin Gottschlich , and Guy Katz The Hebrew University of Jerusalem, Israel { yizhak.elboher, g.katz } @mail.huji.ac.il Intel Labs, [email protected]

Abstract.

Deep neural networks are increasingly being used as con-trollers for safety-critical systems. Because neural networks are opaque,certifying their correctness is a signiﬁcant challenge. To address this issue,several neural network veriﬁcation approaches have recently been pro-posed. However, these approaches aﬀord limited scalability, and apply-ing them to large networks can be challenging. In this paper, we proposea framework that can enhance neural network veriﬁcation techniquesby using over-approximation to reduce the size of the network — thusmaking it more amenable to veriﬁcation. We perform the approximationsuch that if the property holds for the smaller (abstract) network, it holdsfor the original as well. The over-approximation may be too coarse, inwhich case the underlying veriﬁcation tool might return a spurious coun-terexample. Under such conditions, we perform counterexample-guidedreﬁnement to adjust the approximation, and then repeat the process.Our approach is orthogonal to, and can be integrated with, many exist-ing veriﬁcation techniques. For evaluation purposes, we integrate it withthe recently proposed Marabou framework, and observe a signiﬁcant im-provement in Marabou’s performance. Our experiments demonstrate thegreat potential of our approach for verifying larger neural networks.

Machine programming (MP), the automatic generation of software, is showingearly signs of fundamentally transforming the way software is developed [15].A key ingredient employed by MP is the deep neural network (DNN), whichhas emerged as an eﬀective means to semi-autonomously implement many com-plex software systems. DNNs are artifacts produced by machine learning : a userprovides examples of how a system should behave, and a machine learning algo-rithm generalizes these examples into a DNN capable of correctly handling inputsthat it had not seen before. Systems with DNN components have obtained un-precedented results in ﬁelds such as image recognition [24], game playing [33],natural language processing [16], computer networks [28], and many others, of-ten surpassing the results obtained by similar systems that have been carefullyhandcrafted. It seems evident that this trend will increase and intensify, andthat DNN components will be deployed in various safety-critical systems [3,19]. a r X i v : . [ c s . F L ] J u l NNs are appealing in that (in some cases) they are easier to create thanhandcrafted software, while still achieving excellent results. However, their usagealso raises a challenge when it comes to certiﬁcation. Undesired behavior hasbeen observed in many state-of-the-art DNNs. For example, in many cases slightperturbations to correctly handled inputs can cause severe errors [35,26]. Becausemany practices for improving the reliability of hand-crafted code have yet tobe successfully applied to DNNs (e.g., code reviews, coding guidelines, etc.), itremains unclear how to overcome the opacity of DNNs, which may limit ourability to certify them before they are deployed.To mitigate this, the formal methods community has begun developing tech-niques for the formal veriﬁcation of DNNs (e.g., [10,17,20,37]). These techniquescan automatically prove that a DNN always satisﬁes a prescribed property. Un-fortunately, the DNN veriﬁcation problem is computationally diﬃcult (e.g., NP-complete, even for simple speciﬁcations and networks [20]), and becomes expo-nentially more diﬃcult as network sizes increase. Thus, despite recent advancesin DNN veriﬁcation techniques, network sizes remain a severely limiting factor.In this work, we propose a technique by which the scalability of many ex-isting veriﬁcation techniques can be signiﬁcantly increased. The idea is to applythe well-established notion of abstraction and reﬁnement [6]: replace a network N that is to be veriﬁed with a much smaller, abstract network, ¯ N , and thenverify this ¯ N . Because ¯ N is smaller it can be veriﬁed more eﬃciently; and it isconstructed in such a way that if it satisﬁes the speciﬁcation, the original net-work N also satisﬁes it. In the case that ¯ N does not satisfy the speciﬁcation, theveriﬁcation procedure provides a counterexample x . This x may be a true coun-terexample demonstrating that the original network N violates the speciﬁcation,or it may be spurious . If x is spurious, the network ¯ N is reﬁned to make it moreaccurate (and slightly larger), and then the process is repeated. A particularlyuseful variant of this approach is to use the spurious x to guide the reﬁnementprocess, so that the reﬁnement step rules out x as a counterexample. This vari-ant, known as counterexample-guided abstraction reﬁnement ( CEGAR ) [6], hasbeen successfully applied in many veriﬁcation contexts.As part of our technique we propose a method for abstracting and reﬁningneural networks. Our basic abstraction step merges two neurons into one, thusreducing the overall number of neurons by one. This basic step can be repeatednumerous times, signiﬁcantly reducing the network size. Conversely, reﬁnementis performed by splitting a previously merged neuron in two, increasing thenetwork size but making it more closely resemble the original. A key point isthat not all pairs of neurons can be merged, as this could result in a networkthat is smaller but is not an over-approximation of the original. We resolvethis by ﬁrst transforming the original network into an equivalent network whereeach node belongs to one of four classes, determined by its edge weights and itseﬀect on the network’s output; merging neurons from the same class can then bedone safely. The actual choice of which neurons to merge or split is performedheuristically. We propose and discuss several possible heuristics.or evaluation purposes, we implemented our approach as a Python frame-work that wraps the Marabou veriﬁcation tool [22]. We then used our frameworkto verify properties of the Airborne Collision Avoidance System (ACAS Xu) setof benchmarks [20]. Our results strongly demonstrate the potential usefulness ofabstraction in enhancing existing veriﬁcation schemes: speciﬁcally, in most casesthe abstraction-enhanced Marabou signiﬁcantly outperformed the original. Fur-ther, in most cases the properties in question could indeed be shown to hold ornot hold for the original DNN by verifying a small, abstract version thereof.To summarize, our contributions are: (i) we propose a general frameworkfor over-approximating and reﬁning DNNs; (ii) we propose several heuristics forabstraction and reﬁnement, to be used within our general framework; and (iii) weprovide an implementation of our technique that integrates with the Marabouveriﬁcation tool and use it for evaluation. Our code is available online [9].The rest of this paper is organized as follows. In Section 2, we provide abrief background on neural networks and their veriﬁcation. In Section 3, wedescribe our general framework for abstracting an reﬁning DNNs. In Section 4,we discuss how to apply these abstraction and reﬁnement steps as part of aCEGAR procedure, followed by an evaluation in Section 5. In Section 6, wediscuss related work, and we conclude in Section 7.

A neural network consists of an input layer , an output layer , and one or moreintermediate layers called hidden layers . Each layer is a collection of nodes,called neurons . Each neuron is connected to other neurons by one or more di-rected edges. In a feedforward neural network, the neurons in the ﬁrst layerreceive input data that sets their initial values. The remaining neurons calculatetheir values using the weighted values of the neurons that they are connected tothrough edges from the preceding layer (see Fig. 1). The output layer providesthe resulting value of the DNN for a given input.There are many types of DNNs, which may diﬀer in the way their neu-ron values are computed. Typically, a neuron is evaluated by ﬁrst computinga weighted sum of the preceding layer’s neuron values according to the edgeweights, and then applying an activation function to this weighted sum [13]. Wefocus here on the Rectiﬁed Linear Unit (ReLU) activation function [29], given asReLU( x ) = max (0 , x ). Thus, if the weighted sum computation yields a positivevalue, it is kept; and otherwise, it is replaced by zero.More formally, given a DNN N , we use n to denote the number of layersof N . We denote the number of nodes of layer i by s i . Layers 1 and n are theinput and output layers, respectively. Layers 2 , . . . , n − j -th node of layer i by v i,j , and denote the columnvector [ v i, , . . . , v i,s i ] T as V i .Evaluating N is performed by calculating V n for a given input assignment V . This is done by sequentially computing V i for i = 2 , , . . . , n , each time using nput Fig. 1.

A fully connected, feedforward DNN with 5 input nodes (in orange), 5 outputnodes (in purple), and 4 hidden layers containing a total of 36 hidden nodes (in blue).Each edge is associated with a weight value (not depicted). the values of V i − to compute weighted sums, and then applying the ReLUactivation functions. Speciﬁcally, layer i (for i >

1) is associated with a weightmatrix W i of size s i × s i − and a bias vector B i of size s i . If i is a hidden layer,its values are given by V i = ReLU( W i V i − + B i ) , where the ReLUs are appliedelement-wise; and the output layer is given by V n = W n V n − + B n (ReLUs arenot applied). Without loss of generality, in the rest of the paper we assume thatall bias values are 0, and can be ignored. This rule is applied repeatedly once foreach layer, until V n is eventually computed.We will sometimes use the notation w ( v i,j , v i +1 ,k ) to refer to the entry of W i +1 that represents the weight of the edge between neuron j of layer i andneuron k of layer i + 1. We will also refer to such an edge as an outgoing edge for v i,j , and as an incoming edge for v i +1 ,k .As part of our abstraction framework, we will sometimes need to consider a suﬃx of a DNN, in which the ﬁrst layers of the DNN are omitted. For 1 < i < n ,we use N [ i ] to denote the DNN comprised of layers i, i + 1 , . . . , n of the originalnetwork. The sizes and weights of the remaining layers are unchanged, and layer i of N is treated as the input layer of N [ i ] .Fig. 2 depicts a small neural network. The network has n = 3 layers, of sizes s = 1 , s = 2 and s = 1. Its weights are w ( v , , v , ) = 1, w ( v , , v , ) = − w ( v , , v , ) = 1 and w ( v , , v , ) = 2. For input v , = 3, node v , evaluates to3 and node v , evaluates to 0, due to the ReLU activation function. The outputnode v , then evaluates to 3. DNN veriﬁcation amounts to answering the following question: given a DNN N ,which maps input vector x to output vector y , and predicates P and Q , doesthere exist an input x such that P ( x ) and Q ( N ( x )) both hold? In other words,the veriﬁcation process determines whether there exists a particular input thatmeets the input criterion P , and that is mapped to an output that meets the , v , v , v , Fig. 2.

A simple feedforward neural network. output criterion Q . We refer to (cid:104) N, P, Q (cid:105) as the veriﬁcation query . As is usualin veriﬁcation, Q represents the negation of the desired property. Thus, if thequery is unsatisﬁable ( UNSAT ), the property holds; and if it is satisﬁable ( SAT ),then x constitutes a counterexample to the property in question.Diﬀerent veriﬁcation approaches may diﬀer in (i) the kinds of neural net-works they allow (speciﬁcally, the kinds of activation functions in use); (ii) thekinds of input properties; and (iii) the kinds of output properties. For simplicity,we focus on networks that employ the ReLU activation function. In addition, ourinput properties will be conjunctions of linear constraints on the input values.Finally, we will assume that our networks have a single output node y , and thatthe output property is y > c for a given constant c . We stress that these restric-tions are for the sake of simplicity. Many properties of interest, including thosewith arbitrary Boolean structure and involving multiple neurons, can be reducedinto the above single-output setting by adding a few neurons that encode theBoolean structure [20,32]; see Fig. 3 for an example. The number of neuronsto be added is typically negligible when compared to the size of the DNN. Inparticular, this is true for the ACAS Xu family of benchmarks [20], and alsofor adversarial robustness queries that use the L ∞ or the L norm as a distancemetric [5,14,21]. Additionally, other piecewise-linear activation functions, suchas max-pooling layers, can also be encoded using ReLUs [5].Several techniques have been proposed for solving the aforementioned veri-ﬁcation problem in recent years (Section 6 includes a brief overview). Our ab-straction technique is designed to be compatible with most of these techniques,by simplifying the network being veriﬁed, as we describe next. Because the complexity of verifying a neural network is strongly connected toits size [20], our goal is to transform a veriﬁcation query ϕ = (cid:104) N, P, Q (cid:105) intoquery ϕ = (cid:104) ¯ N , P, Q (cid:105) , such that the abstract network ¯ N is signiﬁcantly smallerthan N (notice that properties P and Q remain unchanged). We will construct¯ N so that it is an over-approximation of N , meaning that if ϕ is UNSAT then ϕ is also UNSAT . More speciﬁcally, since our DNNs have a single output, we canregard N ( x ) and ¯ N ( x ) as real values for every input x . To guarantee that ϕ x y y y x x y y y t t z -111-1 11 Fig. 3.

Reducing a complex property to the y > y > y ∨ y > y , which is aproperty that involves multiple outputs and includes a disjunction. We do this (righthand side network) by adding two neurons, t and t , such that t = ReLU( y − y )and t = ReLU( y − y ). Thus, t > y > y , holds;and t > y > y , holds. Finally, we add a neuron z such that z = t + t . It holds that z > t > ∨ t >

0. Thus, wehave reduced the complex property into an equivalent property in the desired form. over-approximates ϕ , we will make sure that for every x , N ( x ) ≤ ¯ N ( x ); andthus, ¯ N ( x ) ≤ c = ⇒ N ( x ) ≤ c . Because our output properties always have theform N ( x ) > c , it is indeed the case that if ϕ is UNSAT , i.e. ¯ N ( x ) ≤ c for all x ,then N ( x ) ≤ c for all x and so ϕ is also UNSAT . We now propose a frameworkfor generating various ¯ N s with this property. We seek to deﬁne an abstraction operator that removes a single neuron from thenetwork, by merging it with another neuron. To do this, we will ﬁrst transform N into an equivalent network, whose neurons have properties that will facilitatetheir merging. Equivalent here means that for every input vector, both networksproduce the exact same output. First, each hidden neuron v i,j of our transformednetwork will be classiﬁed as either a pos neuron or a neg neuron. A neuron is pos if all the weights on its outgoing edges are positive, and is neg if all thoseweights are negative. Second, orthogonally to the pos / neg classiﬁcation, eachhidden neuron will also be classiﬁed as either an inc neuron or a dec neuron. v i,j is an inc neuron of N if, when we look at N [ i ] (where v i,j is an input neuron),increasing the value of v i,j increases the value of the network’s output. Formally, v i,j is inc if for every two input vectors x and x where x [ k ] = x [ k ] for k (cid:54) = j and x [ j ] > x [ j ], it holds that N [ i ] ( x ) > N [ i ] ( x ). A dec neuron is deﬁnedsymmetrically, so that decreasing the value of x [ j ] increases the output. We ﬁrstdescribe this transformation (an illustration of which appears in Fig. 4), andlater we explain how it ﬁts into our abstraction framework.Our ﬁrst step is to transform N into a new network, N (cid:48) , in which every hiddenneuron is classiﬁed as pos or neg . This transformation is done by replacing eachhidden neuron v i j with two neurons, v + i,j and v − i,j , which are respectively pos and neg . Both v + i,j an v − i,j retain a copy of all incoming edges of the original v i,j ; however, v + i,j retains just the outgoing edges with positive weights, and v − i,j etains just those with negative weights. Outgoing edges with negative weightsare removed from v + i,j by setting their weights to 0, and the same is done foroutgoing edges with positive weights for v − i,j . Formally, for every neuron v i − ,p , w (cid:48) ( v i − ,p , v + i,j ) = w ( v i − ,p , v i,j ) , w (cid:48) ( v i − ,p , v − i,j ) = w ( v i − ,p , v i,j )where w (cid:48) represents the weights in the new network N (cid:48) . Also, for every neuron v i +1 ,q w (cid:48) ( v + i,j , v i +1 ,q ) = (cid:40) w ( v i,j , v i +1 ,q ) w ( v i,j , v i +1 ,q ) ≥

00 otherwiseand w (cid:48) ( v − i,j , v i +1 ,q ) = (cid:40) w ( v i,j , v i +1 ,q ) w ( v i,j , v i +1 ,q ) ≤

00 otherwise(see Fig. 4). This operation is performed once for every hidden neuron of N ,resulting in a network N (cid:48) that is roughly double the size of N . Observe that N (cid:48) is indeed equivalent to N , i.e. their outputs are always identical. x x v , v , v , v + ,I , v − ,D , v + ,I , y x x v +1 , v − , v , v , v + ,I , v − ,D , v + ,I , y x x v + ,I , v + ,D , v − ,D , v , v , v + ,I , v − ,D , v + ,I , y Fig. 4.

Classifying neurons as pos / neg and inc / dec . In the initial network (left), theneurons of the second hidden layer are already classiﬁed: + and − superscripts indicate pos and neg neurons, respectively; the I superscript and green background indicate inc , and the D superscript and red background indicate dec . Classifying node v , is done by ﬁrst splitting it into two nodes v +1 , and v − , (middle). Both nodes haveidentical incoming edges, but the outgoing edges of v , are partitioned between them,according to the sign of each edge’s weight. In the last network (right), v +1 , is split oncemore, into an inc node with outgoing edges only to other inc nodes, and a dec nodewith outgoing edges only to other dec nodes. Node v , is thus transformed into threenodes, each of which can ﬁnally be classiﬁed as inc or dec . Notice that in the worstcase, each node is split into four nodes, although for v , three nodes were enough. Our second step is to alter N (cid:48) further, into a new network N (cid:48)(cid:48) , where everyhidden neuron is either inc or dec (in addition to already being pos or neg ).Generating N (cid:48)(cid:48) from N (cid:48) is performed by traversing the layers of N (cid:48) backwards,each time handling a single layer and possibly doubling its number of neurons: Initial step: the output layer has a single neuron, y . This neuron is an inc node, because increasing its value will increase the network’s output value. – Iterative step: observe layer i , and suppose the nodes of layer i + 1 havealready been partitioned into inc and dec nodes. Observe a neuron v + i,j inlayer i which is marked pos (the case for neg is symmetrical). We replace v + i,j with two neurons v + ,Ii,j and v + ,Di,j , which are inc and dec , respectively.Both new neurons retain a copy of all incoming edges of v + i,j ; however, v + ,Ii,j retains only outgoing edges that lead to inc nodes, and v + ,Di,j retains onlyoutgoing edges that lead to dec nodes. Thus, for every v i − ,p and v i +1 ,q , w (cid:48)(cid:48) ( v i − ,p , v + ,Ii,j ) = w (cid:48) ( v i − ,p , v + i,j ) , w (cid:48)(cid:48) ( v i − ,p , v + ,Di,j ) = w (cid:48) ( v i − ,p , v + i,j ) w (cid:48)(cid:48) ( v + ,Ii,j , v i +1 ,q ) = (cid:40) w (cid:48) ( v + i,j , v i +1 ,q ) if v i +1 ,q is inc w (cid:48)(cid:48) ( v + ,Di,j , v i +1 ,q ) = (cid:40) w (cid:48) ( v + i,j , v i +1 ,q ) if v i +1 ,q is dec w (cid:48)(cid:48) represents the weights in the new network N (cid:48)(cid:48) . We perform thisstep for each neuron in layer i , resulting in neurons that are each classiﬁedas either inc or dec .To understand the intuition behind this classiﬁcation, recall that by our assump-tion all hidden nodes use the ReLU activation function, which is monotonicallyincreasing. Because v + i,j is pos , all its outgoing edges have positive weights, andso if its assignment was to increase (decrease), the assignments of all nodes towhich it is connected in the following layer would also increase (decrease). Thus,we split v + i,j in two, and make sure one copy, v + ,Ii,j , is only connected to nodes thatneed to increase ( inc nodes), and that the other copy, v + ,Di,j , is only connectedto nodes that need to decrease ( dec nodes). This ensures that v + ,Ii,j is itself inc ,and that v + ,Di,j is dec . Also, both v + ,Ii,j and v + ,Di,j remain pos nodes, because theiroutgoing edges all have positive weights.When this procedure terminates, N (cid:48)(cid:48) is equivalent to N (cid:48) , and so also to N ;and N (cid:48)(cid:48) is roughly double the size of N (cid:48) , and roughly four times the size of N . Both transformation steps are only performed for hidden neurons, whereasthe input and output neurons remain unchanged. This is summarized by thefollowing lemma: Lemma 1.

Any DNN N can be transformed into an equivalent network N (cid:48)(cid:48) where each hidden neuron is pos or neg , and also inc or dec , by increasing itsnumber of neurons by a factor of at most . Using Lemma 1, we can assume without loss of generality that the DNNnodes in our input query ϕ are each marked as pos / neg and as inc / dec . Weare now ready to construct the over-approximation network ¯ N . We do this byspecifying an abstract operator that merges a pair of neurons in the networkthus reducing network size by one), and can be applied multiple times. The onlyrestrictions are that the two neurons being merged need to be from the samehidden layer, and must share the same pos / neg and inc / dec attributes. Conse-quently, applying abstract to saturation will result in a network with at most4 neurons in each hidden layer, which over-approximates the original network.This, of course, would be an immense reduction in the number of neurons formost reasonable input networks.The abstract operator’s behavior depends on the attributes of the neuronsbeing merged. For simplicity, we will focus on the (cid:104) pos , inc (cid:105) case. Let v i,j , v i,k be two hidden neurons of layer i , both classiﬁed as (cid:104) pos , inc (cid:105) . Because layer i is hidden, we know that layers i + 1 and i − v i − ,p and v i +1 ,q denote arbitrary neurons in the preceding and succeeding layer, respectively. Weconstruct a network ¯ N that is identical to N , except that: (i) nodes v i,j and v i,k are removed and replaced with a new single node, v i,t ; and (ii) all edges thattouched nodes v i,j or v i,k are removed, and other edges are untouched. Finally,we add new incoming and outgoing edges for the new node v i,t as follows: – Incoming edges: ¯ w ( v i − ,p , v i,t ) = max { w ( v i − ,p , v i,j ) , w ( v i − ,p , v i,k ) } – Outgoing edges: ¯ w ( v i,t , v i +1 ,q ) = w ( v i,j , v i +1 ,q ) + w ( v i,k , v i +1 ,q )where ¯ w represents the weights in the new network ¯ N . An illustrative exampleappears in Fig. 5. Intuitively, this deﬁnition of abstract seeks to ensure thatthe new node v i,t always contributes more to the network’s output than the twooriginal nodes v i,j and v i,k — so that the new network produces a larger outputthan the original for every input. By the way we deﬁned the incoming edges ofthe new neuron v i,t , we are guaranteed that for every input x passed into both N and ¯ N , the value assigned to v i,t in ¯ N is greater than the values assigned toboth v i,j and v i,k in the original network. This works to our advantage, because v i,j and v i,k were both inc — so increasing their values increases the outputvalue. By our deﬁnition of the outgoing edges, the values of any inc nodes inlayer i + 1 increase in ¯ N compared to N , and those of any dec nodes decrease.By deﬁnition, this means that the network’s overall output increases.The abstraction operation for the (cid:104) neg , inc (cid:105) case is identical to the one de-scribed above. For the remaining two cases, i.e. (cid:104) pos , dec (cid:105) and (cid:104) neg , dec (cid:105) , themax operator in the deﬁnition is replaced with a min operator.The next lemma (proof omitted due to lack of space) justiﬁes the use of ourabstraction step, and can be applied once per each application of abstract : Lemma 2.

Let ¯ N be derived from N by a single application of abstract . Forevery x , it holds that ¯ N ( x ) ≥ N ( x ) . The aforementioned abstract operator reduces network size by merging neu-rons, but at the cost of accuracy: whereas for some input x the original networkreturns N ( x ) = 3, the over-approximation network ¯ N created by abstract x v v v y y = 5 R ( x − x ) +3 R (4 x − x ) + 4 R (2 x − x ) x x v v v y y = 8 R (4 x − x ) + 4 R (2 x − x ) x x v v v y y = 12 R (4 x − x ) Fig. 5.

Using abstract to merge (cid:104) pos , inc (cid:105) nodes. Initially (left), the three nodes v , v and v are separate. Next (middle), abstract merges v and v into a single node. Forthe edge between x and the new abstract node we pick the weight 4, which is themaximal weight among edges from x to v and v . Likewise, the edge between x andthe abstract node has weight −

1. The outgoing edge from the abstract node to y hasweight 8, which is the sum of the weights of edges from v and v to y . Next, abstract is applied again to merge v with the abstract node, and the weights are adjustedaccordingly (right). With every abstraction, the value of y (given as a formula at thebottom of each DNN, where R represents the ReLU operator) increases. For example,to see that 12 R (4 x − x ) ≥ R (4 x − x ) + 4 R (2 x − x ), it is enough to see that4 R (4 x − x ) ≥ R (2 x − x ), which holds because ReLU is a monotonically increasingfunction and x and x are non-negative (being, themselves, the output of ReLU nodes). might return ¯ N ( x ) = 5. If our goal is prove that it is never the case that N ( x ) >

10, this over-approximation may be adequate: we can prove that always¯ N ( x ) ≤

10, and this will be enough. However, if our goal is to prove that it isnever the case that N ( x ) >

4, the over-approximation is inadequate: it is possi-ble that the property holds for N , but because ¯ N ( x ) = 5 >

4, our veriﬁcationprocedure will return x as a spurious counterexample (a counterexample for¯ N that is not a counterexample for N ). In order to handle this situation, wedeﬁne a reﬁnement operator , refine , that is the inverse of abstract : it trans-forms ¯ N into yet another over-approximation, ¯ N (cid:48) , with the property that forevery x , N ( x ) ≤ ¯ N (cid:48) ( x ) ≤ ¯ N ( x ). If ¯ N (cid:48) ( x ) = 3 .

5, it might be a suitable over-approximation for showing that never N ( x ) >

4. In this section we deﬁne the refine operator, and in Section 4 we explain how to use abstract and refine as part of a CEGAR-based veriﬁcation scheme.Recall that abstract merges together a couple of neurons that share thesame attributes. After a series of applications of abstract , each hidden layer i of the resulting network can be regarded as a partitioning of hidden layer i of theoriginal network, where each partition contains original, concrete neurons thatshare the same attributes. In the abstract network, each partition is representedby a single, abstract neuron. The weights on the incoming and outgoing edges ofthis abstract neuron are determined according to the deﬁnition of the abstract perator. For example, in the case of an abstract neuron ¯ v that represents a setof concrete neurons { v , . . . , v n } all with attributes (cid:104) pos , inc (cid:105) , the weight of eachincoming edge to ¯ v is given by¯ w ( u, v ) = max( w ( u, v ) , . . . , w ( u, v n ))where u represents a neuron that has not been abstracted yet, and w is theweight function of the original network. The key point here is that the order of abstract operations that merged v , . . . , v n does not matter — but rather, onlythe fact that they are now grouped together determines the abstract network’sweights. The following corollary, which is a direct result of Lemma 2, establishesthis connection between sequences of abstract applications and partitions: Corollary 1.

Let N be a DNN where each hidden neuron is labeled as pos / neg and inc / dec , and let P be a partitioning of the hidden neurons of N , that onlygroups together hidden neurons from the same layer that share the same labels.Then N and P give rise to an abstract neural network ¯ N , which is obtained byperforming a series of abstract operations that group together neurons accordingto the partitions of P . This ¯ N is an over-approximation of N . We now deﬁne a refine operation that is, in a sense, the inverse of abstract . refine takes as input a DNN ¯ N that was generated from N via a sequence of abstract operations, and splits a neuron from ¯ N in two. Formally, the operatorreceives the original network N , the partitioning P , and a ﬁner partition P (cid:48) thatis obtained from P by splitting a single class in two. The operator then returnsa new abstract network, ¯ N (cid:48) , that is the abstraction of N according to P (cid:48) .Due to Corollary 1, and because ¯ N returned by refine corresponds to apartition P (cid:48) of the hidden neurons of N , it is straightforward to show that ¯ N isindeed an over-approximation of N . The other useful property that we requireis the following: Lemma 3.

Let ¯ N be an abstraction of N , and let ¯ N (cid:48) be a network obtainedfrom ¯ N by applying a single refine step. Then for every input x it holds that ¯ N ( x ) ≥ ¯ N (cid:48) ( x ) ≥ N ( x ) . The second part of the inequality, ¯ N (cid:48) ( x ) ≥ N ( x ) holds because ¯ N (cid:48) is anover-approximation of N (Corollary 1). The ﬁrst part of the inequality, ¯ N ( x ) ≥ ¯ N (cid:48) ( x ), follows from the fact that ¯ N ( x ) can be obtained from ¯ N (cid:48) ( x ) by a singleapplication of abstract .In practice, in order to support the reﬁnement of an abstract DNN, we main-tain the current partitioning, i.e. the mapping from concrete neurons to theabstract neurons that represent them. Then, when an abstract neuron is se-lected for reﬁnement (according to some heuristic, such as the one we proposein Section 4), we adjust the mapping and use it to compute the weights of theedges that touch the aﬀected neuron. A CEGAR-Based Approach

In Section 3 we deﬁned the abstract operator that reduces network size atthe cost of reducing network accuracy, and its inverse refine operator thatincreases network size and restores accuracy. Together with a black-box veriﬁ-cation procedure

Verify that can dispatch queries of the form ϕ = (cid:104) N, P, Q (cid:105) ,these components now allow us to design an abstraction-reﬁnement algorithmfor DNN veriﬁcation, given as Alg. 1 (we assume that all hidden neurons in theinput network have already been marked pos / neg and inc / dec ). Algorithm 1

Abstraction-based DNN Veriﬁcation(

N, P, Q )

1: Use abstract to generate an initial over-approximation ¯ N of N if Verify ( ¯

N, P, Q ) is

UNSAT then

3: return

UNSAT else

5: Extract counterexample c if c is a counterexample for N then

7: return

SAT else

9: Use refine to reﬁne ¯ N into ¯ N (cid:48)

10: ¯ N ← ¯ N (cid:48)

11: Goto step 212: end if end if

Because ¯ N is obtained via applications of abstract and refine , the sound-ness of the underlying Verify procedure, together with Lemmas 2 and 3, guaran-tees the soundness of Alg. 1. Further, the algorithm always terminates: this is thecase because all the abstract steps are performed ﬁrst, followed by a sequenceof refine steps. Because no additional abstract operations are performed be-yond Step 1, after ﬁnitely many refine steps ¯ N will become identical to N , atwhich point no spurious counterexample will be found, and the algorithm willterminate with either SAT or UNSAT . Of course, termination is only guaranteedwhen the underlying

Verify procedure is guaranteed to terminate.There are two steps in the algorithm that we intentionally left ambiguous:Step 1, where the initial over-approximation is computed, and Step 9, where thecurrent abstraction is reﬁned due to the discovery of a spurious counterexample.The motivation was to make Alg. 1 general, and allow it to be customized byplugging in diﬀerent heuristics for performing Steps 1 and 9, which may dependon the problem at hand. Below we propose a few such heuristics.

The most na¨ıve way to generate the initial abstraction is to apply the abstract operator to saturation. As previously discussed, abstract can merge togetherny pair of hidden neurons from a given layer that share the same attributes.Since there are four possible attribute combinations, this will result in eachhidden layer of the network having four neurons or fewer. This method, whichwe refer to as abstraction to saturation , produces the smallest abstract networkspossible. The downside is that, in some case, these networks might be too coarse,and might require multiple rounds of reﬁnement before a

SAT or UNSAT answercan be reached.A diﬀerent heuristic for producing abstractions that may require fewer re-ﬁnement steps is as follows. First, we select a ﬁnite set of input points, X = { x , . . . , x n } , all of which satisfy the input property P . These points can be gen-erated randomly, or according to some coverage criterion of the input space. Thepoints of X are then used as indicators in estimating when the abstraction hasbecome too coarse: after every abstraction step, we check whether the propertystill holds for x , . . . , x n , and stop abstracting if this is not the case. The exacttechnique, which we refer to as indicator-guided abstraction , appears in Alg. 2,which is used to perform Step 1 of Alg. 1. Algorithm 2

Indicator-Guided Abstraction(

N, P, Q, X )

1: ¯ N ← N while ∀ x ∈ X. ¯ N ( x ) satisﬁes Q and there are still neurons that can be merged do ∆ ← ∞ , bestPair ← ⊥ for every pair of hidden neurons v i,j , v i,k with identical attributes do

5: m ← for every node v i − ,p do

7: a ← ¯ w ( v i − ,p , v i,j ), b ← ¯ w ( v i − ,p , v i,k )8: if | a − b | > m then

9: m ← | a − b | end if end for if m < ∆ then ∆ ← m, bestPair ← (cid:104) v i,j , v i,k (cid:105) end if end for

16: Use abstract to merge the nodes of bestPair, store the result in ¯ N end while return ¯ N Another point that is addressed by Alg. 2, besides how many rounds of ab-straction should be performed, is which pair of neurons should be merged inevery application of abstract . This, too, is determined heuristically. Since anypair of neurons that we pick will result in the same reduction in network size, ourstrategy is to prefer neurons that will result in a more accurate approximation.Inaccuracies are caused by the max and min operators within the abstract operator: e.g., in the case of max , every pair of incoming edges with weights a, b are replaced by a single edge with weight max ( a, b ). Our strategy here is toerge the pair of neurons for which the maximal value of | a − b | (over all incom-ing edges with weights a and b ) is minimal . Intuitively, this leads to max ( a, b )being close to both a and b — which, in turn, leads to an over-approximationnetwork that is smaller than the original, but is close to it weight-wise. We pointout that although repeatedly exploring all pairs (line 4) may appear costly, inour experiments the time cost of this step was negligible compared to that ofthe veriﬁcation queries that followed. Still, if this step happens to become a bot-tleneck, it is possible to adjust the algorithm to heuristically sample just someof the pairs, and pick the best pair among those considered — without harmingthe algorithm’s soundness.As a small example, consider the network depicted on the left hand sideof Fig. 5. This network has three pairs of neurons that can be merged using abstract (any subset of { v , v , v } ). Consider the pair v , v : the maximal valueof | a − b | for these neurons is max ( | − | , | ( − − ( − | ) = 3. For pair v , v ,the maximal value is 1; and for pair v , v the maximal value is 2. According tothe strategy described in Alg. 2, we would ﬁrst choose to apply abstract on thepair with the minimal maximal value, i.e. on the pair v , v . A reﬁnement step is performed when a spurious counterexample x has beenfound, indicating that the abstract network is too coarse. In other words, ourabstraction steps, and speciﬁcally the max and min operators that were usedto select edge weights for the abstract neurons, have resulted in the abstractnetwork’s output being too great for input x , and we now need to reduce it.Thus, our reﬁnement strategies are aimed at applying refine in a way thatwill result in a signiﬁcant reduction to the abstract network’s output. We notethat there may be multiple options for applying refine , on diﬀerent nodes, suchthat any of them would remove the spurious counterexample x from the abstractnetwork. In addition, it is not guaranteed that it is possible to remove x witha single application of refine , and multiple consecutive applications may berequired.One heuristic approach for reﬁnement follows the well-studied notion ofcounterexample-guided abstraction reﬁnement [6]. Speciﬁcally, we leverage thespurious counterexample x in order to identify a concrete neuron v , which iscurrently mapped into an abstract neuron ¯ v , such that splitting v away from ¯ v might rule out counterexample x . To do this, we evaluate the original networkon x and compute the value of v (we denote this value by v ( x )), and then dothe same for ¯ v in the abstract network (value denoted ¯ v ( x )). Intuitively, a neu-ron pair (cid:104) v, ¯ v (cid:105) for which the diﬀerence | v ( x ) − ¯ v ( x ) | is signiﬁcant makes a goodcandidate for a reﬁnement operation that will split v away from ¯ v .In addition to considering v ( x ) and ¯ v ( x ), we propose to also consider theweights of the incoming edges of v and ¯ v . When these weights diﬀer signiﬁcantly,this could indicate that ¯ v is too coarse an approximation for v , and should bereﬁned. We argue that by combining these two criteria — edge weight diﬀerencebetween v and ¯ v , which is a property of the current abstraction, together withhe diﬀerence between v ( x ) and ¯ v ( x ), which is a property of the speciﬁc input x ,we can identify abstract neurons that have contributed signiﬁcantly to x beinga spurious counterexample.The reﬁnement heuristic is formally deﬁned in Alg. 3. The algorithm traversesthe original neurons, looks for the edge weight times assignment value that haschanged the most as a result of the current abstraction, and then performsreﬁnement on the neuron at the end of that edge. As was the case with Alg. 2, ifconsidering all possible nodes turns out to be too costly, it is possible to adjustthe algorithm to explore only some of the nodes, and pick the best one amongthose considered — without jeopardizing the algorithm’s soundness. Algorithm 3

Counterexample-Guided Reﬁnement( N, ¯ N , x )

1: bestNeuron ← ⊥ , m ← for each concrete neuron v i,j of N mapped into abstract neuron ¯ v i,j (cid:48) of ¯ N do for each concrete neuron v i − ,k of N mapped into abstract neuron ¯ v i − ,k (cid:48) of ¯ N do if | w ( v i − ,k , v i,j ) − ¯ w (¯ v i − ,k (cid:48) , ¯ v i,j (cid:48) ) | · | v i,j ( x ) − ¯ v i,j (cid:48) ( x ) | > m then m ← | w ( v i − ,k , v i,j ) − ¯ w (¯ v i − ,k (cid:48) , ¯ v i,j (cid:48) ) | · | v i,j ( x ) − ¯ v i,j (cid:48) ( x ) |

6: bestNeuron ← v i,j end if end for end for

10: Use refine to split bestNeuron from its abstract neuron

As an example, let us use Alg. 3 to choose a reﬁnement step for the right handside network of Fig. 5, for a spurious counterexample (cid:104) x , x (cid:105) = (cid:104) , (cid:105) . For thisinput, the original neurons’ evaluation is v = 1 , v = 4 and v = 2, whereas theabstract neuron that represents them evaluates to 4. Suppose v is consideredﬁrst. In the abstract network, ¯ w ( x , ¯ v ) = 4 and ¯ w ( x , ¯ v ) = −

1; whereas in theoriginal network, w ( x , v ) = 1 and w ( x , v ) = −

2. Thus, the largest value m computed for v is | w ( x , v ) − ¯ w ( x , ¯ v ) | · | − | = 3 · m is larger than the one computed for v (0) and for v (4), and so v is selectedfor the reﬁnement step. After this step is performed, v and v are still mappedto a single abstract neuron, whereas v is mapped to a separate neuron in theabstract network. Our implementation of the abstraction-reﬁnement framework includes modulesthat read a DNN in the NNet format [19] and a property to be veriﬁed, createan initial abstract DNN as described in Section 4, invoke a black-box veriﬁcationengine, and perform reﬁnement as described in Section 4. The process terminateswhen the underlying engine returns either

UNSAT , or an assignment that is atrue counterexample for the original network. For experimentation purposes, wentegrated our framework with the Marabou DNN veriﬁcation engine [22]. Ourimplementation and benchmarks are publicly available online [9].

Ownship v own Intruder v int ρ ψθ Fig. 6. (From [20]) An illustration of thesensor readings passed as input to theACAS Xu DNNs.

Our experiments included verify-ing several properties of the 45 ACASXu DNNs for airborne collision avoid-ance [19,20]. ACAS Xu is a system de-signed to produce horizontal turning ad-visories for an unmanned aircraft (the ownship ), with the purpose of prevent-ing a collision with another nearby air-craft (the intruder ). The ACAS Xu sys-tem receive as input sensor readings, in-dicating the location of the intruder rel-ative to the ownship, the speeds of thetwo aircraft, and their directions (seeFig. 6). Based on these readings, it selects one of 45 DNNs, to which the read-ings are then passed as input. The selected DNN then assigns scores to ﬁveoutput neurons, each representing a possible turning advisory: strong left, weakleft, strong right, weak right, or clear-of-conﬂict (the latter indicating that it issafe to continue along the current trajectory). The neuron with the lowest scorerepresents the selected advisory. We veriﬁed several properties of these DNNsbased on the list of properties that appeared in [20] — speciﬁcally focusing onproperties that ensure that the DNNs always advise clear-of-conﬂict for distantintruders, and that they are robust to (i.e., do not change their advisories in thepresence of) small input perturbations.Each of the ACAS Xu DNNs has 300 hidden nodes spread across 6 hiddenlayers, leading to 1200 neurons when the transformation from Section 3.1 isapplied. In our experiments we set out to check whether the abstraction-basedapproach could indeed prove properties of the ACAS Xu networks on abstractnetworks that had signiﬁcantly fewer neurons than the original ones. In addition,we wished to compare the proposed approaches for generating initial abstractions(the abstraction to saturation approach versus the indicator-guided abstractiondescribed in Alg. 2), in order to identify an optimal conﬁguration for our tool.Finally, once the optimal conﬁguration has been identiﬁed, we used it to compareour tool’s performance to that of vanilla Marabou. The results are described next.Fig. 7 depicts a comparison of the two approaches for generating initial ab-stractions: the abstraction to saturation scheme (x axis), and the indicator-guided abstraction scheme described in Alg. 2 (y axis). Each experiment in-cluded running our tool twice on the same benchmark (network and property),with an identical conﬁguration except for the initial abstraction being used. Theplot depicts the total time (log-scale, in seconds, with a 20-hour timeout) spentby Marabou solving veriﬁcation queries as part of the abstraction-reﬁnementprocedure. It shows that, in contrast to our intuition, abstraction to saturationalmost always outperforms the indicator-guided approach. This is perhaps dueto the fact that, although it might entail additional rounds of reﬁnement, the ab-traction to saturation approach tends to produce coarse veriﬁcation queries thatare easily solved by Marabou, resulting in an overall improved performance. Wethus conclude that, at least in the ACAS Xu case, the abstraction to saturationapproach is superior to that of indicator-guided abstraction.This experiment also conﬁrms that properties can indeed be proved on ab-stract networks that are signiﬁcantly smaller than the original — i.e., despitethe initial 4x increase in network size due to the preprocessing phase, the ﬁnalabstract network on which our abstraction-enhanced approach could solve thequery was usually substantially smaller than the original network. Speciﬁcally,among the abstraction to saturation experiments that terminated, the ﬁnal net-work on which the property was shown to be

SAT or UNSAT had an average sizeof 268.8 nodes, compared to the original 310 — a 13% reduction. Because DNNveriﬁcation becomes exponentially more diﬃcult as the network size increases,this reduction is highly beneﬁcial.

Fig. 7.

Generating initial abstractions using abstraction to saturation and indicator-guided abstraction.

Next, we compared our abstraction-enhanced Marabou (in abstraction tosaturation mode) to the vanilla version. The plot in Fig. 8 compares the totalquery solving time of vanilla Marabou (y axis) to that of our approach (x axis).We ran the tools on 90 ACAS Xu benchmarks (2 properties, checked on eachof the 45 networks), with a 20-hour timeout. We observe that the abstraction-enhanced version signiﬁcantly outperforms vanilla Marabou on average — oftensolving queries orders-of-magnitude more quickly, and timing out on fewer bench-marks. Speciﬁcally, the abstraction-enhanced version solved 58 instances, versus5 solved by Marabou. Further, over the instances solved by both tools, theabstraction-enhanced version had a total query median runtime of 1045 seconds,versus 63671 seconds for Marabou. Interestingly, the average size of the abstractnetworks for which our tool was able to solve the query was 385 nodes — whichis an increase compared to the original 310 nodes. However, the improved run-times demonstrate that although these networks were slightly larger, they werestill much easier to verify, presumably because many of the network’s originalneurons remained abstracted away.

Fig. 8.

Comparing the run time (in seconds, logscale) of vanilla Marabou and theabstraction-enhanced version on the ACAS Xu benchmarks.

Finally, we used our abstraction-enhanced Marabou to verify adversarial ro-bustness properties [35]. Intuitively, an adversarial robustness property statesthat slight input perturbations cannot cause sudden spikes in the network’s out-put. This is desirable because such sudden spikes can lead to misclassiﬁcation ofinputs. Unlike the ACAS Xu domain-speciﬁc properties [20], whose formulationrequired input from human experts, adversarial robustness is a universal prop-erty , desirable for every DNN. Consequently it is easier to formulate, and hasreceived much attention (e.g., [2,10,20,36]).In order to formulate adversarial robustness properties for the ACAS Xunetworks, we randomly sampled the ACAS Xu DNNs to identify input pointswhere the selected output advisory, indicated by an output neuron y i , receiveda much lower score than the second-best advisory, y j (recall that the advisorywith the lowest score is selected). For such an input point x , we then posed theveriﬁcation query: does there exist a point x that is close to x , but for which y j eceives a lower score than y i ? Or, more formally: ( (cid:107) x − x (cid:107) L ∞ ≤ δ ) ∧ ( y j ≤ y i ) . If this query is

SAT then there exists an input x whose distance to x is at most δ , but for which the network assigns a better (lower) score to advisory y j thanto y i . However, if this query is UNSAT , no such point x exists. Because we selectpoint x such that y i is initially much smaller than y j , we expect the query tobe UNSAT for small values of δ .For each of the 45 ACAS Xu networks, we created robustness queries for 20distinct input points — producing a total of 900 veriﬁcation queries (we arbi-trarily set δ = 0 . Fig. 9.

Comparing the run time (seconds, logscale) of vanilla Marabou and theabstraction-enhanced version on the ACAS Xu adversarial robustness properties.

Related Work

In recent years, multiple schemes have been proposed for the veriﬁcation of neu-ral networks. These include SMT-based approaches, such as Marabou [22,23],Reluplex [20], DLV [17] and others; approaches based on formulating the prob-lem as a mixed integer linear programming instance (e.g., [4,7,8,36]); approachesthat use sophisticated symbolic interval propagation [37], or abstract interpre-tation [10]; and others (e.g., [1,18,25,27,30,38,39]). These approaches have beenapplied in a variety of tasks, such as measuring adversarial robustness [2,17],neural network simpliﬁcation [11], neural network modiﬁcation [12], and manyothers (e.g., [23,34]). Our approach can be integrated with any sound and com-plete solver as its engine, and then applied towards any of the aforementionedtasks. Incomplete solvers could also be used and might aﬀord better performance,but this could result in our approach also becoming incomplete.Some existing DNN veriﬁcation techniques incorporate abstraction elements.In [31], the authors use abstraction to over-approximate the Sigmoid activationfunction with a collection of rectangles. If the abstract veriﬁcation query theyproduce is

UNSAT , then so is the original. When a spurious counterexample isfound, an arbitrary reﬁnement step is performed. The authors report limitedscalability, tackling only networks with a few dozen neurons. Abstraction tech-niques also appear in the AI2 approach [10], but there it is the input propertyand reachable regions that are over-approximated, as opposed to the DNN it-self. Combining this kind of input-focused abstraction with our network-focusedabstraction is an interesting avenue for future work.

With deep neural networks becoming widespread and with their forthcomingintegration into safety-critical systems, there is an urgent need for scalable tech-niques to verify and reason about them. However, the size of these networksposes a serious challenge. Abstraction-based techniques can mitigate this diﬃ-culty, by replacing networks with smaller versions thereof to be veriﬁed, withoutcompromising the soundness of the veriﬁcation procedure. The abstraction-basedapproach we have proposed here can provide a signiﬁcant reduction in networksize, thus boosting the performance of existing veriﬁcation technology.In the future, we plan to continue this work along several axes. First, weintend to investigate reﬁnement heuristics that can split an abstract neuroninto two arbitrary sized neurons. In addition, we will investigate abstractionschemes for networks that use additional activation functions, beyond ReLUs.Finally, we plan to make our abstraction scheme parallelizable, allowing users touse multiple worker nodes to explore diﬀerent combinations of abstraction andreﬁnement steps, hopefully leading to faster convergence.

Acknowledgements.

We thank the anonymous reviewers for their insightfulcomments. This project was partially supported by grants from the BinationalScience Foundation (2017662) and the Israel Science Foundation (683/18). eferences

1. G. Anderson, S. Pailoor, I. Dillig, and S. Chaudhuri. Optimization and Abstrac-tion: a Synergistic Approach for Analyzing Neural Network Robustness. In

Proc.40th ACM SIGPLAN Conf. on Programming Language Design and Implementa-tion (PLDI) , pages 731–744, 2019.2. O. Bastani, Y. Ioannou, L. Lampropoulos, D. Vytiniotis, A. Nori, and A. Criminisi.Measuring Neural Net Robustness with Constraints. In

Proc. 30th Conf. on NeuralInformation Processing Systems (NIPS) , 2016.3. M. Bojarski, D. Del Testa, D. Dworakowski, B. Firner, B. Flepp, P. Goyal,L. Jackel, M. Monfort, U. Muller, J. Zhang, X. Zhang, J. Zhao, and K. Zieba.End to End Learning for Self-Driving Cars, 2016. Technical Report. http://arxiv.org/abs/1604.07316 .4. R. Bunel, I. Turkaslan, P. Torr, P. Kohli, and M. Kumar. Piecewise Linear NeuralNetwork Veriﬁcation: A Comparative Study, 2017. Technical Report. https://arxiv.org/abs/1711.00455v1 .5. N. Carlini, G. Katz, C. Barrett, and D. Dill. Provably Minimally-Distorted Adver-sarial Examples, 2017. Technical Report. https://arxiv.org/abs/1709.10207 .6. E. Clarke, O. Grumberg, S. Jha, Y. Lu, and H. Veith. Counterexample-GuidedAbstraction Reﬁnement. In

Proc. 12th Int. Conf. on Computer Aided Veriﬁcation(CAV) , pages 154–169, 2010.7. S. Dutta, S. Jha, S. Sanakaranarayanan, and A. Tiwari. Output Range Analysis forDeep Neural Networks. In

Proc. 10th NASA Formal Methods Symposium (NFM) ,pages 121–138, 2018.8. R. Ehlers. Formal Veriﬁcation of Piece-Wise Linear Feed-Forward Neural Net-works. In

Proc. 15th Int. Symp. on Automated Technology for Veriﬁcation andAnalysis (ATVA) , pages 269–286, 2017.9. Y. Y. Elboher, J. Gottschlich, and G. Katz. An Abstraction-Based Framework forNeural Network Veriﬁcation: Proof-of-Concept Implementation. https://drive.google.com/file/d/1KCh0vOgcOR2pSbGRdbtAQTmoMHAFC2Vs/view , 2020.10. T. Gehr, M. Mirman, D. Drachsler-Cohen, E. Tsankov, S. Chaudhuri, andM. Vechev. AI2: Safety and Robustness Certiﬁcation of Neural Networks withAbstract Interpretation. In

Proc. 39th IEEE Symposium on Security and Privacy(S&P) , 2018.11. S. Gokulanathan, A. Feldsher, A. Malca, C. Barrett, and G. Katz. SimplifyingNeural Networks using Formal Veriﬁcation. In

Proc. 12th NASA Formal MethodsSymposium (NFM) , 2020.12. B. Goldberger, Y. Adi, J. Keshet, and G. Katz. Minimal Modiﬁcations of DeepNeural Networks using Veriﬁcation. In

Proc. 23rd Int. Conf. on Logic for Program-ming, Artiﬁcial Intelligence and Reasoning (LPAR) , 2020.13. I. Goodfellow, Y. Bengio, and A. Courville.

Deep Learning . MIT Press, 2016.14. D. Gopinath, G. Katz, C. Pˇasˇareanu, and C. Barrett. DeepSafe: A Data-drivenApproach for Assessing Robustness of Neural Networks. In

Proc. 16th. Int. Symp.on on Automated Technology for Veriﬁcation and Analysis (ATVA) , pages 3–19,2018.15. J. Gottschlich, A. Solar-Lezama, N. Tatbul, M. Carbin, M. Rinard, R. Barzilay,S. Amarasinghe, J. Tenenbaum, and T. Mattson. The Three Pillars of MachineProgramming. In

Proc. 2nd ACM SIGPLAN Int. Workshop on Machine Learningand Programming Languages (MALP) , pages 69–80, 2018.6. G. Hinton, L. Deng, D. Yu, G. Dahl, A. Mohamed, N. Jaitly, A. Senior, V. Van-houcke, P. Nguyen, T. Sainath, and B. Kingsbury. Deep Neural Networks forAcoustic Modeling in Speech Recognition: The Shared Views of Four ResearchGroups.

IEEE Signal Processing Magazine , 29(6):82–97, 2012.17. X. Huang, M. Kwiatkowska, S. Wang, and M. Wu. Safety Veriﬁcation of DeepNeural Networks. In

Proc. 29th Int. Conf. on Computer Aided Veriﬁcation (CAV) ,pages 3–29, 2017.18. Y. Jacoby, C. Barrett, and G. Katz. Verifying Recurrent Neural Networks usingInvariant Inference, 2020. Technical Report. http://arxiv.org/abs/2004.02462 .19. K. Julian, J. Lopez, J. Brush, M. Owen, and M. Kochenderfer. Policy Compressionfor Aircraft Collision Avoidance Systems. In

Proc. 35th Digital Avionics SystemsConf. (DASC) , pages 1–10, 2016.20. G. Katz, C. Barrett, D. Dill, K. Julian, and M. Kochenderfer. Reluplex: An EﬃcientSMT Solver for Verifying Deep Neural Networks. In

Proc. 29th Int. Conf. onComputer Aided Veriﬁcation (CAV) , pages 97–117, 2017.21. G. Katz, C. Barrett, D. Dill, K. Julian, and M. Kochenderfer. Towards Provingthe Adversarial Robustness of Deep Neural Networks. In

Proc. 1st Workshop onFormal Veriﬁcation of Autonomous Vehicles (FVAV) , pages 19–26, 2017.22. G. Katz, D. Huang, D. Ibeling, K. Julian, C. Lazarus, R. Lim, P. Shah, S. Thakoor,H. Wu, A. Zelji´c, D. Dill, M. Kochenderfer, and C. Barrett. The Marabou Frame-work for Veriﬁcation and Analysis of Deep Neural Networks. In

Proc. 31st Int.Conf. on Computer Aided Veriﬁcation (CAV) , 2019.23. Y. Kazak, C. Barrett, G. Katz, and M. Schapira. Verifying Deep-RL-Driven Sys-tems. In

Proc. 1st ACM SIGCOMM Workshop on Network Meets AI & ML (Ne-tAI) , 2019.24. A. Krizhevsky, I. Sutskever, and G. Hinton. Imagenet Classiﬁcation with DeepConvolutional Neural Networks.

Advances in Neural Information Processing Sys-tems , pages 1097–1105, 2012.25. L. Kuper, G. Katz, J. Gottschlich, K. Julian, C. Barrett, and M. Kochenderfer.Toward Scalable Veriﬁcation for Safety-Critical Deep Networks, 2018. TechnicalReport. https://arxiv.org/abs/1801.05950 .26. A. Kurakin, I. Goodfellow, and S. Bengio. Adversarial Examples in the PhysicalWorld, 2016. Technical Report. http://arxiv.org/abs/1607.02533 .27. A. Lomuscio and L. Maganti. An Approach to Reachability Analysis for Feed-Forward ReLU Neural Networks, 2017. Technical Report. https://arxiv.org/abs/1706.07351 .28. H. Mao, R. Netravali, and M. Alizadeh. Neural Adaptive Video Streaming withPensieve. In

Proc. Conf. of the ACM Special Interest Group on Data Communica-tion (SIGCOMM) , pages 197–210, 2017.29. V. Nair and G. Hinton. Rectiﬁed Linear Units Improve Restricted BoltzmannMachines. In

Proc. 27th Int. Conf. on Machine Learning (ICML) , pages 807–814,2010.30. N. Narodytska, S. Kasiviswanathan, L. Ryzhyk, M. Sagiv, and T. Walsh. VerifyingProperties of Binarized Deep Neural Networks, 2017. Technical Report. http://arxiv.org/abs/1709.06662 .31. L. Pulina and A. Tacchella. An Abstraction-Reﬁnement Approach to Veriﬁca-tion of Artiﬁcial Neural Networks. In

Proc. 22nd Int. Conf. on Computer AidedVeriﬁcation (CAV) , pages 243–257, 2010.32. W. Ruan, X. Huang, and M. Kwiatkowska. Reachability Analysis of Deep NeuralNetworks with Provable Guarantees. In

Proc. 27th Int. Joing Conf. on ArtiﬁcialIntelligence (IJACI) , pages 2651–2659, 2018.3. D. Silver, A. Huang, C. Maddison, A. Guez, L. Sifre, G. Van Den Driessche,J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, and S. Dieleman.Mastering the Game of Go with Deep Neural Networks and Tree Search.

Nature ,529(7587):484–489, 2016.34. X. Sun, K. H., and Y. Shoukry. Formal Veriﬁcation of Neural Network ControlledAutonomous Systems. In

Proc. 22nd ACM Int. Conf. on Hybrid Systems: Com-putation and Control (HSCC) , 2019.35. C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, andR. Fergus. Intriguing Properties of Neural Networks, 2013. Technical Report. http://arxiv.org/abs/1312.6199 .36. V. Tjeng, K. Xiao, and R. Tedrake. Evaluating Robustness of Neural Networks withMixed Integer Programming. In

Proc. 7th Int. Conf. on Learning Representations(ICLR) , 2019.37. S. Wang, K. Pei, J. Whitehouse, J. Yang, and S. Jana. Formal Security Analysisof Neural Networks using Symbolic Intervals. In

Proc. 27th USENIX SecuritySymposium , 2018.38. H. Wu, A. Ozdemir, A. Zelji´c, A. Irfan, K. Julian, D. Gopinath, S. Fouladi, G. Katz,C. P˘as˘areanu, and C. Barrett. Parallelization Techniques for Verifying NeuralNetworks, 2020. Technical Report. https://arxiv.org/abs/2004.08440 .39. W. Xiang, H.-D. Tran, and T. Johnson. Output Reachable Set Estimation and Ver-iﬁcation for Multilayer Neural Networks.