[PDF] An Incremental Abstraction Scheme for Solving Hard SMT-Instances over Bit-Vectors

Abstract

Decision procedures for SMT problems based on the theory of bit-vectors are a fundamental component in state-of-the-art software and hardware verifiers. While very efficient in general, certain SMT instances are still challenging for state-of-the-art solvers (especially when such instances include computationally costly functions). In this work, we present an approach for the quantifier-free bit-vector theory (QF_BV in SMT-LIB) based on incremental SMT solving and abstraction refinement. We define four concrete approximation steps for the multiplication, division and remainder operators and combine them into an incremental abstraction scheme. We implement this scheme in a prototype extending the SMT solver Boolector and measure both the overall performance and the performance of the single approximation steps. The evaluation shows that our abstraction scheme contributes to solving more unsatisfiable benchmark instances, including seven instances with unknown status in SMT-LIB.

Full PDF

AAn Incremental Abstraction Scheme for SolvingHard SMT-Instances over Bit-Vectors

Samuel Teuber, Marko Kleine Büning, Carsten Sinz

Karlsruhe Institute of Technology (KIT), Germany [email protected],{marko.kleinebuening,carsten.sinz}@kit.edu

Abstract.

Decision procedures for SMT problems based on the theoryof bit-vectors are a fundamental component in state-of-the-art softwareand hardware veriﬁers. While very eﬃcient in general, certain SMT in-stances are still challenging for state-of-the-art solvers (especially whensuch instances include computationally costly functions). In this work,we present an approach for the quantiﬁer-free bit-vector theory (

QF_BV in SMT-LIB) based on incremental SMT solving and abstraction reﬁne-ment. We deﬁne four concrete approximation steps for the multiplication,division and remainder operators and combine them into an incrementalabstraction scheme. We implement this scheme in a prototype extend-ing the SMT solver Boolector and measure both the overall performanceand the performance of the single approximation steps. The evaluationshows that our abstraction scheme contributes to solving more unsat-isﬁable benchmark instances, including seven instances with unknownstatus in SMT-LIB.

Decision procedures for bit-vectors play an important role in many applicationssuch as bounded model checking, property directed reachability, test generationfor hardware circuits, or symbolic execution [10,13,23,28]. These applicationscan result in formulae of considerable size, and although many of them are stillwithin the reach of current implementations of SMT solvers, even some smallformulas remain extremely hard to solve (e.g., the modmul instances from theLLBMC Family of Benchmarks [16]). It is well known that particular operatorsin the logic of bit-vectors possess only large SAT encodings, and thus makeproblems containing them often hard to solve. This includes the operators ofmultiplication, division and remainder.A technique that is frequently employed to speed up the solving process forhard instances is abstraction [5,6,20,7,8,3,25]. Instead of the original problem,a related problem is analyzed that is supposed to be easier to solve. Abstrac-tions include under-approximations (where the abstract system allows for fewersolutions than the original one) and over-approximations (more solutions).We ﬁrst present a general scheme for replacing hard bit-vector operators bya series of over-approximations. Then, to demonstrate and analyze their appli-cability, we instantiate the scheme for the operators of multiplication, division a r X i v : . [ c s . L O ] A ug nd remainder. The abstraction sequence contains approximations of diﬀeringprecision, which are tried in turn. First, less precise approximations are appliedand, if they are not suﬃcient, reﬁned by including additional elements of theabstraction sequence. When and which reﬁnements are tried is computed in aCEGAR-like fashion [9].We have enhanced the SMT solver Boolector [26] resulting in an implemen-tation we call Ablector . For an evaluation, we have used the relevant subset ofbenchmarks from the SMT-LIB [1] benchmark release 2019-05-20. In comparisonto Boolector, Ablector is able to solve 11 unsatisﬁable instances and 7 instancewith unknown status more and yields a total of 46 uniquely solved instanceswith unsatisﬁable or unknown status in SMT-LIB. Compared to previous work,we consider the use of multiple abstractions and the development of a general,multi-level abstraction scheme coupled with a strategy for its evaluation andanalysis as the main contribution of this paper.

Related Work.

In the past, a variety of abstraction techniques have been pro-posed to speed up SMT solvers. De Moura and Rueß [24] presented an approachcalled “lemmas on demand”, in which formulas are converted to Boolean con-straints, which are then iteratively reﬁned by adding lemmas generated by thetheory solver. Brummayer and Biere [5,6] applied this technique to the theory ofbit-vectors and arrays, and combined it with under-approximations via bitwidth-reduction. Lahiri and Mehra [20] developed an algorithm combining under-and over-approximations for the theory of quantiﬁer-free Presburger arithmetic(QFP) based on interpolation. Bryant et al. [7] present an approach where formu-lae in bit-vector logic are encoded with fewer Boolean variables than their width,resulting in an under-approximation. If the under-approximation is unsatisﬁable,they compute an unsatisﬁable core to derive an over-approximation, which then,in turn, is used to reﬁne another under-approximation. In the MathSAT solver [8]an over-approximating preprocessing step is employed, which treats bit-vectoroperations as uninterpreted functions. In his PhD thesis proposal, Jonas [18]gives a summary of diﬀerent techniques used in current bit-vector SMT solvers.Finally, Brain et al. [3] developed a general framework for abstraction that gen-eralizes the CDCL algorithm for SAT solving to more expressive theories. Asimilar idea was presented around the same time by de Moura and Jovanovic[25].

We use common notation for propositional logic and many-sorted ﬁrst-orderlogic as can be reviewed in [2,22]. In particular, we deﬁne a signature as Σ = (cid:0) Σ S , Σ P , Σ F , ∫ P , ∫ F (cid:1) where Σ S are the available sorts , Σ P are the predicatesymbols , Σ F are the function symbols and ∫ P : Σ P → (cid:0) Σ S (cid:1) ∗ ( ∫ F : Σ F → (cid:0) Σ S (cid:1) + )deﬁnes the rank of a given predicate (function). We call the number of inputparameters of a predicate (function) its arity . Furthermore, we denote a Σ -interpretation as follows: 2 eﬁnition 1 ( Σ -interpretation). For a signature Σ and a set X of variableswith sorts in Σ S , a Σ -interpretation over X is a tuple I = (cid:0) U , I S , I X , I F , I P (cid:1) where: – U (cid:54) = ∅ is the universe of all possible values; – I S : Σ S → P ( U ) maps each sort σ i to a pairwise disjunct domain D i := I S ( σ i ) of possible values for Σ -terms of this sort; – I X : X → U maps each variable x ∈ X to a value v ∈ U ; – I F maps any function symbol f ∈ Σ F of rank ∫ F ( f ) = σ · · · σ n σ n +1 to afunction f I : D ×· · · × D n → D n +1 ; and – I P maps any predicate p ∈ Σ P of rank ∫ P ( p ) = σ · · · σ n to a truth function p I : D ×· · · × D n → { , } . I X must respect the sort σ i of x (i.e., x of sort σ i may only be mapped to v ∈ D i ). A Σ -interpretation I is a Σ -model for some formula φ iﬀ I satisﬁes φ (i.e., I (cid:15) φ ). Based on this, a Σ -Theory is a tuple T = ( Σ, A ) where Σ is a signatureand A is a set-theoretical class of Σ -interpretations. Furthermore, we denote For T as all formulae in ﬁrst-order logic in theory T and Term Σσ as the set ofall terms in ﬁrst-order logic with signature Σ of sort σ . Given some term t ∈ Term Σσ , we deﬁne φ [ op ( x ) (cid:55)→ t ] as the formula where the function application op ( x ) is replaced by t in φ (note that x represents the vector of all input valuesfor op ).In the SMT-LIB standard [1] for QF_BV , the functions examined in this worksupport overloading in the sense that a single function symbol like bvmul sup-ports multiple ranks. To simplify the explanations in the following sections, onecan think of bvmul r as the bvmul operation of rank r , thereby avoiding the issueof overloading. This way every function symbol has exactly one rank. Finally, in QF_BV , we denote x [ i ] as the i th bit of some bit-vector x counting from zero and x [0] is the least signiﬁcant bit. Further, we denote x [ i : j ] with i > j as the slicefrom the j th to the i th bit of said bit-vector. We present an abstraction procedure for the quantiﬁer-free bit-vector theory(

QF_BV ). Our approach substitutes applications of speciﬁc operators (for thiswork speciﬁcally bvmul , bvsdiv , bvudiv , bvsrem and bvurem ) by abstractions deﬁned on the QF_UFBV theory (adding uninterpreted functions to the theoryof bit-vector). During the solving process, the abstractions made within someinstance are being iteratively reﬁned until the SAT/SMT solver either returns unsat , or sat with correct assignments. We will present a formal deﬁnition ofour scheme starting with the approximation of some given function symbol:

Deﬁnition 2 (Approximation).

Given some theory T = ( Σ, A ) and somefunction symbol op ∈ Σ F with ∫ F ( op ) = σ · · · σ n σ and n ≥ , a T -approximationfor op consists of: P ( S ) is the powerset over S a new uninterpreted function symbol ap op with ∫ F ( op ) = ∫ F (cid:0) ap op (cid:1) ; and – a mapping A op : Term Σσ ×· · · × Term Σσ n → P ( For T ) .A T -approximation can therefore be written as a tuple (cid:0) ap op , A op (cid:1) . An approximation essentially replaces an occurrence of an existing functionsymbol op by a new one ( ap op ), and furthermore adds formulae that ensurecertain properties for the application of ap op . It may be sound or complete : Deﬁnition 3 (Sound T -approximation). Given some theory T = ( Σ, A ) , a T -approximation (cid:0) ap op , A op (cid:1) is sound iﬀ for all x ∈ Dom (cid:0) A op (cid:1) the followingproperty holds: For all T + U F -interpretations I with I (cid:15) A op ( x ) , it holds that I (cid:15) ap op ( x ) . = op ( x ) . Deﬁnition 4 (Complete T -approximation). Given some theory T = ( Σ, A ) ,a T -approximation (cid:0) ap op , A op (cid:1) is complete iﬀ for all x ∈ Dom (cid:0) A op (cid:1) the follow-ing property holds: For all T + U F -interpretations I with I (cid:15) ap op ( x ) . = op ( x ) ,it holds that I (cid:15) A op ( x ) . A sound T -approximation is an under-approximation, while a complete T -ap-proximation is an over-approximation of some function op . A set of approxima-tions can then be used to construct an abstraction scheme : Deﬁnition 5 (Abstraction Scheme).

Given some theory T = ( Σ, A ) andsome function symbol op ∈ Σ F of strictly positive arity, a T -abstraction scheme(for op ) is a ﬁnite, totally ordered set of T -approximations AS op = { (cid:0) ap op , A op (cid:1) , . . . , (cid:0) ap op , A k op (cid:1) } where: – For every i ∈ (cid:74) , k (cid:75) : (cid:0) ap op , A i op (cid:1) is a complete T -approximation of op and – (cid:0) ap op , C op (cid:1) with C op ( x ) := (cid:83) ( · , A ) ∈AS op A ( x ) is a sound T -approximation of op . While any single approximation within the abstraction scheme is only com-plete (and therefore an over-approximation), all approximations taken togethermust be sound and should thus yield a correct deﬁnition of the original function .This abstraction scheme can then be used to build a decision procedure likethe one described in Algorithm 1. In a ﬁrst step, the algorithm replaces alloperators which should be reﬁned by their abstracted uninterpreted functions.Afterwards, the instance is re-evaluated in a loop as long as the underlying SMTsolver does not return unsat and the model returned is incorrect. In each round,further approximations from the abstraction schemes are added to the instance.This process is certain to converge once all approximations of the scheme havebeen added and the formulation is thus both sound and complete. Dom is the domain of a given function. In this case

Dom ( A op ) = Term Σσ ×· · · × Term Σσ n . Just like all previous T -approximations, C op is deﬁned as C op : Term Σσ × · · · × Term Σσ n → P ( For T ) for a function symbol op of rank σ · · · σ n σ . Theorems and Proofs for the correctness of the abstraction scheme can be found inAppendix A lgorithm 1 Decision procedure for QF_BV abstractions.

ADD_CLAUSES and

SAT are calls to the underlying SMT solver.

Require: φ ∈ For QF_BV functions ← (cid:104)(cid:105) for op ∈ Σ F doif op has abstraction then (cid:46) For bvmul , division and remainder in our case for op ( x ) in φ do (cid:46) For each occurrence of op φ ← φ [ op ( x ) (cid:55)→ ap op ( x )] functions.push (( AS op , op ( x ))) ADD_CLAUSES( φ ) while true do r ← SAT() (cid:46)

Checks for satisﬁability of current instance if r = unsat thenreturn unsat (cid:46) Correct result was found else consistent ← true for ( AS , op ( x )) in functions doif op ( x ) assignment is inconsistent then (cid:46) Abstraction needs reﬁnement consistent ← false ADD_CLAUSES( AS .pop () ) (cid:46) Adds next abstraction step to instance if consistent thenreturn sat (cid:46) Correct result was found

We developed abstraction schemes for the computationally costly functions bvmul , bvsdiv , bvudiv , bvsrem and bvurem . We present the abstraction of bvmul inmore detail, while giving just a short overview of the bvsrem abstraction. Weare omitting a description of the other functions due to similarity and spacelimitations. bvmul The abstraction scheme for bvmul is divided into four stages: The ﬁrst stagedescribes the behavior of bvmul for various simple cases (like factors 0 and 1);the second stage deﬁnes intervals for the result value given the intervals of themultiplication factors; the third stage introduces relations between bvmul andother functions (speciﬁcally division and remainer) and the fourth stage ﬁnallyadds full multiplication for certain intervals of the factors.Throughout the abstraction process, we will consider bvmul a signed oper-ation, that is, we will interpret r = bvmul ( x, y ) as if x , y and r were signedintegers. Even though this seems like a restriction, we do not lose correctness ofour approach for unsigned values doing this. For example, if we assert an over-approximation (for an 8 bit multiplication) like x < s s ∧ y < s s = ⇒ r > s s ,this over-approximation also holds for unsigned values as it can be interpretedas x ≥ u u ∧ y ≥ u u = ⇒ r < u u for unsigned values. While this5ight be a surprising abstraction, it is nonetheless a correct one. Eﬀectively,the question of whether x , y and r are signed or unsigned, is an issue for theuser’s interpretation and not for the decision procedure itself. Overﬂow detection.

For many of the abstractions proposed in this chapter, itis essential to detect overﬂows of bvmul . To this end, we deﬁned a predicate noov : { , } w × { , } w → { , } for bitwidth w based on [19] which is true if anoverﬂow can happen when multiplying the two input variables. The approachworks by counting the number of leading bits (ones or zeros). Note that thispredicate also detects signed overﬂows and might not be sound . Simple cases.

For a multiplication instance bvmul ( x, y ) of factors x and y withbitwidth w , we deﬁne the following constraints: ( x . = 0) ⇒ ( ap bvmul ( x, y ) . = 0) (1) ( y . = 0) ⇒ ( ap bvmul ( x, y ) . = 0) (2) ( x . = 1) ⇒ ( ap bvmul ( x, y ) . = y ) (3) ( y . = 1) ⇒ ( ap bvmul ( x, y ) . = x ) (4) ( x . = − ⇒ ( ap bvmul ( x, y ) . = − y ) (5) ( y . = − ⇒ ( ap bvmul ( x, y ) . = − x ) (6) noov ( x, y ) ⇒ ( ¬ x [ w − ∧ ¬ y [ w − ⇒ ¬ ap bvmul ( x, y ) [ w − (7) ∧ ( x > s ∧ y [ w − ⇒ ap bvmul ( x, y ) [ w − (8) ∧ ( x [ w − ∧ y > s ⇒ ap bvmul ( x, y ) [ w − (9) ∧ ( x [ w − ∧ y [ w − ⇒ ¬ ap bvmul ( x, y ) [ w − (10)For example, equations (1) and (2) deﬁne the multiplication cases where onefactor is zero, other equations cover similar easy cases. Rules (1)–(4) have alsobeen proposed in [7], the other rules are, to the best of our knowledge, novel.Additionally, we can make statements about the result’s sign whenever wecan be certain that no overﬂow is going to happen. For the cases where no over-ﬂow happens, the sign behavior of bit-vector multiplication corresponds to thecommon sign behavior of multiplication and can be encoded as an approxima-tion as seen in (7)–(10). Finally, all cases where one of the two factors is a powerof 2 can be covered by constraints like (11) for all i ∈ (cid:74) , w − (cid:75) and for x and y symmetrically: (cid:94) j (cid:54) = i ¬ x [ j ] ∧ x [ i ] = ⇒ (cid:0) umul (cid:0) x +2 , y +2 (cid:1) . = shl (cid:0) y + , i (cid:1)(cid:1) (11) For simplicity, we are omitting some overﬂow behavior in this example. However, wedo consider overﬂow cases in the approximations deﬁned later on. While every overﬂow will be detected, it might detect more overﬂows than actuallyexist umul is the unsigned multiplication function and x +2 as well as y +2 arepositive, double bitwidth versions of x and y as detailed in the following section.Instruction shl is the left shift function.The mapping A op of this abstraction stage is then the conjunction of (1)–(11). These formulae are no static rewrite rules, but constraints provided to theunderlying solver. Completeness.

The completeness is a direct consequence of the deﬁnition givenin 4 as it can be checked that all formulae presented above (which speciﬁcallyomitted any statements about overﬂow cases) are implications of ap bvmul ( x, y ) . = bvmul ( x, y ) . Highest bit set based intervals.

Using the factors’ highest bit set , intervalsof the factors can be deﬁned, which in turn can be used to assert intervals of themultiplication’s result. In a ﬁrst step, the signed multiplication r := ap bvmul ( x, y ) is transformed into its unsigned version with doubled bitwidth: x +2 . = ite ( x [ w − , − sext ( x, w ) , sext ( x, w )) ,y +2 . = ite ( y [ w − , − sext ( y, w ) , sext ( y, w )) ,r (cid:48) . = ite ( x [ w − ⊕ y [ w − , − umul ( x +2 , y +2 ) , umul ( x +2 , y +2 )) . Instruction sext is the sign extension function. By asserting equality of themultiplication result r and r (cid:48) [ w − , it is then possible to reason about theresults of r +2 := umul ( x +2 , y +2 ) through bit shifting: If i is the highest bit set of x +2 then i ≤ x +2 < i +1 and therefore, i ∗ y +2 ≤ r +2 < i +1 ∗ y +2 . We thus deﬁnea predicate hbs ( x, i ) which is true iﬀ the highest bit set of x is i .The previously presented intuition gives rise to the following abstractionwhich must distinguish overﬂow from no-overﬂow cases. For this, we will initiallyuse a double bitwidth (i.e., ∗ w width) unsigned multiplication function. Wethen deﬁne double bit width lower( L ) and upper( U ) bounds for the result of themultiplication based on the highest bit set as explained above: L ( a, b, n ) := (cid:40) ite ( hbs ( a, , b, , n = 0 ite ( hbs ( a, n ) , shl ( b, n ) , L ( a, b, n − , elseU ( a, b, n ) := (cid:40) shl ( b, , n = 0 ite ( hbs ( a, n ) , shl ( b, n + 1) , U ( a, b, n − , else We can compare the necessary number of bits depending on the result of noov :If an overﬂow is possible, we must compare the version with ∗ w bits, otherwisethe w bit version can be used for comparison.Note that while L and U seem to be recursive functions, they can be unrolledinto consecutive ite statements when adding the bounds to the instance at hand. The highest bit set of x is i iﬀ x is of the form n − i − · · { , } i A op of this approximation step is then the assertion that r +2 must lie within the bounds given by L ( x +2 , y +2 , w − and U ( x +2 , y +2 , w − (forthe necessary bitwidth as explained above). Completeness.

Through the distinction between overﬂow and no-overﬂow casesthe various equations can be regarded as normal multiplication disregarding over-ﬂows. Therefore, it can be checked that these constraints are direct implicationsof ap bvmul ( x, y ) . = bvmul ( x, y ) . This approximation is consequently completeaccording to deﬁnition 4. Relations to other functions.

Aside from previous abstraction approaches,we can also look at relations between functions – possibly providing the solverwith more high-level information. This can be useful in cases where relationsbetween multiple function applications already lead to a contradiction.For the multiplication instruction ap bvmul ( x , y ) , with x and y the doublebitwidth ( ∗ w ) versions of x and y , we propose the following abstractions: ap bvmul ( x , y ) . = ap bvmul ( y , x ) ,x . = 0 ∨ y . = ap bvsdiv ( ap bvmul ( x , y ) , x ) ,y . = 0 ∨ x . = ap bvsdiv ( ap bvmul ( x , y ) , y ) . For every bit width w (cid:48) < ∗ w which appears in a given problem instance andits abstractions, we can further assert that ap bvmul ( x , y )[ w (cid:48) − . = ap bvmul ( x [ w (cid:48) − , y [ w (cid:48) − ,ap bvmul ( x , y )[ w (cid:48) − . = ap bvmul ( y [ w (cid:48) − , x [ w (cid:48) − and for x (cid:48) := sext (cid:18) x (cid:20)(cid:22) w (cid:48) (cid:23) : 0 (cid:21) , w (cid:48) − (cid:22) w (cid:48) (cid:23)(cid:19) ; y (cid:48) := sext (cid:18) y (cid:20)(cid:22) w (cid:48) (cid:23) : 0 (cid:21) , w (cid:48) − (cid:22) w (cid:48) (cid:23)(cid:19) we assert that ap bvmul ( x (cid:48) , y (cid:48) ) . = ap bvmul ( y (cid:48) , x (cid:48) ) ,y (cid:48) . = 0 ∨ x (cid:48) . = ap bvsdiv ( ap bvmul ( x (cid:48) , y (cid:48) ) , y (cid:48) ) ,x (cid:48) . = 0 ∨ y (cid:48) . = ap bvsdiv ( ap bvmul ( x (cid:48) , y (cid:48) ) , x (cid:48) ) . Essentially all these relations between various multiplication and division appli-cations are all based on the semantic of multiplication, division and remainderas used in SMT-LIB and C++ [17].The only challenge of this abstraction is to formulate the constraints so thatthey are complete for overﬂow cases. For this, we use an approach with doubledbitwidth ∗ w while encoding the constraints and deﬁne x (cid:48) and y (cid:48) for every w (cid:48) in a way that prevents overﬂows during multiplication.8 ompleteness. The completeness of this abstraction is a direct consequence of allassertions being well-known properties for machine multiplication and division.As we avoid all overﬂow cases through the use of doubled bitwidth, the propertieshold for any input combination.

Full multiplication.

In a last step, full multiplication on a per-interval basisis added as a constraint. We assume an SMT instance containing some multipli-cation bvmul ( x, y ) . If the instance is still satisﬁable after the previous steps, acounterexample is returned. We then look up the highest bit set i of x ’s assign-ment and assert that the multiplication is precise if bit i of x is set to : hbs ( x, i ) = ⇒ ap bvmul ( x, y ) . = bvmul ( x, y ) . Completeness and Soundness.

For a multiplication of bitwidth w the approxi-mation is complete and it even becomes sound once this assertion has been madefor all i ∈ (cid:74) , w − (cid:75) . Note, that the maximum number of necessary reﬁnementsteps is bounded by the function’s bitwidth w . bvsrem Due to its rareness in benchmarks only a single abstraction layer has been addedfor this operator. Once again with double bitwidth as explained in Section 4.1,we assert the relations between bvsrem and other functions: ap bvsrem ( x, y ) . = ap bvsrem ( x , y ) [ w − x . = ap bvmul ( ap bvsdiv ( x , y ) , y ) + ap bvsrem ( x , y ) . In the following reﬁnement step, we add the full remainder constraint: ap bvsrem ( x, y ) . = bvsrem ( x, y ) . In order to evaluate the performance of the abstraction scheme presented above,we implemented a prototype solver by enhancing Boolector [26] (version 3.2).To increase readability, we will refer to the prototype as

Ablector which standsfor Ab stracted Boo lector . Experimental setup.

As the focus of this work lay on researching which abstrac-tions are eﬀective in solving more hard instances and not so much on buildingan improved solver, the abstraction reﬁnement procedure is built as a layer ontop of Boolector. We will compare the default conﬁguration of Boolector withour prototype based upon on said default conﬁguration in order to quantify the Its rareness makes it harder to evaluate the performance of abstractions and ab-stractions are less likely to have a big impact on the overall performance. of the solving process. We do this by comparingBoolector’s SMT-LIB check-sat call against the summed up CPU clock time ofall invocations to rewritten procedures in Ablector including its own check-sat call. In particular, our time measurement for Ablector also contains all abstrac-tion reﬁnement procedures. This will produce a more realistic comparison of theabstraction’s performance. Note that we are even over-approx-imating the timeAblector takes in this comparison as we are adding the operator constructiontime during parsing, which is not considered for Boolector. These procedurecalls, however, are in most cases negligible in comparison to the time for the check-sat call. In accordance with the rules of the SMT competition 2018 [15]the timeout was set to s in all experiments . Contribution CostStep 1 Step 3 Benchmark Selection

Our work wassparked by an investigation on hardbenchmarks in the LLBMC familyof benchmarks presented at the SATCompetition 2017 [16]. To ensure thatthe abstractions do not overﬁt , we de-cided to evaluate on the larger setof

QF_BV benchmarks in the 2019-05-20 SMT-LIB Benchmark release [1]which were used for the 2019 SMT-Competition [14]. This benchmark setcontains 14382 instances of satisﬁable, 27144 instances of unsatisﬁable and 170instances of unknown status. We decided to remove all benchmarks not usingthe abstracted operators bvmul , bvsdiv , bvudiv , bvsrem or bvurem as we weremainly interested in the eﬀect of the abstractions implemented. This resulted ina subset containing 6024 instances of satisﬁable, 15849 instances of unsatisﬁableand 92 instances of unknown status. Of those instances, Boolector was unableto solve 758 satisﬁable, 152 unsatisﬁable and 52 unknown instances. Boolec-tor solves 9769 of the unsatisﬁable instances through its rewriting engine thusleaving 6080 instances to be solved by transformation to SAT clauses.Preliminary experiments showed that our abstraction scheme is mainly helpfulwhen solving unsatisﬁable instances while worsening the results for satisﬁable We chose the CPU clock time as it can be measured through the same uniﬁedinterface in both Python and C with comparable results. As the tasks are heavilyCPU-bound with very little IO operations, this measurement can still be consideredrealistic. Further details for reproducibility and on the experimental setup can be found inAppendix B. instances. Given the four, incremental approximation steps presented, it is a natural ques-tion to ask which approximation step within the abstraction scheme helps howmuch in solving the instance at hand. After some preliminary experiments onthe ordering of the approximation steps, we came up with the following sequenceof steps:1. Simple cases (see 4.1, "Simple cases")2. Interval based abstraction (see 4.1, "Highest bit set based intervals")3. Function relations (see 4.1, "Relations to other functions")4. Iterative, interval based multiplication (see 4.1, "Full multiplication") .With the abstraction scheme setup in this sequence, we began running ex-periments on the unsatisﬁable and unknown instances. In ﬁve subsequent exper-iments we ran Boolector, Ablector with only the ﬁrst step, Ablector with ﬁrstand second step, Ablector with ﬁrst through third step and full Ablector on allbenchmarks. Interestingly, there is no solid progression clearly decreasing thenumber of unsolved instances in every step. Instead every step makes a numberof instances solvable and another number of instances unsolvable. We thereforedeﬁned both the contribution and the cost of an approximation step: – We call the contribution of an approximation at position N in an abstractionscheme the number of benchmark instances that are not solved by approx-imation steps 1 to N − but are solved using the approximations 1 to N .Thus, the contribution identiﬁes exactly those benchmark instances whichare solved through the N th approximation step within the scheme. – In contrast, we deﬁne the cost of an approximation at position N in anabstraction scheme as the number of benchmark instances that are solved byapproximation steps 1 to N − but are not solved using the approximations1 to N . Thus identifying exactly those benchmark instances which are notsolved, because the N th approximation step within the scheme was put inplace. All instances with unknown status in the benchmark set whose status we managedto ﬁnd out are unsatisﬁable, too. rw-Noetzli bmc-bv-svcomp14 brummayerbiere2 calypto Sage2 UltimateAutomizer BuchwaldFried float log-slicing

39 28Table 3: Instances solved more/less byAblector for speciﬁc benchmark familiesin comparison to Boolector

Analysing the approximation steps.

Using the data from the experi-ments we calculated the costs andcontributions of each approxima-tion steps in the scheme (Ta-ble 1). The quantitative analy-sis shows that steps 1 and 3are the most eﬀective approxima-tion steps of the scheme whilestep 2 seems to cost more thenit contributes. For step 4, how-ever, we noticed that a qualita-tive analysis can be just as in-teresting: While step 4 can nolonger solve the benchmark instance log-slicing/bvsdiv_18.smt2 , ensur-ing bvsdiv is correctly implementedfor bitwidth 18 , the abstraction scheme is able to solve calypto/problem_-16.smt2 (a sequential equivalence checking problem). Depending on the usecase one might therefore argue that this approximation should stay in placeeven though it does not change the total number of solved cases. In a paral-lelized portfolio approach it might even be interesting for an abstraction schemeto achieve high contributions at a high cost as other solvers can take care of theproblems unsolved by the abstraction scheme in parallel. Modifying the abstraction scheme.

Based on the results obtained above, we ranan experiment with Ablector omitting approximation step 2 thus only using Ablector solved this for bitwidth 15 to 17, Boolector for bitwidth 15 to 18

The approximation steps in our scheme were originally developed when attemp-ting to improve the solving behaviour for instances of the LLBMC family ofbenchmarks [16]. Considering the modmul benchmark instance in said family, wedeveloped an abstraction scheme diﬀering for signed and unsigned operators.This is noteworthy, as signed operators are usually just rewritten as unsignedoperators during preprocessing. We experimentally compared our signed abstrac-tion scheme with a version which only considered unsigned operators and rewrote Names shortened for conciseness vsdiv and bvsrem to their unsigned versions using if-then-else statements. Sur-prisingly, the results showed no signiﬁcant diﬀerence in runtime or solving behav-ior on the SMT-LIB benchmark set under evaluation. This is interesting for tworeasons: On the one hand, the modmul benchmark instance showed that specialtreatment for signed operators can help for certain benchmark instances whileour experiment on the SMT-LIB benchmarks shows that this special treatmentdoes not worsen the overall performance. On the other hand, the result sug-gests that the LLBMC family of benchmarks would be an interesting additionto the SMT-LIB benchmark set as it contains instances with novel properties- namely being solvable quite performant using special abstraction schemes forsigned operators. Boolectorunsolved solvedAblector unsolved 711 474 1185solved 47 4792 4839758 5266 6024Table 4: Number of satisﬁable instancessolved by Boolector and Ablector.We will now look at the overall perfor-mance of our abstraction scheme. Fig-ure 1 gives a ﬁrst overview on the solv-ing times of Ablector in comparison toBoolector. While Ablector slows downthe solving process for a number ofinstances with short runtime, we seequite a few instances in green at theright side of the scatter plot which canonly be solved by Ablector - sometimes even within seconds. Table 2 presentsa more concise summary of the instances only solved by Ablector or Boolector(omitting instances solved through preprocessing). Ablector shows a slightly bet-ter performance for the total number of solved instances (11 instances more) andsolves 39 instances Boolector fails upon. A large share of the instances solvedby both procedures can be considered as easy: Only 182 of the 5904 instancessolved by both took longer than 100s for one of the solvers. In comparison tothis, an addition of 39 uniquely solved instances is a considerable progress - es-pecially for portfolio approaches and cases where many unsatisﬁable instanceswhich are believed to be hard need to be solved. Table 3 presents evidence thatour abstraction scheme contributes to solving some instances of sequential equiv-alence checking (in calypto [29]) as well as a number of instances in the

Sage2 benchmark family [12] concerned with constraint resolution for whitebox fuzztesting. Furthermore, our approach helps with the veriﬁcation of rewrite rules inthe context of [27] ( rw-Noetzli ). Apart from above mentioned instances withpublished unsatisﬁability status, Ablector was even able to solve seven instancesof the rewrite rule veriﬁcation family with status unknown within the SMT-LIBbenchmark set, which Boolector failed to solve . Boolector was able to solve exactly 40 of the 92 unknown instances through itsrewriting engine - Ablector solved another 7 through its abstraction scheme .4 Satisﬁable Instances For satisﬁable instances on the other hand, Ablector’s performance is visiblyworse than Boolector’s: As we can see in Table 4, Boolector is able to solve a lotof instances Ablector cannot currently solve and the runtime Ablector takes forthe solved instances cannot make up for this ﬂaw; neither can the 47 instancesonly solved by Ablector. While about 500 timed out instances get stuck in theﬁrst reﬁnement round, the rest of the timed out instances are evenly distributedacross all reﬁnement rounds. Bounding the running time of each reﬁnementround by an upper limit could potentially avoid the problem of instances gettingstuck in a certain step. At the same time, such a time out must be ﬁne-tunedin a manner which does not break the positive eﬀects of our abstraction schemefor unsatisﬁable instances. We expect that the abstraction scheme’s performancecould be improved in future work by integrating the abstractions directly into asolver like Boolector instead of building them as a layer on top. This would allowto make better use of already implemented under-approximation techniques thatare completely ignored for most abstraction steps in the current scheme.

We introduced an approach for solving quantiﬁer-free bit-vector problems inSMT-LIB’s

QF_BV theory. The approach is based on abstraction methodologiespreviously used for various other problems in logic and speciﬁcally in SMT. Wepresented numerous approximation steps for 5 comparatively costly functions ofthe bit-vector theory. Additionally, we gave both a theoretical deﬁnition of suchabstraction schemes and presented a methodology allowing the experimentalanalysis of single approximation steps within a given abstraction scheme.We saw that the presented approach performs better than Boolector in de-ciding unsatisﬁable bit-vector problems, solving 11 unsatisﬁable instances and 7instance with unknown status more and yielding a total of 46 uniquely solvedinstances with unsatisﬁable or unknown status in comparison to Boolector. How-ever, the implemented prototype is not yet competitive for satisﬁable instances.This is in some way a natural result, as over-approximations usually improvethe solver runtime on unsatisﬁable (and not on satisﬁable) instances [4]. Also,the diﬀerence in solved instances for both unsatisﬁable and satisﬁable problemsmakes Ablector a promising addition for a portfolio solver.As already seen in UCLID [7] interleaving over- and under-approximationshas the potential to yield a solver which solves satisﬁable and unsatisﬁable bench-mark instances equally well - this could also be an option for the abstractionscheme presented here. However, well-tuned time limits or intelligent interrup-tion conditions, possibly based on Luby Sequences [21], will be necessary forall approximation steps in order to avoid lock ins where the solver keeps work-ing in a single phase without coming to any result, while still granting the stepsenough time to come to conclusions where possible. Alternatively, a portfolio ap-proach making use of various over- and under-approximation techniques couldbe explored. 15 eferences https://satassociation.org/jsat/index.php/jsat/article/view/74

7. Bryant, R.E., Kroening, D., Ouaknine, J., Seshia, S.A., Strichman, O., Brady,B.A.: Deciding bit-vector arithmetic with abstraction. In: Tools and Algorithms forthe Construction and Analysis of Systems, 13th International Conference, TACAS2007, Held as Part of the Joint European Conferences on Theory and Practice ofSoftware, ETAPS 2007 Braga, Portugal, March 24 - April 1, 2007, Proceedings.pp. 358–372 (2007)8. Cimatti, A., Griggio, A., Schaafsma, B.J., Sebastiani, R.: The MathSAT5 SMTsolver. In: Tools and Algorithms for the Construction and Analysis of Systems -19th International Conference, TACAS 2013, Held as Part of the European JointConferences on Theory and Practice of Software, ETAPS 2013, Rome, Italy, March16-24, 2013. Proceedings. pp. 93–107 (2013)9. Clarke, E.M., Grumberg, O., Jha, S., Lu, Y., Veith, H.: Counterexample-guidedabstraction reﬁnement. In: Computer Aided Veriﬁcation, 12th International Con-ference, CAV 2000, Chicago, IL, USA, July 15-19, 2000, Proceedings. pp. 154–169(2000)10. Cordeiro, L.C., Fischer, B., Marques-Silva, J.: Smt-based bounded model checkingfor embedded ANSI-C software. IEEE Trans. Software Eng. 38(4), 957–974 (2012)11. Gario, M., Micheli, A.: PySMT: a solver-agnostic library for fast prototyping ofSMT-based algorithms. In: SMT Workshop 2015 (2015)12. Godefroid, P., Levin, M.Y., Molnar, D.A., et al.: Automated whitebox fuzz testing.In: NDSS. vol. 8, pp. 151–166 (2008)13. Gurﬁnkel, A., Belov, A., Marques-Silva, J.: Synthesizing safe bit-precise invari-ants. In: Tools and Algorithms for the Construction and Analysis of Systems -20th International Conference, TACAS 2014, Held as Part of the European JointConferences on Theory and Practice of Software, ETAPS 2014, Grenoble, France,April 5-13, 2014. Proceedings. pp. 93–108 (2014)14. Hadarean, L., Hyvarinen, A., Niemetz, A., Reger, G.: SMT COMP 2019. Website, http://smtcomp.sourceforge.net/2019/

15. Heizmann, M., Niemetz, A., Reger, G., Weber, T.: SMT COMP 2018. Website, http://smtcomp.sourceforge.net/2018/

6. Iser, M., Kutzner, F., Sinz, C.: The LLBMC family of benchmarks. In: Proceedingsof SAT Competition 2017: Solver and Benchmark Descriptions. pp. 41–42 (2017)17. ISO Central Secretary: ISO14882:2011(e) c++. Standard, International Organiza-tion for Standardization, Geneva, CH (Sep 2011)18. Jonas, M.: SMT Solving for the theory of bit-vectors. Ph.D. thesis, Faculty ofInformatics, Masaryk University, Brno, Czech Republic (2016)19. Jr., H.S.W.: Hacker’s Delight, Second Edition. Pearson Education (2013),

20. Lahiri, S., Mehra, K.K.: Interpolant based decision procedure for quantiﬁer-freepresburger arithmetic. Tech. Rep. MSR-TR-2005-121, Microsoft Research (Septem-ber 2007), proc. National Academy of Sciences21. Luby, M., Sinclair, A., Zuckerman, D.: Optimal speedup of las vegas algorithms.Information Processing Letters 47(4), 173–180 (1993)22. Marques-Silva, J.P., Malik, S.: Propositional SAT Solving, pp. 247–275. SpringerInternational Publishing, Cham (2018)23. Merz, F., Falke, S., Sinz, C.: LLBMC: bounded model checking of C and C++programs using a compiler IR. In: Veriﬁed Software: Theories, Tools, Experiments- 4th International Conference, VSTTE 2012, Philadelphia, PA, USA, January28-29, 2012. Proceedings. pp. 146–161 (2012)24. de Moura, L., Rueß, H.: Lemmas on demand for satisﬁability solvers. In: In Pro-ceedings of the Fifth International Symposium on the Theory and Applications ofSatisﬁability Testing (SAT). pp. 244–251 (2002)25. de Moura, L.M., Jovanovic, D.: A model-constructing satisﬁability calculus. In:Giacobazzi, R., Berdine, J., Mastroeni, I. (eds.) Veriﬁcation, Model Checking, andAbstract Interpretation, 14th International Conference, VMCAI 2013, Rome, Italy,January 20-22, 2013. Proceedings. pp. 1–12 (2013)26. Niemetz, A., Preiner, M., Biere, A.: Boolector 2.0. JSAT 9, 53–58 (2014), https://satassociation.org/jsat/index.php/jsat/article/view/120

27. Nötzli, A., Reynolds, A., Barbosa, H., Niemetz, A., Preiner, M., Barrett, C., Tinelli,C.: Syntax-guided rewrite rule enumeration for SMT Solvers. In: International Con-ference on Theory and Applications of Satisﬁability Testing. pp. 279–297. Springer(2019)28. Peleska, J., Vorobev, E., Lapschies, F.: Automated test case generation with smt-solving and abstract interpretation. In: NASA Formal Methods - Third Interna-tional Symposium, NFM 2011, Pasadena, CA, USA, April 18-20, 2011. Proceed-ings. pp. 298–312 (2011)29. Reisenberger, C.: PBoolector: a parallel SMT solver for QF_BV by combining bit-blasting with look-ahead. Ph.D. thesis, Master’s thesis, Johannes Kepler UnivesitätLinz, Linz, Austria (2014) Correctness of the Abstraction Approach

This section complements Section 3. First, we provide a proof for the complete-ness of C op ( x ) . Afterwards, we explain how a model for some φ can be constructedgiven a model for φ ’s abstraction and vice-versa.In the following, we assume that a T + U F -interpretation for some theory T is also a T interpretation where the evaluation for the uninterpreted functionsis ignored. Furthermore, we assume that we can extend a T -interpretation intoa T + U F -interpretation by adding evaluations for the necessary uninterpretedfunctions. This can usually be considered as valid (e.g. a

QF_UFBV model of someformula φ can also be a QF_BV model of φ if φ does not contain any undeﬁnedfunctions). Lemma 1 (Completeness of Abstraction Schemes).

Given some T -abstrac-tion scheme AS op = { (cid:0) ap op , A op (cid:1) , . . . , (cid:0) ap op , A k op (cid:1) } with the properties deﬁnedabove, C op is a complete T -approximation of op .Proof. Let x be an arbitrary input vector for op . For any T + U F -interpretation I with I (cid:15) ap op ( x ) . = op ( x ) , we know that by deﬁnition I (cid:15) A i op ( x ) for all i ∈ (cid:74) , k (cid:75) as all approximations A i op are complete. Therefore, I (cid:15) (cid:91) ( · , A ) ∈AS op A ( x ) (i.e., I (cid:15) C op ( x ) ) , which implies that C op is a complete T -approximation, too. Theorem 1 (Correctness of Abstraction Approach).

Let T = ( Σ, A ) besome theory with op ∈ Σ F , ∫ F ( op ) = σ · · · σ n σ and n ≥ . Let further Φ bean arbitrary Σ -formula containing some function application op ( x ) . For any T -abstraction scheme AS op with function symbol ap op , the following property holds:There exists a T -interpretation I Φ which is a T -model for Φ iﬀ there exists a T + U F -interpretation I A which is a T + U F -model for Ψ := Φ (cid:2) op ( x ) (cid:55)→ ap op ( x ) (cid:3) ∧ (cid:94) ( · , A ) ∈AS op A ( x ) . Proof. The theorem will be proven in two directions. For each direction, we willconstruct a suitable interpretation given the premise interpretation. ⇒ Let I Φ be a T -model for Φ . We build a model I A by extending I Φ so that I A (cid:0) ap op ( x ) (cid:1) evaluates to I Φ ( op ( x )) . This is possible as ap op is a new un-interpreted function symbol not used within Φ . As I A (cid:15) ap op ( x ) . = op ( x ) ,the completeness proof in Lemma 1 yields I A (cid:15) A op ( x ) . Therefore I A (cid:15) Φ (cid:2) op ( x ) (cid:55)→ ap op ( x ) (cid:3) ∧ (cid:86) ( · , A ) ∈AS op A ( x ) . ⇐ Let I A be a T + U F -model for Ψ . The abstraction scheme deﬁnition statesthat I A (cid:15) (cid:86) ( · , A ) ∈AS op A ( x ) implies I A (cid:15) ap op ( x ) . = op ( x ) through thesoundness property. This implies that I A (cid:15) Φ . Lemma 2 (Soundness/Completeness through Implication).

Given sometheory T = ( Σ, A ) and a T -approximation (cid:0) ap op , A op (cid:1) . If for all T + U F -interpretations I and all x ∈ Dom (cid:0) A op (cid:1) A op ( x ) = ⇒ ap op ( x ) . = op ( x ) holds, then (cid:0) ap op , A op (cid:1) is sound.If for all T + U F -interpretations I and all x ∈ Dom (cid:0) A op (cid:1) ap op ( x ) . = op ( x ) = ⇒ A op ( x ) holds, then (cid:0) ap op , A op (cid:1) is complete.Proof. The proof is based on Deﬁnitions 3 and 4.Given for some formula, the soundness (completeness) formula above holds forall I and x :For any interpretation I where I (cid:50) A op ( x ) ( I (cid:50) ap op ( x ) . = op ( x ) ) the deﬁnitionfor soundness (completeness) is already fulﬁlled.In case I (cid:15) A op ( x ) ( I (cid:15) ap op ( x ) . = op ( x ) ) for some interpretation I , then weknow through the formula above that I (cid:15) ap op ( x ) . = op ( x ) ( I (cid:15) A op ( x ) ) whichimplies that the approximation is, by deﬁnition, sound (complete). B Reproducibility

B.1 Software

For alle experiments a modiﬁed version of

Boolector 3.2.0 is used. More specif-ically, we modiﬁed commit ec1e1a9321aac25e22d404368fef052f704ce78b sothat we could measure the time of the check-sat instruction. This can befound under https://github.com/samysweb/boolector in branch sat-time-measure-32 . As underlying SAT-solver Lingeling with version bcj 78ebb8672540b-de0a335aea946bbf32515157d5a is used. All software packages were compiledusing the provided cmake scripts which have the highest optimization levels en-abled using gcc in version (Ubuntu 7.5.0-3ubuntu1 18.04) 7.5.0 . For theﬁnal experiments presented, Ablector is used in the version available in commit at https://github.com/samysweb/ablector . In our experiments we used a ver-sion of Ablector which used a new function symbol for each function applicationduring the ﬁrst 2 phases of abstraction. This was done as this version showedslightly more promising results than the version which reused the same functionsymbol. 19 .2 Machine All experiments were executed on a cluster of 20 identical compute nodes eachhousing 2 Intel Xeon E5430 @ 2.66GHz CPUs and a total of 32GB of RAM. TheSMT benchmark ﬁles were stored on a RAID system connected to the cluster.

B.3 Benchmark execution