An Incremental Abstraction Scheme for Solving Hard SMT-Instances over Bit-Vectors
AAn Incremental Abstraction Scheme for SolvingHard SMT-Instances over Bit-Vectors
Samuel Teuber, Marko Kleine Büning, Carsten Sinz
Karlsruhe Institute of Technology (KIT), Germany [email protected],{marko.kleinebuening,carsten.sinz}@kit.edu
Abstract.
Decision procedures for SMT problems based on the theoryof bit-vectors are a fundamental component in state-of-the-art softwareand hardware verifiers. While very efficient in general, certain SMT in-stances are still challenging for state-of-the-art solvers (especially whensuch instances include computationally costly functions). In this work,we present an approach for the quantifier-free bit-vector theory (
QF_BV in SMT-LIB) based on incremental SMT solving and abstraction refine-ment. We define four concrete approximation steps for the multiplication,division and remainder operators and combine them into an incrementalabstraction scheme. We implement this scheme in a prototype extend-ing the SMT solver Boolector and measure both the overall performanceand the performance of the single approximation steps. The evaluationshows that our abstraction scheme contributes to solving more unsat-isfiable benchmark instances, including seven instances with unknownstatus in SMT-LIB.
Decision procedures for bit-vectors play an important role in many applicationssuch as bounded model checking, property directed reachability, test generationfor hardware circuits, or symbolic execution [10,13,23,28]. These applicationscan result in formulae of considerable size, and although many of them are stillwithin the reach of current implementations of SMT solvers, even some smallformulas remain extremely hard to solve (e.g., the modmul instances from theLLBMC Family of Benchmarks [16]). It is well known that particular operatorsin the logic of bit-vectors possess only large SAT encodings, and thus makeproblems containing them often hard to solve. This includes the operators ofmultiplication, division and remainder.A technique that is frequently employed to speed up the solving process forhard instances is abstraction [5,6,20,7,8,3,25]. Instead of the original problem,a related problem is analyzed that is supposed to be easier to solve. Abstrac-tions include under-approximations (where the abstract system allows for fewersolutions than the original one) and over-approximations (more solutions).We first present a general scheme for replacing hard bit-vector operators bya series of over-approximations. Then, to demonstrate and analyze their appli-cability, we instantiate the scheme for the operators of multiplication, division a r X i v : . [ c s . L O ] A ug nd remainder. The abstraction sequence contains approximations of differingprecision, which are tried in turn. First, less precise approximations are appliedand, if they are not sufficient, refined by including additional elements of theabstraction sequence. When and which refinements are tried is computed in aCEGAR-like fashion [9].We have enhanced the SMT solver Boolector [26] resulting in an implemen-tation we call Ablector . For an evaluation, we have used the relevant subset ofbenchmarks from the SMT-LIB [1] benchmark release 2019-05-20. In comparisonto Boolector, Ablector is able to solve 11 unsatisfiable instances and 7 instancewith unknown status more and yields a total of 46 uniquely solved instanceswith unsatisfiable or unknown status in SMT-LIB. Compared to previous work,we consider the use of multiple abstractions and the development of a general,multi-level abstraction scheme coupled with a strategy for its evaluation andanalysis as the main contribution of this paper.
Related Work.
In the past, a variety of abstraction techniques have been pro-posed to speed up SMT solvers. De Moura and Rueß [24] presented an approachcalled “lemmas on demand”, in which formulas are converted to Boolean con-straints, which are then iteratively refined by adding lemmas generated by thetheory solver. Brummayer and Biere [5,6] applied this technique to the theory ofbit-vectors and arrays, and combined it with under-approximations via bitwidth-reduction. Lahiri and Mehra [20] developed an algorithm combining under-and over-approximations for the theory of quantifier-free Presburger arithmetic(QFP) based on interpolation. Bryant et al. [7] present an approach where formu-lae in bit-vector logic are encoded with fewer Boolean variables than their width,resulting in an under-approximation. If the under-approximation is unsatisfiable,they compute an unsatisfiable core to derive an over-approximation, which then,in turn, is used to refine another under-approximation. In the MathSAT solver [8]an over-approximating preprocessing step is employed, which treats bit-vectoroperations as uninterpreted functions. In his PhD thesis proposal, Jonas [18]gives a summary of different techniques used in current bit-vector SMT solvers.Finally, Brain et al. [3] developed a general framework for abstraction that gen-eralizes the CDCL algorithm for SAT solving to more expressive theories. Asimilar idea was presented around the same time by de Moura and Jovanovic[25].
We use common notation for propositional logic and many-sorted first-orderlogic as can be reviewed in [2,22]. In particular, we define a signature as Σ = (cid:0) Σ S , Σ P , Σ F , ∫ P , ∫ F (cid:1) where Σ S are the available sorts , Σ P are the predicatesymbols , Σ F are the function symbols and ∫ P : Σ P → (cid:0) Σ S (cid:1) ∗ ( ∫ F : Σ F → (cid:0) Σ S (cid:1) + )defines the rank of a given predicate (function). We call the number of inputparameters of a predicate (function) its arity . Furthermore, we denote a Σ -interpretation as follows: 2 efinition 1 ( Σ -interpretation). For a signature Σ and a set X of variableswith sorts in Σ S , a Σ -interpretation over X is a tuple I = (cid:0) U , I S , I X , I F , I P (cid:1) where: – U (cid:54) = ∅ is the universe of all possible values; – I S : Σ S → P ( U ) maps each sort σ i to a pairwise disjunct domain D i := I S ( σ i ) of possible values for Σ -terms of this sort; – I X : X → U maps each variable x ∈ X to a value v ∈ U ; – I F maps any function symbol f ∈ Σ F of rank ∫ F ( f ) = σ · · · σ n σ n +1 to afunction f I : D ×· · · × D n → D n +1 ; and – I P maps any predicate p ∈ Σ P of rank ∫ P ( p ) = σ · · · σ n to a truth function p I : D ×· · · × D n → { , } . I X must respect the sort σ i of x (i.e., x of sort σ i may only be mapped to v ∈ D i ). A Σ -interpretation I is a Σ -model for some formula φ iff I satisfies φ (i.e., I (cid:15) φ ). Based on this, a Σ -Theory is a tuple T = ( Σ, A ) where Σ is a signatureand A is a set-theoretical class of Σ -interpretations. Furthermore, we denote For T as all formulae in first-order logic in theory T and Term Σσ as the set ofall terms in first-order logic with signature Σ of sort σ . Given some term t ∈ Term Σσ , we define φ [ op ( x ) (cid:55)→ t ] as the formula where the function application op ( x ) is replaced by t in φ (note that x represents the vector of all input valuesfor op ).In the SMT-LIB standard [1] for QF_BV , the functions examined in this worksupport overloading in the sense that a single function symbol like bvmul sup-ports multiple ranks. To simplify the explanations in the following sections, onecan think of bvmul r as the bvmul operation of rank r , thereby avoiding the issueof overloading. This way every function symbol has exactly one rank. Finally, in QF_BV , we denote x [ i ] as the i th bit of some bit-vector x counting from zero and x [0] is the least significant bit. Further, we denote x [ i : j ] with i > j as the slicefrom the j th to the i th bit of said bit-vector. We present an abstraction procedure for the quantifier-free bit-vector theory(
QF_BV ). Our approach substitutes applications of specific operators (for thiswork specifically bvmul , bvsdiv , bvudiv , bvsrem and bvurem ) by abstractions defined on the QF_UFBV theory (adding uninterpreted functions to the theoryof bit-vector). During the solving process, the abstractions made within someinstance are being iteratively refined until the SAT/SMT solver either returns unsat , or sat with correct assignments. We will present a formal definition ofour scheme starting with the approximation of some given function symbol:
Definition 2 (Approximation).
Given some theory T = ( Σ, A ) and somefunction symbol op ∈ Σ F with ∫ F ( op ) = σ · · · σ n σ and n ≥ , a T -approximationfor op consists of: P ( S ) is the powerset over S a new uninterpreted function symbol ap op with ∫ F ( op ) = ∫ F (cid:0) ap op (cid:1) ; and – a mapping A op : Term Σσ ×· · · × Term Σσ n → P ( For T ) .A T -approximation can therefore be written as a tuple (cid:0) ap op , A op (cid:1) . An approximation essentially replaces an occurrence of an existing functionsymbol op by a new one ( ap op ), and furthermore adds formulae that ensurecertain properties for the application of ap op . It may be sound or complete : Definition 3 (Sound T -approximation). Given some theory T = ( Σ, A ) , a T -approximation (cid:0) ap op , A op (cid:1) is sound iff for all x ∈ Dom (cid:0) A op (cid:1) the followingproperty holds: For all T + U F -interpretations I with I (cid:15) A op ( x ) , it holds that I (cid:15) ap op ( x ) . = op ( x ) . Definition 4 (Complete T -approximation). Given some theory T = ( Σ, A ) ,a T -approximation (cid:0) ap op , A op (cid:1) is complete iff for all x ∈ Dom (cid:0) A op (cid:1) the follow-ing property holds: For all T + U F -interpretations I with I (cid:15) ap op ( x ) . = op ( x ) ,it holds that I (cid:15) A op ( x ) . A sound T -approximation is an under-approximation, while a complete T -ap-proximation is an over-approximation of some function op . A set of approxima-tions can then be used to construct an abstraction scheme : Definition 5 (Abstraction Scheme).
Given some theory T = ( Σ, A ) andsome function symbol op ∈ Σ F of strictly positive arity, a T -abstraction scheme(for op ) is a finite, totally ordered set of T -approximations AS op = { (cid:0) ap op , A op (cid:1) , . . . , (cid:0) ap op , A k op (cid:1) } where: – For every i ∈ (cid:74) , k (cid:75) : (cid:0) ap op , A i op (cid:1) is a complete T -approximation of op and – (cid:0) ap op , C op (cid:1) with C op ( x ) := (cid:83) ( · , A ) ∈AS op A ( x ) is a sound T -approximation of op . While any single approximation within the abstraction scheme is only com-plete (and therefore an over-approximation), all approximations taken togethermust be sound and should thus yield a correct definition of the original function .This abstraction scheme can then be used to build a decision procedure likethe one described in Algorithm 1. In a first step, the algorithm replaces alloperators which should be refined by their abstracted uninterpreted functions.Afterwards, the instance is re-evaluated in a loop as long as the underlying SMTsolver does not return unsat and the model returned is incorrect. In each round,further approximations from the abstraction schemes are added to the instance.This process is certain to converge once all approximations of the scheme havebeen added and the formulation is thus both sound and complete. Dom is the domain of a given function. In this case
Dom ( A op ) = Term Σσ ×· · · × Term Σσ n . Just like all previous T -approximations, C op is defined as C op : Term Σσ × · · · × Term Σσ n → P ( For T ) for a function symbol op of rank σ · · · σ n σ . Theorems and Proofs for the correctness of the abstraction scheme can be found inAppendix A lgorithm 1 Decision procedure for QF_BV abstractions.
ADD_CLAUSES and
SAT are calls to the underlying SMT solver.
Require: φ ∈ For QF_BV functions ← (cid:104)(cid:105) for op ∈ Σ F doif op has abstraction then (cid:46) For bvmul , division and remainder in our case for op ( x ) in φ do (cid:46) For each occurrence of op φ ← φ [ op ( x ) (cid:55)→ ap op ( x )] functions.push (( AS op , op ( x ))) ADD_CLAUSES( φ ) while true do r ← SAT() (cid:46)
Checks for satisfiability of current instance if r = unsat thenreturn unsat (cid:46) Correct result was found else consistent ← true for ( AS , op ( x )) in functions doif op ( x ) assignment is inconsistent then (cid:46) Abstraction needs refinement consistent ← false ADD_CLAUSES( AS .pop () ) (cid:46) Adds next abstraction step to instance if consistent thenreturn sat (cid:46) Correct result was found
We developed abstraction schemes for the computationally costly functions bvmul , bvsdiv , bvudiv , bvsrem and bvurem . We present the abstraction of bvmul inmore detail, while giving just a short overview of the bvsrem abstraction. Weare omitting a description of the other functions due to similarity and spacelimitations. bvmul The abstraction scheme for bvmul is divided into four stages: The first stagedescribes the behavior of bvmul for various simple cases (like factors 0 and 1);the second stage defines intervals for the result value given the intervals of themultiplication factors; the third stage introduces relations between bvmul andother functions (specifically division and remainer) and the fourth stage finallyadds full multiplication for certain intervals of the factors.Throughout the abstraction process, we will consider bvmul a signed oper-ation, that is, we will interpret r = bvmul ( x, y ) as if x , y and r were signedintegers. Even though this seems like a restriction, we do not lose correctness ofour approach for unsigned values doing this. For example, if we assert an over-approximation (for an 8 bit multiplication) like x < s s ∧ y < s s = ⇒ r > s s ,this over-approximation also holds for unsigned values as it can be interpretedas x ≥ u u ∧ y ≥ u u = ⇒ r < u u for unsigned values. While this5ight be a surprising abstraction, it is nonetheless a correct one. Effectively,the question of whether x , y and r are signed or unsigned, is an issue for theuser’s interpretation and not for the decision procedure itself. Overflow detection.
For many of the abstractions proposed in this chapter, itis essential to detect overflows of bvmul . To this end, we defined a predicate noov : { , } w × { , } w → { , } for bitwidth w based on [19] which is true if anoverflow can happen when multiplying the two input variables. The approachworks by counting the number of leading bits (ones or zeros). Note that thispredicate also detects signed overflows and might not be sound . Simple cases.
For a multiplication instance bvmul ( x, y ) of factors x and y withbitwidth w , we define the following constraints: ( x . = 0) ⇒ ( ap bvmul ( x, y ) . = 0) (1) ( y . = 0) ⇒ ( ap bvmul ( x, y ) . = 0) (2) ( x . = 1) ⇒ ( ap bvmul ( x, y ) . = y ) (3) ( y . = 1) ⇒ ( ap bvmul ( x, y ) . = x ) (4) ( x . = − ⇒ ( ap bvmul ( x, y ) . = − y ) (5) ( y . = − ⇒ ( ap bvmul ( x, y ) . = − x ) (6) noov ( x, y ) ⇒ ( ¬ x [ w − ∧ ¬ y [ w − ⇒ ¬ ap bvmul ( x, y ) [ w − (7) ∧ ( x > s ∧ y [ w − ⇒ ap bvmul ( x, y ) [ w − (8) ∧ ( x [ w − ∧ y > s ⇒ ap bvmul ( x, y ) [ w − (9) ∧ ( x [ w − ∧ y [ w − ⇒ ¬ ap bvmul ( x, y ) [ w − (10)For example, equations (1) and (2) define the multiplication cases where onefactor is zero, other equations cover similar easy cases. Rules (1)–(4) have alsobeen proposed in [7], the other rules are, to the best of our knowledge, novel.Additionally, we can make statements about the result’s sign whenever wecan be certain that no overflow is going to happen. For the cases where no over-flow happens, the sign behavior of bit-vector multiplication corresponds to thecommon sign behavior of multiplication and can be encoded as an approxima-tion as seen in (7)–(10). Finally, all cases where one of the two factors is a powerof 2 can be covered by constraints like (11) for all i ∈ (cid:74) , w − (cid:75) and for x and y symmetrically: (cid:94) j (cid:54) = i ¬ x [ j ] ∧ x [ i ] = ⇒ (cid:0) umul (cid:0) x +2 , y +2 (cid:1) . = shl (cid:0) y + , i (cid:1)(cid:1) (11) For simplicity, we are omitting some overflow behavior in this example. However, wedo consider overflow cases in the approximations defined later on. While every overflow will be detected, it might detect more overflows than actuallyexist umul is the unsigned multiplication function and x +2 as well as y +2 arepositive, double bitwidth versions of x and y as detailed in the following section.Instruction shl is the left shift function.The mapping A op of this abstraction stage is then the conjunction of (1)–(11). These formulae are no static rewrite rules, but constraints provided to theunderlying solver. Completeness.
The completeness is a direct consequence of the definition givenin 4 as it can be checked that all formulae presented above (which specificallyomitted any statements about overflow cases) are implications of ap bvmul ( x, y ) . = bvmul ( x, y ) . Highest bit set based intervals.
Using the factors’ highest bit set , intervalsof the factors can be defined, which in turn can be used to assert intervals of themultiplication’s result. In a first step, the signed multiplication r := ap bvmul ( x, y ) is transformed into its unsigned version with doubled bitwidth: x +2 . = ite ( x [ w − , − sext ( x, w ) , sext ( x, w )) ,y +2 . = ite ( y [ w − , − sext ( y, w ) , sext ( y, w )) ,r (cid:48) . = ite ( x [ w − ⊕ y [ w − , − umul ( x +2 , y +2 ) , umul ( x +2 , y +2 )) . Instruction sext is the sign extension function. By asserting equality of themultiplication result r and r (cid:48) [ w − , it is then possible to reason about theresults of r +2 := umul ( x +2 , y +2 ) through bit shifting: If i is the highest bit set of x +2 then i ≤ x +2 < i +1 and therefore, i ∗ y +2 ≤ r +2 < i +1 ∗ y +2 . We thus definea predicate hbs ( x, i ) which is true iff the highest bit set of x is i .The previously presented intuition gives rise to the following abstractionwhich must distinguish overflow from no-overflow cases. For this, we will initiallyuse a double bitwidth (i.e., ∗ w width) unsigned multiplication function. Wethen define double bit width lower( L ) and upper( U ) bounds for the result of themultiplication based on the highest bit set as explained above: L ( a, b, n ) := (cid:40) ite ( hbs ( a, , b, , n = 0 ite ( hbs ( a, n ) , shl ( b, n ) , L ( a, b, n − , elseU ( a, b, n ) := (cid:40) shl ( b, , n = 0 ite ( hbs ( a, n ) , shl ( b, n + 1) , U ( a, b, n − , else We can compare the necessary number of bits depending on the result of noov :If an overflow is possible, we must compare the version with ∗ w bits, otherwisethe w bit version can be used for comparison.Note that while L and U seem to be recursive functions, they can be unrolledinto consecutive ite statements when adding the bounds to the instance at hand. The highest bit set of x is i iff x is of the form n − i − · · { , } i A op of this approximation step is then the assertion that r +2 must lie within the bounds given by L ( x +2 , y +2 , w − and U ( x +2 , y +2 , w − (forthe necessary bitwidth as explained above). Completeness.
Through the distinction between overflow and no-overflow casesthe various equations can be regarded as normal multiplication disregarding over-flows. Therefore, it can be checked that these constraints are direct implicationsof ap bvmul ( x, y ) . = bvmul ( x, y ) . This approximation is consequently completeaccording to definition 4. Relations to other functions.
Aside from previous abstraction approaches,we can also look at relations between functions – possibly providing the solverwith more high-level information. This can be useful in cases where relationsbetween multiple function applications already lead to a contradiction.For the multiplication instruction ap bvmul ( x , y ) , with x and y the doublebitwidth ( ∗ w ) versions of x and y , we propose the following abstractions: ap bvmul ( x , y ) . = ap bvmul ( y , x ) ,x . = 0 ∨ y . = ap bvsdiv ( ap bvmul ( x , y ) , x ) ,y . = 0 ∨ x . = ap bvsdiv ( ap bvmul ( x , y ) , y ) . For every bit width w (cid:48) < ∗ w which appears in a given problem instance andits abstractions, we can further assert that ap bvmul ( x , y )[ w (cid:48) − . = ap bvmul ( x [ w (cid:48) − , y [ w (cid:48) − ,ap bvmul ( x , y )[ w (cid:48) − . = ap bvmul ( y [ w (cid:48) − , x [ w (cid:48) − and for x (cid:48) := sext (cid:18) x (cid:20)(cid:22) w (cid:48) (cid:23) : 0 (cid:21) , w (cid:48) − (cid:22) w (cid:48) (cid:23)(cid:19) ; y (cid:48) := sext (cid:18) y (cid:20)(cid:22) w (cid:48) (cid:23) : 0 (cid:21) , w (cid:48) − (cid:22) w (cid:48) (cid:23)(cid:19) we assert that ap bvmul ( x (cid:48) , y (cid:48) ) . = ap bvmul ( y (cid:48) , x (cid:48) ) ,y (cid:48) . = 0 ∨ x (cid:48) . = ap bvsdiv ( ap bvmul ( x (cid:48) , y (cid:48) ) , y (cid:48) ) ,x (cid:48) . = 0 ∨ y (cid:48) . = ap bvsdiv ( ap bvmul ( x (cid:48) , y (cid:48) ) , x (cid:48) ) . Essentially all these relations between various multiplication and division appli-cations are all based on the semantic of multiplication, division and remainderas used in SMT-LIB and C++ [17].The only challenge of this abstraction is to formulate the constraints so thatthey are complete for overflow cases. For this, we use an approach with doubledbitwidth ∗ w while encoding the constraints and define x (cid:48) and y (cid:48) for every w (cid:48) in a way that prevents overflows during multiplication.8 ompleteness. The completeness of this abstraction is a direct consequence of allassertions being well-known properties for machine multiplication and division.As we avoid all overflow cases through the use of doubled bitwidth, the propertieshold for any input combination.
Full multiplication.
In a last step, full multiplication on a per-interval basisis added as a constraint. We assume an SMT instance containing some multipli-cation bvmul ( x, y ) . If the instance is still satisfiable after the previous steps, acounterexample is returned. We then look up the highest bit set i of x ’s assign-ment and assert that the multiplication is precise if bit i of x is set to : hbs ( x, i ) = ⇒ ap bvmul ( x, y ) . = bvmul ( x, y ) . Completeness and Soundness.
For a multiplication of bitwidth w the approxi-mation is complete and it even becomes sound once this assertion has been madefor all i ∈ (cid:74) , w − (cid:75) . Note, that the maximum number of necessary refinementsteps is bounded by the function’s bitwidth w . bvsrem Due to its rareness in benchmarks only a single abstraction layer has been addedfor this operator. Once again with double bitwidth as explained in Section 4.1,we assert the relations between bvsrem and other functions: ap bvsrem ( x, y ) . = ap bvsrem ( x , y ) [ w − x . = ap bvmul ( ap bvsdiv ( x , y ) , y ) + ap bvsrem ( x , y ) . In the following refinement step, we add the full remainder constraint: ap bvsrem ( x, y ) . = bvsrem ( x, y ) . In order to evaluate the performance of the abstraction scheme presented above,we implemented a prototype solver by enhancing Boolector [26] (version 3.2).To increase readability, we will refer to the prototype as
Ablector which standsfor Ab stracted Boo lector . Experimental setup.
As the focus of this work lay on researching which abstrac-tions are effective in solving more hard instances and not so much on buildingan improved solver, the abstraction refinement procedure is built as a layer ontop of Boolector. We will compare the default configuration of Boolector withour prototype based upon on said default configuration in order to quantify the Its rareness makes it harder to evaluate the performance of abstractions and ab-stractions are less likely to have a big impact on the overall performance. of the solving process. We do this by comparingBoolector’s SMT-LIB check-sat call against the summed up CPU clock time ofall invocations to rewritten procedures in Ablector including its own check-sat call. In particular, our time measurement for Ablector also contains all abstrac-tion refinement procedures. This will produce a more realistic comparison of theabstraction’s performance. Note that we are even over-approx-imating the timeAblector takes in this comparison as we are adding the operator constructiontime during parsing, which is not considered for Boolector. These procedurecalls, however, are in most cases negligible in comparison to the time for the check-sat call. In accordance with the rules of the SMT competition 2018 [15]the timeout was set to s in all experiments . Contribution CostStep 1 Step 3 Benchmark Selection
Our work wassparked by an investigation on hardbenchmarks in the LLBMC familyof benchmarks presented at the SATCompetition 2017 [16]. To ensure thatthe abstractions do not overfit , we de-cided to evaluate on the larger setof
QF_BV benchmarks in the 2019-05-20 SMT-LIB Benchmark release [1]which were used for the 2019 SMT-Competition [14]. This benchmark setcontains 14382 instances of satisfiable, 27144 instances of unsatisfiable and 170instances of unknown status. We decided to remove all benchmarks not usingthe abstracted operators bvmul , bvsdiv , bvudiv , bvsrem or bvurem as we weremainly interested in the effect of the abstractions implemented. This resulted ina subset containing 6024 instances of satisfiable, 15849 instances of unsatisfiableand 92 instances of unknown status. Of those instances, Boolector was unableto solve 758 satisfiable, 152 unsatisfiable and 52 unknown instances. Boolec-tor solves 9769 of the unsatisfiable instances through its rewriting engine thusleaving 6080 instances to be solved by transformation to SAT clauses.Preliminary experiments showed that our abstraction scheme is mainly helpfulwhen solving unsatisfiable instances while worsening the results for satisfiable We chose the CPU clock time as it can be measured through the same unifiedinterface in both Python and C with comparable results. As the tasks are heavilyCPU-bound with very little IO operations, this measurement can still be consideredrealistic. Further details for reproducibility and on the experimental setup can be found inAppendix B. instances. Given the four, incremental approximation steps presented, it is a natural ques-tion to ask which approximation step within the abstraction scheme helps howmuch in solving the instance at hand. After some preliminary experiments onthe ordering of the approximation steps, we came up with the following sequenceof steps:1. Simple cases (see 4.1, "Simple cases")2. Interval based abstraction (see 4.1, "Highest bit set based intervals")3. Function relations (see 4.1, "Relations to other functions")4. Iterative, interval based multiplication (see 4.1, "Full multiplication") .With the abstraction scheme setup in this sequence, we began running ex-periments on the unsatisfiable and unknown instances. In five subsequent exper-iments we ran Boolector, Ablector with only the first step, Ablector with firstand second step, Ablector with first through third step and full Ablector on allbenchmarks. Interestingly, there is no solid progression clearly decreasing thenumber of unsolved instances in every step. Instead every step makes a numberof instances solvable and another number of instances unsolvable. We thereforedefined both the contribution and the cost of an approximation step: – We call the contribution of an approximation at position N in an abstractionscheme the number of benchmark instances that are not solved by approx-imation steps 1 to N − but are solved using the approximations 1 to N .Thus, the contribution identifies exactly those benchmark instances whichare solved through the N th approximation step within the scheme. – In contrast, we define the cost of an approximation at position N in anabstraction scheme as the number of benchmark instances that are solved byapproximation steps 1 to N − but are not solved using the approximations1 to N . Thus identifying exactly those benchmark instances which are notsolved, because the N th approximation step within the scheme was put inplace. All instances with unknown status in the benchmark set whose status we managedto find out are unsatisfiable, too. rw-Noetzli bmc-bv-svcomp14 brummayerbiere2 calypto Sage2 UltimateAutomizer BuchwaldFried float log-slicing
39 28Table 3: Instances solved more/less byAblector for specific benchmark familiesin comparison to Boolector
Analysing the approximation steps.
Using the data from the experi-ments we calculated the costs andcontributions of each approxima-tion steps in the scheme (Ta-ble 1). The quantitative analy-sis shows that steps 1 and 3are the most effective approxima-tion steps of the scheme whilestep 2 seems to cost more thenit contributes. For step 4, how-ever, we noticed that a qualita-tive analysis can be just as in-teresting: While step 4 can nolonger solve the benchmark instance log-slicing/bvsdiv_18.smt2 , ensur-ing bvsdiv is correctly implementedfor bitwidth 18 , the abstraction scheme is able to solve calypto/problem_-16.smt2 (a sequential equivalence checking problem). Depending on the usecase one might therefore argue that this approximation should stay in placeeven though it does not change the total number of solved cases. In a paral-lelized portfolio approach it might even be interesting for an abstraction schemeto achieve high contributions at a high cost as other solvers can take care of theproblems unsolved by the abstraction scheme in parallel. Modifying the abstraction scheme.
Based on the results obtained above, we ranan experiment with Ablector omitting approximation step 2 thus only using Ablector solved this for bitwidth 15 to 17, Boolector for bitwidth 15 to 18
The approximation steps in our scheme were originally developed when attemp-ting to improve the solving behaviour for instances of the LLBMC family ofbenchmarks [16]. Considering the modmul benchmark instance in said family, wedeveloped an abstraction scheme differing for signed and unsigned operators.This is noteworthy, as signed operators are usually just rewritten as unsignedoperators during preprocessing. We experimentally compared our signed abstrac-tion scheme with a version which only considered unsigned operators and rewrote Names shortened for conciseness vsdiv and bvsrem to their unsigned versions using if-then-else statements. Sur-prisingly, the results showed no significant difference in runtime or solving behav-ior on the SMT-LIB benchmark set under evaluation. This is interesting for tworeasons: On the one hand, the modmul benchmark instance showed that specialtreatment for signed operators can help for certain benchmark instances whileour experiment on the SMT-LIB benchmarks shows that this special treatmentdoes not worsen the overall performance. On the other hand, the result sug-gests that the LLBMC family of benchmarks would be an interesting additionto the SMT-LIB benchmark set as it contains instances with novel properties- namely being solvable quite performant using special abstraction schemes forsigned operators. Boolectorunsolved solvedAblector unsolved 711 474 1185solved 47 4792 4839758 5266 6024Table 4: Number of satisfiable instancessolved by Boolector and Ablector.We will now look at the overall perfor-mance of our abstraction scheme. Fig-ure 1 gives a first overview on the solv-ing times of Ablector in comparison toBoolector. While Ablector slows downthe solving process for a number ofinstances with short runtime, we seequite a few instances in green at theright side of the scatter plot which canonly be solved by Ablector - sometimes even within seconds. Table 2 presentsa more concise summary of the instances only solved by Ablector or Boolector(omitting instances solved through preprocessing). Ablector shows a slightly bet-ter performance for the total number of solved instances (11 instances more) andsolves 39 instances Boolector fails upon. A large share of the instances solvedby both procedures can be considered as easy: Only 182 of the 5904 instancessolved by both took longer than 100s for one of the solvers. In comparison tothis, an addition of 39 uniquely solved instances is a considerable progress - es-pecially for portfolio approaches and cases where many unsatisfiable instanceswhich are believed to be hard need to be solved. Table 3 presents evidence thatour abstraction scheme contributes to solving some instances of sequential equiv-alence checking (in calypto [29]) as well as a number of instances in the
Sage2 benchmark family [12] concerned with constraint resolution for whitebox fuzztesting. Furthermore, our approach helps with the verification of rewrite rules inthe context of [27] ( rw-Noetzli ). Apart from above mentioned instances withpublished unsatisfiability status, Ablector was even able to solve seven instancesof the rewrite rule verification family with status unknown within the SMT-LIBbenchmark set, which Boolector failed to solve . Boolector was able to solve exactly 40 of the 92 unknown instances through itsrewriting engine - Ablector solved another 7 through its abstraction scheme .4 Satisfiable Instances For satisfiable instances on the other hand, Ablector’s performance is visiblyworse than Boolector’s: As we can see in Table 4, Boolector is able to solve a lotof instances Ablector cannot currently solve and the runtime Ablector takes forthe solved instances cannot make up for this flaw; neither can the 47 instancesonly solved by Ablector. While about 500 timed out instances get stuck in thefirst refinement round, the rest of the timed out instances are evenly distributedacross all refinement rounds. Bounding the running time of each refinementround by an upper limit could potentially avoid the problem of instances gettingstuck in a certain step. At the same time, such a time out must be fine-tunedin a manner which does not break the positive effects of our abstraction schemefor unsatisfiable instances. We expect that the abstraction scheme’s performancecould be improved in future work by integrating the abstractions directly into asolver like Boolector instead of building them as a layer on top. This would allowto make better use of already implemented under-approximation techniques thatare completely ignored for most abstraction steps in the current scheme.
We introduced an approach for solving quantifier-free bit-vector problems inSMT-LIB’s
QF_BV theory. The approach is based on abstraction methodologiespreviously used for various other problems in logic and specifically in SMT. Wepresented numerous approximation steps for 5 comparatively costly functions ofthe bit-vector theory. Additionally, we gave both a theoretical definition of suchabstraction schemes and presented a methodology allowing the experimentalanalysis of single approximation steps within a given abstraction scheme.We saw that the presented approach performs better than Boolector in de-ciding unsatisfiable bit-vector problems, solving 11 unsatisfiable instances and 7instance with unknown status more and yielding a total of 46 uniquely solvedinstances with unsatisfiable or unknown status in comparison to Boolector. How-ever, the implemented prototype is not yet competitive for satisfiable instances.This is in some way a natural result, as over-approximations usually improvethe solver runtime on unsatisfiable (and not on satisfiable) instances [4]. Also,the difference in solved instances for both unsatisfiable and satisfiable problemsmakes Ablector a promising addition for a portfolio solver.As already seen in UCLID [7] interleaving over- and under-approximationshas the potential to yield a solver which solves satisfiable and unsatisfiable bench-mark instances equally well - this could also be an option for the abstractionscheme presented here. However, well-tuned time limits or intelligent interrup-tion conditions, possibly based on Luby Sequences [21], will be necessary forall approximation steps in order to avoid lock ins where the solver keeps work-ing in a single phase without coming to any result, while still granting the stepsenough time to come to conclusions where possible. Alternatively, a portfolio ap-proach making use of various over- and under-approximation techniques couldbe explored. 15 eferences https://satassociation.org/jsat/index.php/jsat/article/view/74
7. Bryant, R.E., Kroening, D., Ouaknine, J., Seshia, S.A., Strichman, O., Brady,B.A.: Deciding bit-vector arithmetic with abstraction. In: Tools and Algorithms forthe Construction and Analysis of Systems, 13th International Conference, TACAS2007, Held as Part of the Joint European Conferences on Theory and Practice ofSoftware, ETAPS 2007 Braga, Portugal, March 24 - April 1, 2007, Proceedings.pp. 358–372 (2007)8. Cimatti, A., Griggio, A., Schaafsma, B.J., Sebastiani, R.: The MathSAT5 SMTsolver. In: Tools and Algorithms for the Construction and Analysis of Systems -19th International Conference, TACAS 2013, Held as Part of the European JointConferences on Theory and Practice of Software, ETAPS 2013, Rome, Italy, March16-24, 2013. Proceedings. pp. 93–107 (2013)9. Clarke, E.M., Grumberg, O., Jha, S., Lu, Y., Veith, H.: Counterexample-guidedabstraction refinement. In: Computer Aided Verification, 12th International Con-ference, CAV 2000, Chicago, IL, USA, July 15-19, 2000, Proceedings. pp. 154–169(2000)10. Cordeiro, L.C., Fischer, B., Marques-Silva, J.: Smt-based bounded model checkingfor embedded ANSI-C software. IEEE Trans. Software Eng. 38(4), 957–974 (2012)11. Gario, M., Micheli, A.: PySMT: a solver-agnostic library for fast prototyping ofSMT-based algorithms. In: SMT Workshop 2015 (2015)12. Godefroid, P., Levin, M.Y., Molnar, D.A., et al.: Automated whitebox fuzz testing.In: NDSS. vol. 8, pp. 151–166 (2008)13. Gurfinkel, A., Belov, A., Marques-Silva, J.: Synthesizing safe bit-precise invari-ants. In: Tools and Algorithms for the Construction and Analysis of Systems -20th International Conference, TACAS 2014, Held as Part of the European JointConferences on Theory and Practice of Software, ETAPS 2014, Grenoble, France,April 5-13, 2014. Proceedings. pp. 93–108 (2014)14. Hadarean, L., Hyvarinen, A., Niemetz, A., Reger, G.: SMT COMP 2019. Website, http://smtcomp.sourceforge.net/2019/
15. Heizmann, M., Niemetz, A., Reger, G., Weber, T.: SMT COMP 2018. Website, http://smtcomp.sourceforge.net/2018/
6. Iser, M., Kutzner, F., Sinz, C.: The LLBMC family of benchmarks. In: Proceedingsof SAT Competition 2017: Solver and Benchmark Descriptions. pp. 41–42 (2017)17. ISO Central Secretary: ISO14882:2011(e) c++. Standard, International Organiza-tion for Standardization, Geneva, CH (Sep 2011)18. Jonas, M.: SMT Solving for the theory of bit-vectors. Ph.D. thesis, Faculty ofInformatics, Masaryk University, Brno, Czech Republic (2016)19. Jr., H.S.W.: Hacker’s Delight, Second Edition. Pearson Education (2013),
20. Lahiri, S., Mehra, K.K.: Interpolant based decision procedure for quantifier-freepresburger arithmetic. Tech. Rep. MSR-TR-2005-121, Microsoft Research (Septem-ber 2007), proc. National Academy of Sciences21. Luby, M., Sinclair, A., Zuckerman, D.: Optimal speedup of las vegas algorithms.Information Processing Letters 47(4), 173–180 (1993)22. Marques-Silva, J.P., Malik, S.: Propositional SAT Solving, pp. 247–275. SpringerInternational Publishing, Cham (2018)23. Merz, F., Falke, S., Sinz, C.: LLBMC: bounded model checking of C and C++programs using a compiler IR. In: Verified Software: Theories, Tools, Experiments- 4th International Conference, VSTTE 2012, Philadelphia, PA, USA, January28-29, 2012. Proceedings. pp. 146–161 (2012)24. de Moura, L., Rueß, H.: Lemmas on demand for satisfiability solvers. In: In Pro-ceedings of the Fifth International Symposium on the Theory and Applications ofSatisfiability Testing (SAT). pp. 244–251 (2002)25. de Moura, L.M., Jovanovic, D.: A model-constructing satisfiability calculus. In:Giacobazzi, R., Berdine, J., Mastroeni, I. (eds.) Verification, Model Checking, andAbstract Interpretation, 14th International Conference, VMCAI 2013, Rome, Italy,January 20-22, 2013. Proceedings. pp. 1–12 (2013)26. Niemetz, A., Preiner, M., Biere, A.: Boolector 2.0. JSAT 9, 53–58 (2014), https://satassociation.org/jsat/index.php/jsat/article/view/120
27. Nötzli, A., Reynolds, A., Barbosa, H., Niemetz, A., Preiner, M., Barrett, C., Tinelli,C.: Syntax-guided rewrite rule enumeration for SMT Solvers. In: International Con-ference on Theory and Applications of Satisfiability Testing. pp. 279–297. Springer(2019)28. Peleska, J., Vorobev, E., Lapschies, F.: Automated test case generation with smt-solving and abstract interpretation. In: NASA Formal Methods - Third Interna-tional Symposium, NFM 2011, Pasadena, CA, USA, April 18-20, 2011. Proceed-ings. pp. 298–312 (2011)29. Reisenberger, C.: PBoolector: a parallel SMT solver for QF_BV by combining bit-blasting with look-ahead. Ph.D. thesis, Master’s thesis, Johannes Kepler UnivesitätLinz, Linz, Austria (2014) Correctness of the Abstraction Approach
This section complements Section 3. First, we provide a proof for the complete-ness of C op ( x ) . Afterwards, we explain how a model for some φ can be constructedgiven a model for φ ’s abstraction and vice-versa.In the following, we assume that a T + U F -interpretation for some theory T is also a T interpretation where the evaluation for the uninterpreted functionsis ignored. Furthermore, we assume that we can extend a T -interpretation intoa T + U F -interpretation by adding evaluations for the necessary uninterpretedfunctions. This can usually be considered as valid (e.g. a
QF_UFBV model of someformula φ can also be a QF_BV model of φ if φ does not contain any undefinedfunctions). Lemma 1 (Completeness of Abstraction Schemes).
Given some T -abstrac-tion scheme AS op = { (cid:0) ap op , A op (cid:1) , . . . , (cid:0) ap op , A k op (cid:1) } with the properties definedabove, C op is a complete T -approximation of op .Proof. Let x be an arbitrary input vector for op . For any T + U F -interpretation I with I (cid:15) ap op ( x ) . = op ( x ) , we know that by definition I (cid:15) A i op ( x ) for all i ∈ (cid:74) , k (cid:75) as all approximations A i op are complete. Therefore, I (cid:15) (cid:91) ( · , A ) ∈AS op A ( x ) (i.e., I (cid:15) C op ( x ) ) , which implies that C op is a complete T -approximation, too. Theorem 1 (Correctness of Abstraction Approach).
Let T = ( Σ, A ) besome theory with op ∈ Σ F , ∫ F ( op ) = σ · · · σ n σ and n ≥ . Let further Φ bean arbitrary Σ -formula containing some function application op ( x ) . For any T -abstraction scheme AS op with function symbol ap op , the following property holds:There exists a T -interpretation I Φ which is a T -model for Φ iff there exists a T + U F -interpretation I A which is a T + U F -model for Ψ := Φ (cid:2) op ( x ) (cid:55)→ ap op ( x ) (cid:3) ∧ (cid:94) ( · , A ) ∈AS op A ( x ) . Proof. The theorem will be proven in two directions. For each direction, we willconstruct a suitable interpretation given the premise interpretation. ⇒ Let I Φ be a T -model for Φ . We build a model I A by extending I Φ so that I A (cid:0) ap op ( x ) (cid:1) evaluates to I Φ ( op ( x )) . This is possible as ap op is a new un-interpreted function symbol not used within Φ . As I A (cid:15) ap op ( x ) . = op ( x ) ,the completeness proof in Lemma 1 yields I A (cid:15) A op ( x ) . Therefore I A (cid:15) Φ (cid:2) op ( x ) (cid:55)→ ap op ( x ) (cid:3) ∧ (cid:86) ( · , A ) ∈AS op A ( x ) . ⇐ Let I A be a T + U F -model for Ψ . The abstraction scheme definition statesthat I A (cid:15) (cid:86) ( · , A ) ∈AS op A ( x ) implies I A (cid:15) ap op ( x ) . = op ( x ) through thesoundness property. This implies that I A (cid:15) Φ . Lemma 2 (Soundness/Completeness through Implication).
Given sometheory T = ( Σ, A ) and a T -approximation (cid:0) ap op , A op (cid:1) . If for all T + U F -interpretations I and all x ∈ Dom (cid:0) A op (cid:1) A op ( x ) = ⇒ ap op ( x ) . = op ( x ) holds, then (cid:0) ap op , A op (cid:1) is sound.If for all T + U F -interpretations I and all x ∈ Dom (cid:0) A op (cid:1) ap op ( x ) . = op ( x ) = ⇒ A op ( x ) holds, then (cid:0) ap op , A op (cid:1) is complete.Proof. The proof is based on Definitions 3 and 4.Given for some formula, the soundness (completeness) formula above holds forall I and x :For any interpretation I where I (cid:50) A op ( x ) ( I (cid:50) ap op ( x ) . = op ( x ) ) the definitionfor soundness (completeness) is already fulfilled.In case I (cid:15) A op ( x ) ( I (cid:15) ap op ( x ) . = op ( x ) ) for some interpretation I , then weknow through the formula above that I (cid:15) ap op ( x ) . = op ( x ) ( I (cid:15) A op ( x ) ) whichimplies that the approximation is, by definition, sound (complete). B Reproducibility
B.1 Software
For alle experiments a modified version of
Boolector 3.2.0 is used. More specif-ically, we modified commit ec1e1a9321aac25e22d404368fef052f704ce78b sothat we could measure the time of the check-sat instruction. This can befound under https://github.com/samysweb/boolector in branch sat-time-measure-32 . As underlying SAT-solver Lingeling with version bcj 78ebb8672540b-de0a335aea946bbf32515157d5a is used. All software packages were compiledusing the provided cmake scripts which have the highest optimization levels en-abled using gcc in version (Ubuntu 7.5.0-3ubuntu1 18.04) 7.5.0 . For thefinal experiments presented, Ablector is used in the version available in commit at https://github.com/samysweb/ablector . In our experiments we used a ver-sion of Ablector which used a new function symbol for each function applicationduring the first 2 phases of abstraction. This was done as this version showedslightly more promising results than the version which reused the same functionsymbol. 19 .2 Machine All experiments were executed on a cluster of 20 identical compute nodes eachhousing 2 Intel Xeon E5430 @ 2.66GHz CPUs and a total of 32GB of RAM. TheSMT benchmark files were stored on a RAID system connected to the cluster.
B.3 Benchmark execution