Deductive Verification of Floating-Point Java Programs in KeY
Rosa Abbasi Boroujeni, Jonas Schiffl, Eva Darulova, Mattias Ulbrich, Wolfgang Ahrendt
CC o n s i s t e n t * C o m p l e t e * W e l l D o c u m e n t e d * E a s y t o R e u s e * * E v a l u a t e d * T A C A S * A r t i f a c t * A E C Deductive Verification of Floating-PointJava Programs in KeY
Rosa Abbasi ( (cid:66) ), Jonas Schiffl , Eva Darulova ,Mattias Ulbrich , and Wolfgang Ahrendt MPI-SWS, Kaiserslautern and Saarbrücken, Germany, {rosaabbasi,eva}@mpi-sws.org Karlsruhe Institute of Technology, Karlsruhe, Germany, {jonas.schiffl,ulbrich}@kit.edu Chalmers University of Technology, Göteborg, Sweden, [email protected]
Abstract.
Deductive verification has been successful in verifying inter-esting properties of real-world programs. One notable gap is the limitedsupport for floating-point reasoning. This is unfortunate, as floating-pointarithmetic is particularly unintuitive to reason about due to roundingas well as the presence of the special values infinity and ‘Not a Num-ber’ (NaN). In this paper, we present the first floating-point support ina deductive verification tool for the Java programming language. Oursupport in the KeY verifier handles arithmetic via floating-point decisionprocedures inside SMT solvers and transcendental functions via axioma-tization. We evaluate this integration on new benchmarks, and show thatthis approach is powerful enough to prove the absence of floating-pointspecial values—often a prerequisite for further reasoning about numeri-cal computations—as well as certain functional properties for realisticbenchmarks.
Keywords:
Deductive Verification · Floating-point Arithmetic · Tran-scendental Functions.
Deductive verification has been successful in providing functional verification forprograms written in popular programming languages such as Java [3, 22, 40, 48],Python [28], Rust [5], C [24, 53], and Ada [18, 49]. Deductive verifiers allow auser to annotate methods in a program with pre- and postconditions, from whichthey automatically generate verification conditions (VCs). These are then eitherproven directly by the verifier itself, or discharged with external tools such asautomated (SMT) solvers or interactive proof assistants.While deductive verifiers fully implement many sophisticated data represen-tations (including heap data structures, objects, and ownership), support forfloating-point numbers remains rather limited – solely Frama-C and SPARK offerautomated support for floating-point arithmetic in C and Ada [31]. This stateof affairs is at least partially a result of previous limitations in floating-pointsupport in SMT solvers. Consequently, deductive verification has been used for a r X i v : . [ c s . P L ] J a n R. Abbasi et al. floating-point programs only by experts with considerable manual effort [14, 31].This is unfortunate as it makes deductive verification unavailable for a largenumber of programs across many domains including embedded systems, machinelearning, and scientific computing. With the increasing need for parallelizationin code, scientific computing specifically has recently experienced algorithmicchallenges for which formal methods may contribute to a solution [9, 55].One of the main challenges of floating-point arithmetic is its unintuitivebehavior and the special values that the IEEE 754 standard [38] introduces.For instance, an overflow or a division by zero results in the special value(positive or negative) infinity , and not a runtime exception. Similarly, invalidoperations like sqrt(-1.0) result in a
Not a Number (NaN) value. These specialvalues are problematic as seemingly straight-forward identities do not hold ( x== x or x * 0.0 == 0.0 ). In addition, every operation on floating-point numberspotentially involves rounding, which compromises familiar rules like associativityand distributivity. Hence, reasoning support for writing correct floating-pointprograms is indispensable.Abstract interpretation-based tools can prove the absence of runtime errorsand special values [19, 42], and bound roundoff errors due to floating-point’sfinite precision [10, 20, 25, 35, 56]. SMT decision procedures [17] or SAT-basedmodel-checking [23,55], on the other hand, can prove intricate properties requiringbit-precise reasoning. However, these techniques and tools largely support onlypurely floating-point programs or program snippets, or analyze programs onlyup to a predefined depth of the call stack. General reasoning about real-worldobject-oriented programs, however, also requires support for features such as the(unbounded) heap, necessitating different analyses which need to be combinedwith floating-point reasoning.Handling floating-points in a deductive verifier has unique advantages. First,the deductive verification approach already comes with the infrastructure forreasoning about complex control and data structures (like exception handling andheap). Second, it allows one to flexibly combine the verifier’s symbolic executionreasoning with external decision procedures. Third, depending on the theorysupport, the verifier or external solver may also generate counterexamples of aproperty and thus help program debugging – something an abstract interpretation-based approach fundamentally cannot provide.We report on adding floating-point support to the KeY deductive verifier,providing the first automated deductive floating-point support for the Javaprogramming language. We focus mainly on proving the absence of the specialvalues infinity and NaN. While these are helpful in certain circumstances, for mostapplications they signal an error. Hence, showing their absence is a prerequisitefor further (functional) reasoning. That said, our extension also allows one toexpress and discharge arbitrary functional properties expressible in floating-pointarithmetic, including bounds on roundoff errors for certain programs, and boundson differences between two similar floating-point programsWe exploit both KeY’s symbolic execution and external SMT support. Onthe one hand, we handle arithmetic operations by relying on a combination of eductive Verification of Floating-Point Java Programs in KeY 3 KeY’s symbolic execution to handle the heap and SMT based decision proceduresto handle the floating-point part of the VCs. On the other hand, we supporttranscendental functions via axiomatization in the KeY prover itself.Transcendental functions such as sine are a common feature in numericalprograms, but are not supported by floating-point decision procedures. We exploretwo ways of supporting them soundly but approximately, by encoding them asaxiomatized uninterpreted function symbols once directly in the SMT queries,and once in additional calculus rules in KeY. Our evaluation shows that eventhough such reasoning is approximate, it is nonetheless sufficient to prove theabsence of special values in many interesting programs.We evaluate KeY’s floating-point support on a number of real-world floating-point Java programs. Our benchmark set allows us to evaluate recent progress inSMT floating-point support in Z3 [27], CVC4 [7] and MathSAT [21] on yet unseenbenchmarks. For instance, we observe that quantifiers are challenging even if theydo not affect satisfiability of SMT queries. Our benchmarks are openly available,and we expect our insights to be useful for further solver development.
Contributions
In summary, we make the following contributions: – we implement and evaluate the first automated deductive verification offloating-point Java programs by combining the strength of rule based andSMT based deduction; – we collect a new set of challenging real-world floating-point benchmarks inJava (available at https://gitlab.mpi-sws.org/AVA/key-float-benchmarks/ ); – we compare different SMT solvers for discharging floating-point VCs on thisnew set of benchmarks; – and we develop novel automated support for reasoning about transcendentalfunctions in a deductive verifier. KeY [3] is a platform for deductive verification of Java programs, working at asource code level. The input is a Java program annotated in the Java ModelingLanguage (JML) [44], encouraging a
Design by Contract ([45, 50]) approach tosoftware development. The user specifies the expected behavior of Java classeswith class invariants that the program has to maintain at critical points. Methodsare specified with method contracts , consisting mainly of pre- and postconditions,with the understanding that if the precondition holds when the method is called,the postcondition has to hold after the method returns.After loading an annotated program, KeY translates it to a formula inJava Dynamic Logic [3] (JavaDL), an instance of Dynamic Logic [36] whichenables logical reasoning about Java programs. Logical rules are provided forthe translation of programs into first-order logic, and for closing the resulting goals , or proof obligations. KeY is semi-interactive in that it allows manual rule
R. Abbasi et al. application, while also offering powerful built-in automation and macros. Inaddition, it is also possible to translate an open goal into SMT-LIB format [8]and call an external SMT solver. For specific theories, SMT solvers can be muchmore efficient than KeY’s own automation. This makes it possible to prove somegoals, which depend on SMT supported theories, by using an SMT solver, whileothers are proved internally, using KeY’s own automation.
In the following, we summarize some central characteristics of Java floating-pointnumbers, loosely following [52]. Each normal floating-point number x can berepresented as a triplet ( s, m, e ) , such that x = ( − s ∗ m ∗ e , where s ∈ { , } is the sign , m (called significand ) is a binary fixed-point number with one digitbefore the radix point and p − digits after the radix point (note that ≤ m < ),and e ( exponent ) is an integer such that e min ≤ e ≤ e max . Java supports twofloating-point formats (both in base ): float (‘single’) precision with p = 24 , andminimal and maximal exponent e min = − , e max = 127 and double precisionwith p = 53 , e min = − , e max = 1023 .Whenever the result of a computation cannot be exactly represented withthe given precision, it is rounded. IEEE 754 defines various rounding modes, ofwhich Java only supports round to nearest, ties to even . Rounding is exact, as ifone would first compute the ideal real number, and round afterwards.The triple representation gives us two zeros, +0 and − , represented by (0 , , and (1 , , , respectively. If the absolute value of the ideal result of acomputation is too small to be representable as a floating-point number of thegiven format, the resulting floating point number is +0 or − . In addition, thereare three special values, + ∞ , −∞ , and NaN (Not a Number). If the absolutevalue of the ideal result of a computation is too big to be representable as afloating-point number of the given format, the result is + ∞ or −∞ . Also, divisionby zero will give an infinite result (e.g., . / + 0 = + ∞ ). Computing further withinfinity may give an infinite result (e.g., + ∞ + + ∞ = + ∞ ), but may also resultin the additional ‘error value’ NaN (e.g., + ∞ − + ∞ = NaN). Due to the presenceof infinities and NaN, floating-point operations do not throw Java exceptions.By default, the Java virtual machine is allowed to make use of higher-precisionformats provided by the hardware. This can make computation more accurate,but it also leads to platform dependent behaviour. This can be avoided by usingthe strictfp modifier, ensuring that only the single and double precision typesare used. This modifier ensures portability.
In order to be able to specify and verify programs containing floating-pointnumbers, we made several extensions to the KeY tool. First, we added the float eductive Verification of Floating-Point Java Programs in KeY 5
Listing 1.1: The Rectangle.scale benchmark /*@ public normal_behavior @ requires \fp_nice(arg0.x) && \fp_nice(arg0.y)@ && \fp_nice(arg1) && \fp_nice(arg2);@ ensures !\fp_nan(\result.x) && !\fp_nan(\result.y) &&@ !\fp_nan(\result.width) && !\fp_nan(\result.height);@ also @ public normal_behavior @ requires -5.53 <= arg0.x && arg0.x <= -3.38 &&@ -5.53 <= arg0.y && arg0.y <= -3.38 &&@ 3.1 < arg0.width && arg0.width <= 3.7332 &&@ 3.0000001 < arg0.height && arg0.height <=4.0004 &&@ 3.0003001 < arg1 && arg1 <= 4.0024 &&@ -6.4000003 < arg2 && arg2 <= 3.0001;@ ensures !\fp_nan(\result.x) && !\fp_nan(\result.y)&&@ !\fp_nan(\result.width) &&!\fp_nan(\result.height);@*/ public Rectangle scale(Rectangle arg0, double arg1, double arg2){Area v1 = new
Area(arg0);AffineTransform v2 = AffineTransform.getScaleInstance(arg1, arg2);Area v3 = v1.createTransformedArea(v2);Rectangle v4 = v3.getRectangle2D(); return v4;} and double types to the KeY type system, together with an enum type for thedifferent rounding modes of the IEEE 754 Standard.We further introduced functions and predicate symbols to formalize opera-tions ( + , * , . . . ) and comparisons ( < , == , . . . ) on floating-point expressions. Thetranslation supports both code with and without the strictfp modifier. However,since the actual precision of non-strictfp operations is not known, the functionsymbols remain uninterpreted. We extended KeY’s parser to correctly handleprograms and annotations containing floating-point numbers, and added logicrules for translating floating-point expressions from Java or JML to JavaDL.As an example, Listing 1.1 shows JML specifications of our Rectangle bench-mark that contains floating-point literals and makes use of the fp_nan and fp_nice predicates. fp_nan states that a floating-point expression is NaN and fp_nice ,which is shorthand for “not infinity and not NaN”, states that a floating-pointexpression is not NaN or infinity. The scale method contains two contracts thatare checked separately, ensuring that the class fields of a scaled rectangle objectare not NaN, considering different preconditions. For the first contract, the SMTsolver produces a counterexample. In the second, we bound inputs by concreteranges that we picked arbitrarily and get the valid result. In practice, such rangeswould come from the context, e.g. from the kind of rectangles that appear in anapplication, or from known ranges of sensor values.
R. Abbasi et al.
Concerning discharging the resulting proof obligations, there were two mainways to consider. One is to create a floating-point theory within KeY by addingaxioms and deduction rules, so that the desired properties can be proven inKeY’s sequent calculus. The other way is to translate the proof obligations fromJavaDL to SMT-LIB and call an external SMT solver. While the KeY approachtraditionally favors conducting proofs within KeY, for this work, we partiallydeviated from this way in order to harness the greater experience and efficiency ofSMT solvers when it comes to floating-point arithmetic. Our approach attemptsto get the best of both worlds by distinguishing between basic floating-pointarithmetic, i. e., elementary operations and comparisons, and more complexfunctions which do not have an SMT-LIB equivalent (e. g., the transcendentalfunctions), or where the SMT-LIB function is not usefully implemented by currentSMT solvers (see Section 3.2.B).Elementary operations and comparisons get translated to the correspondingSMT-LIB functions. In SMT-LIB, all floating-point computations conform to theIEEE 754 Standard. Therefore, only Java programs with the strictfp modifiercan be directly translated to SMT-LIB without loss of correctness.We developed a translation from KeY’s floating-point theory to SMT-LIB.In order to integrate it into KeY, we also overhauled the existing translationfrom JavaDL to SMT-LIB to create a new, more modular framework, whichnow supports all the features of the original translation, e. g., heaps and integerarithmetic, but also floating-point expressions at the same time.Floating-point intricacies sometimes require extra caution. For example, thereare two different notions of equality for floats: bitwise equality and IEEE754equality. Our implementation ensures these are distinguished correctly, and thatthe specification language remains intuitive for a developer to use.Using the translation to SMT-LIB, we can specify and prove two classes ofproperties in KeY: The absence of special values is specified using the fp_nan and fp_infinite predicates (or the fp_nice equivalent). Furthermore, one can specify functional properties that are expressible in floating-point arithmetic, e.g. onecan compare the result of a computation against the result of a different programwhich is known to produce a good result or a reference value.
Floating-point decision procedures in SMT solvers successfully handle programsconsisting of arithmetic and square root operations. Many numerical real-worldprograms, however, include transcendental functions such as sin and cos . In Javaprograms, these functions are implemented as static library functions in the class java.lang.Math .Unlike arithmetic operations, transcendental functions are much more looselyspecified by the IEEE 754 Standard—only an upper bound on the roundofferror is given. Libraries are thus free to provide different implementations, andeven tighter error bounds. Exact reasoning in the same spirit as floating-pointarithmetic would thus have to encode a specific implementation. Given that theseimplementations are highly optimized, this approach would be arguably complex. eductive Verification of Floating-Point Java Programs in KeY 7
We observe, however, that such exact reasoning about transcendental functions isoften not necessary and a sound approximate approach is sufficient and efficient.In this section, we introduce an axiomatic approach for reasoning aboutprograms containing transcendental functions. We observe that with the flexibilityof deductive verification and KeY itself, we can instantiate it in two different ways.We encode transcendental functions as uninterpreted functions and axiomatizethem in the SMT queries. Alternatively, we encode these axioms in KeY as logicalinference rules. (A) Axiomatization in SMT
We encode library functions as uninterpretedfunctions and include a set of axioms in the SMT-LIB translation for eachmethod that is called in a benchmark. That is, we extended KeY such that whena transcendental function exists in the proof obligation, its definition alongsideall the axioms for that function are added to the translation.For the axiomatization of transcendentals, we did not add rules that expandto a definition or allow a repeated approximation of the function value (likeexpansion into a Taylor series). Instead, we added a number of lemmata encodinginteresting properties related to special values. For instance, the following axiomstates that if the input to the sin function is not a NaN or infinity, then thereturned value of sin is between − . and . : (assert (forall ((a Float64)) (=>(and (not (fp.isNaN a)) (not (fp.isInfinite a)))(and (fp.leq (sinDouble a) (fp Note that this implies that the result is not a NaN or infinity. The other axiomsare similar in spirit, so we do not list them.These axioms are expressed as quantified floating-point formulas and capturehigh-level properties of library functions complying with the specifications in theIEEE 754 Standard. Clearly, since we do not have the actual implementations ofthese functions, we are not able to prove arbitrary properties. However, such anaxiomatization is often sufficient to check for the (absence of) special values, i.e.NaN and infinity, as our experiments in Section 4.4 show. (B) Taclets in KeY
Reasoning about quantified formulas in SMT is a long-lasting challenge [33]. We have also observed in our experiments with onlyarithmetic operations (Section 4.3) that SMT solvers struggle with quantifiers incombination with floating-points. We have therefore implemented an alternativeapproach encoding the axioms not in the SMT queries, but instead as deductiveinference rules (so-called taclets) in KeY.The rules encode the same logical information as the universally quantifiedassertions that we add in SMT-LIB (and where we leave the choice of instantia-tions entirely to the SMT/SAT solver). With our taclet approach, we instantiatea quantifier (only) to one’s needs. We note that for proving a property correct,this results in a correct (under)approximation. However, the prize for achieving
R. Abbasi et al.
Benchmark Details Automode Statisticsbenchmark
Table 1: Benchmark details and KeY automode statistics, time is measured insecondsmore closed proofs and shorter running times is that for disproving a prop-erty, not considering all possible quantifier instantiations may lead to spuriouscounterexamples, i.e., false positives.A heuristic strategy applies the rules automatically using the occurrencesof transcendentals as instantiation triggers. However, instantiating the axiomstoo eagerly, considerably increases the number of open goals, which is why weassume that the user selects the axioms to apply manually (and did so in theexperiments). After the application the proof obligation can either be closed, i.eproven, by KeY automatically, or be given to the SMT solver as before for finalsolving.Currently, the set of axioms (in the SMT-LIB translation and as taclets inKeY) only contains axioms for the transcendental functions occurring in ourbenchmarks. So far we have axioms; however, adding more axioms (also forfurther transcendentals like exponentiation or logarithm) is straightforward. Thefull set of axioms is included in the Appendix of the technical report. We collected a set of existing floating-point Java programs representing real-world applications in order to evaluate the feasibility and performance of KeY’sfloating-point support.The left half of Table 1 provides an overview of our benchmarks. Eachbenchmark consists of one method, which is composed of arithmetic operations eductive Verification of Floating-Point Java Programs in KeY 9
Listing 1.2: The Circuit.instantCurrent benchmark public class
Circuit { double maxVoltage, frequency, resistance, inductance; // .../*@ public normal_behavior @ requires ensures !\fp_nan(\result) && !\fp_infinite(\result);@*/ public double instantCurrent( double time) {Complex current = computeCurrent(); double maxCurrent = Math.sqrt(current.getRealPart() * current.getRealPart() +current.getImaginaryPart() * current.getImaginaryPart()); double theta = Math.atan(current.getImaginaryPart() / current.getRealPart()); return maxCurrent * Math.cos((2.0 * Math.PI * frequency * time) + theta);}} and method calls to potentially other classes. The invocations of methods from java.lang.Math (e.g. Math.abs ) are marked by “+1” in Table 1; these are resolvedby inlining the method implementation. For benchmarks that contain calls totranscendental functions and square root, the called functions are listed; these arehandled by our axiomatization. We include sqrt in this list, as we have observedthat exact support can be expensive, so it may be advantageous to handle sqrt axiomatically. Benchmarks
Rectangle , Circuit , Matrix3 and
Rotation are partiallyshown in Listings 1.1, 1.2, 1.3 and 1.4 respectively.Each benchmark also includes a JML contract that is to be checked. Forsome methods, we specify two contracts (marked by “(2)” in the first columnof Table 1), each serving as an independent benchmark. The contracts for mostof these benchmarks check that the methods do not return a special value i.einfinity and/or NaN, the preconditions being that the variables are not themselvesspecial values and possibly are bounded in a given range. For the
Matrix , FPLoop and
Rotate benchmarks, we check a functional property (see Section 4.3).
FPLoop ,which has three contracts, additionally shows how to specify floating-point loopbehavior using loop invariants.
To reason about the contract of a selected benchmark, we apply KeY, whichgenerates proof obligations or ‘goals’. Some of these goals (heap-related) areclosed by KeY automatically. The remaining open goals are closed by either SMTsolvers with floating-point support directly (Section 3.1 and Section 3.2.A), or with a combination of transcendental KeY taclets and floating-point SMT solving(Section 3.2.B).Columns 6 and 7 in Table 1 show the number of proof obligations closed byKeY directly and to be discharged by external solvers, respectively. The next twocolumns show the number of taclet rules that KeY applied in order to close itsgoals, and the time this takes. For benchmarks with two contracts we show therespective values separated by ‘/’.We run our experiments on a server with 1.5 TB memory and 4x12 CPU coresat 3 GHz. However, KeY runs single-threadedly and does not use more than 8GBof memory.For our set of benchmarks, the symbolic execution process is fully automated.Note that the machinery can deal with loop invariants, if they are provided. Loopinvariant generation is, however, particularly challenging for floating-points dueto roundoff errors [26, 39], and a research topic in itself.
Previous work [31] reported that SMT support for floating-point arithmetic israther limited. However, with recent advances [17], we evaluate the situationagain. Most benchmarks used to evaluate SMT solvers’ decision procedures [1]aim to check (individual) specialized (corner case) properties of floating-pointarithmetic. The proof obligations generated from our set of benchmarks arecomplementary in that they are more arithmetic heavy, while nonetheless relyingon accurate reasoning about special values and functional properties.For each open goal not automatically closed, KeY generates one SMT-LIBfile that is fed to the solvers for validation. We compare the performance of thethree major SMT solvers with floating-point support CVC4 [7] (version 1.8, withthe SymFPU library [17] enabled), Z3 (4.8.9) [27] and MathSAT (5.6.3) [21]. Forthis we set a timeout of 300s for each proof obligation. While KeY is able todischarge proof obligations in parallel, for our experiments, we do so sequentiallyto maintain comparability.KeY’s default translation to SMT includes quantifiers. These quantificationsare not related to floating-point arithmetic, but are used to logically encodeimportant properties of the Java memory model, like the type hierarchy andthe absence of dangling references on any valid Java heap. If we reason aboutfloating-point problems in isolation, they are not needed, but if we want toconsider Java verification more holistically with questions combining aspects ofheap and floating point reasoning, they become essential. We manually inspectedthat the proof obligations without our axiomatized treatment of transcendentalfunctions do not depend on these properties and investigate the quantifier supportby including or removing them from the SMT translations. We do not reportresults with quantifiers for MathSAT, since it does not support them.Table 2 summarizes the results of our experiments. Column 4 shows thenumber of expected valid or invalid goals for all benchmarks. For each solver weshow the number of goals that each solver can validate or invalidate, togetherwith the average time (in seconds) needed. The goals resulting in timeout were eductive Verification of Floating-Point Java Programs in KeY 11 index experiment quantifiedaxioms (cid:51)
80 79 4.1 25 18.4 - -2 (cid:55)
80 79 4.0 52 35.0 80 8.83 invalidcontracts (cid:51) (cid:55) (cid:51)
10 9 33.2 4 63.4 - -6 axioms as taclets (cid:55)
10 10 33.4 5 74.2 8 0.97 fp.sqrt (cid:55) (cid:55)
Table 2: Summary of valid / invalid goals correctly decided and average runningtimes of each solver for the SMT translations with and without quantified axioms
Goal0.11.010.0100.0300.0 T i m e ( s ) ( L o g s c a l e ) CVC4Z3
Fig. 1: Runtimes for valid goals withSMT translations with quantifiers
Goal0.11.010.0100.0300.0 T i m e ( s ) ( L o g s c a l e ) CVC4MathSatZ3
Fig. 2: Runtimes for valid goals withSMT translations without quantifiersexcluded from the computation of the average time. Column 3 shows whetherthe SMT queries include quantifiers or not.Rows 1 and 2 of Table 2 show the results for benchmarks with valid contracts.This experiment thus represents the common behavior of KeY, whose main goalis to prove contracts correct. Rows 3 and 4 of Table 2 demonstrate the results forbenchmarks with invalid contracts, i.e. for those we expect a counterexample forat least one of the goals. The Appendix (Section A) contains the detailed resultsfor each experiment separated by benchmark. Figure 1 and Figure 2 show a moredetailed view of the solvers’ running time for the valid benchmarks. The x-axisshows the number of open goals that are discharged by the SMT solvers, sortedby running time for each solver individually. The k -th point of one graph showsthe minimum running time needed by the solver to close each of the k fastestgoals. Note that each solver may have different goals which are its k fastest. They-axis shows the time on a logarithmic scale.We conclude that in the presence of quantified axioms and floating-pointarithmetic solvers’ performance deteriorate for both valid and invalid goals.In particular, none of the solvers is able to find counterexamples for any ofthe invalid goals. However, when the quantified axioms are removed from the SMT translations, their performance improves. For valid contracts, CVC4 andMathSAT perform better than Z3, in terms of both number of goals validatedand the running time per goal. In particular, MathSAT is able to prove all goals.However, the running time performance of CVC4 is better than MathSAT’s. Forinvalid contracts, solvers are able to produce the expected counterexamples atleast partially. Particularly, MathSAT has a better performance than CVC4 andZ3 in terms of both running time and the number of proof obligations for whichit can produce counterexamples.We conducted another experiment on our
Rectangle.scale benchmark to assessthe solvers’ sensitivity to various changes, applied to the benchmark’s contractor its implementation. We considered modifications such as reducing the numberof classes while keeping the same functionality, having tighter and larger boundsfor variables, reducing the number of arithmetic operations etc. The detailsof this experiment can be found in the Appendix of the technical report. Insummary, solvers’ performance seems to be sensitive to slight innocuous lookingchanges such as the number of classes involved and variable bounds. For example,constraining arg2 in the original benchmark more tightly allows CVC4 to validateall goals (1 more). This behavior could be potentially exploited by e.g. relaxing avariable’s bounds.
Proving Functional Properties
Listings 1.3 and 1.4 show examples of functionalproperties that are expressible in floating-point arithmetic and that KeY canhandle. The verification results are included in rows 1 and 2 of Table 2, for moredetails see the Appendix of the technical report.For
Matrix , we check that the determinants of a matrix and its transposeare equal. Note that this property holds trivially under real arithmetic, butnot necessarily under floating-points. After feeding transposedEq (which uses the determinant method) and its contract to KeY, increasing the default timeoutsufficiently and discharging the created goal, CVC4 generates a counterexamplein 170.2s seconds and MathSAT in 16.2s. Z3 times out after 30 minutes. Byfeeding transposedEqV2 (which uses the determinantNew method) to KeY, CVC4validates the contract in 1.1s, MathSAT in 3.9s and Z3 times out again. Onething worth noting is that the way programs are written can greatly influence thecomputational complexity needed to reject or verify the contract. This is evidentfrom the fact that slightly modifying the order of operations (using determinantNew instead) substantially reduces verification time and changes the verification resultfor MathSAT and CVC4.For
Rotate , we check that the difference between an original vector and theone that is rotated four times by 90 degrees, must not be larger than 1.0E-15.We also verified the same bound for the relative difference (by exploiting anothermethod and contract) for this benchmark. The constant cos90 in Listing 1.4 isnot precisely 0.0 to account for rounding effects in the computation of the cosine.
FPLoop includes three loops, for which the contracts check that the return valueis bigger than a given constant.Though not always very fast, these examples show that verification of func-tional floating-point properties is viable. eductive Verification of Floating-Point Java Programs in KeY 13
Listing 1.3: The Matrix3 benchmark public class
Matrix3 { double a, b, c, d, e, f, g, h, i; //The matrix: [[a b c],[d e f],[g h i]] double det; // method transpose not shown double determinant() { return (a * e * i + b * f * g + c * d * h) -(c * e * g + b * d * i + a * f * h);} double determinantNew() { return (a * (e * i) + (g * (b * f) + c * (d * h))) -(e * (c * g) + (i * (b * d) + a * (f * h)));} /*@ ensures \fp_normal(\result) ==> (\result == det); @*/ double transposedEq() {det = determinant(); return transpose().determinant();} /*@ ensures \fp_normal(\result) ==> (\result == det); @*/ double transposedEqV2() {det = determinantNew(); return transpose().determinantNew();}}
Listing 1.4: The Rotation benchmark public class
Rotation { final static double cos90 = 6.123233995736766E-17; final static double sin90 = 1.0; // rotates a 2D vector by 90 degrees public static double [] rotate( double [] vec) { double x = vec[0] * cos90 - vec[1] * sin90; double y = vec[0] * sin90 + vec[1] * cos90; return new double []{x, y};} /*@ requires (\forall int i; 0 <= i && i < vec.length;@ \fp_nice(vec[i]) && vec[i] > 1.0 && vec[i] < 2.0) && vec.length == 2;@ ensures \result[0] < 1.0E-15 && \result[1] < 1.0E-15;*/ public static double [] computeError( double [] vec) { double [] temp = rotate(rotate(rotate(rotate(vec)))); return new double []{Math.abs(temp[0] - vec[0]), Math.abs(temp[1] - vec[1])};}}
We evaluated the two approaches from Section 3.2.A on our set of benchmarks;rows 5 and 6 in Table 2 summarize the results. (The detailed results of theseexperiments are included in the Appendix of the technical report.) Note thatboth approaches are fully automated.We conclude that the SMT solvers perform better when the axiomatizationis applied at the KeY level. When axioms for transcendental functions are addedto the SMT-LIB translation directly Z3 validates 4 out of 10 goals. With theaxiomatization at the KeY level, solvers are able to validate more goals (withquantified formulas removed from the SMT translations), e.g. Z3 is able tovalidate 5 goals and CVC4 can validate all. Therefore, it is preferable to applythem on the KeY side via taclet rules.All the solvers we have used in this work comply with the IEEE 754 standardand therefore have bit-precise support for the square root function. They providebit-precise reasoning by effectively encoding the behavior of floating-point circuitsover bitvectors (which is naturally expensive), together with different heuristicsand abstractions to speed up solving time. However, depending on the property, wedo not always need bit-precise reasoning, so we propose handling the square rootfunction with the same taclet-based axiomatization as introduced in Section 3.2.B.To this end, we conducted an experiment on the benchmarks containing sqrt ,comparing the approach from Section 3.2.B (adding the necessary axioms, resp.taclet rules) to using the square root implemented in SMT solvers ( fp.sqrt ). Wechose to include only axioms specified in or inferred from the IEEE 754 standard(e.g. if the argument of the square root function is NaN or less than zero, thenthe square root results in NaN). The full set of axioms that we used is includedin the Appendix of the technical report.Rows 7 and 8 in Table 2 summarize the results for this experiment; thedetailed results are included in the Appendix of the technical report. We observedthat for two out of the three benchmarks, the average running time of all solversdecreases using the axiomatized square root. Furthermore, Z3 is able to reasonabout more proof obligations with the axiomatized version. However, the successof this approach depends on the axioms added to KeY and may not always workif we do not have suitable axioms. For example, for the
Circuit.instantCurrent benchmark (Listing 1.2), using the axiomatized square root, CVC4 is not able tovalidate the contract, but with fp.sqrt the contract is validated.In summary, treating sqrt axiomatically can result in shorter solving timesthan performing bit-precise reasoning, but the approach may not always succeedwhen the axioms are not sufficient to prove a particular property.
The experiments show that highly automated floating point program verificationis viable for relevant properties (handling of special values and some functionalproperties), up to a certain level of complexity (given by the SMT solvers). Thechoices of which parts of a proof obligation are delegated to SMT, and how they eductive Verification of Floating-Point Java Programs in KeY 15 are translated to SMT, are crucial for achieving effective and efficient programverification. Arithmetic operations proved to be more efficiently dealt with bydelegation to SMT, whereas for transcendental functions, axiomatization andrule based treatment in the theorem prover, outside the SMT solver, performsclearly better.
Our implementation uses the floating-point SMT-LIB theory [16], which how-ever does not handle transcendental functions, as their semantics is (library)implementation dependent. Some real-valued automated solvers do handle tran-scendental functions [4, 32], but to the best of our knowledge, the combination offloating-points and reals in SMT solvers is still severely limited.None of the existing deductive verifiers support floating-point transcendentalfunctions automatically. The Why3 deductive verification framework [29] hassupport for floating-point arithmetic, with front-ends for the C and Ada pro-gramming languages through Frama-C [24] and SPARK [18, 31], respectively.Why3 has back-end support for different SMT solvers, as well as interactive proofassistants like Coq. Until recently, Why3 would discharge still many interestingfloating-point problems with help of Coq, relying on significant user interaction. Inlater work [31] (in the context with floating-point verification for Ada programs),Why3 can achieve a higher degree of automation. Note, however, that the user isstill required to add code assertions as well as ‘ghost code’ to a significant extent.The Boogie intermediate verification language [46] also supports floating-point expressions, and targets Z3 for discharging proof obligations. In the Boogiecommunity, it was observed that writing a specification in Boogie leads todecreases in SMT solver performance when compared to writing the goal inSMT-LIB directly, probably due to an inherent mixing of theories when usingBoogie [2]. This matches our own experiences, and separation of theories shouldbe considered an important task for the further development of floating-pointverification.Other deductive verifiers for Java have only rudimentary support for floating-points. Verifast [40] treats floating-point operations as if they were real values,and OpenJML [22] parses programs with floating-point operations, but essentiallytreats float and double as uninterpreted sorts.The Java category of verification competition SV-COMP [11] contains a num-ber of benchmarks that make use of floating-point variables. However, the focusof these benchmarks is usually not on arithmetical properties of expressions, buton the completeness of the Java language support. Amongst the participants ofSV-COMP 2020, the Symbolic (Java) Pathfinder (SPF) [54] (and various exten-sions) and the Java Bounded Model Checker (JBMC) [23] support floating-pointarithmetic. Besides being limited to exploring the state space up to a boundeddepth, their constraint languages do not support quantifiers and abstracting ofmethod calls—which are features that we have used in this work.
Floating-point arithmetic has also been formalized in several interactivetheorem provers [15, 30, 41]. While one can prove intricate properties aboutfloating-point programs [13, 14, 37], proofs using interactive provers are to a largepart manual and require significant expertise.Abstract interpretation based techniques can show the absence of specialvalues in floating-point code fully automatically, and several abstract domainswhich are sound with respect to floating-point arithmetic exist [19, 42]. While theanalysis itself is fully automated, applying it successfully to real-world programsin general requires adaptation to each program analyzed by end-users, e.g. theselection of suitable abstract domains or widening thresholds [12].Besides showing the absence of special values, recent research has developedstatic analyses to bound floating-point roundoff errors [25, 34, 47, 51, 56]. Theseanalyses currently work only for small arithmetic kernels and the tools in particulardo not accept programs with objects.Dynamic analyses generally scale well on real-world programs, but can onlyidentify bugs (when given failure-triggering input), rather than proving correctnessfor all possible inputs. Executing a floating-point program together with a higher-precision one allows one to find inputs which cause large roundoff errors [10,20,43].Ariadne [6] uses a combination of symbolic execution, real-valued SMT solvingand testing to find inputs that trigger floating-point exceptions, including overflowand invalid operations. Our work subsumes this approach as the SMT solversthat we use can directly generate counterexamples, but more importantly, KeYis able to prove the absence of such exceptions.
By joining the forces of rule-based deduction and SAT-based SMT solving, wepresented the first working floating-point support in a deductive verification toolfor Java and by that close a remaining gap in KeY to now support full sequentialJava. Our evaluation shows that for specifications dealing with value ranges andabsence of NaN and infinity, our approach can verify realistic programs within areasonable time frame. We observe that the MathSAT and CVC4 solver’s floating-point support scales sufficiently for our benchmarks, as long as the queries donot include any quantifiers, and that our axiomatized approach for handlingtranscendental functions is best realized using calculus rules in KeY’s internalreasoning engine. While our work is implemented within the KeY verifier, weexpect our approach to be portable to other verifiers.
Acknowledgements
This research was partially funded by the Deutsche Forschungsgemeinschaft(DFG, German Research Foundation) project 387674182. The authors would liketo thank Daniel Eddeland, who together with co-author W. Ahrendt performedprestudies which impacted the current work. eductive Verification of Floating-Point Java Programs in KeY 17
References
1. QF_FP SMT benchmarks. https://clc-gitlab.cs.uiowa.edu:2443/SMT-LIB-benchmarks/QF_FP (2019)2. Slow verification of programs combining multiple floating point values (Github issue)(2019 (accessed May 11, 2020)), https://github.com/boogie-org/boogie/issues/109
3. Ahrendt, W., Beckert, B., Bubel, R., Hähnle, R., Schmitt, P.H., Ulbrich, M. (eds.):Deductive Software Verification - The KeY Book - From Theory to Practice, LNCS,vol. 10001. Springer (2016)4. Akbarpour, B., Paulson, L.C.: MetiTarski: An Automatic Theorem Prover forReal-Valued Special Functions. Journal of Automated Reasoning (3) (2010)5. Astrauskas, V., Müller, P., Poli, F., Summers, A.J.: Leveraging Rust Types forModular Specification and Verification. In: Object-Oriented Programming Systems,Languages, and Applications (OOPSLA) (2019)6. Barr, E.T., Vo, T., Le, V., Su, Z.: Automatic Detection of Floating-point Exceptions.In: Principles of Programming Languages (POPL) (2013)7. Barrett, C., Conway, C.L., Deters, M., Hadarean, L., Jovanovi’c, D., King, T.,Reynolds, A., Tinelli, C.: CVC4. In: Computer Aided Verification (CAV) (2011),snowbird, Utah8. Barrett, C., Stump, A., Tinelli, C., et al.: The SMT-LIB Standard: Version 2.0. In:Proceedings of the 8th International Workshop on Satisfiability Modulo Theories(2010)9. Beckert, B., Nestler, B., Kiefer, M., Selzer, M., Ulbrich, M.: Experience Report:Formal Methods in Material Science. CoRR abs/1802.02374 (2018)10. Benz, F., Hildebrandt, A., Hack, S.: A Dynamic Program Analysis to Find Floating-Point Accuracy Problems. In: Programming Language Design and Implementation(PLDI) (2012)11. Beyer, D.: Advances in automatic software verification: Sv-comp 2020. In: Toolsand Algorithms for the Construction and Analysis of Systems (TACAS) (2020)12. Blanchet, B., Cousot, P., Cousot, R., Feret, J., Mauborgne, L., Miné, A., Monniaux,D., Rival, X.: A Static Analyzer for Large Safety-Critical Software. In: ProgrammingLanguage Design and Implementation (PLDI) (2003)13. Boldo, S., Clément, F., Filliâtre, J.C., Mayero, M., Melquiond, G., Weis, P.: WaveEquation Numerical Resolution: A Comprehensive Mechanized Proof of a C Program.Journal of Automated Reasoning (4) (2013)14. Boldo, S., Filliâtre, J.C., Melquiond, G.: Combining Coq and Gappa for CertifyingFloating-Point Programs. In: Intelligent Computer Mathematics (2009)15. Boldo, S., Melquiond, G.: Flocq: A Unified Library for Proving Floating-PointAlgorithms in Coq. In: IEEE Symposium on Computer Arithmetic (ARITH) (2011)16. Brain, M., Tinelli, C., Rümmer, P., Wahl, T.: An Automatable Formal Semantics forIEEE-754 Floating-Point Arithmetic. In: IEEE Symposium on Computer Arithmetic(ARITH) (2015)17. Brain, M., Schanda, F., Sun, Y.: Building Better Bit-Blasting for Floating-PointProblems. In: Tools and Algorithms for the Construction and Analysis of Systems(TACAS) (2019)18. Chapman, R., Schanda, F.: Are We There Yet? 20 Years of Industrial TheoremProving with SPARK. In: Interactive Theorem Proving (ITP) (2014)19. Chen, L., Miné, A., Cousot, P.: A Sound Floating-Point Polyhedra Abstract Domain.In: Asian Symposium on Programming Languages and Systems (APLAS) (2008)8 R. Abbasi et al.20. Chiang, W.F., Gopalakrishnan, G., Rakamaric, Z., Solovyev, A.: Efficient Search forInputs Causing High Floating-point Errors. In: Principles and Practice of ParallelProgramming (PPoPP) (2014)21. Cimatti, A., Griggio, A., Schaafsma, B., Sebastiani, R.: The MathSAT5 SMTSolver. In: Proceedings of Tools and Algorithms for the Construction and Analysisof Systems (TACAS) (2013)22. Cok, D.R.: OpenJML: JML for Java 7 by extending OpenJDK. In: NASA FormalMethods (2011)23. Cordeiro, L.C., Kesseli, P., Kroening, D., Schrammel, P., Trtík, M.: JBMC: ABounded Model Checking Tool for Verifying Java Bytecode. In: Computer AidedVerification (CAV) (2018)24. Cuoq, P., Kirchner, F., Kosmatov, N., Prevosto, V., Signoles, J., Yakobowski, B.:Frama-C. In: Software Engineering and Formal Methods (SEFM) (2012)25. Darulova, E., Izycheva, A., Nasir, F., Ritter, F., Becker, H., Bastian, R.: Daisy -Framework for Analysis and Optimization of Numerical Programs. In: Tools andAlgorithms for the Construction and Analysis of Systems (TACAS) (2018)26. Darulova, E., Kuncak, V.: Towards a Compiler for Reals. TOPLAS (2) (2017)27. De Moura, L., Bjørner, N.: Z3: An Efficient SMT Solver. In: Tools and Algorithmsfor the Construction and Analysis of Systems (TACAS) (2008)28. Eilers, M., Müller, P.: Nagini: A Static Verifier for Python. In: Computer AidedVerification (CAV) (2018)29. Filliâtre, J.C., Paskevich, A.: Why3 — Where Programs Meet Provers. In: EuropeanSymposium on Programming (ESOP) (2013)30. Fox, A., Harrison, J., Akbarpour, B.: A Formal Model of IEEE FloatingPoint Arithmetic. HOL4 Theorem Prover Library (2017), https://github.com/HOL-Theorem-Prover/HOL/tree/master/src/floating-point
31. Fumex, C., Marché, C., Moy, Y.: Automating the Verification of Floating-PointPrograms. In: Verified Software: Theories, Tools, and Experiments (VSTTE) (2017)32. Gao, S., Kong, S., Clarke, E.M.: dReal: An SMT Solver for Nonlinear Theories overthe Reals. In: Automated Deduction – CADE-24 (2013)33. Ge, Y., de Moura, L.: Complete Instantiation for Quantified Formulas in Satisfia-biliby Modulo Theories. In: Computer Aided Verification (CAV) (2009)34. Goubault, E., Putot, S.: Static Analysis of Finite Precision Computations. In:Verification, Model Checking, and Abstract Interpretation (VMCAI) (2011)35. Goubault, E., Putot, S.: Robustness Analysis of Finite Precision Implementations.In: Asian Symposium on Programming Languages and Systems (APLAS) (2013)36. Harel, D., Kozen, D., Tiuryn, J.: Dynamic Logic. In: Handbook of PhilosophicalLogic, pp. 99–217. Springer (2001)37. Harrison, J.: Floating Point Verification in HOL Light: The Exponential Function.Formal Methods in System Design (3) (2000)38. IEEE, C.S.: IEEE Standard for Floating-Point Arithmetic. IEEE Std 754-2008(2008)39. Izycheva, A., Darulova, E., Seidl, H.: Counterexample and Simulation-GuidedFloating-Point Loop Invariant Synthesis. In: Static Analysis Symposium (SAS)(2020)40. Jacobs, B., Smans, J., Philippaerts, P., Vogels, F., Penninckx, W., Piessens, F.:VeriFast: A Powerful, Sound, Predictable, Fast Verifier for C and Java. In: NASAFormal Methods (NFM) (2011)41. Jacobsen, C., Solovyev, A., Gopalakrishnan, G.: A Parameterized Floating-PointFormalizaton in HOL Light. Electronic Notes in Theoretical Computer Science (2015)eductive Verification of Floating-Point Java Programs in KeY 1942. Jeannet, B., Miné, A.: Apron: A Library of Numerical Abstract Domains for StaticAnalysis. In: Computer Aided Verification (CAV) (2009)43. Lam, M.O., Hollingsworth, J.K., Stewart, G.W.: Dynamic Floating-point Cancella-tion Detection. Parallel Comput. (3) (2013)44. Leavens, G.T., Baker, A.L., Ruby, C.: Preliminary design of JML: A behavioralinterface specification language for Java. ACM SIGSOFT Software EngineeringNotes (3) (2006)45. Leavens, G.T., Cheon, Y.: Design by Contract with JML (2006),
46. Leino, K.R.M.: This is Boogie 2 (June 2008),
47. Magron, V., Constantinides, G., Donaldson, A.: Certified Roundoff Error BoundsUsing Semidefinite Programming. ACM Trans. Math. Softw. (4) (2017)48. Marché, C., Paulin-Mohring, C., Urbain, X.: The KRAKATOA tool for certificationof Java/JavaCard programs annotated in JML. The Journal of Logic and AlgebraicProgramming (1) (2004)49. McCormick, J.W., Chapin, P.C.: Building High Integrity Applications with SPARK.Cambridge University Press (2015)50. Meyer, B.: Applying “Design by Contract”. Computer (10) (1992)51. Moscato, M., Titolo, L., Dutle, A., Muñoz, C.: Automatic Estimation of VerifiedFloating-Point Round-Off Errors via Static Analysis. In: SAFECOMP (2017)52. Muller, J., Brisebarre, N., de Dinechin, F., Jeannerod, C., Lefèvre, V., Melquiond,G., Revol, N., Stehlé, D., Torres, S.: Handbook of Floating-Point Arithmetic.Birkhäuser (2010)53. Müller, P., Schwerhoff, M., Summers, A.J.: Viper: A Verification Infrastructurefor Permission-Based Reasoning. In: Verification, Model Checking, and AbstractInterpretation (VMCAI) (2016)54. Pasareanu, C.S., Mehlitz, P.C., Bushnell, D.H., Gundy-Burlet, K., Lowry, M.R.,Person, S., Pape, M.: Combining unit-level symbolic execution and system-levelconcrete execution for testing NASA software. In: International Symposium onSoftware Testing and Analysis (ISSTA) (2008)55. Siegel, S.F., Mironova, A., Avrunin, G.S., Clarke, L.A.: Using Model Checkingwith Symbolic Execution to Verify Parallel Numerical Programs. In: InternationalSymposium on Software Testing and Analysis (ISSTA) (2006)56. Solovyev, A., Jacobsen, C., Rakamaric, Z., Gopalakrishnan, G.: Rigorous Estimationof Floating-Point Round-off Errors with Symbolic Taylor Expansions. In: FormalMethods (FM) (2015)0 R. Abbasi et al. A Appendix
A.1 Axioms for Transcendental Functions in KeY
Here we present the axioms that we implemented to prove properties for bench-marks with transcendental functions: – If arg is NaN or an infinity, then sin(arg) is NaN. – If arg is zero, then the result of sin(arg) is a zero with the same sign as arg . – if arg is not NaN or infinity, then the returned value of sin is between − . and . . – if arg is not NaN or infinity, then the returned value of sin is not NaN. – If arg is NaN or an infinity, then cos(arg) is NaN. – if arg is not NaN or infinity, then the returned value of cos is between − . and . . – if arg is not NaN or infinity, then the returned value of cos is not NaN. – If arg is NaN or an infinity, then atan(arg) is NaN. – If arg is zero, then the result of atan(arg) is a zero with the same sign as arg . – if arg is not NaN, then the returned value of atan is between − π/ and π/ .In our Evaluation we showed that handling square root axiomatically canimprove performance. Here is the list of axioms we used for this function: – If arg is NaN or less than zero, then sqrt(arg) is NaN. – If arg is positive infinity, then sqrt(arg) is positive infinity. – If arg is positive zero or negative zero, then sqrt(arg) is the same as arg . – If arg is not NaN and greater or equal to zero, then sqrt(arg) is not NaN. – If arg is not infinity and is greater than one then sqrt(arg) < arg . A.2 Detailed Evaluation Results
Here we present the tables that did not fit in the main body of the paper andcontain detailed results of our experiments. In each table we show the number ofgoals per benchmark that each solver can validate or invalidate, together with theaverage and maximum time (in seconds) needed. ‘TO’ in the maximum columndenotes that at least one goal timed out. The goals resulting in timeout wereexcluded from the computation of the average time.Table 3 shows the results for benchmarks with valid contracts with thequantified formulas included in the SMT translations. We have summarized thistable in row 1 of Table 2.Table 4 demonstrates the results for the same benchmarks when the quantifiedaxioms are removed form the SMT translations which is summarized in row 2 ofTable 2.Table 5 shows the detailed results of the experiments with benchmarks withinvalid contracts, when the quantified formulas are included in and removed formthe SMT translations. This results are summarized in rows 3 and 4 of Table 2. eductive Verification of Floating-Point Java Programs in KeY 21benchmark
Table 3: Summary of valid goals proved and running times of each solver for theSMT translations with quantified axioms benchmark
Table 4: Summary of valid goals proved and running times of each solver for theSMT translations without quantified axioms benchmark with quantified axioms
Matrix3.transposedEq 0 1 0 0 - TO 0 0 - TO - - - -Rectangle.scale(2) 12 4 12 0 12.2 TO 8 0 4.6 TO - - - -Complex.add(2) 2 2 2 0 0.6 0.7 2 0 1.4 TO - - - -FPLoop.fploop2 3 1 3 0 0.9 1.7 3 0 0.5 TO - - - -FPLoop.fploop3 3 1 3 0 0.4 1.7 3 0 0.3 TO - - - - without quantified axioms
Matrix3.transposedEq 0 1 0 1 170.2 170.2 0 0 - TO 0 1 16.2 16.2Rectangle.scale(2) 12 4 12 3 12.2 TO 12 3 108.2 TO 12 4 2.4 9.5Complex.add(2) 2 2 2 2 0.5 0.5 2 2 0.7 1.0 2 2 0.2 0.2FPLoop.fploop2 3 1 3 1 0.4 0.6 3 1 0.9 1.7 3 1 0.3 0.5FPLoop.fploop3 3 1 3 1 0.3 0.6 3 1 0.6 1.7 3 1 0.2 0.4
Table 5: Summary of invalid goals proved and running times of each solver forthe SMT translations with and without quantified axioms benchmark fp.sqrt
Cartesian.toPolar 4 4 6.9 7.5 1 23.5 TO 4 1.2 1.7Cartesian.distanceTo 1 1 8.2 8.2 0 - TO 1 1.0 1.0Circuit.instantCurrent 2 2 123.5 127.5 0 - TO 0 - TO axiomatized sqrt
Cartesian.toPolar 4 4 2.0 2.9 4 49.81 163.0 4 1.0 1.6Cartesian.distanceTo 1 1 2.7 2.7 1 233.0 233.0 1 1.0 1.0Circuit.instantCurrent 2 0 - TO 0 - TO 0 (2 CE) 11.1 13.8
Table 6: Summary statistics for benchmarks containing the square root function,with quantified formulas removed from the SMT-LIB translationThe first two sections of Table 7 show the results from applying the twoapproaches for handling transcendental functions in sections 3.2.A and 3.2.B,using the default SMT translation in KeY. The last section of the table depicts theresults of applying the approach in Section 3.2.B, while the quantified formulasare removed from the SMT translations. This table is summarized in rows 5 and6 of Table 2Table 6 shows the detailed results of conducting the experiment on thebenchmarks containing sqrt , comparing the approach from Section 3.2.B (addingthe necessary axioms, resp. taclet rules) to using the square root implemented inSMT solvers ( fp.sqrt ), when the quantified formulas are removed from the SMTtranslations. We have summarized the results of these experiments in rows 7 and8 of Table 2. eductive Verification of Floating-Point Java Programs in KeY 23 benchmark axioms in SMT-LIB translation
Cartesian.toPolar 4 4 7.1 9.2 1 16.8 TO - - -Polar.toCartesian 2 2 0.9 0.9 2 69.7 95.7 - - -Circuit.instantCurrent 2 1 123.6 TO 0 - TO - - -Circuit.instantVoltage 2 2 1.1 1.1 1 103.8 TO - - - axioms as taclet rules in KeY with quantified formulas
Cartesian.toPolar 4 4 6.8 7.7 1 40.0 TO - - -Polar.toCartesian 2 2 1.4 1.9 1 288.0 TO - - -Circuit.instantCurrent 2 2 123.8 128.3 0 - TO - - -Circuit.instantVoltage 2 2 1.3 1.3 0 - TO - - - axioms as taclet rules in KeY without quantified formulas
Cartesian.toPolar 4 4 6.9 7.5 1 23.5 TO 4 1.2 1.7Polar.toCartesian 2 2 1.5 2.3 2 52.6 81.2 2 0.6 0.8Circuit.instantCurrent 2 2 123.5 127.5 0 - TO 0 - TOCircuit.instantVoltage 2 2 1.5 1.7 2 146.4 160.7 2 0.8 0.8
Table 7: Summary statistics with axioms in SMT-LIB translations and as tacletrules in KeY
Sensitivity to Contract Variations
We conducted an experiment on our
Rectangle.scale benchmark to assess the solver’s sensitivity to various changes,applied to the benchmark’s contract or its implementation. We considered thefollowing modifications: – v : is the original version of the benchmark (Listing 1.1 using the secondcontract) and our baseline; – v : reduces the number of classes involved to two, while keeping the samefunctionality; – v : reduces the number of classes involved to one, while keeping the samefunctionality; – v : modifies v such that variable bounds in the precondition become more“complicated” in terms of longer fractional parts (e.g. the bounds for arg2 become [3.0000001, -6.4000000003] instead of [3.0001, -6.4000003]); – v : simplifies the mathematical expression of v (less arithmetic operations) – v : modifies v such that arg2 has a tighter bound, i.e. the interval width issmaller – v : modifies v such that arg2 has a larger bound, i.e. the interval width islarger – v : modifies v such that only arg2 has a “complicated” bound – v : modifies v such that arg2 has a tighter boundTable 8 summarizes the results for this experiment. With the quantified forulasincluded in the SMT translation, Both CVC4 and Z3 are able to prove more goals version applied change Table 8: SMT solvers summary statistics for various versions of the
Rectangle benchmark with quantified axioms in the SMT translationswhen the number of classes is reduced, and also when the number of arithmeticoperations is reduced. Z3 further seems to be sensitive to whether variable boundsare “complicated” or not, whereas CVC4 is not. We obtain a somewhat surprisingresult when arg2 has a tighter bound. While Z3’s performance improves, CVC4validates two goals less. On the other hand, increasing the bounds on arg2 doesnot seem to make a difference.It seems that arg2 is the bottleneck for this benchmark; when only arg2 has a“complicated” input interval, CVC4 proves less goals. Finally, constraining arg2arg2