[PDF] Deductive Verification of Floating-Point Java Programs in KeY

Abstract

Deductive verification has been successful in verifying interesting properties of real-world programs. One notable gap is the limited support for floating-point reasoning. This is unfortunate, as floating-point arithmetic is particularly unintuitive to reason about due to rounding as well as the presence of the special values infinity and `Not a Number' (NaN). In this paper, we present the first floating-point support in a deductive verification tool for the Java programming language. Our support in the KeY verifier handles arithmetic via floating-point decision procedures inside SMT solvers and transcendental functions via axiomatization. We evaluate this integration on new benchmarks, and show that this approach is powerful enough to prove the absence of floating-point special values -- often a prerequisite for further reasoning about numerical computations -- as well as certain functional properties for realistic benchmarks.

Full PDF

CC o n s i s t e n t * C o m p l e t e * W e l l D o c u m e n t e d * E a s y t o R e u s e * * E v a l u a t e d * T A C A S * A r t i f a c t * A E C Deductive Veriﬁcation of Floating-PointJava Programs in KeY

Rosa Abbasi ( (cid:66) ), Jonas Schiﬄ , Eva Darulova ,Mattias Ulbrich , and Wolfgang Ahrendt MPI-SWS, Kaiserslautern and Saarbrücken, Germany, {rosaabbasi,eva}@mpi-sws.org Karlsruhe Institute of Technology, Karlsruhe, Germany, {jonas.schiffl,ulbrich}@kit.edu Chalmers University of Technology, Göteborg, Sweden, [email protected]

Abstract.

Deductive veriﬁcation has been successful in verifying inter-esting properties of real-world programs. One notable gap is the limitedsupport for ﬂoating-point reasoning. This is unfortunate, as ﬂoating-pointarithmetic is particularly unintuitive to reason about due to roundingas well as the presence of the special values inﬁnity and ‘Not a Num-ber’ (NaN). In this paper, we present the ﬁrst ﬂoating-point support ina deductive veriﬁcation tool for the Java programming language. Oursupport in the KeY veriﬁer handles arithmetic via ﬂoating-point decisionprocedures inside SMT solvers and transcendental functions via axioma-tization. We evaluate this integration on new benchmarks, and show thatthis approach is powerful enough to prove the absence of ﬂoating-pointspecial values—often a prerequisite for further reasoning about numeri-cal computations—as well as certain functional properties for realisticbenchmarks.

Keywords:

Deductive Veriﬁcation · Floating-point Arithmetic · Tran-scendental Functions.

Deductive veriﬁcation has been successful in providing functional veriﬁcation forprograms written in popular programming languages such as Java [3, 22, 40, 48],Python [28], Rust [5], C [24, 53], and Ada [18, 49]. Deductive veriﬁers allow auser to annotate methods in a program with pre- and postconditions, from whichthey automatically generate veriﬁcation conditions (VCs). These are then eitherproven directly by the veriﬁer itself, or discharged with external tools such asautomated (SMT) solvers or interactive proof assistants.While deductive veriﬁers fully implement many sophisticated data represen-tations (including heap data structures, objects, and ownership), support forﬂoating-point numbers remains rather limited – solely Frama-C and SPARK oﬀerautomated support for ﬂoating-point arithmetic in C and Ada [31]. This stateof aﬀairs is at least partially a result of previous limitations in ﬂoating-pointsupport in SMT solvers. Consequently, deductive veriﬁcation has been used for a r X i v : . [ c s . P L ] J a n R. Abbasi et al. ﬂoating-point programs only by experts with considerable manual eﬀort [14, 31].This is unfortunate as it makes deductive veriﬁcation unavailable for a largenumber of programs across many domains including embedded systems, machinelearning, and scientiﬁc computing. With the increasing need for parallelizationin code, scientiﬁc computing speciﬁcally has recently experienced algorithmicchallenges for which formal methods may contribute to a solution [9, 55].One of the main challenges of ﬂoating-point arithmetic is its unintuitivebehavior and the special values that the IEEE 754 standard [38] introduces.For instance, an overﬂow or a division by zero results in the special value(positive or negative) inﬁnity , and not a runtime exception. Similarly, invalidoperations like sqrt(-1.0) result in a

Not a Number (NaN) value. These specialvalues are problematic as seemingly straight-forward identities do not hold ( x== x or x * 0.0 == 0.0 ). In addition, every operation on ﬂoating-point numberspotentially involves rounding, which compromises familiar rules like associativityand distributivity. Hence, reasoning support for writing correct ﬂoating-pointprograms is indispensable.Abstract interpretation-based tools can prove the absence of runtime errorsand special values [19, 42], and bound roundoﬀ errors due to ﬂoating-point’sﬁnite precision [10, 20, 25, 35, 56]. SMT decision procedures [17] or SAT-basedmodel-checking [23,55], on the other hand, can prove intricate properties requiringbit-precise reasoning. However, these techniques and tools largely support onlypurely ﬂoating-point programs or program snippets, or analyze programs onlyup to a predeﬁned depth of the call stack. General reasoning about real-worldobject-oriented programs, however, also requires support for features such as the(unbounded) heap, necessitating diﬀerent analyses which need to be combinedwith ﬂoating-point reasoning.Handling ﬂoating-points in a deductive veriﬁer has unique advantages. First,the deductive veriﬁcation approach already comes with the infrastructure forreasoning about complex control and data structures (like exception handling andheap). Second, it allows one to ﬂexibly combine the veriﬁer’s symbolic executionreasoning with external decision procedures. Third, depending on the theorysupport, the veriﬁer or external solver may also generate counterexamples of aproperty and thus help program debugging – something an abstract interpretation-based approach fundamentally cannot provide.We report on adding ﬂoating-point support to the KeY deductive veriﬁer,providing the ﬁrst automated deductive ﬂoating-point support for the Javaprogramming language. We focus mainly on proving the absence of the specialvalues inﬁnity and NaN. While these are helpful in certain circumstances, for mostapplications they signal an error. Hence, showing their absence is a prerequisitefor further (functional) reasoning. That said, our extension also allows one toexpress and discharge arbitrary functional properties expressible in ﬂoating-pointarithmetic, including bounds on roundoﬀ errors for certain programs, and boundson diﬀerences between two similar ﬂoating-point programsWe exploit both KeY’s symbolic execution and external SMT support. Onthe one hand, we handle arithmetic operations by relying on a combination of eductive Veriﬁcation of Floating-Point Java Programs in KeY 3 KeY’s symbolic execution to handle the heap and SMT based decision proceduresto handle the ﬂoating-point part of the VCs. On the other hand, we supporttranscendental functions via axiomatization in the KeY prover itself.Transcendental functions such as sine are a common feature in numericalprograms, but are not supported by ﬂoating-point decision procedures. We exploretwo ways of supporting them soundly but approximately, by encoding them asaxiomatized uninterpreted function symbols once directly in the SMT queries,and once in additional calculus rules in KeY. Our evaluation shows that eventhough such reasoning is approximate, it is nonetheless suﬃcient to prove theabsence of special values in many interesting programs.We evaluate KeY’s ﬂoating-point support on a number of real-world ﬂoating-point Java programs. Our benchmark set allows us to evaluate recent progress inSMT ﬂoating-point support in Z3 [27], CVC4 [7] and MathSAT [21] on yet unseenbenchmarks. For instance, we observe that quantiﬁers are challenging even if theydo not aﬀect satisﬁability of SMT queries. Our benchmarks are openly available,and we expect our insights to be useful for further solver development.

Contributions

In summary, we make the following contributions: – we implement and evaluate the ﬁrst automated deductive veriﬁcation ofﬂoating-point Java programs by combining the strength of rule based andSMT based deduction; – we collect a new set of challenging real-world ﬂoating-point benchmarks inJava (available at https://gitlab.mpi-sws.org/AVA/key-float-benchmarks/ ); – we compare diﬀerent SMT solvers for discharging ﬂoating-point VCs on thisnew set of benchmarks; – and we develop novel automated support for reasoning about transcendentalfunctions in a deductive veriﬁer. KeY [3] is a platform for deductive veriﬁcation of Java programs, working at asource code level. The input is a Java program annotated in the Java ModelingLanguage (JML) [44], encouraging a

Design by Contract ([45, 50]) approach tosoftware development. The user speciﬁes the expected behavior of Java classeswith class invariants that the program has to maintain at critical points. Methodsare speciﬁed with method contracts , consisting mainly of pre- and postconditions,with the understanding that if the precondition holds when the method is called,the postcondition has to hold after the method returns.After loading an annotated program, KeY translates it to a formula inJava Dynamic Logic [3] (JavaDL), an instance of Dynamic Logic [36] whichenables logical reasoning about Java programs. Logical rules are provided forthe translation of programs into ﬁrst-order logic, and for closing the resulting goals , or proof obligations. KeY is semi-interactive in that it allows manual rule

R. Abbasi et al. application, while also oﬀering powerful built-in automation and macros. Inaddition, it is also possible to translate an open goal into SMT-LIB format [8]and call an external SMT solver. For speciﬁc theories, SMT solvers can be muchmore eﬃcient than KeY’s own automation. This makes it possible to prove somegoals, which depend on SMT supported theories, by using an SMT solver, whileothers are proved internally, using KeY’s own automation.

In the following, we summarize some central characteristics of Java ﬂoating-pointnumbers, loosely following [52]. Each normal ﬂoating-point number x can berepresented as a triplet ( s, m, e ) , such that x = ( − s ∗ m ∗ e , where s ∈ { , } is the sign , m (called signiﬁcand ) is a binary ﬁxed-point number with one digitbefore the radix point and p − digits after the radix point (note that ≤ m < ),and e ( exponent ) is an integer such that e min ≤ e ≤ e max . Java supports twoﬂoating-point formats (both in base ): float (‘single’) precision with p = 24 , andminimal and maximal exponent e min = − , e max = 127 and double precisionwith p = 53 , e min = − , e max = 1023 .Whenever the result of a computation cannot be exactly represented withthe given precision, it is rounded. IEEE 754 deﬁnes various rounding modes, ofwhich Java only supports round to nearest, ties to even . Rounding is exact, as ifone would ﬁrst compute the ideal real number, and round afterwards.The triple representation gives us two zeros, +0 and − , represented by (0 , , and (1 , , , respectively. If the absolute value of the ideal result of acomputation is too small to be representable as a ﬂoating-point number of thegiven format, the resulting ﬂoating point number is +0 or − . In addition, thereare three special values, + ∞ , −∞ , and NaN (Not a Number). If the absolutevalue of the ideal result of a computation is too big to be representable as aﬂoating-point number of the given format, the result is + ∞ or −∞ . Also, divisionby zero will give an inﬁnite result (e.g., . / + 0 = + ∞ ). Computing further withinﬁnity may give an inﬁnite result (e.g., + ∞ + + ∞ = + ∞ ), but may also resultin the additional ‘error value’ NaN (e.g., + ∞ − + ∞ = NaN). Due to the presenceof inﬁnities and NaN, ﬂoating-point operations do not throw Java exceptions.By default, the Java virtual machine is allowed to make use of higher-precisionformats provided by the hardware. This can make computation more accurate,but it also leads to platform dependent behaviour. This can be avoided by usingthe strictfp modiﬁer, ensuring that only the single and double precision typesare used. This modiﬁer ensures portability.

In order to be able to specify and verify programs containing ﬂoating-pointnumbers, we made several extensions to the KeY tool. First, we added the float eductive Veriﬁcation of Floating-Point Java Programs in KeY 5

Listing 1.1: The Rectangle.scale benchmark /*@ public normal_behavior @ requires \fp_nice(arg0.x) && \fp_nice(arg0.y)@ && \fp_nice(arg1) && \fp_nice(arg2);@ ensures !\fp_nan(\result.x) && !\fp_nan(\result.y) &&@ !\fp_nan(\result.width) && !\fp_nan(\result.height);@ also @ public normal_behavior @ requires -5.53 <= arg0.x && arg0.x <= -3.38 &&@ -5.53 <= arg0.y && arg0.y <= -3.38 &&@ 3.1 < arg0.width && arg0.width <= 3.7332 &&@ 3.0000001 < arg0.height && arg0.height <=4.0004 &&@ 3.0003001 < arg1 && arg1 <= 4.0024 &&@ -6.4000003 < arg2 && arg2 <= 3.0001;@ ensures !\fp_nan(\result.x) && !\fp_nan(\result.y)&&@ !\fp_nan(\result.width) &&!\fp_nan(\result.height);@*/ public Rectangle scale(Rectangle arg0, double arg1, double arg2){Area v1 = new

Area(arg0);AffineTransform v2 = AffineTransform.getScaleInstance(arg1, arg2);Area v3 = v1.createTransformedArea(v2);Rectangle v4 = v3.getRectangle2D(); return v4;} and double types to the KeY type system, together with an enum type for thediﬀerent rounding modes of the IEEE 754 Standard.We further introduced functions and predicate symbols to formalize opera-tions ( + , * , . . . ) and comparisons ( < , == , . . . ) on ﬂoating-point expressions. Thetranslation supports both code with and without the strictfp modiﬁer. However,since the actual precision of non-strictfp operations is not known, the functionsymbols remain uninterpreted. We extended KeY’s parser to correctly handleprograms and annotations containing ﬂoating-point numbers, and added logicrules for translating ﬂoating-point expressions from Java or JML to JavaDL.As an example, Listing 1.1 shows JML speciﬁcations of our Rectangle bench-mark that contains ﬂoating-point literals and makes use of the fp_nan and fp_nice predicates. fp_nan states that a ﬂoating-point expression is NaN and fp_nice ,which is shorthand for “not inﬁnity and not NaN”, states that a ﬂoating-pointexpression is not NaN or inﬁnity. The scale method contains two contracts thatare checked separately, ensuring that the class ﬁelds of a scaled rectangle objectare not NaN, considering diﬀerent preconditions. For the ﬁrst contract, the SMTsolver produces a counterexample. In the second, we bound inputs by concreteranges that we picked arbitrarily and get the valid result. In practice, such rangeswould come from the context, e.g. from the kind of rectangles that appear in anapplication, or from known ranges of sensor values.

R. Abbasi et al.

Concerning discharging the resulting proof obligations, there were two mainways to consider. One is to create a ﬂoating-point theory within KeY by addingaxioms and deduction rules, so that the desired properties can be proven inKeY’s sequent calculus. The other way is to translate the proof obligations fromJavaDL to SMT-LIB and call an external SMT solver. While the KeY approachtraditionally favors conducting proofs within KeY, for this work, we partiallydeviated from this way in order to harness the greater experience and eﬃciency ofSMT solvers when it comes to ﬂoating-point arithmetic. Our approach attemptsto get the best of both worlds by distinguishing between basic ﬂoating-pointarithmetic, i. e., elementary operations and comparisons, and more complexfunctions which do not have an SMT-LIB equivalent (e. g., the transcendentalfunctions), or where the SMT-LIB function is not usefully implemented by currentSMT solvers (see Section 3.2.B).Elementary operations and comparisons get translated to the correspondingSMT-LIB functions. In SMT-LIB, all ﬂoating-point computations conform to theIEEE 754 Standard. Therefore, only Java programs with the strictfp modiﬁercan be directly translated to SMT-LIB without loss of correctness.We developed a translation from KeY’s ﬂoating-point theory to SMT-LIB.In order to integrate it into KeY, we also overhauled the existing translationfrom JavaDL to SMT-LIB to create a new, more modular framework, whichnow supports all the features of the original translation, e. g., heaps and integerarithmetic, but also ﬂoating-point expressions at the same time.Floating-point intricacies sometimes require extra caution. For example, thereare two diﬀerent notions of equality for ﬂoats: bitwise equality and IEEE754equality. Our implementation ensures these are distinguished correctly, and thatthe speciﬁcation language remains intuitive for a developer to use.Using the translation to SMT-LIB, we can specify and prove two classes ofproperties in KeY: The absence of special values is speciﬁed using the fp_nan and fp_infinite predicates (or the fp_nice equivalent). Furthermore, one can specify functional properties that are expressible in ﬂoating-point arithmetic, e.g. onecan compare the result of a computation against the result of a diﬀerent programwhich is known to produce a good result or a reference value.

Floating-point decision procedures in SMT solvers successfully handle programsconsisting of arithmetic and square root operations. Many numerical real-worldprograms, however, include transcendental functions such as sin and cos . In Javaprograms, these functions are implemented as static library functions in the class java.lang.Math .Unlike arithmetic operations, transcendental functions are much more looselyspeciﬁed by the IEEE 754 Standard—only an upper bound on the roundoﬀerror is given. Libraries are thus free to provide diﬀerent implementations, andeven tighter error bounds. Exact reasoning in the same spirit as ﬂoating-pointarithmetic would thus have to encode a speciﬁc implementation. Given that theseimplementations are highly optimized, this approach would be arguably complex. eductive Veriﬁcation of Floating-Point Java Programs in KeY 7

We observe, however, that such exact reasoning about transcendental functions isoften not necessary and a sound approximate approach is suﬃcient and eﬃcient.In this section, we introduce an axiomatic approach for reasoning aboutprograms containing transcendental functions. We observe that with the ﬂexibilityof deductive veriﬁcation and KeY itself, we can instantiate it in two diﬀerent ways.We encode transcendental functions as uninterpreted functions and axiomatizethem in the SMT queries. Alternatively, we encode these axioms in KeY as logicalinference rules. (A) Axiomatization in SMT

We encode library functions as uninterpretedfunctions and include a set of axioms in the SMT-LIB translation for eachmethod that is called in a benchmark. That is, we extended KeY such that whena transcendental function exists in the proof obligation, its deﬁnition alongsideall the axioms for that function are added to the translation.For the axiomatization of transcendentals, we did not add rules that expandto a deﬁnition or allow a repeated approximation of the function value (likeexpansion into a Taylor series). Instead, we added a number of lemmata encodinginteresting properties related to special values. For instance, the following axiomstates that if the input to the sin function is not a NaN or inﬁnity, then thereturned value of sin is between − . and . : (assert (forall ((a Float64)) (=>(and (not (fp.isNaN a)) (not (fp.isInfinite a)))(and (fp.leq (sinDouble a) (fp Note that this implies that the result is not a NaN or inﬁnity. The other axiomsare similar in spirit, so we do not list them.These axioms are expressed as quantiﬁed ﬂoating-point formulas and capturehigh-level properties of library functions complying with the speciﬁcations in theIEEE 754 Standard. Clearly, since we do not have the actual implementations ofthese functions, we are not able to prove arbitrary properties. However, such anaxiomatization is often suﬃcient to check for the (absence of) special values, i.e.NaN and inﬁnity, as our experiments in Section 4.4 show. (B) Taclets in KeY

Reasoning about quantiﬁed formulas in SMT is a long-lasting challenge [33]. We have also observed in our experiments with onlyarithmetic operations (Section 4.3) that SMT solvers struggle with quantiﬁers incombination with ﬂoating-points. We have therefore implemented an alternativeapproach encoding the axioms not in the SMT queries, but instead as deductiveinference rules (so-called taclets) in KeY.The rules encode the same logical information as the universally quantiﬁedassertions that we add in SMT-LIB (and where we leave the choice of instantia-tions entirely to the SMT/SAT solver). With our taclet approach, we instantiatea quantiﬁer (only) to one’s needs. We note that for proving a property correct,this results in a correct (under)approximation. However, the prize for achieving

R. Abbasi et al.

Benchmark Details Automode Statisticsbenchmark

Table 1: Benchmark details and KeY automode statistics, time is measured insecondsmore closed proofs and shorter running times is that for disproving a prop-erty, not considering all possible quantiﬁer instantiations may lead to spuriouscounterexamples, i.e., false positives.A heuristic strategy applies the rules automatically using the occurrencesof transcendentals as instantiation triggers. However, instantiating the axiomstoo eagerly, considerably increases the number of open goals, which is why weassume that the user selects the axioms to apply manually (and did so in theexperiments). After the application the proof obligation can either be closed, i.eproven, by KeY automatically, or be given to the SMT solver as before for ﬁnalsolving.Currently, the set of axioms (in the SMT-LIB translation and as taclets inKeY) only contains axioms for the transcendental functions occurring in ourbenchmarks. So far we have axioms; however, adding more axioms (also forfurther transcendentals like exponentiation or logarithm) is straightforward. Thefull set of axioms is included in the Appendix of the technical report. We collected a set of existing ﬂoating-point Java programs representing real-world applications in order to evaluate the feasibility and performance of KeY’sﬂoating-point support.The left half of Table 1 provides an overview of our benchmarks. Eachbenchmark consists of one method, which is composed of arithmetic operations eductive Veriﬁcation of Floating-Point Java Programs in KeY 9

Listing 1.2: The Circuit.instantCurrent benchmark public class

Circuit { double maxVoltage, frequency, resistance, inductance; // .../*@ public normal_behavior @ requires ensures !\fp_nan(\result) && !\fp_infinite(\result);@*/ public double instantCurrent( double time) {Complex current = computeCurrent(); double maxCurrent = Math.sqrt(current.getRealPart() * current.getRealPart() +current.getImaginaryPart() * current.getImaginaryPart()); double theta = Math.atan(current.getImaginaryPart() / current.getRealPart()); return maxCurrent * Math.cos((2.0 * Math.PI * frequency * time) + theta);}} and method calls to potentially other classes. The invocations of methods from java.lang.Math (e.g. Math.abs ) are marked by “+1” in Table 1; these are resolvedby inlining the method implementation. For benchmarks that contain calls totranscendental functions and square root, the called functions are listed; these arehandled by our axiomatization. We include sqrt in this list, as we have observedthat exact support can be expensive, so it may be advantageous to handle sqrt axiomatically. Benchmarks

Rectangle , Circuit , Matrix3 and

Rotation are partiallyshown in Listings 1.1, 1.2, 1.3 and 1.4 respectively.Each benchmark also includes a JML contract that is to be checked. Forsome methods, we specify two contracts (marked by “(2)” in the ﬁrst columnof Table 1), each serving as an independent benchmark. The contracts for mostof these benchmarks check that the methods do not return a special value i.einﬁnity and/or NaN, the preconditions being that the variables are not themselvesspecial values and possibly are bounded in a given range. For the

Matrix , FPLoop and

Rotate benchmarks, we check a functional property (see Section 4.3).

FPLoop ,which has three contracts, additionally shows how to specify ﬂoating-point loopbehavior using loop invariants.

To reason about the contract of a selected benchmark, we apply KeY, whichgenerates proof obligations or ‘goals’. Some of these goals (heap-related) areclosed by KeY automatically. The remaining open goals are closed by either SMTsolvers with ﬂoating-point support directly (Section 3.1 and Section 3.2.A), or with a combination of transcendental KeY taclets and ﬂoating-point SMT solving(Section 3.2.B).Columns 6 and 7 in Table 1 show the number of proof obligations closed byKeY directly and to be discharged by external solvers, respectively. The next twocolumns show the number of taclet rules that KeY applied in order to close itsgoals, and the time this takes. For benchmarks with two contracts we show therespective values separated by ‘/’.We run our experiments on a server with 1.5 TB memory and 4x12 CPU coresat 3 GHz. However, KeY runs single-threadedly and does not use more than 8GBof memory.For our set of benchmarks, the symbolic execution process is fully automated.Note that the machinery can deal with loop invariants, if they are provided. Loopinvariant generation is, however, particularly challenging for ﬂoating-points dueto roundoﬀ errors [26, 39], and a research topic in itself.

Previous work [31] reported that SMT support for ﬂoating-point arithmetic israther limited. However, with recent advances [17], we evaluate the situationagain. Most benchmarks used to evaluate SMT solvers’ decision procedures [1]aim to check (individual) specialized (corner case) properties of ﬂoating-pointarithmetic. The proof obligations generated from our set of benchmarks arecomplementary in that they are more arithmetic heavy, while nonetheless relyingon accurate reasoning about special values and functional properties.For each open goal not automatically closed, KeY generates one SMT-LIBﬁle that is fed to the solvers for validation. We compare the performance of thethree major SMT solvers with ﬂoating-point support CVC4 [7] (version 1.8, withthe SymFPU library [17] enabled), Z3 (4.8.9) [27] and MathSAT (5.6.3) [21]. Forthis we set a timeout of 300s for each proof obligation. While KeY is able todischarge proof obligations in parallel, for our experiments, we do so sequentiallyto maintain comparability.KeY’s default translation to SMT includes quantiﬁers. These quantiﬁcationsare not related to ﬂoating-point arithmetic, but are used to logically encodeimportant properties of the Java memory model, like the type hierarchy andthe absence of dangling references on any valid Java heap. If we reason aboutﬂoating-point problems in isolation, they are not needed, but if we want toconsider Java veriﬁcation more holistically with questions combining aspects ofheap and ﬂoating point reasoning, they become essential. We manually inspectedthat the proof obligations without our axiomatized treatment of transcendentalfunctions do not depend on these properties and investigate the quantiﬁer supportby including or removing them from the SMT translations. We do not reportresults with quantiﬁers for MathSAT, since it does not support them.Table 2 summarizes the results of our experiments. Column 4 shows thenumber of expected valid or invalid goals for all benchmarks. For each solver weshow the number of goals that each solver can validate or invalidate, togetherwith the average time (in seconds) needed. The goals resulting in timeout were eductive Veriﬁcation of Floating-Point Java Programs in KeY 11 index experiment quantiﬁedaxioms (cid:51)

80 79 4.1 25 18.4 - -2 (cid:55)

80 79 4.0 52 35.0 80 8.83 invalidcontracts (cid:51) (cid:55) (cid:51)

10 9 33.2 4 63.4 - -6 axioms as taclets (cid:55)

10 10 33.4 5 74.2 8 0.97 fp.sqrt (cid:55) (cid:55)

Table 2: Summary of valid / invalid goals correctly decided and average runningtimes of each solver for the SMT translations with and without quantiﬁed axioms

Goal0.11.010.0100.0300.0 T i m e ( s ) ( L o g s c a l e ) CVC4Z3

Fig. 1: Runtimes for valid goals withSMT translations with quantiﬁers

Goal0.11.010.0100.0300.0 T i m e ( s ) ( L o g s c a l e ) CVC4MathSatZ3

Fig. 2: Runtimes for valid goals withSMT translations without quantiﬁersexcluded from the computation of the average time. Column 3 shows whetherthe SMT queries include quantiﬁers or not.Rows 1 and 2 of Table 2 show the results for benchmarks with valid contracts.This experiment thus represents the common behavior of KeY, whose main goalis to prove contracts correct. Rows 3 and 4 of Table 2 demonstrate the results forbenchmarks with invalid contracts, i.e. for those we expect a counterexample forat least one of the goals. The Appendix (Section A) contains the detailed resultsfor each experiment separated by benchmark. Figure 1 and Figure 2 show a moredetailed view of the solvers’ running time for the valid benchmarks. The x-axisshows the number of open goals that are discharged by the SMT solvers, sortedby running time for each solver individually. The k -th point of one graph showsthe minimum running time needed by the solver to close each of the k fastestgoals. Note that each solver may have diﬀerent goals which are its k fastest. They-axis shows the time on a logarithmic scale.We conclude that in the presence of quantiﬁed axioms and ﬂoating-pointarithmetic solvers’ performance deteriorate for both valid and invalid goals.In particular, none of the solvers is able to ﬁnd counterexamples for any ofthe invalid goals. However, when the quantiﬁed axioms are removed from the SMT translations, their performance improves. For valid contracts, CVC4 andMathSAT perform better than Z3, in terms of both number of goals validatedand the running time per goal. In particular, MathSAT is able to prove all goals.However, the running time performance of CVC4 is better than MathSAT’s. Forinvalid contracts, solvers are able to produce the expected counterexamples atleast partially. Particularly, MathSAT has a better performance than CVC4 andZ3 in terms of both running time and the number of proof obligations for whichit can produce counterexamples.We conducted another experiment on our

Rectangle.scale benchmark to assessthe solvers’ sensitivity to various changes, applied to the benchmark’s contractor its implementation. We considered modiﬁcations such as reducing the numberof classes while keeping the same functionality, having tighter and larger boundsfor variables, reducing the number of arithmetic operations etc. The detailsof this experiment can be found in the Appendix of the technical report. Insummary, solvers’ performance seems to be sensitive to slight innocuous lookingchanges such as the number of classes involved and variable bounds. For example,constraining arg2 in the original benchmark more tightly allows CVC4 to validateall goals (1 more). This behavior could be potentially exploited by e.g. relaxing avariable’s bounds.

Proving Functional Properties

Listings 1.3 and 1.4 show examples of functionalproperties that are expressible in ﬂoating-point arithmetic and that KeY canhandle. The veriﬁcation results are included in rows 1 and 2 of Table 2, for moredetails see the Appendix of the technical report.For

Matrix , we check that the determinants of a matrix and its transposeare equal. Note that this property holds trivially under real arithmetic, butnot necessarily under ﬂoating-points. After feeding transposedEq (which uses the determinant method) and its contract to KeY, increasing the default timeoutsuﬃciently and discharging the created goal, CVC4 generates a counterexamplein 170.2s seconds and MathSAT in 16.2s. Z3 times out after 30 minutes. Byfeeding transposedEqV2 (which uses the determinantNew method) to KeY, CVC4validates the contract in 1.1s, MathSAT in 3.9s and Z3 times out again. Onething worth noting is that the way programs are written can greatly inﬂuence thecomputational complexity needed to reject or verify the contract. This is evidentfrom the fact that slightly modifying the order of operations (using determinantNew instead) substantially reduces veriﬁcation time and changes the veriﬁcation resultfor MathSAT and CVC4.For

Rotate , we check that the diﬀerence between an original vector and theone that is rotated four times by 90 degrees, must not be larger than 1.0E-15.We also veriﬁed the same bound for the relative diﬀerence (by exploiting anothermethod and contract) for this benchmark. The constant cos90 in Listing 1.4 isnot precisely 0.0 to account for rounding eﬀects in the computation of the cosine.

FPLoop includes three loops, for which the contracts check that the return valueis bigger than a given constant.Though not always very fast, these examples show that veriﬁcation of func-tional ﬂoating-point properties is viable. eductive Veriﬁcation of Floating-Point Java Programs in KeY 13

Listing 1.3: The Matrix3 benchmark public class

Matrix3 { double a, b, c, d, e, f, g, h, i; //The matrix: [[a b c],[d e f],[g h i]] double det; // method transpose not shown double determinant() { return (a * e * i + b * f * g + c * d * h) -(c * e * g + b * d * i + a * f * h);} double determinantNew() { return (a * (e * i) + (g * (b * f) + c * (d * h))) -(e * (c * g) + (i * (b * d) + a * (f * h)));} /*@ ensures \fp_normal(\result) ==> (\result == det); @*/ double transposedEq() {det = determinant(); return transpose().determinant();} /*@ ensures \fp_normal(\result) ==> (\result == det); @*/ double transposedEqV2() {det = determinantNew(); return transpose().determinantNew();}}

Listing 1.4: The Rotation benchmark public class

Rotation { final static double cos90 = 6.123233995736766E-17; final static double sin90 = 1.0; // rotates a 2D vector by 90 degrees public static double [] rotate( double [] vec) { double x = vec[0] * cos90 - vec[1] * sin90; double y = vec[0] * sin90 + vec[1] * cos90; return new double []{x, y};} /*@ requires (\forall int i; 0 <= i && i < vec.length;@ \fp_nice(vec[i]) && vec[i] > 1.0 && vec[i] < 2.0) && vec.length == 2;@ ensures \result[0] < 1.0E-15 && \result[1] < 1.0E-15;*/ public static double [] computeError( double [] vec) { double [] temp = rotate(rotate(rotate(rotate(vec)))); return new double []{Math.abs(temp[0] - vec[0]), Math.abs(temp[1] - vec[1])};}}

We evaluated the two approaches from Section 3.2.A on our set of benchmarks;rows 5 and 6 in Table 2 summarize the results. (The detailed results of theseexperiments are included in the Appendix of the technical report.) Note thatboth approaches are fully automated.We conclude that the SMT solvers perform better when the axiomatizationis applied at the KeY level. When axioms for transcendental functions are addedto the SMT-LIB translation directly Z3 validates 4 out of 10 goals. With theaxiomatization at the KeY level, solvers are able to validate more goals (withquantiﬁed formulas removed from the SMT translations), e.g. Z3 is able tovalidate 5 goals and CVC4 can validate all. Therefore, it is preferable to applythem on the KeY side via taclet rules.All the solvers we have used in this work comply with the IEEE 754 standardand therefore have bit-precise support for the square root function. They providebit-precise reasoning by eﬀectively encoding the behavior of ﬂoating-point circuitsover bitvectors (which is naturally expensive), together with diﬀerent heuristicsand abstractions to speed up solving time. However, depending on the property, wedo not always need bit-precise reasoning, so we propose handling the square rootfunction with the same taclet-based axiomatization as introduced in Section 3.2.B.To this end, we conducted an experiment on the benchmarks containing sqrt ,comparing the approach from Section 3.2.B (adding the necessary axioms, resp.taclet rules) to using the square root implemented in SMT solvers ( fp.sqrt ). Wechose to include only axioms speciﬁed in or inferred from the IEEE 754 standard(e.g. if the argument of the square root function is NaN or less than zero, thenthe square root results in NaN). The full set of axioms that we used is includedin the Appendix of the technical report.Rows 7 and 8 in Table 2 summarize the results for this experiment; thedetailed results are included in the Appendix of the technical report. We observedthat for two out of the three benchmarks, the average running time of all solversdecreases using the axiomatized square root. Furthermore, Z3 is able to reasonabout more proof obligations with the axiomatized version. However, the successof this approach depends on the axioms added to KeY and may not always workif we do not have suitable axioms. For example, for the

Circuit.instantCurrent benchmark (Listing 1.2), using the axiomatized square root, CVC4 is not able tovalidate the contract, but with fp.sqrt the contract is validated.In summary, treating sqrt axiomatically can result in shorter solving timesthan performing bit-precise reasoning, but the approach may not always succeedwhen the axioms are not suﬃcient to prove a particular property.

The experiments show that highly automated ﬂoating point program veriﬁcationis viable for relevant properties (handling of special values and some functionalproperties), up to a certain level of complexity (given by the SMT solvers). Thechoices of which parts of a proof obligation are delegated to SMT, and how they eductive Veriﬁcation of Floating-Point Java Programs in KeY 15 are translated to SMT, are crucial for achieving eﬀective and eﬃcient programveriﬁcation. Arithmetic operations proved to be more eﬃciently dealt with bydelegation to SMT, whereas for transcendental functions, axiomatization andrule based treatment in the theorem prover, outside the SMT solver, performsclearly better.

Our implementation uses the ﬂoating-point SMT-LIB theory [16], which how-ever does not handle transcendental functions, as their semantics is (library)implementation dependent. Some real-valued automated solvers do handle tran-scendental functions [4, 32], but to the best of our knowledge, the combination ofﬂoating-points and reals in SMT solvers is still severely limited.None of the existing deductive veriﬁers support ﬂoating-point transcendentalfunctions automatically. The Why3 deductive veriﬁcation framework [29] hassupport for ﬂoating-point arithmetic, with front-ends for the C and Ada pro-gramming languages through Frama-C [24] and SPARK [18, 31], respectively.Why3 has back-end support for diﬀerent SMT solvers, as well as interactive proofassistants like Coq. Until recently, Why3 would discharge still many interestingﬂoating-point problems with help of Coq, relying on signiﬁcant user interaction. Inlater work [31] (in the context with ﬂoating-point veriﬁcation for Ada programs),Why3 can achieve a higher degree of automation. Note, however, that the user isstill required to add code assertions as well as ‘ghost code’ to a signiﬁcant extent.The Boogie intermediate veriﬁcation language [46] also supports ﬂoating-point expressions, and targets Z3 for discharging proof obligations. In the Boogiecommunity, it was observed that writing a speciﬁcation in Boogie leads todecreases in SMT solver performance when compared to writing the goal inSMT-LIB directly, probably due to an inherent mixing of theories when usingBoogie [2]. This matches our own experiences, and separation of theories shouldbe considered an important task for the further development of ﬂoating-pointveriﬁcation.Other deductive veriﬁers for Java have only rudimentary support for ﬂoating-points. Verifast [40] treats ﬂoating-point operations as if they were real values,and OpenJML [22] parses programs with ﬂoating-point operations, but essentiallytreats float and double as uninterpreted sorts.The Java category of veriﬁcation competition SV-COMP [11] contains a num-ber of benchmarks that make use of ﬂoating-point variables. However, the focusof these benchmarks is usually not on arithmetical properties of expressions, buton the completeness of the Java language support. Amongst the participants ofSV-COMP 2020, the Symbolic (Java) Pathﬁnder (SPF) [54] (and various exten-sions) and the Java Bounded Model Checker (JBMC) [23] support ﬂoating-pointarithmetic. Besides being limited to exploring the state space up to a boundeddepth, their constraint languages do not support quantiﬁers and abstracting ofmethod calls—which are features that we have used in this work.

Floating-point arithmetic has also been formalized in several interactivetheorem provers [15, 30, 41]. While one can prove intricate properties aboutﬂoating-point programs [13, 14, 37], proofs using interactive provers are to a largepart manual and require signiﬁcant expertise.Abstract interpretation based techniques can show the absence of specialvalues in ﬂoating-point code fully automatically, and several abstract domainswhich are sound with respect to ﬂoating-point arithmetic exist [19, 42]. While theanalysis itself is fully automated, applying it successfully to real-world programsin general requires adaptation to each program analyzed by end-users, e.g. theselection of suitable abstract domains or widening thresholds [12].Besides showing the absence of special values, recent research has developedstatic analyses to bound ﬂoating-point roundoﬀ errors [25, 34, 47, 51, 56]. Theseanalyses currently work only for small arithmetic kernels and the tools in particulardo not accept programs with objects.Dynamic analyses generally scale well on real-world programs, but can onlyidentify bugs (when given failure-triggering input), rather than proving correctnessfor all possible inputs. Executing a ﬂoating-point program together with a higher-precision one allows one to ﬁnd inputs which cause large roundoﬀ errors [10,20,43].Ariadne [6] uses a combination of symbolic execution, real-valued SMT solvingand testing to ﬁnd inputs that trigger ﬂoating-point exceptions, including overﬂowand invalid operations. Our work subsumes this approach as the SMT solversthat we use can directly generate counterexamples, but more importantly, KeYis able to prove the absence of such exceptions.

By joining the forces of rule-based deduction and SAT-based SMT solving, wepresented the ﬁrst working ﬂoating-point support in a deductive veriﬁcation toolfor Java and by that close a remaining gap in KeY to now support full sequentialJava. Our evaluation shows that for speciﬁcations dealing with value ranges andabsence of NaN and inﬁnity, our approach can verify realistic programs within areasonable time frame. We observe that the MathSAT and CVC4 solver’s ﬂoating-point support scales suﬃciently for our benchmarks, as long as the queries donot include any quantiﬁers, and that our axiomatized approach for handlingtranscendental functions is best realized using calculus rules in KeY’s internalreasoning engine. While our work is implemented within the KeY veriﬁer, weexpect our approach to be portable to other veriﬁers.

Acknowledgements

This research was partially funded by the Deutsche Forschungsgemeinschaft(DFG, German Research Foundation) project 387674182. The authors would liketo thank Daniel Eddeland, who together with co-author W. Ahrendt performedprestudies which impacted the current work. eductive Veriﬁcation of Floating-Point Java Programs in KeY 17

References

1. QF_FP SMT benchmarks. https://clc-gitlab.cs.uiowa.edu:2443/SMT-LIB-benchmarks/QF_FP (2019)2. Slow veriﬁcation of programs combining multiple ﬂoating point values (Github issue)(2019 (accessed May 11, 2020)), https://github.com/boogie-org/boogie/issues/109

3. Ahrendt, W., Beckert, B., Bubel, R., Hähnle, R., Schmitt, P.H., Ulbrich, M. (eds.):Deductive Software Veriﬁcation - The KeY Book - From Theory to Practice, LNCS,vol. 10001. Springer (2016)4. Akbarpour, B., Paulson, L.C.: MetiTarski: An Automatic Theorem Prover forReal-Valued Special Functions. Journal of Automated Reasoning (3) (2010)5. Astrauskas, V., Müller, P., Poli, F., Summers, A.J.: Leveraging Rust Types forModular Speciﬁcation and Veriﬁcation. In: Object-Oriented Programming Systems,Languages, and Applications (OOPSLA) (2019)6. Barr, E.T., Vo, T., Le, V., Su, Z.: Automatic Detection of Floating-point Exceptions.In: Principles of Programming Languages (POPL) (2013)7. Barrett, C., Conway, C.L., Deters, M., Hadarean, L., Jovanovi’c, D., King, T.,Reynolds, A., Tinelli, C.: CVC4. In: Computer Aided Veriﬁcation (CAV) (2011),snowbird, Utah8. Barrett, C., Stump, A., Tinelli, C., et al.: The SMT-LIB Standard: Version 2.0. In:Proceedings of the 8th International Workshop on Satisﬁability Modulo Theories(2010)9. Beckert, B., Nestler, B., Kiefer, M., Selzer, M., Ulbrich, M.: Experience Report:Formal Methods in Material Science. CoRR abs/1802.02374 (2018)10. Benz, F., Hildebrandt, A., Hack, S.: A Dynamic Program Analysis to Find Floating-Point Accuracy Problems. In: Programming Language Design and Implementation(PLDI) (2012)11. Beyer, D.: Advances in automatic software veriﬁcation: Sv-comp 2020. In: Toolsand Algorithms for the Construction and Analysis of Systems (TACAS) (2020)12. Blanchet, B., Cousot, P., Cousot, R., Feret, J., Mauborgne, L., Miné, A., Monniaux,D., Rival, X.: A Static Analyzer for Large Safety-Critical Software. In: ProgrammingLanguage Design and Implementation (PLDI) (2003)13. Boldo, S., Clément, F., Filliâtre, J.C., Mayero, M., Melquiond, G., Weis, P.: WaveEquation Numerical Resolution: A Comprehensive Mechanized Proof of a C Program.Journal of Automated Reasoning (4) (2013)14. Boldo, S., Filliâtre, J.C., Melquiond, G.: Combining Coq and Gappa for CertifyingFloating-Point Programs. In: Intelligent Computer Mathematics (2009)15. Boldo, S., Melquiond, G.: Flocq: A Uniﬁed Library for Proving Floating-PointAlgorithms in Coq. In: IEEE Symposium on Computer Arithmetic (ARITH) (2011)16. Brain, M., Tinelli, C., Rümmer, P., Wahl, T.: An Automatable Formal Semantics forIEEE-754 Floating-Point Arithmetic. In: IEEE Symposium on Computer Arithmetic(ARITH) (2015)17. Brain, M., Schanda, F., Sun, Y.: Building Better Bit-Blasting for Floating-PointProblems. In: Tools and Algorithms for the Construction and Analysis of Systems(TACAS) (2019)18. Chapman, R., Schanda, F.: Are We There Yet? 20 Years of Industrial TheoremProving with SPARK. In: Interactive Theorem Proving (ITP) (2014)19. Chen, L., Miné, A., Cousot, P.: A Sound Floating-Point Polyhedra Abstract Domain.In: Asian Symposium on Programming Languages and Systems (APLAS) (2008)8 R. Abbasi et al.20. Chiang, W.F., Gopalakrishnan, G., Rakamaric, Z., Solovyev, A.: Eﬃcient Search forInputs Causing High Floating-point Errors. In: Principles and Practice of ParallelProgramming (PPoPP) (2014)21. Cimatti, A., Griggio, A., Schaafsma, B., Sebastiani, R.: The MathSAT5 SMTSolver. In: Proceedings of Tools and Algorithms for the Construction and Analysisof Systems (TACAS) (2013)22. Cok, D.R.: OpenJML: JML for Java 7 by extending OpenJDK. In: NASA FormalMethods (2011)23. Cordeiro, L.C., Kesseli, P., Kroening, D., Schrammel, P., Trtík, M.: JBMC: ABounded Model Checking Tool for Verifying Java Bytecode. In: Computer AidedVeriﬁcation (CAV) (2018)24. Cuoq, P., Kirchner, F., Kosmatov, N., Prevosto, V., Signoles, J., Yakobowski, B.:Frama-C. In: Software Engineering and Formal Methods (SEFM) (2012)25. Darulova, E., Izycheva, A., Nasir, F., Ritter, F., Becker, H., Bastian, R.: Daisy -Framework for Analysis and Optimization of Numerical Programs. In: Tools andAlgorithms for the Construction and Analysis of Systems (TACAS) (2018)26. Darulova, E., Kuncak, V.: Towards a Compiler for Reals. TOPLAS (2) (2017)27. De Moura, L., Bjørner, N.: Z3: An Eﬃcient SMT Solver. In: Tools and Algorithmsfor the Construction and Analysis of Systems (TACAS) (2008)28. Eilers, M., Müller, P.: Nagini: A Static Veriﬁer for Python. In: Computer AidedVeriﬁcation (CAV) (2018)29. Filliâtre, J.C., Paskevich, A.: Why3 — Where Programs Meet Provers. In: EuropeanSymposium on Programming (ESOP) (2013)30. Fox, A., Harrison, J., Akbarpour, B.: A Formal Model of IEEE FloatingPoint Arithmetic. HOL4 Theorem Prover Library (2017), https://github.com/HOL-Theorem-Prover/HOL/tree/master/src/floating-point

31. Fumex, C., Marché, C., Moy, Y.: Automating the Veriﬁcation of Floating-PointPrograms. In: Veriﬁed Software: Theories, Tools, and Experiments (VSTTE) (2017)32. Gao, S., Kong, S., Clarke, E.M.: dReal: An SMT Solver for Nonlinear Theories overthe Reals. In: Automated Deduction – CADE-24 (2013)33. Ge, Y., de Moura, L.: Complete Instantiation for Quantiﬁed Formulas in Satisﬁa-biliby Modulo Theories. In: Computer Aided Veriﬁcation (CAV) (2009)34. Goubault, E., Putot, S.: Static Analysis of Finite Precision Computations. In:Veriﬁcation, Model Checking, and Abstract Interpretation (VMCAI) (2011)35. Goubault, E., Putot, S.: Robustness Analysis of Finite Precision Implementations.In: Asian Symposium on Programming Languages and Systems (APLAS) (2013)36. Harel, D., Kozen, D., Tiuryn, J.: Dynamic Logic. In: Handbook of PhilosophicalLogic, pp. 99–217. Springer (2001)37. Harrison, J.: Floating Point Veriﬁcation in HOL Light: The Exponential Function.Formal Methods in System Design (3) (2000)38. IEEE, C.S.: IEEE Standard for Floating-Point Arithmetic. IEEE Std 754-2008(2008)39. Izycheva, A., Darulova, E., Seidl, H.: Counterexample and Simulation-GuidedFloating-Point Loop Invariant Synthesis. In: Static Analysis Symposium (SAS)(2020)40. Jacobs, B., Smans, J., Philippaerts, P., Vogels, F., Penninckx, W., Piessens, F.:VeriFast: A Powerful, Sound, Predictable, Fast Veriﬁer for C and Java. In: NASAFormal Methods (NFM) (2011)41. Jacobsen, C., Solovyev, A., Gopalakrishnan, G.: A Parameterized Floating-PointFormalizaton in HOL Light. Electronic Notes in Theoretical Computer Science (2015)eductive Veriﬁcation of Floating-Point Java Programs in KeY 1942. Jeannet, B., Miné, A.: Apron: A Library of Numerical Abstract Domains for StaticAnalysis. In: Computer Aided Veriﬁcation (CAV) (2009)43. Lam, M.O., Hollingsworth, J.K., Stewart, G.W.: Dynamic Floating-point Cancella-tion Detection. Parallel Comput. (3) (2013)44. Leavens, G.T., Baker, A.L., Ruby, C.: Preliminary design of JML: A behavioralinterface speciﬁcation language for Java. ACM SIGSOFT Software EngineeringNotes (3) (2006)45. Leavens, G.T., Cheon, Y.: Design by Contract with JML (2006),

46. Leino, K.R.M.: This is Boogie 2 (June 2008),

47. Magron, V., Constantinides, G., Donaldson, A.: Certiﬁed Roundoﬀ Error BoundsUsing Semideﬁnite Programming. ACM Trans. Math. Softw. (4) (2017)48. Marché, C., Paulin-Mohring, C., Urbain, X.: The KRAKATOA tool for certiﬁcationof Java/JavaCard programs annotated in JML. The Journal of Logic and AlgebraicProgramming (1) (2004)49. McCormick, J.W., Chapin, P.C.: Building High Integrity Applications with SPARK.Cambridge University Press (2015)50. Meyer, B.: Applying “Design by Contract”. Computer (10) (1992)51. Moscato, M., Titolo, L., Dutle, A., Muñoz, C.: Automatic Estimation of VeriﬁedFloating-Point Round-Oﬀ Errors via Static Analysis. In: SAFECOMP (2017)52. Muller, J., Brisebarre, N., de Dinechin, F., Jeannerod, C., Lefèvre, V., Melquiond,G., Revol, N., Stehlé, D., Torres, S.: Handbook of Floating-Point Arithmetic.Birkhäuser (2010)53. Müller, P., Schwerhoﬀ, M., Summers, A.J.: Viper: A Veriﬁcation Infrastructurefor Permission-Based Reasoning. In: Veriﬁcation, Model Checking, and AbstractInterpretation (VMCAI) (2016)54. Pasareanu, C.S., Mehlitz, P.C., Bushnell, D.H., Gundy-Burlet, K., Lowry, M.R.,Person, S., Pape, M.: Combining unit-level symbolic execution and system-levelconcrete execution for testing NASA software. In: International Symposium onSoftware Testing and Analysis (ISSTA) (2008)55. Siegel, S.F., Mironova, A., Avrunin, G.S., Clarke, L.A.: Using Model Checkingwith Symbolic Execution to Verify Parallel Numerical Programs. In: InternationalSymposium on Software Testing and Analysis (ISSTA) (2006)56. Solovyev, A., Jacobsen, C., Rakamaric, Z., Gopalakrishnan, G.: Rigorous Estimationof Floating-Point Round-oﬀ Errors with Symbolic Taylor Expansions. In: FormalMethods (FM) (2015)0 R. Abbasi et al. A Appendix

A.1 Axioms for Transcendental Functions in KeY

Here we present the axioms that we implemented to prove properties for bench-marks with transcendental functions: – If arg is NaN or an inﬁnity, then sin(arg) is NaN. – If arg is zero, then the result of sin(arg) is a zero with the same sign as arg . – if arg is not NaN or inﬁnity, then the returned value of sin is between − . and . . – if arg is not NaN or inﬁnity, then the returned value of sin is not NaN. – If arg is NaN or an inﬁnity, then cos(arg) is NaN. – if arg is not NaN or inﬁnity, then the returned value of cos is between − . and . . – if arg is not NaN or inﬁnity, then the returned value of cos is not NaN. – If arg is NaN or an inﬁnity, then atan(arg) is NaN. – If arg is zero, then the result of atan(arg) is a zero with the same sign as arg . – if arg is not NaN, then the returned value of atan is between − π/ and π/ .In our Evaluation we showed that handling square root axiomatically canimprove performance. Here is the list of axioms we used for this function: – If arg is NaN or less than zero, then sqrt(arg) is NaN. – If arg is positive inﬁnity, then sqrt(arg) is positive inﬁnity. – If arg is positive zero or negative zero, then sqrt(arg) is the same as arg . – If arg is not NaN and greater or equal to zero, then sqrt(arg) is not NaN. – If arg is not inﬁnity and is greater than one then sqrt(arg) < arg . A.2 Detailed Evaluation Results

Here we present the tables that did not ﬁt in the main body of the paper andcontain detailed results of our experiments. In each table we show the number ofgoals per benchmark that each solver can validate or invalidate, together with theaverage and maximum time (in seconds) needed. ‘TO’ in the maximum columndenotes that at least one goal timed out. The goals resulting in timeout wereexcluded from the computation of the average time.Table 3 shows the results for benchmarks with valid contracts with thequantiﬁed formulas included in the SMT translations. We have summarized thistable in row 1 of Table 2.Table 4 demonstrates the results for the same benchmarks when the quantiﬁedaxioms are removed form the SMT translations which is summarized in row 2 ofTable 2.Table 5 shows the detailed results of the experiments with benchmarks withinvalid contracts, when the quantiﬁed formulas are included in and removed formthe SMT translations. This results are summarized in rows 3 and 4 of Table 2. eductive Veriﬁcation of Floating-Point Java Programs in KeY 21benchmark

Table 3: Summary of valid goals proved and running times of each solver for theSMT translations with quantiﬁed axioms benchmark

Table 4: Summary of valid goals proved and running times of each solver for theSMT translations without quantiﬁed axioms benchmark with quantiﬁed axioms

Matrix3.transposedEq 0 1 0 0 - TO 0 0 - TO - - - -Rectangle.scale(2) 12 4 12 0 12.2 TO 8 0 4.6 TO - - - -Complex.add(2) 2 2 2 0 0.6 0.7 2 0 1.4 TO - - - -FPLoop.fploop2 3 1 3 0 0.9 1.7 3 0 0.5 TO - - - -FPLoop.fploop3 3 1 3 0 0.4 1.7 3 0 0.3 TO - - - - without quantiﬁed axioms

Matrix3.transposedEq 0 1 0 1 170.2 170.2 0 0 - TO 0 1 16.2 16.2Rectangle.scale(2) 12 4 12 3 12.2 TO 12 3 108.2 TO 12 4 2.4 9.5Complex.add(2) 2 2 2 2 0.5 0.5 2 2 0.7 1.0 2 2 0.2 0.2FPLoop.fploop2 3 1 3 1 0.4 0.6 3 1 0.9 1.7 3 1 0.3 0.5FPLoop.fploop3 3 1 3 1 0.3 0.6 3 1 0.6 1.7 3 1 0.2 0.4

Table 5: Summary of invalid goals proved and running times of each solver forthe SMT translations with and without quantiﬁed axioms benchmark fp.sqrt

Cartesian.toPolar 4 4 6.9 7.5 1 23.5 TO 4 1.2 1.7Cartesian.distanceTo 1 1 8.2 8.2 0 - TO 1 1.0 1.0Circuit.instantCurrent 2 2 123.5 127.5 0 - TO 0 - TO axiomatized sqrt

Cartesian.toPolar 4 4 2.0 2.9 4 49.81 163.0 4 1.0 1.6Cartesian.distanceTo 1 1 2.7 2.7 1 233.0 233.0 1 1.0 1.0Circuit.instantCurrent 2 0 - TO 0 - TO 0 (2 CE) 11.1 13.8

Table 6: Summary statistics for benchmarks containing the square root function,with quantiﬁed formulas removed from the SMT-LIB translationThe ﬁrst two sections of Table 7 show the results from applying the twoapproaches for handling transcendental functions in sections 3.2.A and 3.2.B,using the default SMT translation in KeY. The last section of the table depicts theresults of applying the approach in Section 3.2.B, while the quantiﬁed formulasare removed from the SMT translations. This table is summarized in rows 5 and6 of Table 2Table 6 shows the detailed results of conducting the experiment on thebenchmarks containing sqrt , comparing the approach from Section 3.2.B (addingthe necessary axioms, resp. taclet rules) to using the square root implemented inSMT solvers ( fp.sqrt ), when the quantiﬁed formulas are removed from the SMTtranslations. We have summarized the results of these experiments in rows 7 and8 of Table 2. eductive Veriﬁcation of Floating-Point Java Programs in KeY 23 benchmark axioms in SMT-LIB translation

Cartesian.toPolar 4 4 7.1 9.2 1 16.8 TO - - -Polar.toCartesian 2 2 0.9 0.9 2 69.7 95.7 - - -Circuit.instantCurrent 2 1 123.6 TO 0 - TO - - -Circuit.instantVoltage 2 2 1.1 1.1 1 103.8 TO - - - axioms as taclet rules in KeY with quantiﬁed formulas

Cartesian.toPolar 4 4 6.8 7.7 1 40.0 TO - - -Polar.toCartesian 2 2 1.4 1.9 1 288.0 TO - - -Circuit.instantCurrent 2 2 123.8 128.3 0 - TO - - -Circuit.instantVoltage 2 2 1.3 1.3 0 - TO - - - axioms as taclet rules in KeY without quantiﬁed formulas

Cartesian.toPolar 4 4 6.9 7.5 1 23.5 TO 4 1.2 1.7Polar.toCartesian 2 2 1.5 2.3 2 52.6 81.2 2 0.6 0.8Circuit.instantCurrent 2 2 123.5 127.5 0 - TO 0 - TOCircuit.instantVoltage 2 2 1.5 1.7 2 146.4 160.7 2 0.8 0.8

Table 7: Summary statistics with axioms in SMT-LIB translations and as tacletrules in KeY

Sensitivity to Contract Variations

We conducted an experiment on our

Rectangle.scale benchmark to assess the solver’s sensitivity to various changes,applied to the benchmark’s contract or its implementation. We considered thefollowing modiﬁcations: – v : is the original version of the benchmark (Listing 1.1 using the secondcontract) and our baseline; – v : reduces the number of classes involved to two, while keeping the samefunctionality; – v : reduces the number of classes involved to one, while keeping the samefunctionality; – v : modiﬁes v such that variable bounds in the precondition become more“complicated” in terms of longer fractional parts (e.g. the bounds for arg2 become [3.0000001, -6.4000000003] instead of [3.0001, -6.4000003]); – v : simpliﬁes the mathematical expression of v (less arithmetic operations) – v : modiﬁes v such that arg2 has a tighter bound, i.e. the interval width issmaller – v : modiﬁes v such that arg2 has a larger bound, i.e. the interval width islarger – v : modiﬁes v such that only arg2 has a “complicated” bound – v : modiﬁes v such that arg2 has a tighter boundTable 8 summarizes the results for this experiment. With the quantiﬁed forulasincluded in the SMT translation, Both CVC4 and Z3 are able to prove more goals version applied change Table 8: SMT solvers summary statistics for various versions of the

Rectangle benchmark with quantiﬁed axioms in the SMT translationswhen the number of classes is reduced, and also when the number of arithmeticoperations is reduced. Z3 further seems to be sensitive to whether variable boundsare “complicated” or not, whereas CVC4 is not. We obtain a somewhat surprisingresult when arg2 has a tighter bound. While Z3’s performance improves, CVC4validates two goals less. On the other hand, increasing the bounds on arg2 doesnot seem to make a diﬀerence.It seems that arg2 is the bottleneck for this benchmark; when only arg2 has a“complicated” input interval, CVC4 proves less goals. Finally, constraining arg2arg2