An Abstraction-guided Approach to Scalable and Rigorous Floating-Point Error Analysis
Arnab Das, Ian Briggs, Ganesh Gopalakrishnan, Pavel Panchekha, Sriram Krishnamoorthy
AAn Abstraction-guided Approach toScalable and Rigorous Floating-Point Error Analysis
Arnab Das , Ian Briggs , Ganesh Gopalakrishnan , Pavel Panchekha , andSriram Krishnamoorthy University of Utah, USA Pacific Northwest National Laboratory, USA
Abstract.
Automated techniques for rigorous floating-point round-offerror analysis are important in areas including formal verification ofcorrectness and precision tuning. Existing tools and techniques, whileproviding tight bounds, fail to analyze expressions with more than afew hundred operators, thus unable to cover important practical prob-lems. In this work, we present
Satire , a new tool that sheds light onhow scalability and bound-tightness can be attained through a combi-nation of incremental analysis, abstraction, and judicious use of con-crete and symbolic evaluation.
Satire has handled problems exceed-ing 200K operators. We present
Satire ’s underlying error analysis ap-proach, information-theoretic abstraction heuristics, and a wide range ofcase studies, with evaluation covering FFT, Lorenz system of equations,and various PDE stencil types. Our results demonstrate the tightness of
Satire ’s bounds, its acceptable runtime, and valuable insights provided.
Keywords:
Floating-point arithmetic; Rigorous Round-off analysis; Sym-bolic Differentiation; Abstraction; Scalable Analysis
Floating-point arithmetic underlies many critical scientific and engineering appli-cations. Round-off errors introduced during floating-point computations have re-sulted in egregious errors in applications ranging from statistical computing andeconometric software [33,2] to critical defense systems [39]. To guard against sucherrors, formal verification tools are essential, and a basic capability in such toolsis the ability to rigorously estimate the round-off error generated by floating-point expressions for given input intervals. This capability is also important indesigning rigorous mixed-precision tuning systems (e.g., [9]).A number of tools for rigorous round-off analysis have been developed: Fluc-tuat [14], Gappa [13], PRECiSA [49], Real2Float [31], Rosa [12], FPTaylor [47],Numerors [26], and tools specialized for cyberphysical systems [36,43] to namea few. Unfortunately, these tools cannot automatically handle floating-point ex-pressions of thousands of operators that arise in even the simplest of practicalsituations including recurrence equation solving schemes and algorithms suchas the Fast Fourier Transform (FFT) and parallel prefix sum. Today’s tools in a r X i v : . [ c s . P L ] J u l Authors Suppressed Due to Excessive Length this space can at best handle expressions with a few dozen operators. To tacklelarge sizes automatically, one must today employ abstract interpretation meth-ods and/or theorem-proving methods [14,36,37,5]. The use of non-rigorous ap-proaches that employ concrete evaluation on a reasonable sampling of the inputspace [42,21,38,28,35] or Monte-Carlo-style approached [15,44] is also popular.While such approaches are ultimately inevitable, formal verification is essentialfor critical tasks such as aircraft collision avoidance verification [19]. A higherdegree of automation can greatly help with these verification tasks.An analysis method’s ability to derive tight bounds is as important as itsability to handle large expressions graphs. Interval analysis provides an efficientand fast way to obtain error bounds; however, it inherently loses correlation be-tween terms, resulting in quite pessimistic bounds. While affine analysis main-tains correlations between error variables by keeping them formal (symbolic), itcan also result in pessimistic bounds, especially for non-linear expressions suchas x/ ( x + 1), where the non-linearity is introduced by division.In this work, we introduce a scalable and rigorous approach for error anal-ysis embedded in a new tool, namely, Satire ( S calable A bstraction-guided T echnique for I ncremental R igorous analysis of round-off E rrors). Satire sup-ports scalable mixed-precision analysis and multi-output estimation of straight-line floating-point code. We demonstrate that
Satire can perform error estima-tion for complex expression graphs involving thousands of operators, includingFFTs, recurrence equations, and partial differential equation (PDE) solving.The recently proposed method of rigorously computing symbolic Taylor forms and embodied in the FPTaylor tool [47] avoids problems associated with intervaland affine methods. Moreover, as the table of results in [47] and [26] show, FP-Taylor also produces some of the tightest bounds when compared to the tools inits class. However, FPTaylor cannot handle large expression graphs. For example,for the discretized one-dimensional heat flow (“1D-heat”) equation calculationusing a stencil whose temporal application is unrolled over 10 time steps, about442 expression nodes are generated. FPTaylor takes about 8 hours simply togenerate the Taylor form expression, following which its global optimizer failsto handle this expression. In contrast,
Satire handles the aforesaid 1D-heatproblem in a few seconds, generating tight bounds.
Satire also has handled ex-pressions with over 200K operators drawn from realistic examples. In addition,
Satire generates tighter error bounds than FPTaylor on most of FPTaylor’sexamples. To give an early indication of
Satire ’s performance, Table 1 providesa sampling of examples (comprehensive evaluations in § § maximum absolute error over 10 simulations for randomly selected data points from the atire : Abstraction-guided Rigorous Floating-Point Error Analysis 3Benchmarks Satire
FPTaylor
Satire
FPTaylorjetEngine 35 1.969 - 6.11E-15 NA 5.45E-16FFT-1024pt 43K - 7.80E-13 NA 6.18E-15Lorenz-20steps 307 - 1.25E-14 NA 3.59E-15 input intervals. Lacking other means of easily obtaining tight rigorous bounds,these results provide a good indication of how close
Satire ’s bounds are.
Satire only analyzes for first-order error. To check whether we are lackingprecision due to this, we ran FPTaylor on its own benchmarks with its second-order error analysis turned off, only finding that the resulting error values werestill bit-identical. This shows that it can be very difficult to create exampleswhere second-order error matters. In [29], the authors further emphasize thispoint by mentioning that they had to refine their second-order error analysisbeyond the approach taken in FPTaylor.
Roadmap: In §
2, we explain
Satire ’s contributions at a high level; this helpsbetter understand the remainder of the paper. In §
2, we explain the essen-tial background on floating-point arithmetic and the global optimization-basederror analysis done in
Satire .We also point out the key contributions madeby
Satire . In §
3, we explain the incremental analysis adopted in
Satire . § Satire ’s error analysis with heuristics for abstrac-tion and for improving global optimization/canonicalization detailed in § § Satire on many practical examples. § § A binary floating-point number systems, F , is a subset of real numbers repre-sentable in finite precision and expressed as tuple ( s, m, e, p ) with p = 53 and24 for double and single precision numbers, respectively, s ∈ {− , } is the signbit, m the mantissa, and e the exponent. We consider only normal numberswhere m ∈ [1 , x ∈ F has the value s · m · e . If x ∈ R ,then ˜ x denotes an element in F closest to x obtained by applying the roundingoperator( ◦ ) to x. The IEEE 754 standard defines four rounding modes for ele-mentary floating-point operations. Every real number x lying in the range of F can be approximated by a member ˜ x ∈ F with a relative error no larger than theunit round-off u =
12 2 − p . Here, 2 − p represents the unit of least precision (ulp) for the exponent value of 1. Thus, to relate the quantities ˜ x and x over the round-ing operator, if x ∈ R lies in the range of F , then ˜ x = ◦ ( x ) = x (1 + δ ) , | δ | ≤ u . Authors Suppressed Due to Excessive Length
Given two exactly represented floating-point numbers, ˜ x and ˜ y , arithmeticoperators (cid:5) ∈ { + , − , · , / } have the following guarantees: ◦ (˜ x (cid:5) ˜ y ) = (˜ x (cid:5) ˜ y )(1 + δ ) , | δ | ≤ u = (˜ x (cid:5) ˜ y ) + (˜ x (cid:5) ˜ y ) δ = { real value } + { round-off error } (1) Thus, the rounded result involves an interval centered around the real exactvalue. The width of this interval is determined by the amount of round-off erroraccumulated. Besides the elementary operators,
Satire supports other complexmathematical operations such as transcendentals. In such cases, the bound onthe δ terms changes as a multiple of u . We keep a default configurable valuefor each operator, allowing the user to obtain customized error bounds for thespecific math library being used.Affine arithmetic [48] is often employed to keep errors correlated. A realnumber ˆ x is represented in affine form as ˆ x = x + (cid:80) ni =1 x i (cid:15) i , where x denotesthe central value and the x i ’s are finite floating-point numbers representing noisecoefficients associated with the corresponding noise variables, (cid:15) i ’s. The (cid:15) i areformal variables whose value lies in [ − ,
1] but are unknown until assigned. Thisrepresentation allows converging paths to cancel out error terms and improvetightness. For example ( x − x ) in affine analysis will yield 0 because the noisevariables can cancel each other out. Unlike affine analysis, interval analysis keepswidening the intervals. Satire ’s approach can be viewed as “symbolic affine”(much like FPTaylor) in that the noise coefficients, x i , are symbolic , as furtherclarified in § E defined over inputs X is being analyzed for theround-off error, given that each x i ∈ X is instantiated with input interval I i ∈ I .Here, E is assumed to be represented as a DAG, and the process of analyzing E consists of somehow obtaining the error at each interior DAG node i , representedby E i , and then calculating how much this error contributes to the final error.Suppose E rr i be the error generated by subexpression E i (details in § E i and E (respectively) to denote E i and E that havebeen evaluated to ground values (fully evaluated after variable substitutions) byinstantiating x i ∈ X with intervals I i ∈ I . We now explain how Satire ensuresscalability (Page 4) and tight bounds (Page 6).
Contributions Toward Scalability: Incremental Approach
Satire adoptsan incremental approach while generating the error contributions of node E i tothe final output. In other words, we perform a breadth-first walk of the ex-pression DAG of E , enumerating subexpression nodes E i . At each E i , we performfirst-order error analysis (see §
4) of the total error generated at E i to compute theerror expression E rr i . We then multiply this error with the path strength of allthe paths going from E i to E . Let there be M such paths with path strengths The notion of “path-strength” is explained in §
4, but basically it is the value ofthe derivative of E with respect to E i that is calculated using symbolic reverse-modeautomatic differentiation. atire : Abstraction-guided Rigorous Floating-Point Error Analysis 5 p , p , . . . p M . In many examples such as 1D-heat, M grows exponentially withthe depth of E i due to path reconvergence in expression DAGs.In FPTaylor, the Symbolic Taylor Form ends up having the form Σ Mj =1 ( p j ·E rr i ) which is huge for large M , as the error expression E rr i is replicated M times. This replication happens multiple times, explaining why it took 8 hoursto generate its Symbolic Taylor Form for 1D-heat.With Satire , we first compute E rr i as explained in §
4. We then compute E rr i · Σ Mj =1 p j , which is the product of the error with the effective path strength Σ Mj =1 p j , taking all the paths from E i to E into account. This is significantlysmaller in that it does not replicate E rr i a very large number of times. Thisexplains why Satire finished on 1D-heat in a few seconds even without usingabstractions . Generating such factored forms also aids
SimEngine [46] (detailedbelow under “canonicalizations”).
Automated Introduction of Abstractions:
During a bottom-up (forward)pass to build E , Satire performs symbolic execution to build the symbolic ex-pression for E i . While doing so, Satire uses a heuristic to decide whether or notto automatically create an abstraction for E i . Input or Abstract nodes with Total Error termsFirstAbstraction SecondAbstractionInternal nodes with Local Error terms
Fig. 1.
Incremental error analysis using gradual abstraction
Figure 1 illustrates this incremental abstraction scheme. We employ a costmetric based on information-theory [45] to obtain a set of cut-points in theabstract syntax tree (AST). Each of these cut-points serve as the E i for whichabstractions are created. The selected cut-points then become free variables,carrying concrete error interval bounds (explained below) for the rest of theerror analysis. In Figure 1, we introduce our first abstraction at height 2 for astencil-type computation.Clearly there is a risk of losing variable correlations in the process. In §
5, wepoint out that with well-chosen heuristics, this risk can be minimized, providingstill tight error estimates. These abstractions are key to
Satire ’s performance onlarger examples, and can be viewed as an appropriate tradeoff between scalabilityand bound tightness. By performing abstractions, the error analysis at each ofthe abstracted nodes E i is a much more manageable process and finishes quickly.Secondly, we incrementally compute the error bounds of E i with respect to the Authors Suppressed Due to Excessive Length variables X i mentioned in it using a global optimization call that refers to theappropriate intervals I i . This is a much smaller optimization query that finishesfast. The resulting error expression is a concrete value interval that is propagatedupstream for further error analysis. In other words, during Satire ’s analysis, a mixture of symbolic and concrete error intervals are propagated across theexpression DAG. This dramatically reduces overall error analysis complexity.
Contributions Toward Tight Bounds:
Expression Canonicalization:
A cen-tral step that is involved at every stage in
Satire ’s operation is this: givenan expression E and the intervals in which the variables in E lie, compute thetightest interval bound for E . Consider computing this for a simple expressionsuch as x · x · x using an interval library such as Gaol where x ∈ [ − , − , pow using which the above call can be written as pow ( x, pow ( x,
3) conveys the information that the same x instanceis being acted upon, allowing Gaol to obtain [ − , pow , and therefore, in gen-eral, an external canonicalizer must be used to reduce the number of distinctoccurrences of variables in an expression. SimEngine’s “expand” strategy allowsus to achieve this effect, which then allows the back-end optimizer Gelpia (thatuses the interval branch-and-bound algorithm) to converge to tight bounds muchmore quickly. For the DQMOM example in Table 1, this was the main reasonwhy we improved upon FPTaylor’s result of to in Satire ( § Avoid Impeding Canonicalization:
Not only must explicit canonicalization beperformed wherever possible, we avoid taking steps that can block canonicaliza-tion. To be more specific, denote the sum of path strengths at expression site E i ,i.e., Σ Mj =1 p j , by S i . When we perform a depth-oriented traversal of the expres-sion DAG, let us imagine generating expressions E , . . . , E K (where the E i are thesubexpressions of E ) with corresponding errors E rr , . . . , E rr K . The final errorat the output of E is E rr = Π Ki =1 S i · E rr i . Let the final concrete error at the out-put of E be Err . There are two ways to arrive at
Err : (1) As a concretization of Π Ki =1 abs ( S i · E rr i ); or (2) As a concretization of Π Ki =1 abs ( S i ) · abs ( E rr i ). Whileboth these are real-number equivalent, the latter expression blocks canonicaliza-tions from occurring within SimEngine . All our results jumped from being worsethan FPTaylor’s to being better than FPTaylor’s when we adopted the formerform (the jump from
Satire (UnOpt) to
Satire in Table 1). In Satire , a floating-point straight line program is parsed into an abstractsyntax tree (AST) where each node of the AST is an operator (e.g., { + , − , · , / } or others such as exp and transcendentals). Figure 2 shows the AST for the As can be seen,
Satire is neither “interval based” nor “affine based”—it is bestviewed as a judicious combination of interval and “symbolic-affine” analysis. atire : Abstraction-guided Rigorous Floating-Point Error Analysis 7 + * + * * + S = v · . , E LR ( S )
AST for S = ( x · ( x + y ) + z ) · .
5, and the rebuilt AST with v abstracted. Theequations analyzed are v = x + y ; v = v · x ; v = v + z ; S = v · . expression S = ( x · ( x + y ) + z ) · .
5. Here, x , y , and z are the input variableswith intervals I ( x ), I ( y ), and I ( z ), respectively. In [12,24], the authors introducethe decomposition of errors into generated round-off errors and propagation ofincoming errors. We use E lr to denote the local error term associated with a node.The total error term for a node, denoted as E tr , is then the additive compositionof E lr and the propagation of the incoming error, E prop . We annotate eachinterior node in Figure 2 with a pair consisting of its symbolic function expressionand a symbolic local error term. An abstracted node is treating as an additionalinput, wherein we take the convention of using the variable name as the symbolicexpression for an input node, and instead of the local error, we employ the totalerror term (as detailed below these input nodes are not modeled).Consider node v with associated symbolic expression v · x . Its incomingerror is a symbolic expression denoted by E lr ( v ). From Equation 1: E tr ( v ) = E lr ( v ) + E prop ( v ) = E lr ( v ) + (cid:18) E tr ( x ) ∂v ∂x + E tr ( v ) ∂v ∂v (cid:19) = E lr ( v ) + (cid:0) E tr ( x ) · v + E tr ( v ) · x (cid:1) (2) The derivatives signify the sensitivity of v w.r.t. their operands. Similarly, E tr ( v ) = E lr ( v ) + E prop ( v ) = E lr ( v ) + (cid:18) E tr ( x ) ∂v ∂x + E tr ( y ) ∂v ∂y (cid:19) = E lr ( v ) + (cid:0) E tr ( x ) · E tr ( y ) · (cid:1) (3) Plugging Equation (3) into Equation (2), we obtain the total error for v interms of the local errors and incoming errors on inputs: E tr ( v ) = E lr ( v ) + (2 x + y ) E lr ( x ) + x · E lr ( v ) + x · E tr ( y ) (4) The extent to which the error accumulated at v impacts the output at S depends on the sensitivity of S to changes in v , that is, ∂S∂v = ∂S∂v ∂v ∂v = 3 . As before, we use E for symbolic expressions and E for their ground counterparts. Authors Suppressed Due to Excessive Length Let E tr ( S | v ) denote the total error component at the output node, S , dueto total error accumulated at v (path-strength from v to S ); then: E tr ( S | v ) = E tr ( v ) ∂S∂v = (cid:16) E lr ( v ) + (2 x + y ) E lr ( x ) + x · E lr ( v ) + x · E tr ( y ) (cid:17) · . E lr ( v ) ∂S∂v + E tr ( x ) ∂S∂x + E lr ( v ) ∂S∂v + E tr ( y ) ∂S∂y (5) Equation (5) is the symbolic form of the error contribution from v (and all itsdependencies) to S . There are two approaches to obtain the bound on E tr ( S | v ):1. Without abstraction , where we feed the optimizer with the fully symbolic ex-pression of E tr ( S | v ) = E tr ( v ) ∂S∂v . This formulation preserves all variablecorrelations (especially important for non-linear expressions) and is expectedto obtain the tightest possible bounds. However, the resulting large expres-sions can often end up choking the global optimizer or excessively delay itsconvergence.2. With abstraction of node v . First we obtain a function interval and thetotal error by solving for v and E tr ( v ) = max( |E tr ( v ) | ) using a globaloptimizer. Now, v can be abstracted as a free variable , F V , with an asso-ciated error term ( E tr ( F V ) = E tr ( v )) and an interval range as shown inRHS of figure 2. Then reconstruct E tr ( S | v ) = E tr ( v ) ∂S∂v , and feed it tothe global optimizer to obtain the concrete bound, E tr ( S | v ). This reducesthe the query sizes submitted to the optimizer.Note in our example that abstracting node v completely removed the depen-dency from v and x , and, therefore, was a good choice for abstraction. However,consider the case of v being abstracted. This would result in a loss of correla-tion between the reconvergent paths merging at v for input x . This can resultin looser error bounds, especially when error cancellations occur in the merg-ing paths. Error cancellations are important to include in floating-point erroranalysis in order to obtain tight error bounds.Also, to find the total error at v , we unrolled its operand’s error term, E tr ( v ) as well. Instead, if we only tracked the local error term at each node andscaled them by their respective propagation factors (“path strengths”) to theoutput, we end up with the same expression as in the last part of equation (5).In contrast to existing tools, Satire identifies this balance in finding designatedcut-points where we can solve internal nodes for total error terms and abstractthem, removing their child dependencies.
This process is repeated until the ASTreaches a size that is easily managed using direct solve (unrolling all to localerror terms; see Figure 1):
We now define the first-order error analysis underlying
Satire ending with asoundness claim. Let the expression to be analyzed be presented as a straight-lineprogram with one output being s (other outputs are similarly treated): atire : Abstraction-guided Rigorous Floating-Point Error Analysis 9 s = ( x , s , s , . . . , s n )Here x is the set of m incoming inputs, { x , x , . . . , x m } with m incoming er-ror quantities, { e x , e x , . . . , e x m } . Here, we think of s i as having been computedat the i th step of the overall computation, where f i are the operators supported: s i = f i ( x , s , s , . . . , s i − ) (6)Equation (6) represents the causality of the execution order, that is, theoperator f i being computed in the i th step can involve as its inputs the primaryinputs, x , and any of the intermediary computed results leading upto the i thstage. When such a computation is performed in finite precision, the operators f i are replaced by their floating-point counterparts denoted as ˜ f i . In addition togenerating a round-off error component at the operator site, ˜ f i also propagatesthe incoming errors through each operand. Thus, the i th step of the algorithmwith floating point operators is specified as˜ s i = ˜ f i (˜ x , ˜ s , ˜ s , . . . , ˜ s i − )= ˜ f i ( x + e x , . . . , x m + e x m , s + e s , s + e s , . . . , s i − + e s i − ) (7)Here, every e s j denotes the total error flowing in to the operator site dueto the accumulation at s j , that is e s j = E tr ( s j ). Each ˜ f i generates as round-offerror according to the equation (1) , thus relating the real and floating pointevaluation as˜ s i = f i ( x + e x , . . . , x m + e x m , s + e s , . . . , s i − + e s i − )(1 + δ i ); | δ i | ≤ u (8)If s n is the final step of the algorithm, then one could write its correspondingfloating point ˜ s n as˜ s n = f n ( x + e x , . . . , x m + e x m , s + e s , . . . , s i − + e s n − )(1 + δ n ); | δ n | ≤ u A Taylor series expansion of the f n term on the rhs leads to the following˜ s n = f n ( x + e x , . . . , x m + e x m , s + e s , . . . , s i − + e s n − )(1 + δ n ); | δ n | ≤ u = (cid:18) f n ( x , s , . . . , s n − ) + n − (cid:88) j =1 ∂s n ∂s j e s j + m (cid:88) j =1 ∂s n ∂x j e x j (cid:19) (1 + δ n ) + O ( u )= f n ( x , s , . . . , s n − ) + f n ( x , s , . . . , s n − ) δ n + n − (cid:88) j =1 ∂s n ∂s j e s j + m (cid:88) j =1 ∂s n ∂x j e x j + O ( u )= s n + s n δ n + n − (cid:88) j =1 ∂s n ∂s j e s j + m (cid:88) j =1 ∂s n ∂x j e x j + O ( u ) (9) Satire ’s goal is to bound the absolute error on the output s n , that is e s n = E tr ( s n ) = | s n − ˜ s n | . The term s n δ n is the round-off error generator at the f n operator site representing the local error term, E lr ( s n ). Thus the bound on theabsolute error can be evaluated as E tr ( s n ) ≤ max( |E tr ( s n ) | ) ≤ max (cid:18) | s n δ n | + | n − (cid:88) j =1 ∂s n ∂s j e s j | + | m (cid:88) j =1 ∂s n ∂x j e x j | (cid:19) + O ( u ) ≤ max (cid:18) |E lr ( s n ) | + | n − (cid:88) j =1 ∂s n ∂s j E tr ( s j ) | + | m (cid:88) j =1 ∂s n ∂x j E tr ( x j ) | (cid:19) + O ( u )(10)Equation (10) makes three necessary distinctions as follows. • The second order error terms contain u which is an extremely small quantityand becomes relevant to precision only when n u ≈
1. Given that u = 2 − , n must approach 2 . We perform abstractions well before reaching this point. • Error expressions such as E lr and E tr can coexits in two forms in Satire : (1)as a symbolic expression, and (2) as a numeric interval. Ultimately, when we dothe abstractions or solve for the final output, the expressions are maximized toobtain the maximum error bound. • While the right-hand side of Equation (10) involves additional total errorterms, these can be further unrolled to obtain direct dependencies to the localerror terms down the hierarchy and/or inputs and abstracted nodes.As mentioned in § Satire follows an incremental and decoupled strategy toward error analysis: – A forward pass that assigns a local symbolic error term at each operatorsite, E lr . – A backward pass that uses symbolic algorithmic differentiation to derivethe propagation factors (path strengths or derivatives) from each operatorsite to the output nodes. – Incremental abstractions of nodes as detailed in § E tr ( s n ) are comprised of bothsymbolic (“ E ”) and ground (“ E ”) representations. More specifically, suppose α denotes the set of nodes that have been abstracted to numeric (ground) errorbounds, and β = n − α are the remaining nodes that are preserved symbolically.Equation (10) can be viewed as a combination of both ilks, as in: E tr ( s n ) ≤ max( |E lr ( s n ) | + | (cid:88) s j ∈ β ∂s n ∂s j E tr ( s j ) | + | (cid:88) s j ∈ α ∂s n ∂s j E tr ( s i ) | + m (cid:88) j =1 | ∂s n ∂x j E tr ( x j ) | )(11)Global optimizers are capable of entertaining such mixed “concolic” expressions. atire : Abstraction-guided Rigorous Floating-Point Error Analysis 11 Soundness:
Satire ’s error bounds are conservative:
This claim is easilyproved based on the following observations.
Satire ’s error analysis is decoupledinto (1) computing the local error symbolically, and (2) multiplying it with the sum of all path strengths , again symbolically. Computation of the local error isbased on symbolic execution to which the accepted error model [22] in Equation 1is applied. The method of adding the generated errors and propagating the errorsis justified to be sound in [12,24] (also the beginning of § reverse-mode symbolic differentiation is again known to besound [4]. Satire implements its own symbolic automatic differentiation. Lastbut not least, the abstractions introduced in
Satire are also conservative in thatnew free variables are introduced in lieu of existing expression subgraphs. Thesefree variables do not have any correlations with the other inputs. Their groundvalue intervals are also (recursively) computed using the same error analysis,and by induction (since these are subexpressions) their soundness follows.
Heuristics for Abstraction:
Heuristics that trigger abstraction are based on:(1) relative depth (RD), measuring how close the node is to the output (the closerit is, the less the path strength to the output); (2) a cost function φ ( opCount ),measuring the size of the symbolic expression and the number of free variables init (these require increased optimizer-time); and (3) fanout ( f anout ) from eachoperator site: large fanouts are indicative of a larger dependency footprint. Whileabstracting at such sites can hurt error cancellation due to path correlations, ithelps handle heavily interconnected networks such as stencil-type applicationsor neural networks. The depth heuristics are inspired by Shannon’s informationtheory [45]. If D is the total AST depth and d i is the depth of the i th node, itsrelative depth and the overall cost function is given by:depth info( i ) = − d i D log ( d i D )cost info( i ) = depth info( i ) × ( opCount ( i )) × ( f anout ( i )) (12)Additional knobs include: (1) a fixed depth at which an abstraction is forced(ignoring the cost function); (2) a [minDepth, maxDepth] window within which Satire iteratively performs the abstraction and rebuilds ASTs. In case the depthof the rebuilt AST is below the specified minDepth,
Satire computes the re-maining expression directly (“direct-solve” in Figure 1).
Heuristics to Improve Global Optimization, Canonicalization:
Satire uses Gelpia [18]as the backend global optimizer. It implements an interval branch-and-bound al-gorithm (
IBBA [1]) to obtain a tight interval containing the optima when solvingfor the error expressions. Essentially, when solving for an n − variable expres-sion, it searches an n -dimensional box of intervals defined on the inputs. IBBAproduces queries by repeatedly dividing this n -dimensional box into smaller box(interval) queries, where each query comprises of the symbolic expression and the subdivided interval box. These queries are fed to an interval library (Gaol)that produces an output bound for that interval. The final output is the bestfit n -dimensional box that produces the tightest error bound within a giventolerance and a limit on iterations to constrain the search algorithm.Note that the optimizer works on real-valued expressions and must obtaina bound that contains the optimum. As pointed out in §
2, Gaol can automat-ically canonicalize simple expressions such as x · x · x . However, as expressionsbecome more complex with non-linear terms and multiple variables, the intervalsubdivision process gets further bottlenecked, producing loose intervals. Take,for example, the ‘direct quadrature moments method’ (DQMOM) benchmarkcaptured in Equation (13), where m i ∈ [ − . , .
0] and each of w i and a i belongsto [0 . , . r =(0 . w ∗ (0 . − m )) ∗ ( − . ∗ ((1 . ∗ ( a /w )) ∗ ( a /w )))) ∗ . w ∗ (0 . − m )) ∗ ( − . ∗ ((1 . ∗ ( a /w )) ∗ ( a /w )))) ∗ . w ∗ (0 . − m )) ∗ ( − . ∗ ((1 . ∗ ( a /w )) ∗ ( a /w )))) ∗ .
0) + 0 . When fed to Gelpia, the unsimplified expression for r in DQMOM gener-ates the interval [-9.0E+10, 9.0E+10] . However, the canonicalized expression(automatically done by SimEngine for us), is r = 3 . ∗ ( a ) ∗ m /w + 3 . ∗ ( a ) ∗ m /w + 3 . ∗ ( a ) ∗ m /w and this reduces the occurrence of w i and a i instances. This yields an intervalbound [-9.0E+05, 9.0E+05] , which is 5 orders of magnitude tighter. As a re-sult, the error bound on DQMOM is using FPTaylor and using Satire (Table 1). These canonicalizations plus steps to prevent canonicalizationblocking ( §
2) are largely responsible for
Satire ’s superior overall bounds.
We first evaluate
Satire on FPTaylor’s benchmarks (also provided at [16]),which includes a wide range of examples such as polynomials approximationsand non-linear expressions, all of which consist of a few dozen operators. We thenstudy
Satire on a set of fresh benchmarks comprised of much larger expressionsgoing up to nearly 200k operators (obtained by unrolling loops from specificimplementations of these computations). One important example studied is FFTby separating the complex evaluation into real and imaginary datapaths (e.g.,as done by researchers who implement FFT circuits [27]). We also study theLorenz equation system, followed by a study of relative-error profiles exhibitedby FDTD. Our use of
SimEngine is to obtain canonicalized forms of expressions with respectto variable occurrences, and also expression simplifications. Any other engine that hasthese capabilities may be used in
Satire . atire : Abstraction-guided Rigorous Floating-Point Error Analysis 13 Experimental Setup:
Satire is compatible with Python3, and its symbolicengine is based on
SimEngine [46] (related to SymPy). All benchmarks wereexecuted with Python3.8.0 version on a dual 14-core Intel Xeon CPU E5-2680v42.60GHz CPUs system (total 28 processor cores) with 128GB of RAM. To arriveat an objective comparison, the core analysis algorithms were measured withoutany multicore parallelism (both for FPTaylor and
Satire ). The Gelpia solverdoes employ internal multithreading: we did not alter it in any way when we usedeither FPTaylor or
Satire . All FPTaylor benchmarks used their specified datatypes. Larger benchmarks we introduced use double floating-point type. High-precision shadow value calculations were performed using GCC’s quadmath . Table 1 presents the comparative results for
Satire and FPTaylor. Comparisonis presented for both the total execution times and the absolute error bounds overa suite of 47 benchmarks exported from FPTaylor’s suite. We observe that while
Satire excels in its ability to handle large and complex expressions, even forsmaller benchmarks it obtains comparable and, in many cases, tighter boundsthan today’s state-of-the-art error analysis tools. In most cases,
Satire andFPTaylor generated error bounds within the same orders of magnitude.
Satire obtains even slightly tighter bounds in over 50% of these benchmarks whileproviding an average 4.5 times speed-up in execution time. Using 10 randomsimulations on the input interval ranges we empirically verify Satire ’s bound’swith respect to a high-precision shadow value evaluation of these benchmarks.Furthermore, applying simplification procedures on FPTaylor’s generatedsymbolic taylor forms, we show improvement in its generated bounds. By de-fault,
Satire optimizes the product of the propagation factor and error term ,that is, the “absolute” quantifiers are placed on the product terms. This helpsto improve the expression simplification process. For many of the benchmarksin Table 1,
Satire ’s result without this optimization shows weaker results, butstill within the same order of magnitude as FPTaylor.
Table 2 summarizes the aforesaid large benchmarks where column ‘Full Solve’indicates the bound obtained without any abstractions. Without abstractions,
Satire fails to obtain answers for Lorenz, 2D-heat, and Gram-Schmidt (resid-ual). For linear examples, the derivatives (propagation factors for the errorterms) are always constant and, hence, do not impact the error expressions’scomplexity. However, as expressions become non-linear (e.g., Lorenz), the prop-agation factors are not constants any more, leading to more complex error expres-sions.
This is the primary reason why we can directly compute (“Full Solve”)a bound for FDTD with 192,000 operators without any abstractions —but forLorenz, with merely 307 operators, we had to use abstractions. FPTaylor couldnot generate symbolic Taylor Forms for any of these expressions.
Satire
FPTaylor
Satire
FPTaylor
Satire (UnOpt)exp1x * 0.00010
21x by xy * 4.37E-11 2.92E-11 4logexp * 4.37E-11 2.92E-11 4i4 * 8.91E-06 * 0.00052 * 3.55E-10 16doppler3 * 6.91E-05 2.99E-05 35test06 sums4 sum1 Table 1.
Comparison of Results ( bold-face highlights better results, and * highlightsa difference of more than an order of magnitude) atire : Abstraction-guided Rigorous Floating-Point Error Analysis 15
While abstractions help manage a large expression size, using abstractionsfrequently can impact the bounds. On a positive note, frequent abstraction im-plies smaller queries to the optimizer which helps to converge quicker exploring asmaller search space. Conversely, frequent abstraction will also lead to loss of cor-relation since we are replacing the symbolic expressions with concrete intervals.As a generic trend, while error bounds weaken if the frequency of abstractionsis increased, they remain in the same order of magnitude. In examples suchas FDTD, concretization of internal nodes during abstraction introduces largecorrelation losses. In the presence of cancellation terms (as in FDTD), preserv-ing correlation is necessary because it enables symbolic error variables to cancelout. Abstractions inhibit this cancellation process and, hence, should be usedjudiciously in such examples.
Benchmarks NumOPs Absolute Error Bound BestExecutionTimeFull Solve Increasing frequency of Abstraction −−−−−−−−−−−→
FFT-1024pt 43k 7.89E-13 – – – – 2.6 hrs1D-heat(10) 442
Table 2.
Larger Benchmarks with and without abstractions
To further motivate the importance of obtained rigorous bounds for largebenchmark problems, we study FFT and Lorenz, both being critical componentsof high precision applications.
FFT:
Fast Fourier transform (FFT) is an optimized algorithm for discreteFourier transform (DFT), which converts a finite sequence of sampled pointsof a function into a same length sequence of an equally numbered complex-valued frequency components. It has a vast number of applications in signalprocessing and fast multi-precision arithmetic for large polynomial and integermultiplications. An N -point FFT involves log N stages, each stage having a familiar butterflystructure (see for example [8]). We are not aware of any tool supporting floating-point error analysis over complex domains. However, its application to fast mathmulti-precision libraries necessitates precise floating point error analysis andhas been the subject of multiple analytical studies [41,7,40,27]. These analysismethods focus on obtaining L2-norms in terms of root mean square (RMS)errors, or statistically profiled error bounds. Brisbarre et al. [8] followed up onthe work from Percival’s [40,7] to report the best L2-norm bound till date of ≈ . ulps . That is, if z represents the discretized input samples, Z is theexact result and ˆ Z is the computed result, then, || Z − ˆ Z || || Z || ≤ B ≈ . u To extend this result to a bound on absolute error corresponding to the L-infinity norm, we utilize two well-known relations: (1) Between the L2 normsof the input and outputs of an N -point FFT, i.e., || Z || ≤ √ N · || z || , and(2) a generic relation between the L-infinity norm and L2-norm, i.e., || Z || ≤√ n || Z || ∞ . Using these relations, the L-infinity norm on the error(as obtained in[8]) of the computed FFT result can be obtained as || Z − ˆ Z || ∞ ≤ B · N √ || z || ∞ (14) Equations (14) obtain the absolute-error bound analytically. A 1024-pointFFT with input samples in the interval [0 ,
1] with an L2-norm bound of B ≈ . u obtains an absolute error bound of 78200 u . For double precision datatype, with u = 2 − , implying an error bound of . Satire partitions the real and imaginary parts of the complex operationsin FFT, obtaining real expression types for the output variables guarded withrounding information at every compute stage. Two separate datapaths are gen-erated for the real and imaginary terms, each of which is solver individually. Let E R and E I denote the absolute error bounds obtains by solving the real andimaginary parts, respectively. These solutions, on their own, provide the indi-vidual accuracy information of the real and imaginary expressions. Additionally, E T = (cid:112) E R + E I gives the bound on the maximum of the total absolute error.It is an upper bound on the L-infinity norm.We show that Satire obtains a tighter bound than the analytical boundobtained by [8] as in equation (14). We also select the input space in the interval[0 , E R ≤ . E − E I ≤ . E −
13. Thus the total error bound is E T ≤ whichis tighter than the best analytical bound obtained in [8]. We tried differentinput intervals for FFT, each time obtaining a tight bound in comparison to theanalytical bounds of [8]. However, the optimizer faced convergence difficultiesfor intervals with zero crossings like [ − , Lorenz equations:
Lorenz equations model thermally induced fluid convec-tion using three state variables ( x, y, z ). Here, x represents the fluid velocity am-plitude, y models temperature difference between top and bottom membranes, atire : Abstraction-guided Rigorous Floating-Point Error Analysis 17 while z represents a distortion from linearity of temperature [30]. The equationsrequires three additional parameters, a = 10 called the Prandtl number, b = 8 / r being the Rayleighnumber proportional to the temperature difference. The recurrence relations ob-tained by discretizing the continuous versions of the Lorenz equations are givenin equation (15), where k represents the previous iteration and dt is the timediscretization. x k +1 = x k + a ( x k − y k ) dt ; y k +1 = y k + ( − x k z k + rx k − y k ) dtz k +1 = z k + ( x k y k − bz k ) dt (15) In [30], authors study the trajectories for different r values for chosen initialconditions of ( x , y , z , dt ) = (1 . , . , . , . r ≥ .
35 and again starts approaching equilibrium once r reaches close to 200.However, for such chaotic systems, if two initial conditions differ by a quantityof δ , the resulting difference after time t shows exponential separation in termsof δ · e λt . This becomes a critical component when evaluating such equationsin finite precision since the round-off error accumulation introduces a gradual δ error building up.We focus on two aspects of the analysis: (1) Obtaining bounds over a range ofinput intervals over ( x, y, z ), and (2) Analyze the sensitivity of initial conditionson individual inputs by using degenerate intervals on the other inputs.Beyond a few iterations, the non-linearity involved in these equations impedethe application of current tools such as FPTaylor. Using FPTaylor, it took 350seconds to generate Taylor forms for a small 5-iteration case, while timing outfor unrolls beyond 10 iterations. Additionally, the symbolic Taylor form for 5iterations was so large that backend optimizer could not handle.Using Satire , we obtain tight bounds for as large as 70 iterations of theproblem using abstraction-guided method. Note here the non-linearity of theseequations makes it difficult to simplify expressions beyond a certain limit.
Satire delays further canonicalization beyond an operational count larger than 8,000,controlled by a parametric knob, maxOpCount , with default value of 8,000 se-lected over multiple experiments over a mix of non-linear and linear systems.The error expressions composed of the products of forward error and reversederivative may reach an operator count of over 100K over just 20 iterations,choking simEngine ’s simplification process. Using maxOpCount , we only al-low simplification within the depth necessary for abstraction. During the nextabstraction, further simplification will occur.For example, for the Lorenz-20 (that is unrolled over 20 iterations), the overallAST depth is 89. Table 3 shows the bounds obtained for varying windows of theabstraction depth and the corresponding execution time(exec time) in seconds.We perform analysis for both 20 and 40 iterations. The second state variable, y , shows more worst-case variation. Therefore, we studied the sensitivity of y fora larger size of 70 iterations using degenerate intervals in x and y , obtaining abound of 1.11E-11 as opposed to 1.9E-10 in the non-degenerate case. Table 3.
Error bounds for the three state variables in the Lorenz equation
Error bounds obtained using static analysis methods generally provide the worst-case absolute-error bounds for a given input range. However, absolute error is notindicative of true precision-loss: a worst-case absolute error of, say, 0 .
01 is largewhen the actual value is close to 1 than when it is 100. Relative error, obtainedby normalizing the absolute error with the true value of the variable, providesmore insight into precision. Given a real-valued expression f , its floating pointcounterpart ˜ f , then re ( f ) = ( | f − ˜ f | /f ) denotes its relative error.Using Satire , we can obtain a maximum bound for the numerator, that is,max( | f − ˜ f | ). Tools that attempt to obtain a relative-error bound statically doso in two primary ways: (1) divide the worst-case error by the lower bound of thereal-valued function interval, or (2) obtain a symbolic expression for re ( f ) andsolve the optimization problem. While the prior technique leads to quite con-servative bounds (loss of correlation between the numerator and denominators),the latter suffers when encountering intervals close to or containing 0.In practice, however, the output is only rarely close to 0. Our approach capi-talizes on this to obtain dynamic information on the relative error, making use ofthe observed runtime outputs. More precisely, we want to quantify the numer ofbits lost using Satire -computed worst-case absolute error. For a floating-pointsystem using p precision bits and a machine epsilon of u , the number of bits lostis obtained as bits lost = p − log ( re ( f ) / u ).The information we do not have here is the true value of f . To this end, we use˜ f as a proxy for f in the denominator. Since ˜ f is the observed value at runtime,there is no extra overhead to evaluate it (unlike shadow value calculations).We use FDTD as a case study here. We select FDTD because it was theworst-behaving benchmark when using frequent abstractions (attributable tothe large number of cancellation terms), implying a larger impact on precisionbits than additive kernels like heat. We analyze it in the input interval [0 ,
1] toobtain values closer to zero, because floating-point values are highly condensedin this region and many binades are included in this interval.
Even for shadow-value evaluations, one uses a high-precision value as a proxyfor the “true value.” Let f dp and f qp denote the observed values when using atire : Abstraction-guided Rigorous Floating-Point Error Analysis 19 Q = qsat − qshadow ; qsat = max( | f − ˜ f | ) | f dp | ; qshadow = | f qp − f dp || f qp | ;(16) b i t s d i ff u s i n g s a t i r e v s s h a d o w v a l u e Fig. 3.
Difference in precision estimate(
Satire and shadow value calculation double and quad (high) precision, respectively. In figure 3, the y-axis plots theprediction difference, Q , evaluated as shown in equation (16) and the x-axis plotsthe simulations done for the empirical evaluation. Here qsat , the numerator, isfrom Satire ’s worst case-bounds, the denominator is the observed value in theworking precision (here, double). For qshadow , the numerator is the observeddifference in dynamic data between double and quad precision values, while itsdenominator is the quad precision value.In practice, using shadow-value calculations is a large burden for the user:an entirely duplicated thread has to be run in high precision, with additionalnecessary code changes. Our objective is to help the user avoid all this andstill allow them to gain value through
Satire ’s analysis toward precision loss.We aim to show that using only the observed values, we can precisely informthe loss in precision bits. In other words, Q should be always greater than zerofor soundness but close to zero for tightness. Figure 3 shows denser regionscloser to zero implying the tightness of our analysis compared to the shadowvalue calculation, while never crossing zero indicating a sound bound. Thus,we believe that users can use Satire ’s analysis to obtain a quick indication ofprecision loss—at least in a relative sense. That is, they may be able to use ourmethods to zoom into code regions that have higher loss and improve overallhigher precision.
Precision analysis plays an important role in safety-critical systems. For trust-worthy error analysis formally certified bounds are needed, and some recentexamples are [3,47].Rosa and FPTaylor are the closest to our work in their approach to extendTaylor forms rigorous floating point error estimation. Rosa propagates errors innumeric affine form and uses SMT solvers to obtain tight bounds. In FPTaylor,full symbolic Taylor forms are obtained and fed to a global optimizer, which often results in large expressions, impeding scalability. In contrast,
Satire ’s decoupledanalysis achieves the same end-result albeit without this complexity, by using aforward and backward pass that greatly improves overall effectiveness.While scalability has been previously achieved, it required manual assistanceusing abstract interpretation [10] coupled with theorem proving (e.g., FLuc-tuat [14,6]). Gappa [13] is basically an interval-based reasoning system, butcomes with many simplification rules of its own, and has been embedded intoverifiers such as [17] and various proof-assistants [34,23,49]. Combined uses ofstatic analyses [20] are popular, as demonstrated at scale in Astr´ee [11]. A generalabstract domain for floating-point computations is described in [32].The importance of sound techniques for relative error estimation has beenrecognized by many. A common approach for relative error estimation is to firstobtain the absolute error and then divide it by the minimum of the functioninterval value. A combined rigorous approach to relative error estimation is pre-sented in [25]. Our approach for relative error estimation is meant for use in adynamic analysis setting. It is motivated by the important pragmatic considera-tion of avoiding shadow-value computations on an existing piece of code. We usethe observed value at runtime in combination with
Satire ’s worst-case boundto obtain a relative error profile that provides insights on precision loss. Throughempirical evaluation, we establish the reliability of this approach.
We presented
Satire , a tool for rigorous floating-point error analysis that pro-duces tight error bounds in practice.
Satire is similar to many of its predeces-sors, but specifically emphasizes handling large expressions that arise in practiceby including an information-theoretic abstraction mechanism for scalability. Theeffectiveness of
Satire has been demonstrated on practical examples includingFFT, parallel prefix sum, and stencils for various partial differential equation(PDE) types. Even divergent families of equations such as the Lorenz systemare included in our study.
Satire can provide insights on the loss of precisionat runtime as demonstrated on a large ill-conditioned problem. We believe thisvariety and scale in an automated rigorous tool is unique.Our work quickly showed us that rather than merely the size of expressions,one must also take the kinds of computations being analyzed. For instance, verylarge expressions generated by FDTD and heat-flow can be analyzed withoutusing abstractions for large sizes; however, highly non-linear systems such asLorenz require abstractions even for small sizes.An important conclusion that emerges pertains to the impact of losing vari-able correlations due to abstractions: it is an important consideration in choosingwhen and where to abstract. The importance of incremental and decoupled er-ror expression computation is also brought out. Given that global optimizers areworkhorses in error estimation, our studies shed light on the role of expressioncanonicalization. This fits well with incremental computation, allowing globaloptimizer calls to be smaller as well as can be parallelized. atire : Abstraction-guided Rigorous Floating-Point Error Analysis 21
Important future directions include handling loops (enabling further scal-ing), improving abstractions without increasing error bounds, and the use ofparallelism to further speed up the analysis.
References
1. Alliot, J.M., Durand, N., Gianazza, D., Gotteland, J.B.: Finding and provingthe optimum: Cooperative stochastic and deterministic search. In: Proceedings ofthe 20th European Conference on Artificial Intelligence (ECAI). pp. 55–60. ACM(2012). https://doi.org/10.3233/978-1-61499-098-7-552. Altman, M., Gill, J., McDonald, M.P.: Numerical Issues in Statistical Com-puting for the Social Scientist. John Wiley & Sons, Inc. (Dec 2003).https://doi.org/10.1002/0471475769, https://doi.org/10.1002/0471475769
3. Becker, H., Zyuzin, N., Monat, R., Darulova, E., Myreen, M., Fox, A.: A verifiedcertificate checker for finite-precision error bounds in coq and hol4. In: FMCAD.pp. 1–10 (10 2018). https://doi.org/10.23919/FMCAD.2018.86030194. Bischof, C., Buker, H., Hovland, P., Naumann, U., Utke, J. (eds.): Advances inAutomatic Differentiation. Springer (2008), iSBN : 978-3-540-68935-55. Boldo, S., Cl´ement, F., Filliˆatre, J.C., Mayero, M., Melquiond, G., Weis,P.: Wave equation numerical resolution: A comprehensive mechanized proofof a c program. Journal of Automated Reasoning (4), 423–456 (Aug2012). https://doi.org/10.1007/s10817-012-9255-4, https://doi.org/10.1007/s10817-012-9255-4
6. Boldo, S., Cl´ement, F., Filliˆatre, J.C., Mayero, M., Melquiond, G., Weis, P.:Wave equation numerical resolution: A comprehensive mechanized proof of aC program. Journal of Automated Reasoning (JAR) (4), 423–456 (2013).https://doi.org/10.1007/s10817-012-9255-47. Brent, R.P., Percival, C., Zimmermann, P.: Error bounds on complex floating-pointmultiplication. Math. Comput. , 1469–1481 (2007)8. Brisebarre, N., Joldes, M., Muller, J.M., Nane¸s, A.M., Picot, J.: Error analysisof some operations involved in the Cooley-Tukey Fast Fourier Transform. ACMTransactions on Mathematical Software pp. 1–34 (2019)9. Chiang, W.F., Baranowski, M., Briggs, I., Solovyev, A., Gopalakrishnan, G.,Rakamariundefined, Z.: Rigorous floating-point mixed-precision tuning. SIGPLANNot. (1), 300315 (Jan 2017). https://doi.org/10.1145/3093333.3009846, https://doi.org/10.1145/3093333.3009846
10. Cousot, P., Cousot, R.: Abstract interpretation: A unified lattice model for staticanalysis of programs by construction or approximation of fixpoints. In: Proceedingsof the 4th ACM SIGACT-SIGPLAN Symposium on Principles of ProgrammingLanguages (POPL). pp. 238–252. ACM (1977)11. Cousot, P., Cousot, R., Feret, J., Mauborgne, L., Min´e, A., Monniaux, D., Rival,X.: The ASTR´EE analyser. In: Proceedings of the 14th European Symposiumon Programming Languages and Systems (ESOP). Lecture Notes in ComputerScience, vol. 3444, pp. 21–30. Springer (2005)12. Darulova, E., Kuncak, V.: Sound compilation of reals. In: Proceedings of the 41stACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages(POPL). pp. 235–248. ACM (2014)13. Daumas, M., Melquiond, G.: Certification of Bounds on Expressions InvolvingRounded Operators. ACM Transactions on Mathematical Software (1), 1–20(2010)14. Delmas, D., Goubault, E., Putot, S., Souyris, J., Tekkal, K., V´edrine, F.: To-wards an Industrial Use of FLUCTUAT on Safety-Critical Avionics Software.In: Formal Methods for Industrial Critical Systems, FMICS 2009, Lecture Notesin Computer Science, vol. 5825, pp. 53–69. Springer Berlin Heidelberg (2009).https://doi.org/10.1007/978-3-642-04570-7 6 atire : Abstraction-guided Rigorous Floating-Point Error Analysis 2315. Denis, C., de Oliveira Castro, P., Petit, E.: Verificarlo: Checking floating point ac-curacy through monte carlo arithmetic. In: Montuschi, P., Schulte, M.J., Hormigo,J., Oberman, S.F., Revol, N. (eds.) 23nd IEEE Symposium on Computer Arith-metic, ARITH 2016, Silicon Valley, CA, USA, July 10-13, 2016. pp. 55–62.IEEE Computer Society (2016). https://doi.org/10.1109/ARITH.2016.31, https://doi.org/10.1109/ARITH.2016.31
16. FPBENCH, benchmarks, compilers and standards for the floating point researchcommunity. http://fpbench.org/benchmarks.html
17. Frama-C Software Analyzers. http://frama-c.com/index.html (2017)18. Gelpia: A global optimizer for real functions (2017), https://github.com/soarlab/gelpia
19. Goodloe, A.E., Mu˜noz, C., Kirchner, F., Correnson, L.: Verification of numericalprograms: From real numbers to floating point numbers. In: Brat, G., Rungta, N.,Venet, A. (eds.) NASA Formal Methods. pp. 441–446. Springer Berlin Heidelberg,Berlin, Heidelberg (2013)20. Goubault, E., Putot, S.: Static Analysis of Finite Precision Computations. In:International Workshop on Verification, Model Checking, and Abstract Interpre-tation, VMCAI 2011, Lecture Notes in Computer Science, vol. 6538, pp. 232–247.Springer Berlin Heidelberg (2011)21. Guo, H., Rubio-Gonzlez, C.: Exploiting community structure forfloating-point precision tuning. In: ISSTA. pp. 333–343 (07 2018).https://doi.org/10.1145/3213846.321386222. Harrison, J.: A machine-checked theory of floating point arithmetic. In: Bertot,Y., Dowek, G., Hirschowitz, A., Paulin, C., Th´ery, L. (eds.) Theorem Proving inHigher Order Logics: 12th International Conference, TPHOLs’99. Lecture Notesin Computer Science, vol. 1690, pp. 113–130. Springer-Verlag, Nice, France (1999)23. Harrison, J.: Floating-Point Verification Using Theorem Proving. In: SFM 2006,Lecture Notes in Computer Science, vol. 3965, pp. 211–242. Springer Berlin Hei-delberg (2006)24. Higham, N.J.: Accuracy and Stability of Numerical Algorithms. So-ciety for Industrial and Applied Mathematics, second edn. (2002).https://doi.org/10.1137/1.9780898718027, https://epubs.siam.org/doi/abs/10.1137/1.9780898718027
25. Izycheva, A., Darulova, E.: On sound relative error bounds for floating-point arith-metic. In: Proceedings of the 17th Conference on Formal Methods in Computer-Aided Design. p. 1522. FMCAD 17, FMCAD Inc, Austin, Texas (2017)26. Jacquemin, M., Putot, S., V´edrine, F.: A reduced product of absolute and rela-tive error bounds for floating-point analysis. In: Podelski, A. (ed.) Static Anal-ysis - 25th International Symposium, SAS 2018, Freiburg, Germany, August29-31, 2018, Proceedings. Lecture Notes in Computer Science, vol. 11002, pp.223–242. Springer (2018). https://doi.org/10.1007/978-3-319-99725-4 15, https://doi.org/10.1007/978-3-319-99725-4_15
27. Jr, E., Saleh, H.: Fft implementation with fused floating-point oper-ations. Computers, IEEE Transactions on , 284 – 288 (03 2012).https://doi.org/10.1109/TC.2010.27128. Lam, M.O., Hollingsworth, J.K.: Fine-grained floating-point precision analysis. TheInternational Journal of High Performance Computing Applications (2), 231–245 (Jun 2016). https://doi.org/10.1177/1094342016652462, https://doi.org/10.1177/1094342016652462 (POPL), 47:1–47:32 (2018).https://doi.org/10.1145/3158135, https://doi.org/10.1145/3158135
30. Liang, J., Song, W.: Difference equation of lorenz system. Inter-national Journal of Pure and Apllied Mathematics (02 2013).https://doi.org/10.12732/ijpam.v83i1.931. Magron, V., Constantinides, G., Donaldson, A.: Certified roundoff error boundsusing semidefinite programming. ACM Transactions on Mathematical Software (4), 34:1–34:31 (Jan 2017). https://doi.org/10.1145/3015465, http://doi.acm.org/10.1145/3015465
32. Martel, M.: Semantics of roundoff error propagation in finite precisioncalculations. Higher Order Symbolic Computation (1), 7–30 (2006).https://doi.org/10.1007/s10990-006-8608-2, http://dx.doi.org/10.1007/s10990-006-8608-2
33. Mccullough, B.D., Vinod, H.: The numerical reliability of economet-ric software. Journal of Economic Literature , 633–665 (02 1999).https://doi.org/10.1257/jel.37.2.63334. Melquiond, G.: Floating-Point Arithmetic in the Coq System. Information andComputation , 14–23 (2012). https://doi.org/10.1016/j.ic.2011.09.005, http://dx.doi.org/10.1016/j.ic.2011.09.005
35. Menon, H., Lam, M.O., Osei-Kuffuor, D., Schordan, M., Lloyd, S., Mohror, K.,Hittinger, J.: Adapt: Algorithmic differentiation applied to floating-point preci-sion tuning. In: Proceedings of the International Conference for High PerformanceComputing, Networking, Storage, and Analysis. SC 18, IEEE Press (2018)36. Moscato, M.M., Titolo, L., Feli´u, M.A., Mu˜noz, C.A.: Provably correct floating-point implementation of a point-in-polygon algorithm. In: ter Beek, M.H., McIver,A., Oliveira, J.N. (eds.) Formal Methods – The Next 30 Years. pp. 21–37. SpringerInternational Publishing, Cham (2019)37. Narkawicz, A., Mu˜noz, C., Dutle, A.: Formally-verified decision procedures for uni-variate polynomial computation based on sturm’s and tarski’s theorems. Journal ofAutomated Reasoning (4), 285–326 (Feb 2015). https://doi.org/10.1007/s10817-015-9320-x, https://doi.org/10.1007/s10817-015-9320-x
38. Panchekha, P., Sanchez-Stern, A., Wilcox, J.R., Tatlock, Z.: Automatically im-proving accuracy for floating point expressions. In: Proceedings of the 36th ACMSIGPLAN Conference on Programming Language Design and Implementation,PLDI 2015. pp. 1–11. ACM (2015). https://doi.org/10.1145/2737924.2737959, http://doi.acm.org/10.1145/2737924.2737959
39. Failure of the Patriot Missle due to Floating-Point Error Accumulation.
40. Percival, C.: Rapid multiplication modulo the sum and difference ofhighly composite numbers. Math. Comput. (241), 387395 (Jan 2003).https://doi.org/10.1090/S0025-5718-02-01419-9, https://doi.org/10.1090/S0025-5718-02-01419-9
41. Ramos, G.: Roundoff error analysis of the fast fourier transform. Math-ematics of Computation - Math. Comput. , 757–757 (10 1971).https://doi.org/10.1090/S0025-5718-1971-0300488-042. Rubio-Gonz´alez, C., Nguyen, C., Nguyen, H.D., Demmel, J., Kahan, W., Sen, K.,Bailey, D.H., Iancu, C., Hough, D.: Precimonious: Tuning assistant for floating-point precision. In: Supercomputing (SC). pp. 27:1–27:12 (2013), https://github.com/corvette-berkeley/precimonious atire : Abstraction-guided Rigorous Floating-Point Error Analysis 2543. Salvia, R., Titolo, L., Feli´u, M.A., Moscato, M.M., Mu˜noz, C.A., Rakamaric, Z.: Amixed real and floating-point solver. In: Badger, J.M., Rozier, K.Y. (eds.) NASAFormal Methods - 11th International Symposium, NFM 2019, Houston, TX, USA,May 7-9, 2019, Proceedings. Lecture Notes in Computer Science, vol. 11460, pp.363–370. Springer (2019). https://doi.org/10.1007/978-3-030-20652-9 25, https://doi.org/10.1007/978-3-030-20652-9_25
44. Schkufza, E., Sharma, R., Aiken, A.: Stochastic Optimization of Floating-pointPrograms with Tunable Precision. In: PLDI 2014. pp. 53–64. PLDI ’14, ACM(2014)45. Shannon, C.E.: A mathematical theory of communication. The Bell Sys-tem Technical Journal (3), 379–423 (7 1948). https://doi.org/10.1002/j.1538-7305.1948.tb01338.x, https://ieeexplore.ieee.org/document/6773024/
46. Simengine.
47. Solovyev, A., Baranowski, M.S., Briggs, I., Jacobsen, C., Rakamaric, Z., Gopalakr-ishnan, G.: Rigorous estimation of floating-point round-off errors with symbolictaylor expansions. ACM Trans. Program. Lang. Syst. (1), 2:1–2:39 (2019).https://doi.org/10.1145/3230733, https://doi.org/10.1145/3230733
48. Stolfi, J., de Figueiredo, L.H.: An Introduction to Affine Arithmetic. TEMA Trendsin Applied and Computational Mathematics (3), 297–312 (2003)49. Titolo, L., Feli´u, M.A., Moscato, M., Mu˜noz, C.A.: An abstract interpretationframework for the round-off error analysis of floating-point programs. In: LectureNotes in Computer Science, pp. 516–537. Springer International Publishing (Dec2017). https://doi.org/10.1007/978-3-319-73721-8 24,(3), 297–312 (2003)49. Titolo, L., Feli´u, M.A., Moscato, M., Mu˜noz, C.A.: An abstract interpretationframework for the round-off error analysis of floating-point programs. In: LectureNotes in Computer Science, pp. 516–537. Springer International Publishing (Dec2017). https://doi.org/10.1007/978-3-319-73721-8 24,