Memory-Efficient Fixpoint Computation
MMemory-Efficient Fixpoint Computation
Sung Kook Kim , Arnaud J. Venet , and Aditya V. Thakur University of California, Davis CA 95616, USA {sklkim,avthakur}@ucdavis.edu Facebook, Inc., Menlo Park CA 94025, USA [email protected]
Abstract.
Practical adoption of static analysis often requires tradingprecision for performance. This paper focuses on improving the memoryefficiency of abstract interpretation without sacrificing precision or timeefficiency. Computationally, abstract interpretation reduces the problemof inferring program invariants to computing a fixpoint of a set of equa-tions. This paper presents a method to minimize the memory footprintin Bourdoncle’s iteration strategy, a widely-used technique for fixpointcomputation. Our technique is agnostic to the abstract domain used. Weprove that our technique is optimal (i.e., it results in minimum memoryfootprint) for Bourdoncle’s iteration strategy while computing the sameresult. We evaluate the efficacy of our technique by implementing it in atool called
Mikos , which extends the state-of-the-art abstract interpreterIKOS. When verifying user-provided assertions,
Mikos shows a decreasein peak-memory usage to . % ( . × ) on average compared to IKOS.When performing interprocedural buffer-overflow analysis, Mikos showsa decrease in peak-memory usage to . % ( . × ) on average comparedto IKOS. Abstract interpretation [15] is a general framework for expressing static analysisof programs. Program invariants inferred by an abstract interpreter are used inclient applications such as program verifiers, program optimizers, and bug find-ers. To extract the invariants, an abstract interpreter computes a fixpoint of anequation system approximating the program semantics. The efficiency and preci-sion of the abstract interpreter depends on the iteration strategy , which specifiesthe order in which the equations are applied during fixpoint computation.The recursive iteration strategy developed by Bourdoncle [10] is widely usedfor fixpoint computation in academic and industrial abstract interpreters suchas NASA IKOS [11], Crab [32], Facebook SPARTA [17], Kestrel TechnologyCodeHawk [48], and Facebook Infer [12]. Extensions to Bourdoncle’s approachthat improve precision [1] and time efficiency [27] have also been proposed.This paper focuses on improving the memory efficiency of abstract inter-pretation. This is an important problem in practice because large memory re-quirements can prevent clients such as compilers and developer tools from using a r X i v : . [ c s . P L ] S e p S. Kim et al.
Fig. 1: Control-flow graph G sophisticated analyses. This has motivated approaches for efficient implementa-tions of abstract domains [26,4,44], including techniques that trade precision forefficiency [18,5,25].This paper presents a technique for memory-efficient fixpoint computation.Our technique minimizes the memory footprint in Bourdoncle’s recursive iter-ation strategy. Our approach is agnostic to the abstract domain and does notsacrifice time efficiency. We prove that our technique exhibits optimal peak-memory usage for the recursive iteration strategy while computing the samefixpoint (§ 3). Specifically, our approach does not change the iteration order butprovides a mechanism for early deallocation of abstract values. Thus, there isno loss of precision when improving memory performance. Furthermore, such“backward compatibility” ensures that existing implementations of Bourdoncle’sapproach can be replaced without impacting clients of the abstract interpreter,an important requirement in practice.Suppose we are tasked with proving assertions at program points and ofthe control-flow graph G ( V, ) in Figure 1. Current approaches (§ 2.1) allocateabstract values for each program point during fixpoint computation, check theassertions at and after fixpoint computation, and then deallocate all abstractvalues. In contrast, our approach deallocates abstract values and checks theassertions during fixpoint computation while guaranteeing that the results ofthe checks remain the same and that the peak-memory usage is optimal.We prove that our approach deallocates abstract values as soon as they are nolonger needed during fixpoint computation. Providing this theoretical guaranteeis challenging for arbitrary irreducible graphs such as G . For example, assumingthat node is analyzed after , one might think that the fixpoint iterator candeallocate the abstract value at once it analyzes . However, is part of thestrongly-connected component { , } , and the fixpoint iterator might need toiterate over node multiple times. Thus, deallocating the abstract value at when node is first analyzed will lead to incorrect results. In this case, theearliest that the abstract value at can be deallocated is after the stabilizationof component { , } .Furthermore, we prove that our approach performs the assertion checks asearly as possible during fixpoint computation. Once the assertions are checked,the associated abstract values are deallocated. For example, consider the asser-tion check at node . Notice that is part of the strongly-connected components { , } and { , , , } . Checking the assertion the first time node is analyzedcould lead to an incorrect result because the abstract value at has not con-verged. The earliest that the check at node can be executed is after the conver-gence of the component { , , , } . Apart from being able to deallocate abstractvalues earlier, early assertion checks provide partial results on timeout. emory-Efficient Fixpoint Computation 3 The key theoretical result (Theorem 1) is that our iteration strategy ismemory-optimal (i.e., it results in minimum memory footprint) while computingthe same result as Bourdoncle’s approach. Furthermore, we present an almost-linear time algorithm to compute this optimal iteration strategy (§ 4).We have implemented this memory-optimal fixpoint computation in a toolcalled
Mikos (§ 5), which extends the state-of-the-art abstract interpreter forC/C++, IKOS [11]. We compared the memory efficiency of
Mikos and IKOSon the following tasks:T1 Verifying user-provided assertions. Task T1 represents the program-verificationclient of a fixpoint computation. We performed interprocedural analysis of784 SV-COMP 2019 benchmarks [6] using reduced product of DifferenceBound Matrix with variable packing [18] and congruence [21] domains.T2 Proving absence of buffer overflows. Task T2 represents the bug-finding andcompiler-optimization client of fixpoint computation. In the context of bugfinding, a potential buffer overflow can be reported to the user as a potentialbug. In the context of compiler optimization, code to check buffer-accesssafety can be elided if the buffer access is verified to be safe. We performedinterprocedural buffer overflow analysis of 426 open-source programs usingthe interval abstract domain.On Task T1,
Mikos shows a decrease in peak-memory usage to . % ( . × )on average compared to IKOS. For instance, peak-memory required to ana-lyze the SV-COMP 2019 benchmark ldv-3.16-rc1/205_9a-net-rtl8187 de-creased from 46 GB to 56 MB . Also, while ldv-3.14/usb-mxl111sf spaced outin IKOS with 64 GB memory limit, peak-memory usage was 21 GB for Mikos .On Task T2,
Mikos shows a decrease in peak-memory usage to . % ( . × )on average compared to IKOS. For instance, peak-memory required to analyzea benchmark ssh-keygen decreased from 30 GB to 1 GB.The contributions of the paper are as follows: – A memory-optimal technique for Bourdoncle’s recursive iteration strategythat does not sacrifice precision or time efficiency (§ 3). – An almost-linear time algorithm to construct our memory-efficient iterationstrategy (§ 4). – Mikos , an interprocedural implementation of our approach (§ 5). – An empirical evaluation of the efficacy of
Mikos using a large set of Cbenchmarks (§ 6).§ 2 presents necessary background on fixpoint computation, including Bourdon-cle’s approach; § 7 presents related work; § 8 concludes.
This section presents background on fixpoint computation that will allow us toclearly state the problem addressed in this paper (§2.3). This section is not meant
S. Kim et al. to capture all possible approaches to implementing abstract interpretation. How-ever, it does capture the relevant high-level structure of abstract-interpretationimplementations such as IKOS [11].Consider an equation system Φ whose dependency graph is G ( V, ) . Thegraph G typically reflects the control-flow graph of the program, though this isnot always true. The aim is to find the fixpoint of the equation system Φ : Pre [ v ] = (cid:71) { Post [ p ] | p v } v ∈ V (1) Post [ v ] = τ v ( Pre [ v ]) v ∈ V The maps
Pre : V → A and Post : V → A maintain the abstract values at thebeginning and end of each program point, where A is an abstract domain. Theabstract transformer τ v : A → A overapproximates the semantics of programpoint v ∈ V . After fixpoint computation, Pre [ v ] is an invariant for v ∈ V .Client applications of the abstract interpreter typically query these fixpointvalues to perform assertion checks, program optimizations, or report bugs. Let V C ⊆ V be the set of program points where such checks are performed, and let ϕ v : A → bool represent the corresponding functions that performs the check foreach v ∈ V C . To simplify presentation, we assume that the check function merelyreturns true or false . Thus, after fixpoint computation, the client applicationcomputes ϕ v ( Pre [ v ]) for each v ∈ V C .The exact least solution of the system Eq. 1 can be computed using Kleeneiteration provided A is Noetherian. However, most interesting abstract domainsrequire the use of widening ( (cid:79) ) to ensure termination followed by narrowing to improve the post solution. In this paper, we use “fixpoint” to refer to suchan approximation of the least fixpoint. Furthermore, for simplicity of presen-tation, we restrict our description to a simple widening strategy. However, ourimplementation (§ 5) uses more sophisticated widening and narrowing strategiesimplemented in state-of-the-art abstract interpreters [11,1].An iteration strategy specifies the order in which the individual equationsare applied, where widening is used, and how convergence of the equation sys-tem is checked. For clarity of exposition, we introduce a Fixpoint Machine ( FM ) consisting of an imperative set of instructions. An FM program represents a par-ticular iteration strategy used for fixpoint computation. The syntax of FixpointMachine programs is defined by the following grammar: Prog ::= exec v | repeat v [ Prog ] | Prog (cid:35)
Prog , v ∈ V (2)Informally, the instruction exec v applies τ v for v ∈ V ; the instruction repeat v [ P ] repeatedly executes the FM program P until convergence and performs wideningat v ; and the instruction P (cid:35) P executes FM programs P and P in sequence.The syntax (Eq. 2) and semantics (Figure 2) of the Fixpoint Machine are suf-ficient to express Bourdoncle’s recursive iteration strategy (§ 2.1), a widely-usedapproach for fixpoint computation [10]. We also extend the notion of iterationstrategy to perform memory management of the abstract values as well as per-form checks during fixpoint computation (§ 2.2). emory-Efficient Fixpoint Computation 5 In this section, we review Bourdoncle’s recursive iteration strategy [10] and showhow to generate the corresponding FM program.Bourdoncle’s iteration strategy relies on the notion of weak topological order-ing (WTO) of a directed graph G ( V, ) . A WTO is defined using the notion ofa hierarchical total ordering (HTO) of a set. Definition 1. A hierarchical total ordering H of a set S is a well parenthesizedpermutation of S without two consecutive “(”. (cid:4) An HTO H is a string over the alphabet S augmented with left and right paren-thesis. Alternatively, we can denote an HTO H by the tuple ( S, (cid:22) , ω ) , where (cid:22) is the total order induced by H over the elements of S and ω : V → V . Theelements between two matching parentheses are called a component , and the firstelement of a component is called the head . Given l ∈ S , ω ( l ) is the set of headsof the components containing l . We use C : V → V to denote the mapping froma head to its component. Example 1.
Let V = { , , , , , , , , } . An example HTO H ( V, (cid:22) , ω ) is ( ( ) ) ( ) . ω (3) = { } , ω (5) = { , } , and ω (1) = ∅ . It has compo-nents C (4) = { , } , C (7) = { , } and C (3) = { , } ∪ C (4) . (cid:4) A weak topological ordering (WTO) W of a directed graph G ( V, ) is anHTO H ( V, (cid:22) , ω ) satisfying certain constraints listed below: Definition 2. A weak topological ordering W ( V, (cid:22) , ω ) of a directed graph G ( V, ) is an HTO H ( V, (cid:22) , ω ) such that for every edge u → v , either (i) u ≺ v , or(ii) v (cid:22) u and v ∈ ω ( u ) . (cid:4) Example 2.
HTO H in Example 1 is a WTO W of the graph G (Figure 1). (cid:4) Given a directed graph G ( V, ) that represents the dependency graph of theequation system, Bourdoncle’s approach uses a WTO W ( V, (cid:22) , ω ) of G to derivethe following recursive iteration strategy : – The total order (cid:22) determines the order in which the equations are applied.The equation after a component is applied only after the component stabi-lizes. – The stabilization of a component C ( h ) is determined by checking the stabi-lization of the head h . – Widening is performed at each of the heads.We now show how the WTO can be represented using the syntax of our FixpointMachine ( FM ) defined in Eq. 2. The following function genProg : WTO → Prog maps a given WTO W to an FM program: genProg ( W ) := repeat v [ genProg ( W (cid:48) ) ] if W = ( v W (cid:48) ) genProg ( W ) (cid:35) genProg ( W ) if W = W W exec v if W = v (3) S. Kim et al.
Each node v ∈ V is mapped to a single FM instruction by genProg ; we use Inst [ v ] to refer to this FM instruction corresponding to v . Note that if v ∈ V is a head,then Inst [ v ] is an instruction of the form repeat v [ . . . ] , else Inst [ v ] is exec v . Example 3.
The WTO W of graph G (Figure 1) is ( ( ) ) ( ) .The corresponding FM program is P = genProg ( W ) = exec (cid:35) exec (cid:35) repeat [ repeat [ exec ] (cid:35) exec ] (cid:35) repeat [ exec ] (cid:35) exec . Thecolors used for brackets and parentheses are to more clearly indicate the corre-spondence between the WTO and the FM program. Note that Inst [1] = exec ,and Inst [4] = repeat [ exec ] . (cid:4) Ignoring the text in gray, the semantics of the FM instructions shown in Figure 2capture Bourdoncle’s recursive iteration strategy. The semantics are parameter-ized by the graph G ( V, ) and a WTO W ( V, (cid:22) , ω ) . In this paper, we extend the notion of iteration strategy to indicate when abstractvalues are deallocated and when checks are executed. The gray text in Figure 2shows the semantics of the FM instructions that handle these issues. The right-hand side of ⇒ is executed if the left-hand side evaluates to true. Recall thatthe set V C ⊆ V is the set of program points that have assertion checks. Themap Ck : V C → bool records the result of executing the check ϕ u ( Pre [ u ]) foreach u ∈ V C . Thus, the output of the FM program is the map Ck . In practice, thefunctions ϕ u are expensive to compute. Furthermore, they often write the resultto a database or report the output to a user. Consequently, we assume that onlythe first execution of ϕ u is recorded in Ck .The memory configuration M is a tuple ( Dpost , Achk , Dpost (cid:96) , Dpre (cid:96) ) where – The map
Dpost : V → V controls the deallocation of values in Post thathave no further use. If v = Dpost [ u ] , Post [ u ] is deallocated after the exe-cution of Inst [ v ] . – The map
Achk : V C → V controls when the check function ϕ u correspondingto u ∈ V C is executed, after which the corresponding Pre value is deallo-cated. If
Achk [ u ] = v , assertions in u are checked and Pre [ u ] is subsequentlydeallocated after the execution of Inst [ v ] . – The map
Dpost (cid:96) : V → V control deallocation of Post values that arerecomputed and overwritten in the loop of a repeat instruction before itsnext use. If v ∈ Dpost (cid:96) [ u ] , Post [ u ] is deallocated in the loop of Inst [ v ] . – The map
Dpre (cid:96) : V C → V control deallocation of Pre values that recom-puted and overwritten in the loop of a repeat instruction before its nextuse. If v ∈ Dpre (cid:96) [ u ] , Pre [ u ] is deallocated in the loop of Inst [ v ] .To simplify presentation, the semantics in Figure 2 does not make explicit theallocations of abstract values: if a Post or Pre value that has been deallocatedis accessed, then it is allocated and initialized to ⊥ . emory-Efficient Fixpoint Computation 7 G ( V, ) , WTO W ( V, (cid:22) , ω ) ,V C ⊆ V , memory configuration M ( Dpost , Achk , Dpost (cid:96) , Dpre (cid:96) ) (cid:74) exec v (cid:75) M def = Pre [ v ] ← (cid:71) { Post [ p ] | p v } foreach u ∈ V : v = Dpost [ u ] ⇒ free Post [ u ] Post [ v ] ← τ v ( Pre [ v ]) v / ∈ V C ⇒ free Pre [ v ] foreach u ∈ V C : v = Achk [ u ] ⇒ Ck [ u ] ← ϕ u ( Pre [ u ]); free Pre [ u ] (cid:74) repeat v [ P ] (cid:75) M def = tpre ← (cid:71) { Post [ p ] | p v ∧ v / ∈ ω ( p ) } (cid:27) Preamble do {foreach u ∈ V : v ∈ Dpost (cid:96) [ u ] ⇒ free Post [ u ] foreach u ∈ V C : v ∈ Dpre (cid:96) [ u ] ⇒ free Pre [ u ] Pre [ v ] , Post [ v ] ← tpre, τ v ( tpre ) (cid:74) P (cid:75) M tpre ← Pre [ v ] (cid:79) (cid:71) { Post [ p ] | p v } } while ( tpre (cid:54)(cid:118) Pre [ v ]) Loop foreach u ∈ V : v = Dpost [ u ] ⇒ free Post [ u ] v / ∈ V C ⇒ free Pre [ v ] foreach u ∈ V C : v = Achk [ u ] ⇒ Ck [ u ] ← ϕ u ( Pre [ u ]); free Pre [ u ] Postamble (cid:74) P (cid:35) P (cid:75) M def = (cid:74) P (cid:75) M (cid:74) P (cid:75) M Fig. 2: The semantics of the Fixpoint Machine ( FM ) instructions of Eq. 2. Two memory configurations are equivalent if they result in the same values foreach check in the program:
Definition 3.
Given an FM program P , memory configuration M is equivalentto M , denoted by (cid:74) P (cid:75) M = (cid:74) P (cid:75) M , iff for all u ∈ V C , we have Ck [ u ] = Ck [ u ] , where Ck and Ck are the check maps corresponding to execution of P using M and M , respectively. (cid:4) The default memory configuration M dflt performs checks and deallocationsat the end of the FM program after fixpoint has been computed. Definition 4.
Given an FM program P , the default memory configuration M dflt ( Dpost dflt , Achk dflt , Dpost (cid:96) dflt , Dpre (cid:96) dflt ) is Dpost dflt [ v ] = z for all v ∈ V , S. Kim et al.
Achk dflt [ c ] = z for all c ∈ V C , and Dpost (cid:96) dflt = Dpre (cid:96) dflt = ∅ , where z is thelast instruction in P . (cid:4) Example 4.
Consider the FM program P from Example 3. Let V C = { , } . Dpost dflt [ v ] = 9 for all v ∈ V . That is, all Post values are deallocated at theend of the fixpoint computation. Also,
Achk dflt [4] =
Achk dflt [9] = 9 , meaningthat assertion checks also happen at the end.
Dpost (cid:96) dflt = Dpre (cid:96) dflt = ∅ , sothe FM program does not clear abstract values whose values will be recomputedand overwritten in a loop of repeat instruction. (cid:4) Given an FM program P , a memory configuration M is valid for P iff it isequivalent to the default configuration; i.e., (cid:74) P (cid:75) M = (cid:74) P (cid:75) M dflt .Furthermore, a valid memory configuration M is optimal for a given FM program iff memory footprint of (cid:74) P (cid:75) M is smaller than or equal to that of (cid:74) P (cid:75) M (cid:48) for all valid memory configuration M (cid:48) . The problem addressed in this paper canbe stated as:Given an FM program P , find an optimal memory configuration M .An optimal configuration should deallocate abstract values during fixpointcomputation as soon they are no longer needed. The challenge is ensuring thatthe memory configuration remains valid even without knowing the number ofloop iterations for repeat instructions. § 3 gives the optimal memory configura-tion for the FM program P from Example 3. M opt This section provides a declarative specification of an optimal memory config-uration M opt ( Dpost opt , Achk opt , Dpost (cid:96) opt , Dpre (cid:96) opt ). The proofs of thetheorems in this section can be found in Appendix A. § 4 presents an efficientalgorithm for computing M opt . Definition 5.
Given a WTO W ( V, (cid:22) , ω ) of a graph G ( V, ) , the nesting rela-tion N is a tuple ( V, (cid:22) N ) where x (cid:22) N y iff x = y or y ∈ ω ( x ) for x, y ∈ V . (cid:4) Let (cid:98)(cid:98) v (cid:101) (cid:22) N def = { w ∈ V | v (cid:22) N w } ; that is, (cid:98)(cid:98) v (cid:101) (cid:22) N equals the set containing v and the heads of components in the WTO that contain v . The nesting relation N ( V, (cid:22) N ) is a forest ; i.e. a partial order such that for all v ∈ V , ( (cid:98)(cid:98) v (cid:101) (cid:22) N , (cid:22) N ) is achain (Theorem 4, Appendix A.1). Example 5.
For the WTO W of G in Example 2, N ( V, (cid:22) N ) is: .Note that (cid:98)(cid:98) (cid:101) (cid:22) N = { , , } , forming a chain (cid:22) N (cid:22) N . (cid:4) emory-Efficient Fixpoint Computation 9 Dpost opt
Dpost opt [ u ] = v implies that v is the earliest instruction at which Post [ u ] canbe deallocated while ensuring that there are no subsequents reads of Post [ u ] during fixpoint computation. We cannot conclude Dpost opt [ u ] = v from a de-pendency u v as illustrated in the following example. Example 6.
Consider the FM program P from Example 3, whose graph G ( V, ) is in Figure 1. Although , memory configuration with Dpost [2] = 8 is notvalid:
Post [2] is read by
Inst [8] , which is executed repeatedly as part of
Inst [7] ;if
Dpost [2] = 8 , Post [2] is deallocated the first time
Inst [8] is executed, andsubsequent executions of
Inst [8] will read ⊥ as the value of Post [2] . (cid:4) In general, for a dependency u v , we must find the head of maximal com-ponent that contains v but not u as the candidate for Dpost opt [ u ] . By choosingthe head of maximal component, we remove the possibility of having a largercomponent whose head’s repeat instruction can execute Inst [ v ] after deallo-cating Post [ u ] . If there is no component that contains v but not u , we simplyuse v as the candidate. The following Lift operator gives us the candidate of
Dpost opt [ u ] for u v : Lift ( u, v ) def = max (cid:22) N (( (cid:98)(cid:98) v (cid:101) (cid:22) N \ (cid:98)(cid:98) u (cid:101) (cid:22) N ) ∪ { v } ) (4) (cid:98)(cid:98) v (cid:101) (cid:22) N gives us v and the heads of components that contain v . Subtracting (cid:98)(cid:98) u (cid:101) (cid:22) N removes the heads of components that also contain u . We put back v to accountfor the case when there is no component containing v but not u and (cid:98)(cid:98) v (cid:101) (cid:22) N \(cid:98)(cid:98) u (cid:101) (cid:22) N is empty. Because N ( V, (cid:22) N ) is a forest, (cid:98)(cid:98) v (cid:101) (cid:22) N and (cid:98)(cid:98) u (cid:101) (cid:22) N are chains, and hence, (cid:98)(cid:98) v (cid:101) (cid:22) N \ (cid:98)(cid:98) u (cid:101) (cid:22) N is also a chain. Therefore, maximum is well-defined. Example 7.
Consider the nesting relation N ( V, (cid:22) N ) from Example 5. Lift (2 , max (cid:22) N (( { , } \ { } ) ∪ { } ) = 7 . We see that is the head of the maximalcomponent containing but not . Also, Lift (5 ,
4) = max (cid:22) N (( { , }\{ , , } ) ∪{ } ) = 4 . There is no component that contains but not . (cid:4) For each instruction u , we now need to find the last instruction from amongthe candidates computed using Lift . Notice that deallocations of
Post valuesare at a postamble of repeat instructions in Figure 2. Therefore, we cannot usethe total order (cid:22) of a WTO to find the last instruction: (cid:22) is the order in whichthe instruction begin executing, or the order in which preamble s are executed.
Example 8.
Let
Dpost to [ u ] def = max (cid:22) { Lift ( u, v ) | u v } , u ∈ V , an incor-rect variant of Dpost opt that uses the total order (cid:22) . Consider the FM program P from Example 3, whose graph G ( V, ) is in Figure 1 and nesting rela-tion N ( V, (cid:22) N ) is in Example 5. Post [5] has dependencies and . Lift (5 ,
4) = 4 , Lift (5 ,
3) = 3 . Now,
Dpost to [5] = 4 because (cid:22) . However,a memory configuration with Dpost [5] = 4 is not valid:
Inst [4] is nested in
Inst [3] . Due to the deletion of
Post [5] in Inst [4] , Inst [3] will read ⊥ as thevalue of Post [5] . (cid:4) To find the order in which the instructions finish executing, or the order inwhich postamble s are executed, we define the relation ( V, ≤ ) , using the totalorder ( V, (cid:22) ) and the nesting relation ( V, (cid:22) N ) : x ≤ y def = x (cid:22) N y ∨ ( y (cid:54)(cid:22) N x ∧ x (cid:22) y ) (5)In the definition of ≤ , the nesting relation (cid:22) N takes precedence over (cid:22) . ( V, ≤ ) is a total order (Theorem 5, Appendix A.1). Intuitively, the total order ≤ movesthe heads in the WTO to their corresponding closing parentheses ‘)’. Example 9.
For G (Figure 1) and its WTO W , ( ( ) ) ( ) , we have ≤ ≤ ≤ ≤ ≤ ≤ ≤ ≤ . Note that (cid:22) while ≤ . Postamble of repeat [ . . . ] is executed after Inst [6] , while preamble of repeat [ . . . ] isexecuted before Inst [6] . (cid:4) We can now define
Dpost opt . Given a nesting relation N ( V, (cid:22) N ) for the graph G ( V, ) , Dpost opt is defined as:
Dpost opt [ u ] def = max ≤ { Lift ( u, v ) | u v } , u ∈ V (6) Example 10.
Consider the FM program P from Example 3, whose graph G ( V, ) is in Figure 1 and nesting relation N ( V, (cid:22) N ) is in Example 5. An optimal mem-ory configuration M opt defined by Eq. 6 is: Dpost opt [1] = 2 , Dpost opt [2] =
Dpost opt [3] =
Dpost opt [8] = 7 , Dpost opt [4] = 6 , Dpost opt [5] =
Dpost opt [6] = 3 , Dpost opt [7] =
Dpost opt [9] = 9 . Successors of u are first lifted to compute Dpost opt [ u ] . For example, tocompute Dpost opt [2] , ’s successors, and , are lifted to Lift (2 ,
3) = 3 and
Lift (2 ,
8) = 7 . To compute
Dpost opt [5] , ’s successors, and , are lifted to Lift (5 ,
3) = 3 and
Lift (5 ,
4) = 4 . Then, the maximum (as per the total order ≤ )of the lifted successors is chosen as Dpost opt [ u ] . Because ≤ , Dpost opt [2] = 7 .Thus,
Post [2] is deleted in
Inst [7] . Also, because ≤ , Dpost opt [5] = 3 , and
Post [5] is deleted in
Inst [3] . (cid:4) Achk opt
Achk opt [ u ] = v implies that v is the earliest instruction at which the assertioncheck at u ∈ V C can be executed so that the invariant passed to the assertioncheck function ϕ u is the same as when using M dflt . Thus, guaranteeing the samecheck result Ck .Because an instruction can be executed multiple times in a loop, we cannotsimply execute the assertion checks right after the instruction, as illustrated bythe following example. Example 11.
Consider the FM program P from Example 3. Let V C = { , } .A memory configuration with Achk [4] = 4 is not valid:
Inst [4] is executedrepeatedly as part of
Inst [3] , and the first value of
Pre [4] may not be the finalinvariant. Consequently, executing ϕ ( Pre [4]) in Inst [4] may not give the sameresult as executing it in
Inst [9] ( Achk dflt [4] = 9 ). (cid:4) emory-Efficient Fixpoint Computation 11 In general, because we cannot know the number of iterations of the loop in a repeat instruction, we must wait for the convergence of the maximal componentthat contains the assertion check. After the maximal component converges, the FM program never visits the component again, making Pre values of the elementsinside the component final. Only if the element is not in any component can itsassertion check be executed right after its instruction.Given a nesting relation N ( V, (cid:22) N ) for the graph G ( V, ) , Achk opt is definedas:
Achk opt [ u ] def = max (cid:22) N (cid:98)(cid:98) u (cid:101) (cid:22) N , u ∈ V C (7)Because N ( V, (cid:22) N ) is a forest, ( (cid:98)(cid:98) u (cid:101) (cid:22) N , (cid:22) N ) is a chain. Hence, max (cid:22) N is well-defined. Example 12.
Consider the FM program P from Example 3, whose graph G ( V, ) is in Figure 1 and nesting relation N ( V, (cid:22) N ) is in Example 5. Suppose that V C = { , } . Achk opt [4] = max (cid:22) N { , } = 3 and Achk opt [9] = max (cid:22) N { } = 9 . (cid:4) Dpost (cid:96) opt v ∈ Dpost (cid:96) [ u ] implies that Post [ u ] can be deallocated at v because it is recom-puted and overwritten in the loop of a repeat instruction before a subsequentuse of Post [ u ] . Dpost (cid:96) opt [ u ] must be a subset of (cid:98)(cid:98) u (cid:101) (cid:22) N : only the instructions of the headsof components that contain v recompute Post [ u ] . We can further rule outthe instruction of the heads of components that contain Dpost opt [ u ] , because Inst [ Dpost opt [ u ]] deletes Post [ u ] . We add back Dpost opt [ u ] to Dpost (cid:96) opt when u is contained in Dpost opt [ u ] , because deallocation by Dpost opt happensafter the deallocation by
Dpost (cid:96) opt .Given a nesting relation N ( V, (cid:22) N ) for the graph G ( V, ) , Dpost (cid:96) opt is de-fined as:
Dpost (cid:96) opt [ u ] def = ( (cid:98)(cid:98) u (cid:101) (cid:22) N \ (cid:98)(cid:98) d (cid:101) (cid:22) N ) ∪ ( u (cid:22) N d ? { d } : ∅ ) , u ∈ V (8)where d = Dpost opt [ u ] as defined in Eq. 6, and ( b ? x : y ) is the ternaryconditional choice operator. Example 13.
Consider the FM program P from Example 3, whose graph G ( V, ) is in Figure 1, nesting relation N ( V, (cid:22) N ) is in Example 5, and Dpost opt is inExample 10.
Dpost (cid:96) opt [1] = { } , Dpost (cid:96) opt [2] = { } , Dpost (cid:96) opt [3] = { } , Dpost (cid:96) opt [4] = { } , Dpost (cid:96) opt [5] = { , , } , Dpost (cid:96) opt [6] = { , } , Dpost (cid:96) opt [7] = { } , Dpost (cid:96) opt [8] = { , } , Dpost (cid:96) opt [9] = { } . For , Dpost opt [7] = 9 . Because (cid:54)(cid:22) N , Dpost (cid:96) opt [7] = (cid:98)(cid:98) (cid:101) (cid:22) N \ (cid:98)(cid:98) (cid:101) (cid:22) N = { } . Therefore, Post [7] is deleted in each iteration of the loop of
Inst [7] . While
Inst [9] reads
Post [7] in the future, the particular values of
Post [7] that aredeleted by
Dpost (cid:96) opt [7] are not used in
Inst [9] . For , Dpost opt [5] = 3 . Because (cid:22) N , Dpost (cid:96) opt [5] = (cid:98)(cid:98) (cid:101) (cid:22) N \ (cid:98)(cid:98) (cid:101) (cid:22) N ∪ { } = { , , } . (cid:4) Dpre (cid:96) opt v ∈ Dpre (cid:96) [ u ] implies that Pre [ u ] can be deallocated at v because it is recom-puted and overwritten in the loop of a repeat instruction before a subsequentuse of Pre [ u ] . Dpre (cid:96) opt [ u ] must be a subset of (cid:98)(cid:98) u (cid:101) (cid:22) N : only the instructions of the heads ofcomponents that contain v recompute Pre [ u ] . If Inst [ u ] is a repeat instruction, Pre [ u ] is required to perform widening. Therefore, u must not be contained in Dpre (cid:96) opt [ u ] . Example 14.
Consider the FM program P from Example 3. Let V C = { , } .A memory configuration with Dpre (cid:96) [4] = { , } is not valid, because Inst [4] would read ⊥ as the value of Post [4] when performing widening. (cid:4)
Given a nesting relation N ( V, (cid:22) N ) for the graph G ( V, ) , Dpre (cid:96) opt is definedas:
Dpre (cid:96) opt [ u ] def = (cid:98)(cid:98) u (cid:101) (cid:22) N \ { u } , u ∈ V C (9) Example 15.
Consider the FM program P from Example 3, whose graph G ( V, ) is in Figure 1 and nesting relation N ( V, (cid:22) N ) is in Example 5. Let V C = { , } . Dpre (cid:96) opt [4] = { , } \ { } = { } and Dpre (cid:96) opt [9] = { } \ { } = ∅ . Therefore, Pre [4] is deleted in each loop iteration of
Inst [3] . (cid:4) The following theorem is proved in Appendix A.2:
Theorem 1.
The memory configuration M opt ( Dpost opt , Achk opt , Dpost (cid:96) opt , Dpre (cid:96) opt ) is optimal. M opt Algorithm
GenerateFMProgram (Algorithm 1) is an almost-linear time algorithmfor computing an FM program P and optimal memory configuration M opt fora given directed graph G ( V, ) . Algorithm 1 adapts the bottom-up WTO con-struction algorithm presented in Kim et al. [27]. In particular, Algorithm 1 ap-plies the genProg rules (Eq. 3) to generate the FM program from a WTO. Line 32generates exec instructions for non-heads. Line 39 generates repeat instructionsfor heads, with their bodies ([ ]) generated on Line 35. Finally, instructions aremerged on Line 48 to construct the final output P .Algorithm GenerateFMProgram utilizes a disjoint-set data structure. Opera-tion rep ( v ) returns the representative of the set that contains v . In Line 5, thesets are initialized to be rep ( v ) = v for all v ∈ V . Operation merge ( v, h ) onLine 43 merges the sets containing v and h , and assigns h to be the representa-tive for the combined set. lca D ( u, v ) is the lowest common ancestor of u, v in thedepth-first forest D [47]. Cross and forward edges are initially removed from (cid:48) on Line 7, making the graph ( V, (cid:48) ∪ B ) reducible. Restoring it on Line 9 when h = lca D ( u, v ) restores some reachability while keeping ( V, (cid:48) ∪ B ) reducible. emory-Efficient Fixpoint Computation 13 Algorithm 1:
GenerateFMProgram ( G ) Input:
Directed graph G ( V, ) Output: FM program pgm , M opt ( Dpost opt , Achk opt , Dpost (cid:96) opt , Dpre (cid:96) opt ) D := DepthFirstForest ( G ) B := back edges in D CF := cross & forward edges in D (cid:48) := \ B for v ∈ V do rep ( v ) := v ; R [ v ] := ∅ P := ∅ removeAllCrossFwdEdges() for h ∈ V in descending DFN D do restoreCrossFwdEdges( h ) generateFMInstruction( h ) pgm := connectFMInstructions() return pgm , M opt def removeAllCrossFwdEdges() : for ( u, v ) ∈ CF do (cid:48) := (cid:48) \ { ( u, v ) } (cid:46) Lowest common ancestor. R [ lca D ( u, v )] := R [ lca D ( u, v )] ∪ { ( u, v ) } def restoreCrossFwdEdges( h ) : (cid:48) := (cid:48) ∪ { ( u, rep ( v )) | ( u, v ) ∈ R [ h ] } def findNestedSCCs( h ) : B h := { rep ( p ) | ( p, h ) ∈ B } N h := ∅ (cid:46) Nested SCCs except h . W := B h \ { h } (cid:46) Worklist. while there exists v ∈ W do W, N h := W \ { v } , N h ∪ [ v ] for u s.t. u (cid:48) v do if rep ( u ) / ∈ N h ∪ { h } ∪ W then W := W ∪ { rep ( u ) } return N h , B h def generateFMInstruction( h ) : N h , B h := findNestedSCCs( h ) if B h = ∅ then Inst [ h ] := exec h return for v ∈ N h in desc. postDFN D do Inst [ h ] := Inst [ h ] (cid:35) Inst [ v ] (cid:63) for u s.t. u (cid:48) v do (cid:63) Dpost opt [ u ] := v (cid:63) T [ u ] := rep ( u ) Inst [ h ] := repeat h [Inst [ h ] ] (cid:63) for u s.t. u B h do (cid:63) Dpost opt [ u ] := T [ u ] := h for v ∈ N h do merge( v, h ) ; P := P ∪ { ( v, h ) } def connectFMInstructions() : pgm := (cid:15) (cid:46) Empty program. for v ∈ V in desc. postDFN D do if rep ( v ) = v then pgm := pgm (cid:35) Inst [ v ] (cid:63) for u s.t. u (cid:48) v do (cid:63) Dpost opt [ u ] := v (cid:63) T [ u ] := rep ( u ) (cid:63) if v ∈ V C then (cid:63) Achk opt [ v ] := rep ( v ) (cid:63) Dpre (cid:96) opt [ v ] := (cid:98)(cid:98) v, rep ( v ) (cid:101)(cid:101) P ∗ \{ v } (cid:63) for v ∈ V do (cid:63) Dpost (cid:96) opt [ v ] := (cid:98)(cid:98) v, T [ v ] (cid:101)(cid:101) P ∗ return pgm Lines indicated by (cid:63) in Algorithm 1 compute M opt . Lines 37, 41, and 50compute Dpost opt . Due to the specific order in which the algorithm traverses G , Dpost opt [ u ] is overwritten with greater values (as per the total order ≤ ) onthese lines, making the final value to be the maximum among the successors. Lift is implicitly applied when restoring the edges in restoreCrossFwdEdges :edge u v whose
Lift ( u, v ) = h is replaced to u (cid:48) h on Line 9. Dpost (cid:96) opt is computed using an auxiliary map T : V → V and a relation P : V × V . At the end of the algorithm, T [ u ] will be the maximum element (as per (cid:22) N ) in Dpost (cid:96) opt [ u ] . That is, T [ u ] = max (cid:22) N (( (cid:98)(cid:98) u (cid:101) (cid:22) N \(cid:98)(cid:98) d (cid:101) (cid:22) N ) ∪ ( u (cid:22) N d ? { d } : ∅ ) ) ,where d = Dpost opt [ u ] . Once T [ u ] is computed by lines 38, 41, and 51, thetransitive reduction of (cid:22) N , P , is used to find all elements of Dpost (cid:96) opt [ u ] onLine 56. P is computed on Line 43. Note that P ∗ = (cid:22) N and (cid:98)(cid:98) x, y (cid:101)(cid:101) P ∗ def = { v | x P ∗ v ∧ v P ∗ y } . Achk and
Dpre (cid:96) are computed on Lines 53 and 54, respectively.
Example 16.
Consider the graph G (Figure 1). Labels of vertices indicate adepth-first numbering ( DFN ) of G . The graph edges are classified into tree,back, cross, and forward edges using the corresponding depth-first forest [14].Cross and forward edges of G , CF = { (2 , } , are removed on Line 7. Because lca D (2 ,
8) = 2 , the removed edge (2 , will be restored in Line 9 when h = 2 . Itis restored as (2 , , because the disjoint set { } would have already been mergedwith { } on Line 43 when h = 7 , making rep (8) to be when h = 2 .The for-loop on Line 8 visits nodes in V in a descending DFN : from to .Calling generateFMInstruction( h ) on Line 10 generates Inst [ h ] , an FM in-struction for h . When h = 9 , because the SCC whose entry is is trivial, exec is generated in Line 32. When h = 3 , the SCC whose entry is is non-trivial, withthe entries of its nested SCCs, N h = { , } . These entries are visited in a topo-logical order (descending postDFN ), , , and their instructions are connected onLine 35 to generate repeat [Inst [4] (cid:35) Inst [6] ] on Line 39. Visiting the nodes ina descending DFN guarantees the instruction of nested SCCs to be present, andremoving the cross and forward edges ensures each SCC to have a single entry.Table 1 shows some relevant steps and values within generateFMInstruction .Finally, calling connectFMInstructions on Line 11 connects the instruc-tions of entries of outermost SCCs, which is detected by the boolean expression rep ( v ) = v , in a topological order (descending postDFN ) to generate the final FM program. For the given example, it visits the nodes in the order of , , , , and , correctly generating the FM program on Line 48.Due to (cid:48) and (cid:48) , Dpost opt [2] is set to and then to on Line 50.Due to B and B , Dpost opt [5] is set to and then to in Line 41. Achk opt [4] is set to 3, as rep (4) = 3 in Line 53. T [7] is set to on Line 51, and Dpost (cid:96) opt [7] is set to { } on Line 56. T [5] is set to and then to on Line 41,making Dpost (cid:96) opt [5] to be { , , } . Because rep (4) = 3 , Dpre (cid:96) opt [4] is set to { } in Line 54. (cid:4) The proofs of the following theorems are in Appendix A.3:
Theorem 2.
GenerateFMProgram correctly computes M opt , defined in § 3. emory-Efficient Fixpoint Computation 15 Table 1: Relevant steps and values within
GenerateFMProgram when applied tograph G of Example 16 Major iteration h = 4 h = 3 Line 35
Inst [5]
Inst [4] (cid:35)
Inst [6]
Line 39 repeat [ exec ] repeat [ repeat [ exec ] (cid:35) exec ] Line 37
Dpost opt [4] = 5
Dpost opt [4] = 6 , Dpost opt [3] = 4
Line 38 T [4] = 4 T [4] = 4 , T [3] = 3 Line 41
Dpost opt [5] = T [5] = 4 Dpost opt [6] = T [6] = Dpost opt [5] = T [5] = 3 Line 43 Sets { } , { } merged. Sets { } , { , } , { } merged. Theorem 3.
Running time of
GenerateFMProgram is almost-linear.
We have implemented our approach in a tool called
Mikos , which extendsNASA’s IKOS [11], a WTO-based abstract-interpreter for C/C++.
Mikos in-herits all abstract domains and widening-narrowing strategies from IKOS. Itincludes the localized narrowing strategy [1] that intertwines the increasing anddecreasing sequences.
Abstract domains in IKOS.
IKOS uses the state-of-the-art implementationsof abstract domains comparable to those used in industrial abstract interpreterssuch as Astrée. In particular, IKOS implements the interval abstract domain [15]using functional data-structures based on Patricia Trees [35]. Astrée implementsintervals using OCaml’s map data structure that uses balanced trees [8, Section6.2]. As shown in [35, Section 5], the Patricia Trees used by IKOS are moreefficient when you have to merge data structures, which is required often dur-ing abstract interpretation. Also, IKOS uses memory-efficient variable packingDifference Bound Matrix (DBM) relational abstract domain [18], similar to thevariable packing relational domains employed by Astrée [5, Section 3.3.2].
Interprocedural analysis in IKOS.
IKOS implements context-sensitive in-terprocedural analysis by means of dynamic inlining, much like the semanticexpansion of function bodies in Astrée [16, Section 5]: at a function call, formaland actual parameters are matched, the callee is analyzed, and the return valueat the call site is updated after the callee returns; a function pointer is resolvedto a set of callees and the results for each call are joined; IKOS returns top for acallee when a cycle is found in this dynamic call chain. To prevent running theentire interprocedural analysis again at the assertion checking phase, invariantsat exits of the callees are additionally cached during the fixpoint computation.
Interprocedural extension of
Mikos . Although the description of our it-eration strategy focused on intraprocedural analysis, it can be extended to in-terprocedural analysis as follows. Suppose there is a call to function f1 from abasic block contained in component C . Any checks in this call to f1 must bedeferred until we know that the component C has stabilized. Furthermore, if function f1 calls the function f2 , then the checks in f2 must also be deferreduntil C converges. In general, checks corresponding to a function call f must bedeferred until the maximal component containing the call is stabilized.When the analysis of callee returns in Mikos , only
Pre values for the de-ferred checks remain. They are deallocated when the checks are performed orwhen the component containing the call is reiterated.
The experiments in this section were designed to answer the following questions:
RQ0 [Accuracy]
Does
Mikos (§5) have the same analysis results as IKOS?
RQ1 [Memory footprint]
How does the memory footprint of
Mikos com-pare to that of IKOS?
RQ2 [Runtime]
How does the runtime of
Mikos compare to that of IKOS?
Experimental setup
All experiments were run on Amazon EC2 r5.2xlargeinstances (64 GiB memory, 8 vCPUs, 4 physical cores), which use Intel XeonPlatinum 8175M processors. Processors have L1, L2, and L3 caches of sizes 1.5MiB (data: 0.75 MiB, instruction: 0.75 MiB), 24 MiB, and 33 MiB, respectively.Linux kernel version 4.15.0-1051-aws was used, and gcc 7.4.0 was used to compileboth
Mikos and IKOS. Dedicated EC2 instances and BenchExec [7] were usedto improve reliability of the results. Time and space limit were set to an hour and64 GB, respectively. The experiments can be reproduced using https://github.com/95616ARG/mikos_sas2020.
Benchmarks
We evaluated
Mikos on two tasks that represent different clientapplications of abstract interpretation, each using different benchmarks describedin Sections 6.1 and 6.2. In both tasks, we excluded benchmarks that did notcomplete in both
IKOS and
Mikos given the time and space budget. Therewere no benchmarks for which IKOS succeeded but
Mikos failed to complete.Benchmarks for which IKOS took less than 5 seconds were also excluded. Mea-surements for benchmarks that took less than 5 seconds are summarized inAppendix B.
Metrics
To answer RQ1, we define and use memory reduction ratio (MRR) :MRR def = Memory footprint of
Mikos / Memory footprint of IKOS (10)The smaller the MRR, the greater reduction in peak-memory usage in
Mikos .If MRR is less than 1,
Mikos has smaller memory footprint than IKOS.For RQ2, we report the speedup , which is defined as below:Speedup def = Runtime of IKOS / Runtime of
Mikos (11)The larger the speedup, the greater reduction in runtime in
Mikos . If speedupis greater than 1,
Mikos is faster than IKOS. emory-Efficient Fixpoint Computation 17(a) Min MRR: 0.895. Max MRR: 0.001.Geometric means: (i) 0.044 (when × s areignored), (ii) 0.041 (when measurementsuntil timeout/spaceout are used for × s).29 non-completions in IKOS. (b) Min speedup: 0.87 × . Max speedup:1.80 × . Geometric mean: 1.29 × . Note that × s are ignored as they space out fast inIKOS compared to in Mikos where theycomplete.
Fig. 3:
Task T1.
Log-log scatter plots of (a) memory footprint and (b) runtimeof IKOS and
Mikos , with an hour timeout and 64 GB spaceout. Benchmarksthat did not complete in IKOS are marked × . All × s completed in Mikos .Benchmarks below y = x required less memory or runtime in Mikos . RQ0: Accuracy of
Mikos
As a sanity check for our theoretical results, weexperimentally validated Theorem 1 by comparing the analysis results reportedby IKOS and
Mikos . Mikos used a valid memory configuration, reporting thesame analysis results as IKOS. Recall that Theorem 1 also proves that the fix-point computation in
Mikos is memory-optimal (, it results in minimum memoryfootprint).
Benchmarks
For Task T1, we selected all 2928 benchmarks from DeviceDriver-sLinux64, ControlFlow, and Loops categories of SV-COMP 2019 [6]. These cat-egories are well suited for numerical analysis, and have been used in recentworks [45,46,27]. From these benchmarks, we removed 435 benchmarks thattimed out in both
Mikos and IKOS, and 1709 benchmarks that took less than5 seconds in IKOS. That left us with
SV-COMP 2019 benchmarks.
Abstract domain
Task T1 used the reduced product of Difference BoundMatrix (DBM) with variable packing [18] and congruence [21]. This domain ismuch richer and more expressive than the interval domain used in task T2.
Task
Task T1 consists of using the results of interprocedural fixpoint com-putation to prove user-provided assertions in the SV-COMP benchmarks. Eachbenchmark typically has one assertion to prove.
RQ1: Memory footprint of
Mikos compared to IKOS
Figure 3(a) showsthe measured memory footprints in a log-log scatter plot. For Task T1, the MRR MB –
MB) (b) 25% – 50% (
MB –
MB)(c) 50% – 75% (
MB –
MB) (d) 75% – 100% (
MB –
MB)
Fig. 4: Histograms of MRR (Eq. 10) in task T1 for different ranges. Figure 4(a)shows the distribution of benchmarks that used from MB to
MB inIKOS. They are the bottom 25% in terms of the memory footprint in IKOS.The distribution significantly tended toward a smaller MRR in the upper range.(Eq. 10) ranged from 0.895 to 0.001. That is, the memory footprint decreased to0.1% in the best case. For all benchmarks,
Mikos had smaller memory footprintthan IKOS: MRR was less than 1 for all benchmarks, with all points below the y = x line in Figure 3(a). On average, Mikos required only 4.1% of the memoryrequired by IKOS, with an MRR 0.041 as the geometric mean.As Figure 3(a) shows, reduction in memory tended to be greater as thememory footprint in the baseline IKOS grew. For the top 25% benchmarks withlargest memory footprint in IKOS, the geometric mean of MRRs was 0.009. Thistrend is further confirmed by the histograms in Figure 4. While a similar trendwas observed in task T2, the trend was significantly stronger in task T1. Table 3in Appendix B lists
RQ1 results for specific benchmarks.
RQ2: Runtime of
Mikos compared to IKOS
Figure 3(b) shows the mea-sured runtime in a log-log scatter plot. We measured both the speedup (Eq. 11)and the difference in the runtimes. For fair comparison, we excluded 29 bench-marks that did not complete in IKOS. This left us with 755 SV-COMP 2019 emory-Efficient Fixpoint Computation 19(a) Min MRR: 0.998. Max MRR: 0.022.Geometric means: (i) 0.436 (when × s areignored), (ii) 0.437 (when measurementsuntil timeout/spaceout are used for × s).1 non-completions in IKOS. (b) Min speedup: 0.88 × . Max speedup:2.83 × . Geometric mean: 1.08 × . Note that × s are ignored as they space out fast inIKOS compared to in Mikos where theycomplete.
Fig. 5:
Task T2.
Log-log scatter plots of (a) memory footprint and (b) runtimeof IKOS and
Mikos , with an hour timeout and 64 GB spaceout. Benchmarksthat did not complete in IKOS are marked × . All × s completed in Mikos .Benchmarks below y = x required less memory or runtime in Mikos .benchmarks. Out of these 755 benchmarks, 740 benchmarks had speedup > .The speedup ranged from 0.87 × to 1.80 × , with geometric mean of 1.29 × . Thedifference in runtimes (runtime of IKOS − runtime of Mikos ) ranged from − . s to . s, with arithmetic mean of . s. Table 4 in Appendix Blists RQ2 results for specific benchmarks.
Benchmarks
For Task T2, we selected all 1503 programs from the officialArch Linux core packages that are primarily written in C and whose LLVMbitcodes are obtainable by gllvm [20]. These include, but are not limited to, coreutils , dhcp , gnupg , inetutils , iproute , nmap , openssh , vim , etc. Fromthese benchmarks, we removed 76 benchmarks that timed out and 8 benchmarksthat spaced out in both Mikos and IKOS. Also, 994 benchmarks that tookless than 5 seconds in IKOS were removed. That left us with open-sourcebenchmarks.
Abstract domain
Task T2 used the interval abstract domain [15]. Using aricher domain like DBM caused IKOS and
Mikos to timeout on most bench-marks.
Task
Task T2 consists of using the results of interprocedural fixpoint compu-tation to prove the safety of buffer accesses. In this task, most program pointshad checks. MB –
MB) (b) 25% – 50% (
MB –
MB)(c) 50% – 75% (
MB –
MB) (d) 75% – 100% (
MB –
MB)
Fig. 6: Histograms of MRR (Eq. 10) in task T2 for different ranges. Figure 6(a)shows the distribution of benchmarks that used from MB to
MB inIKOS. They are the bottom 25% in terms of the memory footprint in IKOS.The distribution slightly tended toward a smaller MRR in the upper range.
RQ1: Memory footprint of
Mikos compared to IKOS
Figure 5(a) showsthe measured memory footprints in a log-log scatter plot. For Task T2, MRR(Eq. 10) ranged from 0.998 to 0.022. That is, the memory footprint decreased to2.2% in the best case. For all benchmarks,
Mikos had smaller memory footprintthan IKOS: MRR was less than 1 for all benchmarks, with all points below the y = x line in Figure 5(a). On average, Mikos ’s memory footprint was less thanhalf of that of IKOS, with an MRR 0.437 as the geometric mean. Table 5 inAppendix B lists
RQ1 results for specific benchmarks.
RQ2: Runtime of
Mikos compared to IKOS
Figure 5(b) shows the mea-sured runtime in a log-log scatter plot. We measured both the speedup (Eq. 11)and the difference in the runtimes. For fair comparison, we excluded 1 benchmarkthat did not complete in IKOS. This left us with 425 open-source benchmarks.Out of these 425 benchmarks, 331 benchmarks had speedup > . The speedupranged from 0.88 × to 2.83 × , with geometric mean of 1.08 × . The difference in emory-Efficient Fixpoint Computation 21 runtimes (runtime of IKOS − runtime of Mikos ) ranged from − . s to . s, with arithmetic mean of . s. Table 6 in Appendix B lists RQ2 results for specific benchmarks.
Abstract interpretation has a long history of designing time and memory ef-ficient algorithms for specific abstract domains, which exploit variable packingand clustering and sparse constraints [46,45,44,43,25,19,13,23]. Often these tech-niques represent a trade-off between precision and performance of the analysis.Nonetheless, such techniques are orthogonal to the abstract-domain agnosticapproach discussed in this paper. Approaches for improving precision via so-phisticated widening and narrowing strategies [22,2,3] are also orthogonal to ourmemory-efficient iteration strategy.
Mikos inherits the interleaved widening-narrowing strategy implemented in the baseline IKOS abstract interpreter.As noted in § 1, Bourdoncle’s approach [10] is used in many industrial andacademic abstract interpreters [11,32,17,48,12]. Thus, improving memory effi-ciency of WTO-based exploration is of great applicability to real-world staticanalysis. Astrée is one of the few, if not only, industrial abstract interpretersthat does not use WTO exploration, because it assumes that programs do nothave gotos and recursion [8, Section 2.1], and is targeted towards a specific classof embedded C code [5, Section 3.2]. Such restrictions makes is easier to com-pute when an abstract value will not be used anymore by naturally followingthe abstract syntax tree [29, Section 3.4.3]. In contrast,
Mikos works for gen-eral programs with goto and recursion, which requires the use of WTO-basedexploration.Generic fixpoint-computation approaches for improving running time of ab-stract interpretation have also been explored [52,30,27]. Most recently, Kim etal. [27] present the notion of weak partial order (WPO), which generalizes thenotion of WTO that is used in this paper. Kim et al. describe a parallel fix-point algorithm that exploits maximal parallelism while computing the samefixpoint as the WTO-based algorithm. Reasoning about correctness of concur-rent algorithms is complex; hence, we decided to investigate an optimal memorymanagement scheme in the sequential setting first. However, we believe it wouldbe possible to extend our WTO-based result to one that uses WPO.The nesting relation described in § 3 is closely related to the notion of LoopNesting Forest [36,37], as observed in Kim et al. [27]. The almost-linear timealgorithm
GenerateFMProgram is an adaptation of LNF construction algorithmby Ramalingam [36]. The
Lift operation in § 3 is similar to the outermost-loop-excluding (OLE) operator introduced by Rastello [38, Section 2.4.4].Seidl et al. [42] present time and space improvements to a generic fixpointsolver, which is closest in spirit to the problem discussed in this paper. Forimproving space efficiency, their approach recomputes values during fixpointcomputation, and does not prove optimality, unlike our approach. However, thesetting discussed in their work is also more generic compared to ours; we assumea static dependency graph for the equation system.
Abstract interpreters such as Astrée [8] and CodeHawk [48] are implementedin OCaml, which provides a garbage collector. However, merely using a refer-ence counting garbage collector will not reduce peak memory usage of fixpointcomputation. For instance, the reference count of
Pre [ u ] can be decreased tozero only after the final check/assert that uses Pre [ u ] . If the checks are all con-ducted at the end of the analysis (as is currently done in prior tools), then usinga reference counting garbage collector will not reduce peak memory usage. Incontrast, our approach lifts the checks as early as possible enabling the analysisto free the abstract values as early as possible.Symbolic approaches for applying abstract transformers during fixpoint com-putation [24,40,28,41,50,49,51] allow the entire loop body to be encoded as asingle formula. This might appear to obviate the need for Pre and
Post valuesfor individual basic blocks within the loop; by storing the
Pre value only at theheader, such a symbolic approach might appear to reduce the memory footprint.First, this scenario does not account for the fact that
Pre values need to becomputed and stored if basic blocks in the loop have checks. Note that if thereare no checks within the loop body, then our approach would also only store the
Pre value at the loop header. Second, such symbolic approaches only performintraprocedural analysis [24]; additional abstract values would need to be storeddepending on how function calls are handled in interprocedural analysis. Third,due to the use of SMT solvers in such symbolic approaches, the memory foot-print might not necessarily reduce, but might increase if one takes into accountthe memory used by the SMT solver.Sparse analysis [34,33] and database-backed analysis [54] improve the mem-ory cost of static analysis. For specific classes of static analysis such as the IFDSframework [39], there have been approaches for improving the time and memoryefficiency [9,31,53,55].
This paper presented an approach for memory-efficient abstract interpretationthat is agnostic to the abstract domain used. Our approach is memory-optimaland produces the same result as Bourdoncle’s approach without sacrificing timeefficiency. We extended the notion of iteration strategy to intelligently deallocateabstract values and perform assertion checks during fixpoint computation. Weprovided an almost-linear time algorithm that constructs this iteration strategy.We implemented our approach in a tool called
Mikos , which extended the ab-stract interpreter IKOS. Despite the use of state-of-the-art implementation ofabstract domains, IKOS had a large memory footprint on two analysis tasks.
Mikos was shown to effectively reduce it. When verifying user-provided asser-tions in SV-COMP 2019 benchmarks,
Mikos showed a decrease in peak-memoryusage to . % ( . × ) on average compared to IKOS. When performing in-terprocedural buffer-overflow analysis of open-source programs, Mikos showeda decrease in peak-memory usage to . % ( . × ) on average compared toIKOS. emory-Efficient Fixpoint Computation 23 References
1. Amato, G., Scozzari, F.: Localizing widening and narrowing. In: Static Analysis- 20th International Symposium, SAS 2013, Seattle, WA, USA, June 20-22, 2013.Proceedings. pp. 25–42 (2013). https://doi.org/10.1007/978-3-642-38856-9_42. Amato, G., Scozzari, F., Seidl, H., Apinis, K., Vojdani, V.: Efficiently inter-twining widening and narrowing. Sci. Comput. Program. , 1–24 (2016).https://doi.org/10.1016/j.scico.2015.12.0053. Apinis, K., Seidl, H., Vojdani, V.: Enhancing top-down solving with widening andnarrowing. In: Probst, C.W., Hankin, C., Hansen, R.R. (eds.) Semantics, Logics,and Calculi - Essays Dedicated to Hanne Riis Nielson and Flemming Nielson on theOccasion of Their 60th Birthdays. Lecture Notes in Computer Science, vol. 9560,pp. 272–288. Springer (2016). https://doi.org/10.1007/978-3-319-27810-0_144. Bagnara, R., Hill, P.M., Zaffanella, E.: The parma polyhedra library: Towarda complete set of numerical abstractions for the analysis and verification ofhardware and software systems. Sci. Comput. Program. (1-2), 3–21 (2008).https://doi.org/10.1016/j.scico.2007.08.0015. Bertrane, J., Cousot, P., Cousot, R., Feret, J., Mauborgne, L., Miné, A.,Rival, X.: Static analysis by abstract interpretation of embedded criticalsoftware. ACM SIGSOFT Software Engineering Notes (1), 1–8 (2011).https://doi.org/10.1145/1921532.19215536. Beyer, D.: Automatic verification of C and java programs: SV-COMP 2019.In: Tools and Algorithms for the Construction and Analysis of Systems -25 Years of TACAS: TOOLympics, Held as Part of ETAPS 2019, Prague,Czech Republic, April 6-11, 2019, Proceedings, Part III. pp. 133–155 (2019).https://doi.org/10.1007/978-3-030-17502-3_97. Beyer, D., Löwe, S., Wendler, P.: Reliable benchmarking: requirements and solu-tions. STTT (1), 1–29 (2019). https://doi.org/10.1007/s10009-017-0469-y8. Blanchet, B., Cousot, P., Cousot, R., Feret, J., Mauborgne, L., Miné, A., Monniaux,D., Rival, X.: Design and implementation of a special-purpose static program ana-lyzer for safety-critical real-time embedded software. In: Mogensen, T.Æ., Schmidt,D.A., Sudborough, I.H. (eds.) The Essence of Computation, Complexity, Analysis,Transformation. Essays Dedicated to Neil D. Jones [on occasion of his 60th birth-day]. Lecture Notes in Computer Science, vol. 2566, pp. 85–108. Springer (2002).https://doi.org/10.1007/3-540-36377-7_59. Bodden, E.: Inter-procedural data-flow analysis with IFDS/IDE and soot. In:Bodden, E., Hendren, L.J., Lam, P., Sherman, E. (eds.) Proceedings of theACM SIGPLAN International Workshop on State of the Art in Java Programanalysis, SOAP 2012, Beijing, China, June 14, 2012. pp. 3–8. ACM (2012).https://doi.org/10.1145/2259051.225905210. Bourdoncle, F.: Efficient chaotic iteration strategies with widenings. In: For-mal Methods in Programming and Their Applications, International Conference,Akademgorodok, Novosibirsk, Russia, June 28 - July 2, 1993, Proceedings. pp.128–141 (1993). https://doi.org/10.1007/BFb003970411. Brat, G., Navas, J.A., Shi, N., Venet, A.: IKOS: A framework for static analysisbased on abstract interpretation. In: Software Engineering and Formal Methods- 12th International Conference, SEFM 2014, Grenoble, France, September 1-5,2014. Proceedings. pp. 271–277 (2014). https://doi.org/10.1007/978-3-319-10431-7_204 S. Kim et al.12. Calcagno, C., Distefano, D.: Infer: An automatic program verifier for memory safetyof C programs. In: Bobaru, M.G., Havelund, K., Holzmann, G.J., Joshi, R. (eds.)NASA Formal Methods - Third International Symposium, NFM 2011, Pasadena,CA, USA, April 18-20, 2011. Proceedings. Lecture Notes in Computer Science,vol. 6617, pp. 459–465. Springer (2011). https://doi.org/10.1007/978-3-642-20398-5_3313. Chawdhary, A., King, A.: Compact difference bound matrices. In: Chang, B.E.(ed.) Programming Languages and Systems - 15th Asian Symposium, APLAS 2017,Suzhou, China, November 27-29, 2017, Proceedings. Lecture Notes in ComputerScience, vol. 10695, pp. 471–490. Springer (2017). https://doi.org/10.1007/978-3-319-71237-6_2314. Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms,3rd Edition. MIT Press (2009)15. Cousot, P., Cousot, R.: Abstract interpretation: A unified lattice model for staticanalysis of programs by construction or approximation of fixpoints. In: Con-ference Record of the Fourth ACM Symposium on Principles of ProgrammingLanguages, Los Angeles, California, USA, January 1977. pp. 238–252 (1977).https://doi.org/10.1145/512950.51297316. Cousot, P., Cousot, R., Feret, J., Mauborgne, L., Miné, A., Monniaux, D., Rival, X.:The astreé analyzer. In: Sagiv, S. (ed.) Programming Languages and Systems, 14thEuropean Symposium on Programming,ESOP 2005, Held as Part of the Joint Eu-ropean Conferences on Theory and Practice of Software, ETAPS 2005, Edinburgh,UK, April 4-8, 2005, Proceedings. Lecture Notes in Computer Science, vol. 3444,pp. 21–30. Springer (2005). https://doi.org/10.1007/978-3-540-31987-0_317. Facebook: Sparta. https://github.com/facebookincubator/SPARTA (2020)18. Gange, G., Navas, J.A., Schachte, P., Søndergaard, H., Stuckey, P.J.: An ab-stract domain of uninterpreted functions. In: Verification, Model Checking,and Abstract Interpretation - 17th International Conference, VMCAI 2016, St.Petersburg, FL, USA, January 17-19, 2016. Proceedings. pp. 85–103 (2016).https://doi.org/10.1007/978-3-662-49122-5_419. Gange, G., Navas, J.A., Schachte, P., Søndergaard, H., Stuckey, P.J.: Exploitingsparsity in difference-bound matrices. In: Rival, X. (ed.) Static Analysis - 23rdInternational Symposium, SAS 2016, Edinburgh, UK, September 8-10, 2016, Pro-ceedings. Lecture Notes in Computer Science, vol. 9837, pp. 189–211. Springer(2016). https://doi.org/10.1007/978-3-662-53413-7_1020. gllvm. https://github.com/SRI-CSL/gllvm (2020)21. Granger, P.: Static analysis of arithmetical congruences. Interna-tional Journal of Computer Mathematics (3-4), 165–190 (1989).https://doi.org/10.1080/0020716890880377822. Halbwachs, N., Henry, J.: When the decreasing sequence fails. In: Static Analysis- 19th International Symposium, SAS 2012, Deauville, France, September 11-13,2012. Proceedings. pp. 198–213 (2012). https://doi.org/10.1007/978-3-642-33125-1_1523. Halbwachs, N., Merchat, D., Gonnord, L.: Some ways to reduce the space dimen-sion in polyhedra computations. Formal Methods Syst. Des. (1), 79–95 (2006).https://doi.org/10.1007/s10703-006-0013-224. Henry, J., Monniaux, D., Moy, M.: PAGAI: A path sensitive staticanalyser. Electron. Notes Theor. Comput. Sci. , 15–25 (2012).https://doi.org/10.1016/j.entcs.2012.11.003emory-Efficient Fixpoint Computation 2525. Heo, K., Oh, H., Yang, H.: Learning a variable-clustering strategy for octagonfrom labeled data generated by a static analysis. In: Rival, X. (ed.) Static Analysis- 23rd International Symposium, SAS 2016, Edinburgh, UK, September 8-10, 2016,Proceedings. Lecture Notes in Computer Science, vol. 9837, pp. 237–256. Springer(2016). https://doi.org/10.1007/978-3-662-53413-7_1226. Jeannet, B., Miné, A.: Apron: A library of numerical abstract domains for staticanalysis. In: Bouajjani, A., Maler, O. (eds.) Computer Aided Verification, 21stInternational Conference, CAV 2009, Grenoble, France, June 26 - July 2, 2009.Proceedings. Lecture Notes in Computer Science, vol. 5643, pp. 661–667. Springer(2009). https://doi.org/10.1007/978-3-642-02658-4_5227. Kim, S.K., Venet, A.J., Thakur, A.V.: Deterministic parallel fixpoint computation.PACMPL (POPL), 14:1–14:33 (2020). https://doi.org/10.1145/337108228. Li, Y., Albarghouthi, A., Kincaid, Z., Gurfinkel, A., Chechik, M.: Symbolic opti-mization with SMT solvers. In: Jagannathan, S., Sewell, P. (eds.) The 41st AnnualACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages,POPL ’14, San Diego, CA, USA, January 20-21, 2014. pp. 607–618. ACM (2014).https://doi.org/10.1145/2535838.253585729. Miné, A.: Tutorial on static inference of numeric invariants by abstract interpreta-tion. Foundations and Trends in Programming Languages (3-4), 120–372 (2017).https://doi.org/10.1561/250000003430. Monniaux, D.: The parallel implementation of the astrée static analyzer.In: Programming Languages and Systems, Third Asian Symposium, APLAS2005, Tsukuba, Japan, November 2-5, 2005, Proceedings. pp. 86–96 (2005).https://doi.org/10.1007/11575467_731. Naeem, N.A., Lhoták, O., Rodriguez, J.: Practical extensions to the IFDS al-gorithm. In: Gupta, R. (ed.) Compiler Construction, 19th International Confer-ence, CC 2010, Held as Part of the Joint European Conferences on Theory andPractice of Software, ETAPS 2010, Paphos, Cyprus, March 20-28, 2010. Proceed-ings. Lecture Notes in Computer Science, vol. 6011, pp. 124–144. Springer (2010).https://doi.org/10.1007/978-3-642-11970-5_832. Navas, J.A.: Crab: Cornucopia of abstractions: a language-agnostic library for ab-stract interpretation. https://github.com/seahorn/crab (2019)33. Oh, H., Heo, K., Lee, W., Lee, W., Park, D., Kang, J., Yi, K.: Global sparseanalysis framework. ACM Trans. Program. Lang. Syst. (3), 8:1–8:44 (2014).https://doi.org/10.1145/259081134. Oh, H., Heo, K., Lee, W., Lee, W., Yi, K.: Design and implementation of sparseglobal analyses for c-like languages. In: ACM SIGPLAN Conference on Program-ming Language Design and Implementation, PLDI ’12, Beijing, China - June 11 -16, 2012. pp. 229–238 (2012). https://doi.org/10.1145/2254064.225409235. Okasaki, C., Gill, A.: Fast mergeable integer maps. In: Workshop on ML. pp. 77–86(1998)36. Ramalingam, G.: Identifying loops in almost linear time. ACM Trans. Program.Lang. Syst. (2), 175–188 (1999). https://doi.org/10.1145/316686.31668737. Ramalingam, G.: On loops, dominators, and dominance frontiers. ACM Trans. Pro-gram. Lang. Syst. (5), 455–490 (2002). https://doi.org/10.1145/570886.57088738. Rastello, F.: On Sparse Intermediate Representations: Some Structural Propertiesand Applications to Just-In-Time Compilation. University works, Inria GrenobleRhône-Alpes (Dec 2012), https://hal.inria.fr/hal-00761555, habilitation à dirigerdes recherches, École normale supérieure de Lyon6 S. Kim et al.39. Reps, T.W., Horwitz, S., Sagiv, M.: Precise interprocedural dataflow anal-ysis via graph reachability. In: Conference Record of POPL’95: 22nd ACMSIGPLAN-SIGACT Symposium on Principles of Programming Languages,San Francisco, California, USA, January 23-25, 1995. pp. 49–61 (1995).https://doi.org/10.1145/199448.19946240. Reps, T.W., Sagiv, S., Yorsh, G.: Symbolic implementation of the best transformer.In: Steffen, B., Levi, G. (eds.) Verification, Model Checking, and Abstract Interpre-tation, 5th International Conference, VMCAI 2004, Venice, Italy, January 11-13,2004, Proceedings. Lecture Notes in Computer Science, vol. 2937, pp. 252–266.Springer (2004). https://doi.org/10.1007/978-3-540-24622-0_2141. Reps, T.W., Thakur, A.V.: Automating abstract interpretation. In: Jobstmann, B.,Leino, K.R.M. (eds.) Verification, Model Checking, and Abstract Interpretation -17th International Conference, VMCAI 2016, St. Petersburg, FL, USA, January17-19, 2016. Proceedings. Lecture Notes in Computer Science, vol. 9583, pp. 3–40.Springer (2016). https://doi.org/10.1007/978-3-662-49122-5_142. Seidl, H., Vogler, R.: Three improvements to the top-down solver. In: Sabel,D., Thiemann, P. (eds.) Proceedings of the 20th International Symposiumon Principles and Practice of Declarative Programming, PPDP 2018, Frank-furt am Main, Germany, September 03-05, 2018. pp. 21:1–21:14. ACM (2018).https://doi.org/10.1145/3236950.323696743. Singh, G., Püschel, M., Vechev, M.T.: Making numerical program analysis fast. In:Proceedings of the 36th ACM SIGPLAN Conference on Programming LanguageDesign and Implementation, Portland, OR, USA, June 15-17, 2015. pp. 303–313(2015). https://doi.org/10.1145/2737924.273800044. Singh, G., Püschel, M., Vechev, M.T.: Fast polyhedra abstract do-main. In: Castagna, G., Gordon, A.D. (eds.) Proceedings of the 44thACM SIGPLAN Symposium on Principles of Programming Languages,POPL 2017, Paris, France, January 18-20, 2017. pp. 46–59. ACM (2017).https://doi.org/10.1145/3009837.300988545. Singh, G., Püschel, M., Vechev, M.T.: Fast numerical program analysis withreinforcement learning. In: Computer Aided Verification - 30th InternationalConference, CAV 2018, Held as Part of the Federated Logic Conference, FloC2018, Oxford, UK, July 14-17, 2018, Proceedings, Part I. pp. 211–229 (2018).https://doi.org/10.1007/978-3-319-96145-3_1246. Singh, G., Püschel, M., Vechev, M.T.: A practical construction for decomposingnumerical abstract domains. Proc. ACM Program. Lang. (POPL), 55:1–55:28(2018). https://doi.org/10.1145/315814347. Tarjan, R.E.: Applications of path compression on balanced trees. J. ACM (4),690–715 (1979). https://doi.org/10.1145/322154.32216148. Technology, K.: Codehawk. https://github.com/kestreltechnology/codehawk(2020)49. Thakur, A.V., Elder, M., Reps, T.W.: Bilateral algorithms for symbolic ab-straction. In: Miné, A., Schmidt, D. (eds.) Static Analysis - 19th InternationalSymposium, SAS 2012, Deauville, France, September 11-13, 2012. Proceedings.Lecture Notes in Computer Science, vol. 7460, pp. 111–128. Springer (2012).https://doi.org/10.1007/978-3-642-33125-1_1050. Thakur, A.V., Lal, A., Lim, J., Reps, T.W.: Posthat and all that: Automatingabstract interpretation. Electron. Notes Theor. Comput. Sci. , 15–32 (2015).https://doi.org/10.1016/j.entcs.2015.02.003emory-Efficient Fixpoint Computation 2751. Thakur, A.V., Reps, T.W.: A method for symbolic computation of abstract opera-tions. In: Madhusudan, P., Seshia, S.A. (eds.) Computer Aided Verification - 24thInternational Conference, CAV 2012, Berkeley, CA, USA, July 7-13, 2012 Proceed-ings. Lecture Notes in Computer Science, vol. 7358, pp. 174–192. Springer (2012).https://doi.org/10.1007/978-3-642-31424-7_1752. Venet, A., Brat, G.P.: Precise and efficient static array bound checking for largeembedded C programs. In: Proceedings of the ACM SIGPLAN 2004 Conference onProgramming Language Design and Implementation 2004, Washington, DC, USA,June 9-11, 2004. pp. 231–242 (2004). https://doi.org/10.1145/996841.99686953. Wang, K., Hussain, A., Zuo, Z., Xu, G.H., Sani, A.A.: Graspan: A single-machine disk-based graph system for interprocedural static analyses of large-scale systems code. In: Proceedings of the Twenty-Second International Con-ference on Architectural Support for Programming Languages and OperatingSystems, ASPLOS 2017, Xi’an, China, April 8-12, 2017. pp. 389–404 (2017).https://doi.org/10.1145/3037697.303774454. Weiss, C., Rubio-González, C., Liblit, B.: Database-backed program analysis forscalable error propagation. In: 37th IEEE/ACM International Conference on Soft-ware Engineering, ICSE 2015, Florence, Italy, May 16-24, 2015, Volume 1. pp.586–597 (2015). https://doi.org/10.1109/ICSE.2015.7555. Zuo, Z., Gu, R., Jiang, X., Wang, Z., Huang, Y., Wang, L., Li, X.: Bigspa:An efficient interprocedural static analysis engine in the cloud. In: 2019IEEE International Parallel and Distributed Processing Symposium, IPDPS2019, Rio de Janeiro, Brazil, May 20-24, 2019. pp. 771–780. IEEE (2019).https://doi.org/10.1109/IPDPS.2019.000868 S. Kim et al. A Proofs
This section provides proofs of theorems presented in the paper.
A.1 Nesting forest ( V, (cid:22) N ) and total order ( V, ≤ ) in § 3 This section presents the theorems and proofs about (cid:22) N and ≤ defined in § 3.A partial order ( S, R ) is a forest if for all x ∈ S , ( (cid:98)(cid:98) x (cid:101) R , R ) is a chain, where (cid:98)(cid:98) x (cid:101) R def = { y ∈ S | x R y } . Theorem 4. ( V, (cid:22) N ) is a forest.Proof. First, we show that ( V, (cid:22) N ) is a partial order. Let x, y, z be a vertex in V . – Reflexivity: x (cid:22) N x . This is true by the definition of (cid:22) N . – Transitivity: x (cid:22) N y and y (cid:22) N z implies x (cid:22) N z . (i) If x = y , x (cid:22) N z .(ii) Otherwise, by definition of (cid:22) N , y ∈ ω ( x ) . Furthermore, (ii-1) if y = z , z ∈ ω ( x ) ; and hence, x (cid:22) N z . (ii-2) Otherwise, z ∈ ω ( y ) , and by definitionof HTO, z ∈ ω ( x ) . – Anti-symmetry: x (cid:22) N y and y (cid:22) N x implies x = y . Suppose x (cid:54) = y . Bydefinition of (cid:22) N and premises, y ∈ ω ( x ) and x ∈ ω ( y ) . Then, by definitionof HTO, x ≺ y and y ≺ x . This contradicts that (cid:22) is a total order.Next, we show that the partial order is a forest. Suppose there exists v ∈ V such that ( (cid:98)(cid:98) v (cid:101) (cid:22) N , (cid:22) N ) is not a chain. That is, there exists x, y ∈ (cid:98)(cid:98) v (cid:101) (cid:22) N such that x (cid:54)(cid:22) N y and y (cid:54)(cid:22) N x . Then, by definition of HTO, C ( x ) ∩ C ( y ) = ∅ . However, thiscontradicts that v ∈ C ( x ) and v ∈ C ( y ) . (cid:117)(cid:116) Theorem 5. ( V, ≤ ) is a total order.Proof. We prove the properties of a total order. Let x, y, z be a vertex in V . – Connexity: x ≤ y or y ≤ x . This follows from the connexity of the totalorder (cid:22) . – Transitivity: x ≤ y and y ≤ z implies x ≤ z . (i) Suppose x (cid:22) N y . (i-1) If y (cid:22) N z , by transitivity of (cid:22) N , x (cid:22) N z . (ii-2) Otherwise, z (cid:54)(cid:22) N y and y (cid:22) z .It cannot be z (cid:22) N x because transitivity of (cid:22) N implies z (cid:22) N y , which isa contradiction. Furthermore, it cannot be z ≺ x because y (cid:22) z ≺ x and x (cid:22) N y implies y ∈ ω ( z ) by the definition of HTO. By connexity of (cid:22) , x (cid:22) z .(ii) Otherwise y (cid:54)(cid:22) N x and x (cid:22) y . (ii-1) If y (cid:22) N z , z (cid:54)(cid:22) N x because, otherwise,transitivity of (cid:22) N will imply y (cid:22) N x . By connexity of (cid:22) , it is either x (cid:22) z or z ≺ x . If x (cid:22) z , x ≤ z . If z ≺ x , by definition of HTO, z ∈ ω ( z ) . – Anti-symmetry: x ≤ y and y ≤ x implies x = y . (i) If x (cid:22) N y , it should be y (cid:22) N x for y ≤ x to be true. By anti-symmetry of (cid:22) N , x = y . (ii) Otherwise, y (cid:54)(cid:22) N x and x (cid:22) y . For y ≤ x to be true, x (cid:54)(cid:22) N y and x (cid:22) y . By anti-symmetryof (cid:22) , x = y . (cid:117)(cid:116) emory-Efficient Fixpoint Computation 29 Theorem 6.
For u, v ∈ V , if Inst [ v ] reads Post [ u ] , then u ≤ v .Proof. By the definition of the mapping
Inst , there must exists v (cid:48) ∈ V suchthat u v (cid:48) and v (cid:48) (cid:22) N v for Inst [ v ] to read Post [ u ] . By the definition of WTO,it is either u ≺ v (cid:48) and v (cid:48) / ∈ ω ( u ) , or v (cid:48) (cid:22) u and v (cid:48) ∈ ω ( u ) . In both cases, u ≤ v (cid:48) .Because v (cid:48) (cid:22) N v , and hence v (cid:48) ≤ v , u ≤ v . (cid:117)(cid:116) A.2 Optimality of M opt in § 3 This section presents the theorems and proofs about the optimality of M opt described in § 3. The theorem is divided into optimality theorems of the mapsthat constitute M opt .Given M ( Dpost , Achk , Dpost (cid:96) , Dpre (cid:96) ) and a map Dpost , we use M (cid:32) Dpost to denote the memory configuration ( Dpost , Achk , Dpost (cid:96) , Dpre (cid:96) ). Simi-larly, M (cid:32) Achk means ( Dpost , Achk , Dpost (cid:96) , Dpre (cid:96) ), and so on. For agiven FM program P , each map X that constitutes a memory configuration isvalid for P iff M (cid:32) X is valid for every valid memory configuration M . Also, X is optimal for P iff M (cid:32) X is optimal for an optimal memory configuration M . Theorem 7.
Dpost opt is valid. That is, given an FM program P and a validmemory configuration M , (cid:74) P (cid:75) M (cid:32) Dpost opt = (cid:74) P (cid:75) M .Proof. Our approach does not change the iteration order and only changes wherethe deallocations are performed. Therefore, it is sufficient to show that for all u v , Post [ u ] is available whenever Inst [ v ] is executed.Suppose that this is false: there exists an edge u v that violates it. Let d be Dpost opt [ u ] computed by our approach. Then, the execution trace of P has execution of Inst [ v ] after the deallocation of Post [ u ] in Inst [ d ] , with noexecution of Inst [ u ] in between.Because ≤ is a total order, it is either d < v or v ≤ d . It must be v ≤ d ,because d < v implies d < v ≤ Lift ( u, v ) , which contradicts the definition of Dpost opt [ u ] . Then, by definition of ≤ , it is either v (cid:22) N d or ( d (cid:54)(cid:22) N v ) ∧ ( v (cid:22) d ) .In both cases, the only way Inst [ v ] can be executed after Inst [ d ] is to haveanother head h whose repeat instruction includes both Inst [ d ] and Inst [ v ] .That is, when d ≺ N h and v ≺ N h . By definition of WTO and u v , it is either u ≺ v , or u (cid:22) N v . It must be u ≺ v , because if u (cid:22) N v , Inst [ u ] is part of Inst [ v ] ,making Inst [ u ] to be executed before reading Post [ u ] in Inst [ v ] . Furthermore,it must be u ≺ h , because if h (cid:22) u , Inst [ u ] is executed before Inst [ v ] in eachiteration over C ( h ) . However, that implies h ∈ ( (cid:98)(cid:98) v (cid:101) (cid:22) N \ (cid:98)(cid:98) u (cid:101) (cid:22) N ) , which combinedwith d ≺ N h , contradicts the definition of Dpost opt [ u ] . Therefore, no such edge u v can exist and the theorem is true. (cid:117)(cid:116) Theorem 8.
Dpost opt is optimal. That is, given an FM program P , memoryfootprint of (cid:74) P (cid:75) M (cid:32) Dpost opt is smaller than or equal to that of (cid:74) P (cid:75) M for all validmemory configuration M . Proof.
For
Dpost opt to be optimal, deallocation of
Post values must be de-termined at earliest positions as possible with a valid memory configuration M (cid:32) Dpost opt . That is, there should not exists u, b ∈ V such that if d = Dpost opt [ u ] , b (cid:54) = d , M (cid:32) ( Dpost opt [ u ← b ]) is valid, and Inst [ b ] deletes Post [ u ] earlier than Inst [ d ] .Suppose that this is false: such u, b exists. Let d be Dpost opt [ u ] , computedby our approach. Then it must be b < d for Inst [ b ] to be able to delete Post [ u ] earlier than Inst [ d ] . Also, for all u v , it must be v ≤ b for Inst [ v ] to beexecuted before deleting Post [ u ] in Inst [ b ] .By definition of Dpost opt , v ≤ d for all u v . Also, by Theorem 6, u ≤ v .Hence, u ≤ d , making it either u (cid:22) N d , or ( d (cid:54)(cid:22) N u ) ∧ ( u (cid:22) d ) . If u (cid:22) N d ,by definition of Lift , it must be u d . Therefore, it must be d ≤ b , whichcontradicts that b < d . Alternative, if ( d (cid:54)(cid:22) N u ) ∧ ( u (cid:22) d ) , there must exist v ∈ V such that u v and Lift ( u, v ) = d . To satisfy v ≤ b , v (cid:22) N d , and b < d , it must be b (cid:22) N d . However, this makes the analysis incorrect becausewhen stabilization check fails for C ( d ) , Inst [ v ] gets executed again, attemptingto read Post [ u ] that is already deleted by Inst [ b ] . Therefore, no such u, b canexist, and the theorem is true. (cid:117)(cid:116) Theorem 9.
Achk opt is valid. That is, given an FM program P and a validmemory configuration M , (cid:74) P (cid:75) M (cid:32) Achk opt = (cid:74) P (cid:75) M Proof.
Let v = Achk opt [ u ] . If v is a head, by definition of Achk opt , C ( v ) is thelargest component that contains u . Therefore, once C ( v ) is stabilized, Inst [ u ] can no longer be executed, and Pre [ u ] remains the same. If v is not a head,then v = u . That is, there is no component that contains u . Therefore, Pre [ u ] remains the same after the execution of Inst [ u ] . In both cases, the value passedto Ck u are the same as when using Achk dflt . (cid:117)(cid:116) Theorem 10.
Achk opt is optimal. That is, given an FM program P , memoryfootprint of (cid:74) P (cid:75) M (cid:32) Achk opt is smaller than or equal to that of (cid:74) P (cid:75) M for all validmemory configuration M .Proof. Because
Pre value is deleted right after its corresponding assertions arechecked, it is sufficient to show that assertion checks are placed at the earliestpositions with
Achk opt .Let v = Achk opt [ u ] . By definition of Achk opt , u (cid:22) N v . For some b to performassertion checks of u earlier than v , it must satisfy b ≺ N v . However, becauseone cannot know in advance when a component of v would stabilize and when Pre [ u ] would converge, the assertion checks of u cannot be performed in Inst [ b ] .Therefore, our approach puts the assertion checks at the earliest positions, andit leads to the minimum memory footprint. (cid:117)(cid:116) Theorem 11.
Dpost (cid:96) opt is valid. That is, given an FM program P and a validmemory configuration M , (cid:74) P (cid:75) M (cid:32) Dpost (cid:96) opt = (cid:74) P (cid:75) M . emory-Efficient Fixpoint Computation 31 Proof.
Again, our approach does not change the iteration order and only changeswhere the deallocations are performed. Therefore, it is sufficient to show thatfor all u v , Post [ u ] is available whenever Inst [ v ] is executed.Suppose that this is false: there exists an edge u v that violates it. Let d (cid:48) be element in Dpost (cid:96) opt [ u ] that causes this violation. Then, the execution traceof P has execution of Inst [ v ] after the deallocation of Post [ u ] in Inst [ d (cid:48) ] , withno execution of Inst [ u ] in between. Because Post [ u ] is deleted inside the loopof Inst [ d (cid:48) ] , Inst [ v ] must be nested in Inst [ d (cid:48) ] or be executed after Inst [ d (cid:48) ] tobe affected. That is, it must be either v (cid:22) N d (cid:48) or d (cid:48) ≺ v . Also, because of how Dpost (cid:96) opt [ u ] is computed, u (cid:22) N d (cid:48) .First consider the case v (cid:22) N d (cid:48) . By definition of WTO and u v , it iseither u ≺ v or u (cid:22) N v . In either case, Inst [ u ] gets executed before Inst [ v ] reads Post [ u ] . Therefore, deallocation of Post [ u ] in Inst [ d (cid:48) ] cannot cause theviolation.Alternatively, consider d (cid:48) ≺ v and v (cid:54)(cid:22) N d (cid:48) . Because u (cid:22) N d (cid:48) , Post [ u ] isgenerated in each iteration over C ( d (cid:48) ) , and the last iteration does not delete Post [ u ] . Therefore, Post [ u ] will be available when executing Inst [ v ] . Therefore,such u, d (cid:48) does not exists, and the theorem is true. (cid:117)(cid:116) Theorem 12.
Dpost (cid:96) opt is optimal. That is, given an FM program P , memoryfootprint of (cid:74) P (cid:75) M (cid:32) Dpost (cid:96) opt is smaller than or equal to that of (cid:74) P (cid:75) M for all validmemory configuration M .Proof. Because one cannot know when a component would stabilize in advance,the decision to delete intermediate
Post [ u ] cannot be made earlier than thestabilization check of a component that contains u . Our approach makes suchdecisions in all relevant components that contains u .If u (cid:22) N d , Dpost (cid:96) opt [ u ] = (cid:98)(cid:98) u (cid:101) (cid:22) N ∩ (cid:98) d (cid:101)(cid:101) (cid:22) N . Because Post [ u ] is deleted in Inst [ d ] , we do not have to consider components in (cid:98)(cid:98) d (cid:101) (cid:22) N \ { d } . Alternatively, if u (cid:54)(cid:22) N d , Dpost (cid:96) opt [ u ] = (cid:98)(cid:98) u (cid:101) (cid:22) N \ (cid:98)(cid:98) d (cid:101) (cid:22) N . Because Post [ u ] is deleted Inst [ d ] , wedo not have to consider components in (cid:98)(cid:98) u (cid:101) (cid:22) N \ (cid:98)(cid:98) d (cid:101) (cid:22) N . Therefore, Dpost (cid:96) opt isoptimal. (cid:117)(cid:116)
Theorem 13.
Dpre (cid:96) opt is valid. That is, given an FM program P and a validmemory configuration M , (cid:74) P (cid:75) M (cid:32) Dpre (cid:96) opt = (cid:74) P (cid:75) M .Proof. Pre [ u ] is only used in assertion checks and to perform widening in Inst [ u ] . Because u is removed from Dpre (cid:96) [ u ] , the deletion does not affect widen-ing.For all v ∈ Dpre (cid:96) [ u ] , v (cid:22) N Achk opt [ u ] . Because Pre [ u ] is not deleted when C ( v ) is stabilized, Pre [ u ] will be available when performing assertion checks in Inst [ Achk opt [ u ]] . Therefore, Dpre (cid:96) is valid. (cid:117)(cid:116)
Theorem 14.
Dpre (cid:96) opt is optimal. That is, given an FM program P , memoryfootprint of (cid:74) P (cid:75) M (cid:32) Dpre (cid:96) opt is smaller than or equal to that of (cid:74) P (cid:75) M for all validmemory configuration M . Proof.
Because one cannot know when a component would stabilize in advance,the decision to delete intermediate
Pre [ u ] cannot be made earlier than the sta-bilization check of a component that contains u . Our approach makes such de-cisions in all components that contains u . Therefore, Dpre (cid:96) opt is optimal. (cid:117)(cid:116)
Theorem 1.
The memory configuration M opt ( Dpost opt , Achk opt , Dpost (cid:96) opt , Dpre (cid:96) opt ) is optimal.Proof. This follows from theorems Theorem 11 to 14. (cid:117)(cid:116)
A.3 Correctness and efficiency of
GenerateFMProgram in § 4
This section presents the theorems and proofs about the correctness and effi-ciency of
GenerateFMProgram (Algorithm 1, § 4).
Theorem 2.
GenerateFMProgram correctly computes M opt , defined in § 3.Proof. We show that each map is constructed correctly. – Dpost opt : Let v (cid:48) be the value of Dpost opt [ u ] before overwritten in Line 50,37, or 41. Descending post DFN ordering corresponds to a topological sortingof the nested SCCs. Therefore, in Line 50 and 37, v (cid:48) ≺ v . Also, because v (cid:22) N h for all v ∈ N h in Line 41, v (cid:48) (cid:22) N v . In any case, v (cid:48) ≤ v . Because rep ( v ) essentially performs Lift ( u, v ) when restoring the edges, the final Dpost opt [ u ] is the maximum of the lifted successors, and the map is correctlycomputed. – Dpost (cid:96) opt : The correctness follows from the correctness of T . Because thecomponents are constructed bottom-up, rep ( u ) in Line 51 and 38 returnsmax (cid:22) N ( (cid:98)(cid:98) u (cid:101) (cid:22) N \ (cid:98)(cid:98) Dpost opt [ u ] (cid:101) (cid:22) N ) . Also, N ∗ = (cid:22) N . Thus, Dpost (cid:96) opt is cor-rectly computed. – Achk opt : At the end of the algorithm rep ( v ) is the head of maximal com-ponent that contains v , or v itself when v is outside of any components.Therefore, Achk opt is correctly computed. – Dpre (cid:96) opt : Using the same reasoning as in
Achk opt , and because N ∗ = (cid:22) N , Dpre (cid:96) opt is correctly computed. (cid:117)(cid:116)
Theorem 3.
Running time of
GenerateFMProgram is almost-linear.Proof.
The base WTO-construction algorithm is almost-linear time [27]. Thestarred lines in Algorithm 1 visit each edge and vertex once. Therefore, timecomplexity still remains almost-linear time. (cid:117)(cid:116) emory-Efficient Fixpoint Computation 33
B Further experimental evaluation
Table 2: Measurements for benchmarks that took less than 5 seconds are sum-marized in the table below. Time diff shows the runtime of IKOS minus thatof
Mikos (positive means speedup in
Mikos ). Mem diff shows the memoryfootprint of IKOS minus that of
Mikos (positive means memory reduction in
Mikos ). < s Time (s) Memory (MB) Time diff (s) Memory diff (MB)min. max. avg. min. max. avg. min. max. avg. min. max. avg.T1 0.11 4.98 0.58 25 564 42 -0.61 +1.44 +0.08 -0.37 +490 +12T2 0.06 4.98 1.07 9 218 46 -0.05 +1.33 +0.14 -0.43 +172 +18 Table 3:
Task T1.
A sample of the results for task T1 in Figure 3(a), exclud-ing the non-completed benchmarks in IKOS. The first 5 rows list benchmarkswith the smallest memory reduction ratio (MRR)s. The latter 5 rows list bench-marks with the largest memory footprints. The smaller the MRR, the greaterthe reduction in memory footprint. T: time; MF: memory footprint.
IKOS
Mikos
Benchmark T (s) MF (MB) T (s) MF (MB) MRR3.16-rc1/205_9a-net-rtl8187 1500 45905 1314 56 0.0014.2-rc1/43_2a-mmc-rtsx 786.5 26909 594.8 42 0.0024.2-rc1/43_2a-video-radeonfb 2494 56752 1930 107 0.0024.2-rc1/43_2a-net-skge 3523 47392 3131 98 0.0024.2-rc1/43_2a-usb-hcd 220.4 17835 150.8 39 0.0024.2-rc1/32_7a-target_core_mod 1316 60417 1110 2967 0.049challenges/3.14-alloc-libertas 2094 60398 1620 626 0.0104.2-rc1/43_2a-net-libertas 1634 59902 1307 307 0.005challenges/3.14-kernel-libertas 2059 59826 1688 2713 0.0453.16-rc1/43_2a-sound-cs46xx 3101 58087 2498 193 0.0034 S. Kim et al.
Table 4:
Task T1.
A sample of the results for task T1 in Figure 3(b). The first3 rows list benchmarks with lowest speedups. The latter 3 rows list benchmarkswith highest speedups. T: time; MF: memory footprint.
IKOS
Mikos
Benchmark T (s) MF (MB) T (s) MF (MB) MRR Speedupchallenges/3.8-usb-main11 42.63 541 48.92 122 0.225 0.87 × challenges/3.8-usb-main0 54.31 3025 61.78 190 0.063 0.88 × challenges/3.8-usb-main1 42.84 457 47.73 119 0.261 0.90 × × × × Table 5:
Task T2.
A sample of the results for task T2 in Figure 5(a), exclud-ing the non-completed benchmarks in IKOS. The first 5 rows list benchmarkswith the smallest memory reduction ratio (MRR)s. The latter 5 rows list bench-marks with the largest memory footprints. The smaller the MRR, the greaterthe reduction in memory footprint. T: time; MF: memory footprint.
IKOS
Mikos
Benchmark T (s) MF (MB) T (s) MF (MB) MRRlxsession-0.5.4/lxsession 146.1 5831 81.57 130 0.022rox-2.11/ROX-Filer 362.3 9569 400.6 329 0.034tor-0.3.5.8/tor-resolve 58.36 1930 53.10 70 0.036openssh-8.0p1/ssh-keygen 1212 29670 1170 1128 0.038xsane-0.999/xsane 499.8 10118 467.5 430 0.042openssh-8.0p1/sftp 3036 45903 3446 9137 0.199metacity-3.30.1/metacity 2111 36324 2363 6329 0.174links-2.19/links 2512 29761 2740 3930 0.132openssh-8.0p1/ssh-keygen 1212 29670 1170 1128 0.038links-2.19/xlinks 2523 29587 2760 3921 0.133
Table 6:
Task T2.
A sample of the results for task T2 in Figure 5(b). The first3 rows list benchmarks with lowest speedups. The latter 3 rows list benchmarkswith highest speedups. T: time; MF: memory footprint.
IKOS
Mikos
Benchmark T (s) MF (MB) T (s) MF (MB) MRR Speedupmoserial-3.0.12/moserial 422.3 109 585.5 107 0.980 0.72 × openssh-8.0p1/ssh-pkcs11-helper 82.70 674 94.61 613 0.910 0.87 × openssh-8.0p1/sftp 3036 45903 3446 9137 0.199 0.88 × packeth-1.9/packETH 188.7 153 83.82 120 0.782 2.25 × lxsession-0.5.4/lxsession 146.1 5831 81.57 130 0.022 1.79 × xscreensaver-5.42/braid 6.48 203 4.87 36 0.179 1.33xscreensaver-5.42/braid 6.48 203 4.87 36 0.179 1.33