Tighter Loop Bound Analysis (Technical report)
aa r X i v : . [ c s . P L ] M a y Tighter Loop Bound Analysis (Technical report)Pavel Čadek
Faculty of InformaticsVienna University of TechnologyAustria
Jan Strejček
Faculty of InformaticsMasaryk University, BrnoCzech Republic
Marek Trtík
LaBRIUniversity of BordeauxFrance
Abstract
We present a new algorithm for computing upper bounds on the num-ber of executions of each program instruction during any single programrun. The upper bounds are expressed as functions of program input val-ues. The algorithm is primarily designed to produce bounds that arerelatively tight, i.e. not unnecessarily blown up. The upper bounds for in-structions allow us to infer loop bounds, i.e. upper bounds on the numberof loop iterations. Experimental results show that the algorithm imple-mented in a prototype tool
Looperman often produces tighter boundsthan current tools for loop bound analysis.
Keywords:
Loop bound, WCET, symbolic execution, path counter, loopsummarisation, static analysis.
The goal of loop bound analysis is to derive for each loop in a given programan upper bound on the number of its iterations during any execution of theprogram. These bounds can be parametrized by the program input. The loopbound analysis is an active research area with two prominent applications: pro-gram complexity analysis and worst case execution time (WCET) analysis.The aim of program complexity analysis is to derive an asymptotic complex-ity of a given program. The complexity is commonly considered by programmersin their everyday work and it is also used in specifications of programming lan-guages, e.g. every implementation of the standard template library of C++ hasto have the prescribed complexities. Loop bound analysis clearly plays a cru-cial role in program complexity analysis. In this context, emphasis is put onlarge coverage of the loop bound analysis (i.e. it should find some bounds for asmany program loops as possible), while there are only limited requirements ontightness of the bounds as asymptotic complexity is studied.1 typical application scenario for WCET analysis is to check whether agiven part of some critical system finishes its execution within an allocatedtime budget. One step of the decision process is to compute loop bounds.Tightness of the bounds is very important here as an untight bound can lead toa spuriously negative answer of the analysis (i.e. ‘the allocated time budged canbe exceeded’), which may imply unnecessary additional costs, e.g. for systemredesign or for hardware components with higher performance. The WCETanalysis can also be used by schedulers to estimate the run-time of individualtasks.The problem to infer loop bounds has recently been refined into the reachability-bound problem [16], where the goal is to find an upper bound on the number ofexecutions of a given program instruction during any single run of a given pro-gram. One typically asks for a reachability bound on some resource demandinginstruction like memory allocation. Reachability bound analysis is more chal-lenging than loop bound analysis as, in order to get a reasonably precise bound,branching inside loops must be taken into account.This paper presents a new algorithm that infers reachability bounds. Moreprecisely, for each instruction of a given program, the algorithm tries to findan upper bound on the number of executions of the instruction in any singlerun of the program. The bounds are parametrized by the program input. Thereachability bounds can be directly used to infer loop bounds and asymptoticprogram complexity. Our algorithm builds on symbolic execution [19] and loopsummarisation adopted from [25]. In comparison with other techniques forreachability bound or loop bound analysis, our algorithm brings the followingfeatures: • It utilizes a loop summarisation technique that computes precise values ofprogram variables as functions of loop iteration counts. • It distinguishes different branches inside loops and computes bounds foreach of them separately. • If more different bounds arise, it handles all of them while other techniquesusually choose nondeterministically one of them. • It can detect logarithmic bounds. • Upper bounds for nested loops are computed more precisely: while othertechniques typically multiply a bound for the outer loop by a maximalbound on iterations of the inner loop during one iteration of the outerloop, we sum the bounds for the inner loop over all iterations of the outerloop.All these features have a positive effect on tightness of produced bounds.We can explain the basic idea of our algorithm on the flowgraph on the right.The node a is the entry location, d is the exit location, and locations b, c form aloop. An initial value of x represents program input. We symbolically executeeach path in the loop and assign an iteration counter to it. Then we try toexpress the effect of arbitrarily many iterations of the loop using the iterationcounters as parameters. The loop in our example has just one path bcb thatincrements i by 2. Hence, the value of i after κ iterations is i ′ + 2 κ , where i ′ denotes the value of i before the loop execution starts. We combine this2 oid f(int x) { aaa int i=5; aaa while (i In this section we introduce or recall some terminology intensively used in thefollowing sections. For simplicity, this paper focuses on programs manipulat-ing only integer scalar variables a , b , . . . and read-only multidimensional integerarray variables A , B , . . . , and with no function calls. An analysed program is represented as a flowgraph P = ( V, E, l beg , l end , ι ) ,where ( V, E ) is a finite oriented graph, l beg , l end ∈ V are different begin and end nodes respectively. The in-degree of l beg and the out-degree of l end are 0,and ι : E → I labels each edge e by an instruction ι ( e ) . The out-degree ofall nodes except the end node is or . Nodes with out-degree are called branching nodes. We use two kinds of instructions: an assignment instruction a:= expr for some scalar variable a and some program expression expr over pro-gram variables, and an assumption assume ( γ ) for some quantifier-free formula γ over program variables. Out-edges of any branching node are labelled withassumptions assume ( γ ) and assume ( ¬ γ ) for some γ . We often omit the keyword assume in flowgraphs. 3 path in a flowgraph is a (finite or infinite) sequence π = v v · · · of nodessuch that ( v i , v i +1 ) ∈ E for all v i , v i +1 in the sequence. Paths are denoted byGreek letters.A backbone in a flowgraph is an acyclic path leading from the begin node tothe end node.Let π be a backbone with a prefix αv . There is a loop C with a loop entry v along π , if there exists a path vβv such that no node of β appears in α . Theloop C is then the smallest set containing all nodes of all such paths vβv .Each run of the program corresponds to a path in the flowgraph starting at l beg and such that it is either infinite, or it is finite and ends in l end . Every runfollows some backbone: it can escape from the backbone in order to performone or more iterations in a loop along the backbone, but once the last iterationin the loop is finished, the execution continues along the backbone again. Wethus talk about a run along a backbone. We can compute the backbone for afinite run corresponding to a path ρ by the following procedure: If ρ is acyclic,then the backbone is directly ρ . Otherwise, we find the leftmost repeating nodein ρ , remove the part of ρ between the first and the last occurrence of this node(including the last occurrence), and repeat the procedure. In other words, thebackbone arises from ρ by removing all cycles in it. For an infinite run, weextend the procedure to remove the suffix starting just after the leftmost nodethat repeats infinitely often on the run. The procedure returns an acyclic paththat is a prefix of some backbones. We can associate the run to any backbonewith this prefix.For a loop C with a loop entry v along a backbone π , a flowgraph inducedby the loop , denoted as P ( C, v ) , is derived from the subgraph of the originalflowgraph induced by C , where v is marked as the begin node, a fresh end node v ′ is added, and every transition ( u, v ) ∈ E leading to v is redirected to v ′ (weidentify the edge ( u, v ′ ) with ( u, v ) in the context of the original program). Eachsingle iteration (i.e. a path from v back to v without visiting v inside) of theloop corresponds to a run of the induced flowgraph.Note that program representation by flowgraphs and the above definitionof loops easily handle many features of programming languages like break , continue , or even program loops constructed using goto instructions. The basic idea of symbolic execution [19] is to replace input data of a programby symbols representing arbitrary data. Executed instructions then manipulatesymbolic expressions over the symbols instead of exact values. A symbolic ex-pression is any term of the theory of integers extended with functions max and min , rounding functions ⌈·⌉ and ⌊·⌋ for constant expressions over reals, and • for each scalar variable a , an uninterpreted constant a , which is a symbol representing any initial (input) value of the variable a , • for each array variable A , an uninterpreted function A of the same arity as A , which is a symbol representing any initial (input) content of the array A , We assume that crashes or other undefined behaviour of program expressions are preventedby safety guards. For example, an expression a/b is guarded by assume ( b = 0) . a countable set { κ , κ , . . . } of artificial variables (not appearing in anal-ysed programs), called path counters and ranging over non-negative inte-gers, • a special symbol ⋆ called unknown , and • for each formula ψ build over symbolic expressions and two symbolic ex-pressions φ , φ , a construct ite ( ψ, φ , φ ) of meaning “ i f- t hen- e lse”, thatevaluates to φ if ψ holds, and to φ otherwise.Let ψ, φ be symbolic expressions and x be a symbol or a path counter. By ψ [ x/φ ] we denote the expression ψ , where all occurrences of x are simultaneouslyreplaced by φ . We further extend this notation such that ψ [ x i /φ i | i ∈ I ] denotesthe expression ψ , where all occurrences of x i are simultaneously replaced by φ i ,for each i ∈ I .We sometimes add the upper index ~κ to expression names to denote thatthe expressions can contain path counters ~κ = ( κ , . . . , κ n ) . We say that anexpression is ~κ -free if it contains no path counters.Symbolic execution stores variable values in symbolic memory and all exe-cutable program paths are uniquely identified by corresponding path conditions .Here we provide brief descriptions of these terms. For more information see [19].A symbolic memory is a function θ assigning to each scalar variable a asymbolic expression and to each array variable A the symbol A (the value of everyarray variable is always identical to its initial value as we consider programs withread-only arrays). A symbolic memory θ is called initial , if θ ( a ) = a , for eachscalar or array variable a .We overload the notation θ ( · ) to program expressions as follows. Let expr be a program expression over program variables a , . . . , a n . Then θ ( expr ) represents a symbolic expression obtained from expr such that we simultaneouslyreplace all occurrences of the variables a , . . . , a n by symbolic expressions θ ( a ) ,. . . , θ ( a n ) respectively.Symbolic execution of a path in a flowgraph starts with the initial symbolicmemory and the memory is updated on assignments. For example, if the firstexecuted assignment is a:=2*a+b , the initial symbolic memory θ is updated tothe symbolic memory θ ′ where θ ′ ( a ) = θ ( ) = 2 a + b . If we later update θ ′ on a:=1-a , we get the memory θ ′′ such that θ ′′ ( a ) = θ ′ ( ) = 1 − (2 a + b ) =1 − a − b .If ψ is a symbolic expression over symbols { a i | i ∈ I } corresponding toprogram variables { a i | i ∈ I } respectively, then θ h ψ i denotes the symbolicexpression ψ [ a i /θ ( a i ) | i ∈ I ] . For example, if θ ( a ) = κ and θ ( b ) = a − κ , then θ h a + b i = 2 θ ( a ) + θ ( b ) = 2 κ + a − κ . We apply the notation θ h ϕ i and ϕ [ x/ψ ] with the analogous meanings also to formulae ϕ built over symbolic expressions.Note that θ h θ ( a ) i returns the value of a after a code with effect θ followedby a code with effect θ . For example, if θ ( a ) = 2 a + 1 represents the effect ofassignment a:=2*a+1 and θ ( a ) = a − the effect of assignment a:=a-2 , then θ h θ ( a ) i = θ h a − i = (2 a + 1) − represents the effect of the two assignmentsapplied in the sequence.Given a path in a flowgraph leading from its begin node l beg , the path condi-tion is a formula over symbols, which determines the set of all possible programinputs for which the program execution will follow the path. A path condition isconstructed during symbolic execution of the path. Initially, the path condition5s set to true and it can only be updated when an assume ( γ ) is executed. Forexample, if a symbolic execution reaches assume(a > 5) with a path condition ϕ and a symbolic memory θ ( a ) = 2 a − , then it updates the path condition to ϕ ∧ a − > . An upper bound for an edge e in a flowgraph P is a ~κ -free symbolic expression ρ such that whenever P is executed on any input, the instruction on edge e isexecuted at most ρ ′ times, where ρ ′ is the expression that we get by replacingeach variable symbol by the input value of the corresponding variable. Recall that every program run follows some backbone and the run can divergefrom the backbone only to loops along the backbone. The algorithm first detectsall backbones. For each backbone π i and each edge e , it computes a set of upperbounds β i ( e ) on the number of visits of the edge by any run following the con-sidered backbone. As all these bounds are valid, the set β i ( e ) can be interpretedas a single bound min β i ( e ) on visits of edge e by any run along π i . At the end,the overall upper bound for an edge e can be computed as the maximum ofthese bounds over all backbones, i.e. as max { min β i ( e ) | π i is a backbone } .The algorithm consists of the following four procedures: executeProgram is the starting procedure of the whole algorithm. It gets aflowgraph and computes all its backbones. Then it symbolically exe-cutes each backbone and computes for each edge a set of upper boundson the number of visits of the edge by a run following the backbone.Whenever the symbolic execution visits a loop entry node, the procedure processLoop is called to get upper bounds on visits during loop execution. processLoop gets a loop represented by the program induced by the loop. Notethat each run of the induced program corresponds to one iteration of theloop and it follows some backbone of the induced program (the backbonesare called loop paths in this context). The procedure then symbolicallyexecutes each loop path by recursive call of executeProgram (the nestingof recursive calls thus directly corresponds to the nesting of loops in theprogram). The recursive call of executeProgram produces, for each looppath, a symbolic memory and a path condition capturing the effect of a sin-gle iteration along the loop path. The procedure processLoop then calls computeSummary , which takes the symbolic memories after single loop it-erations, assigns to each loop path a unique path counter κ i , and computesa parametrized symbolic memory θ ~κ describing the effect of an arbitrarynumber of loop iterations. This symbolic memory is parametrized by pathcounters ~κ = ( κ , . . . , κ k ) representing the numbers of iterations along thecorresponding loop paths. From the parametrized symbolic memory andfrom the path conditions corresponding to single loop iterations (receivedfrom the recursive call of executeProgram ), we derive a parametrizednecessary condition for each loop path, which is a formula over symbolsand path counters ~κ that has to be satisfied when another loop iteration6long the corresponding loop path can be performed after ~κ loop itera-tions. Finally, processLoop infers upper bounds from these parametrizednecessary conditions with the help of the procedure computeBounds . computeSummary is a subroutine of processLoop that gets symbolic memoriescorresponding to one loop iteration along each loop path and it producesthe parametrized symbolic memory θ ~κ after an arbitrary number of loopiterations (as mentioned above). computeBounds is another subroutine of processLoop . It gets a set I of looppaths and the corresponding parametrized necessary conditions and triesto derive some upper bounds on the number of loop iterations along looppaths from I .We describe the four procedures in the following four subsections. The pro-cedure processLoop is described as the last one as it calls the other threeprocedures. executeProgram The procedure executeProgram of Algorithm 1 takes a flowgraph as input,determines its backbones, and symbolically executes each backbone separately.For a backbone π i , symbolic execution computes symbolic memory θ i , pathcondition ϕ i , and bound function β i assigning to each edge e a set of symbolicexpressions that are valid upper bounds on the number of visits of edge e duringany single run along the backbone. Each such a set β i ( e ) of bounds could bereplaced by a single bound min β i ( e ) , but we prefer to keep it as a set of simplerexpressions to increase the success rate of expression matching in procedure processLoop (we point out the reason in Section 3.4).The symbolic execution proceeds in the standard way until we enter a loopentry (line 9). Then we call procedure processLoop on the loop, current sym-bolic memory and path condition. The procedure returns function β loop ofupper bounds on visits of loop edges during execution of the loop, and a sym-bolic memory after execution of the loop. We add these bounds and the formerbounds in the foreach loop at line 12 and continue the execution along thebackbone. If the processLoop procedure cannot determine the value of somevariable after the loop, it simply uses the symbol ⋆ (unknown).Another difference from the standard symbolic execution is at line 14 wherewe suppress insertion of predicates containing ⋆ to the path condition. As aconsequence, a path condition of our approach is no longer a necessary andsufficient condition on input values to lead the program execution along thecorresponding path (which is the case in standard symbolic execution), but itis only a necessary condition on input values of a run to follow the backbone.After processing an edge of the backbone, we increase the correspondingbounds by one (line 18).At the end of the procedure, the resulting bounds for each edge are computedas the maximum of previously computed bounds for the edge over all backbones(see the foreach loop at line 20). Besides these bounds, the procedure alsoreturns each backbone with the symbolic memory and path condition after itsexecution. 7 lgorithm 1: executeProgram Input : P // a flowgraph Output : { ( π , θ , ϕ ) , . . . , ( π k , θ k , ϕ k ) } // backbones π i (with symbolic memory θ i // and path condition ϕ i after execution// along π i ) β // for each edge e of P , β ( e ) is a set of upper bounds for e states ←− ∅ Compute the set of backbones { π , . . . , π k } of P . foreach i = 1 , . . . , k do Initialize θ i to return a for each (scalar or array) variable a . ϕ i ←− true Initialize β i to return { } for each edge. Let π i = v . . . v n . foreach j = 1 , . . . , n − do if v j is a loop entry then Let C be the loop with the loop entry v j along π i . ( β loop , θ i ) ←− processLoop( P ( C, v j ) , θ i , ϕ i ) foreach edge e of P ( C, v j ) do β i ( e ) ←− { ρ + ρ | ρ ∈ β i ( e ) , ρ ∈ β loop ( e ) } if ι (( v j , v j +1 )) has the form assume( ψ ) and θ i ( ψ ) contains no ⋆ then ϕ i ←− ϕ i ∧ θ i ( ψ ) if ι (( v j , v j +1 )) has the form a := expr then θ i ( a ) ←− θ i ( expr ) β i (( v j , v j +1 )) ←− { ρ + 1 | ρ ∈ β i (( v j , v j +1 )) } Insert ( π i , θ i , ϕ i ) into states . foreach edge e of P do β ( e ) ←− { max { ρ , . . . , ρ k } | ρ ∈ β ( e ) , . . . , ρ k ∈ β k ( e ) } return ( states , β ) computeSummary The procedure computeSummary of Algorithm 2 is a subpart of the procedureof the same name introduced in [25].The procedure gets loop paths π , . . . , π l together with symbolic memories θ , . . . , θ l , where each θ i represents the effect of a single iteration along π i . Thenit assigns fresh path counters ~κ = ( κ , . . . , κ l ) to the loop paths and computesthe parametrized symbolic memory θ ~κ after ~κ iterations of the loop, i.e. after P ≤ i ≤ l κ i iterations where exactly κ i iterations follow π i for each i and thereis no assumption on the order of iterations along different loop paths.Note that the value of some variables can depend on the order of iterationsalong different loop paths. If we do not find the precise value of some variableafter ~κ iterations, then θ ~κ assigns ⋆ (unknown) to this variable.To be on safe side, we start with θ ~κ assigning ⋆ to all scalar variables. Thenwe gradually improve the precision of θ ~κ as long as there is some progress. Thecrucial step is the computation of an improved value b for a scalar variable a atline 6. The value b is defined as ⋆ except the following four cases.1. For each loop path π i , we have θ i ( a ) = a . In other words, the value of a is not changed in any iteration of the loop. This case is trivial. We set8 lgorithm 2: computeSummary Input : { ( π , θ ) , . . . , ( π l , θ l ) } // each θ i is a memory after a single execution of π i Output : θ ~κ // the symbolic memory after ~κ iterations of backbones π , . . . , π l Introduce fresh path counters ~κ = ( κ , . . . , κ l ) for backbones π , . . . , π l resp. Initialize θ ~κ to return ⋆ for each scalar variable, and A for each array variable A . repeat change ←− false foreach variable a such that θ ~κ ( a ) = ⋆ do Compute an improved value b for the variable a from symbolicmemories θ , . . . , θ l and θ ~κ . if b = ⋆ then θ ~κ ( a ) ←− b change ←− true until change = false return θ ~κ b = a .2. For each loop path π i , either θ i ( a ) = a or θ i ( a ) = a + d i (resp. θ i ( a ) = a · d i )for some symbolic expression d i such that θ ~κ h d i i contains neither ⋆ norany path counters. Let us assume that the latter possibility holds forloop paths π , . . . , π m . The condition on θ ~κ h d i i guarantees that the valueof d i is constant during all iterations over the loop. In this case, we set b = a + P mi =1 θ ~κ h d i i · κ i (resp. b = a · Q mi =1 θ ~κ h d i i κ i ).3. There exists a symbolic expression d such that θ ~κ h d i contains neither ⋆ norany path counters. For each loop path π i , either θ i ( a ) = a or θ i ( a ) = d .Let us assume that the latter possibility holds for loop paths π , . . . , π m .In other words, the value of a is set to d in each iteration along loop path π i for ≤ i ≤ m , while it is unchanged in any other iteration. Hence, weset b = ite ( P ≤ i ≤ m κ i > , θ ~κ h d i , a ) .4. For one loop path, say π i , θ i ( a ) = d for some symbolic expression d suchthat θ ~κ h d i contains neither ⋆ nor any path counters except κ i . Further,for each loop path π j such that i = j , θ j ( a ) = a . That is, only iterationsalong π i modify a and they set it to a value independent on other pathcounters than κ i . Note that if we assign d to a in the κ i -th iteration along π i , then the actual assigned value of d is the value after κ i − iterationsalong the paths. Therefore we set b = ite ( κ i > , ( θ ~κ h d i )[ κ i /κ i − , a ) .Note that one can add another cases covering other situations where the valueof a can be expressed precisely, e.g. the case capturing geometric progressions.Also note that the value of any variable in the resulting symbolic memory θ ~κ isexpressed either precisely, or it is unknown (i.e. ⋆ ).9 lgorithm 3: computeBounds Input : I // indices of backbones ϕ // a necessary condition to perform an iteration along a backbone// with an index in I after ~κ iterations Output : B // upper bounds on the number of iterations// along backbones with indices in I if ϕ [ κ i / | i ∈ I ] is not satisfiable then return { } B ←− ∅ foreach inequality P j ∈ J ⊇ I a j κ j < b implied by ϕ , where each a j is a positiveinteger and b is ~κ -free do B ←− B ∪ { max { , ⌈ b/ min { a i | i ∈ I }⌉}} return B computeBounds The procedure computeBounds of Algorithm 3 gets a set I of selected loop pathindices, and a necessary condition ϕ to perform an iteration along some looppath with an index in I (we talk about an iteration along I for short) after ~κ previous loop iterations. From this information, the procedure infers upperbounds on the number of loop iterations along I .We would like to find a tight upper bound, i.e. a ~κ -free symbolic expression B such that there exist some values of symbols (given by a valuation function v ) for which the necessary condition ϕ [ a/v ( a ) | a is a symbol ] to make anotheriteration along I is satisfiable whenever the number of finished iterations along I is less than B [ a/v ( a ) | a is a symbol ] and the same does not hold for theexpression B + 1 . An effective algorithm computing these tight bounds is aninteresting research topic itself.The presented procedure infers some bounds only for two special cases.Line 1 covers the case when even the first iteration along any loop path in I is not possible: the procedure then returns the bound .The other special case is the situation when the necessary condition impliesan inequality of the form P j ∈ J ⊇ I a j κ j < b , where each a j is a positive integerand b is ~κ -free. To detect these cases, we transform the necessary condition tothe conjunctive normal form and look for clauses that contain just one predicateand try to transfer the predicate into this form by basic arithmetical operationsknowing that ϕ holds. Each such inequality implies the following: X j ∈ J ⊇ I a j κ j < b = ⇒ X i ∈ I a i κ i < b = ⇒ min { a i | i ∈ I } · X i ∈ I κ i < b. Hence, P i ∈ I κ i < ⌈ b/ min { a i | i ∈ I }⌉ has to be satisfied to perform an-other iteration along I after ~κ previous iterations including P i ∈ I κ i iterationsalong I . As all path counters are non-negative integers, we derive the bound max { , ⌈ b/ min { a i | i ∈ I }⌉} on iterations along I . processLoop The procedure processLoop of Algorithm 4 gets a flowgraph Q representing thebody of a loop, i.e. each run of Q corresponds to one iteration of the original10 lgorithm 4: processLoop Input : Q // a flowgraph induced by a loop θ in // a symbolic memory when entering the loop ϕ in // a path condition when entering the loop Output : β loop // upper bounds for all edges in the loop θ out // symbolic memory after the loop Initialize β loop to return ∅ for each edge e of Q . ( { ( π , θ , ϕ ) , . . . , ( π k , θ k , ϕ k ) } , β inner ) ←− executeProgram( Q ) θ ~κ ←− computeSummary( { ( π , θ ) , . . . , ( π k , θ k ) } ) ϕ ~κi ←− ϕ in ∧ θ in h θ ~κ h ϕ i ii for each i ∈ { , . . . , k } β ~κ ( e ) ←− { θ in h θ ~κ h ρ ii | ρ ∈ β inner ( e ) } for each edge e of Q foreach edge e of Q do I ←− { i | e is on π i or on a loop along π i } B outer ←− computeBounds( I, W i ∈ I ϕ ~κi ) if ∈ B outer then β loop ( e ) ←− { } else foreach ρ inner ∈ β ~κ ( e ) do if ρ inner ≡ c where c is ~κ -free then β loop ( e ) ←− β loop ( e ) ∪ { c · ρ outer | ρ outer ∈ B outer } else if ρ inner ≡ max { c, b + P ki =1 a i κ i } where c, b and all a i are ~κ -free then J ←− { j | j / ∈ I ∧ a j = 0 } B J ←− computeBounds( J, W j ∈ J ϕ ~κj ) if B J = ∅ then b ′ ←− b + max { , a j | j ∈ J } · min B J a ←− max { a i | i ∈ I } foreach ρ outer ∈ B outer do β loop ( e ) ←− β loop ( e ) ∪ { P ρ outer − K =0 max { c, b ′ + a · K }} foreach edge e of Q do if an edge e ′ of Q such that β ~κ ( e ′ ) = ∅ is reachable from e in Q then β loop ( e ) ←− { ρ + ρ | ρ ∈ β loop ( e ) , ρ ∈ β ~κ ( e ) , and ρ is ~κ -free } θ out ( a ) ←− θ in h θ ~κ ( a ) i for each variable a Eliminate ~κ from θ out . return ( β loop , θ out ) loop. We symbolically execute Q using the recursive call of executeProgram atline 2. We obtain all loop paths π , . . . , π k of Q and bounds β inner on visits ofeach edge in the loop during any single iteration of the loop. For each π i , we alsoget symbolic memory θ i after one iteration along π i and a necessary condition ϕ i to perform this iteration. The procedure computeSummary produces theparametrized symbolic memory θ ~κ after ~κ iterations. Symbols a appearing in θ ~κ refer to variable values before the loop is entered. If we combine θ ~κ with thesymbolic memory before entering the loop θ in , we get the symbolic memory afterexecution of the code preceding the loop and ~κ iterations of the loop. We usethis combination to derive necessary conditions ϕ ~κi to perform another iterationalong π i and upper bounds β ~κ on visits of loop edges in the next iteration of11he loop.The foreach loop at line 6 computes upper bounds for all edges of theprocessed loop on visits during all its complete iterations (incomplete iterationswhen a run cycles in some nested loop forever are handled later). We alreadyhave the bounds β ~κ on visits in a single iteration after ~κ preceding iterations.For each edge e , we compute the set I of all loop path indices such that iterationsalong these loop paths can visit e . The computeBounds procedure at line 8 takes I and a necessary condition to perform an iteration along I after ~κ iterationsand computes bounds B outer on the number of iterations along I . If there is0 among these bounds, e cannot be visited by any complete iteration and thecomputation for e is over. Otherwise we try to compute some overall boundsfor each bound ρ inner on the visits of e during one iteration (after ~κ iterations)separately. If ρ inner is a ~κ -free expression (line 13), then it is constant in eachiteration and we simply multiply it with every bound on the number of iterationsalong I . The situation is more difficult if ρ inner contains some path counters.We can handle the frequent case when it has the form max { c, b + P ki =1 a i κ i } ,where a , . . . , a k , b, c are ~κ -free (see line 15 and note that this is the reason forkeeping the bounds simple). First we get rid of path counters κ j that havesome influence on this bound (i.e. a j = 0 ), but e cannot be visited by anyiteration along loop path π j . Let J be the set of indices of such path counters(line 16). We try to compute bounds B J on the number of iterations along J (line 17), which are also the bounds on P j ∈ J κ j . Note that if J = ∅ , wecall computeBounds( ∅ , false ) , which immediately returns { } . If we get somebounds in B J , we can overapproximate P ki =1 a i κ i as follows: k X i =1 a i κ i = X j ∈ J a j κ j + X i ∈ I a i κ i ≤≤ max { , a j | j ∈ J } · min B J + max { a i | i ∈ I } · X i ∈ I κ i Using the definitions of b ′ and a at lines 19–20, we overapproximate the bound ρ inner on visits of e during one iteration along I after ~κ loop iterations by ρ inner = max { c, b + k X i =1 a i κ i } ≤ max { c, b ′ + a · X i ∈ I κ i } . As K -th iteration along I is preceded by K − iterations along I , the edge e can be visited at most max { c, b ′ + a · ( K − } times during K -th iteration.For each bound ρ outer on the iterations along I , we can now compute the totalbound on visits of e as P ρ outer − K =0 max { c, b ′ + a · K } .Until now we have considered visits of loop edges during complete iterations.However, it may also happen that an iteration is started, but never finishedbecause the execution keeps looping forever in some nested loop. For example, inthe program while(x>0) { x:=x-1;while(true) {}} , we easily compute bound on the number of complete iterations of the outer loop and thus we assign bound to all loop edges at line 10. However, some edges of the loop are visited. Theseincomplete iterations are treated by the foreach loop at line 23. Whenever anedge e can be visited by an incomplete iteration (which is detected by existenceof some subsequent edge e ′ without any bound and thus potentially lying on12 nt nonzeros(int n, int* A) { int k=0;for (int i=0; i Looperman . It is built on top of the symbolic execution package Bugst [27]and it intensively uses the SMT solver Z3 [32].We compare Looperman with four state-of-the-art loop bound analysistools: Loopus [23], KoAT [7], PUBS [1], and Rank [3]. These freely availabletools were mutually compared before and results from their elaborate evaluationare also freely available [29, 30]. We run Looperman on the same benchmarksoriginally collected from literature. More precisely, we use the benchmarkstranslated to C programs by the authors of Loopus . We ignore programs withrecursive function calls as our tool does not support recursion. In order to makemanual inspection of benchmarks and results from their analysis manageable, wealso ignore all benchmarks associated with the termination proving tool T2 [6].At the end, we used 199 benchmarks. They are small C programs (ranging from7 to 451 lines of code, 26 lines in average) containing various kinds of loops.We focus on two kinds of bounds: asymptotic complexity bounds for wholeprograms and exact (meaning non-asymptotic) upper bounds for individual pro-gram loops. The comparison of exact bounds is restricted to Looperman and Loopus as the other tools use input in different format and (as far as we know)they do not provide mapping of their bounds to the original C code.15e took the results of KoAT , PUBS , and Rank from the mentioned exper-imental evaluation [29]. In order to obtain the exact bounds for individual loops(which are not publicly available), we run Loopus using a VirtualBox imageavailable on the tool homepage [31]. All experiments use 60 seconds timeout.While KoAT , PUBS , and Rank were executed on a computer with 6GB ofRAM and Intel i7 CPU clocked at 3.07 GHz, Loopus and Looperman runon a machine with 8GB of RAM and Intel i5 CPU clocked at 2.5GHz. We be-lieve that this difference is not substantial as all the tools usually either answervery quickly or they fail to determine any bound. The Looperman tool (bothsources and Windows binaries), the 199 benchmarks, and all measured data areavailable at [28]. First we focus on asymptotic complexity bounds for whole programs, whichcan be produced by all considered tools. Looperman derives them from theupper bounds on edges: each upper bound is transformed into an asymptoticone (e.g. max (0 , a + b − is transformed into O ( n ) ) and maximal asymptoticbound over edges is then taken.The results are represented in Table 1(a). For each tool, the count of all199 benchmarks is split into the first three columns. The columns ‘correct’and ‘incorrect’ represent counts of inferred correct and incorrect asymptoticbounds, respectively. The column ‘fail’ shows counts of benchmarks for whichthe computation of an asymptotic bound failed. The column ‘TO’ says howoften the fail is due to timeout.We see that Looperman placed the third after Loopus and KoAT inthe count of successfully computed correct asymptotic bounds. If we considerincorrect bounds, Looperman placed the first together with Loopus . Notethat the incorrect bounds are due to errors in implementation, the algorithmsfollowed by the tools are in principle sound.Table 1(b) compares results of Looperman (marked as L. ) relatively toeach other tool. We can see the number of cases where the other tool succeedsto infer a correct bound while Looperman fails, or vice versa. Further, we cansee the number of cases where both tools infer a correct bound and the boundof the other tool is tighter, less tight, or the same as the Looperman ’s bound.An importation observation from this table is that for each tool there arebenchmarks where the tool fails to compute a correct bound while Looperman succeeds. Now we compare accuracy of exact upper bounds for individual loops in bench-marks. Note that Looperman computes a loop bound as the sum of boundson the edges leading from a loop entry node into the loop.The benchmarks contain 313 loops. Table 2(a) provides for both tools thenumbers of inferred correct and incorrect loop bounds, and the number of loopsfor which no bound is inferred. Table 2(b) compares quality of the inferred loopbounds. It presents the number of loops where one tool produces a correct loopbounds while the other does not, the number of loops where one tool providedasymptotically tighter loop bound than the other, and the number of loops16orrect incorrect fail TO Looperman 129 0 70 5 Loopus 162 0 37 0 KoAT 140 2 57 22 PUBS 85 29 85 1 Rank 26 2 171 0(a) Loopus KoAT PUBS Rank succeeds, L. fails 38 36 13 7fails, L. succeeds 5 25 57 110tighter than L. 10 4 2 0less tight than L. Looperman ( L. ) and byother tools (b).where one tool produces a tighter bound than the other tool, but the differenceis not asymptotic (e.g. n versus n ). To complete the presented data, let usnote that both tools inferred exactly the same bound for 143 loops.The data in the Table 2(a) show the primary weakness of Looperman :It gives up too often. On the other hand, data in the Table 2(b) show that Looperman is able to compute tighter bounds than Loopus for interestingnumber of benchmarks.The results show that Loopus can infer bounds for slightly more loops than Looperman . However, there are also loops bounded by Looperman and notby Loopus . The biggest advantage of Looperman is definitely the tightness ofits bounds: Looperman found a tighter bound for 28% of 216 loops boundedby both tools, while Loopus found a tighter bound only for 6% of these loops. There are several popular approaches to the computation of upper bounds on thenumber of loop iterations. Notable are especially those based on constructionand solving recurrence equations [1, 2, 20, 5], loop iteration counters [5, 15], andranking functions [3, 7, 24, 23, 26]. Some tools utilize more than one analysisapproach.Recurrence equations play an important role in PUBS [1]. It uses inputgenerated from Java bytecode and computes upper bounds as a non-recursiverepresentation of a given set of recurrent equations obtained from other anal-ysers, like [2]. Another representative work is r-TuBound [20]. It rewrites Looperman calls the SMT solver Z3 to decide satisfiability of formulae from an undecid-able logic. However, none of the fails are due to Z3 as it is able to solve all queries comingfrom the benchmarks. Looperman 227 0 86 Loopus 267 3 43(a) Looperman Loopus succeeds, the other fails 11 51asymptotically tighter 16 11tighter (not asymptotically) 44 2(b)Table 2: Numbers of correct/incorrect/unknown loop bounds for Looperman and Loopus (a) and comparison loop bound quality (b).multi-path loops into single-path ones using SMT reasoning. Obtained loopsare then translated into a set of recurrence equations over program variables.Upper bounds are solutions of the equations. This is done by a pattern-basedrecurrence solving algorithm. The analysis implemented in ABC [5] combinesthe use of recurrence equations and iteration counters. Authors focus on nestedloops there. An inner loop is instrumented by an artificial variable increasedby one in each iteration (the counter). Recurrence equations over this variableand regular variables of the loop are constructed and solved (to get their non-recurrent versions). Bounds on the artificial variable are obtained by replacingregular variables by their greatest values in the computed equations. An ad-vanced use of counters can be found in SPEED [15]. Counters are artificialvariables instrumented into the analysed program initialised to 0 and incre-mented by one on back-edges of program loops. A linear invariant generationtool is then used to compute bounds on the counter variables at appropriatelocations.In our approach we also use recurrent equations. They are constructed dur-ing symbolic execution and solved in the procedure computeSummary as func-tions of path counters. Nevertheless, our computation is simpler than in theother approaches. And the purpose of equations and their solutions is also dif-ferent. We use solutions in the process of construction of necessary conditions,whose combinations are latter used for inference of upper bounds.Similarly to [5] and [15], we also use counters. But in contrast to [5], we in-troduce a path counter for each acyclic path in the loop instead of one counterfor the whole loop. Our approach is thus more related to [15], where coun-ters can also be associated to individual paths (namely, to back-edges of thosepaths). Nevertheless, we do not perform instrumentation of the counter. An-other important difference is that we do not compute bounds on path countersexplicitly, while [15] computes them using an invariant generation tool.A ranking function was originally used for the termination proving [9, 6, 22,13, 12]. The ranking function is usually defined by assigning an expression overprogram variables to each program location such that the expression valuesare always non-negative and decreasing between every two subsequent visitsof the particular location. It is apparent that a ranking function can be alsoused for a computation of upper bounds on the number of iterations of program18oops. Rank [3] reuses results from the termination analysis of a given program,namely an inferred (global) ranking function, to get an asymptotic upper boundon length of all possible executions of that program. KoAT [7] employs a similarapproach, but it uses ranking functions of already processed loops to computebounds on values of program variables. Absolute values of these bounds may inturn improve ranking functions of subsequent loops. Loopus [26] transforms ananalysed program in particular locations such that program variables appearingthere represent ranking functions. There are several heuristics used for thispurpose. Program loops are then summarised which is done per individual looppaths. Moreover, a contextualization technique is used to infer which loop pathscan be executed in a sequence. The approach was further improved in severaldirections. Method in [23] merges nested loops in order to compute bounds forprograms, where an inner loop affects termination of the outer loop. A techniquefor computing sizes of variables after loops is introduced in [24].Our algorithm does not use ranking functions. However, the passing ofinformation from a preceding to a subsequent loop we see in [7] or [24] happensalso in our approach. While [7] and [24] implement this via ranking functions,in our algorithm it is available due to the use of the symbolic execution. Theloop summarization per individual loop paths presented in [26] is similar tosummarization we perform in the procedure computeSummary . However, while[26] computes summary as a transitive hull expressed in the domain of a size-change abstraction, we compute precise symbolic values of variables after theloop. On the other hand, we do not consider what loop paths can be executedin a sequence. And finally, in comparison to [23], we do not merge nested loops.There are other important techniques computing upper bounds, which are,however, less related to our work. For instance, SWEET [17] uses abstractinterpretation [10] to derive bounds on values of program variables. The ap-proach further uses a pattern matching for easy and fast processing of loopswith a recognized structure. In [16] authors propose an analysis which is able tocompute an upper bound on the number of visits of a given program location.The program is transformed such that its execution starts at the location andcorrectly terminates once the execution reaches the location again. Loops in thetransformed program are summarised using an abstract interpretation based it-erative algorithm into disjunctive invariants. Upper bounds are computed fromthe invariants using a non-iterative proof-rules based technique.Another interesting approach for computation of worst-case complexity, called WISE [8], attempts to compute an input for which the execution of a given pro-gram is the longest. From symbolic executions of all paths up to a given maximalinput size there is inferred a recipe (branch policy generators) which then re-stricts symbolic execution of all remaining program paths (for unrestricted inputsize) to the longest ones. The worst-case execution input is then generated fromthe longest executed path (from its path condition using an SMT solver).There are further approaches focusing on declarative languages. For exam-ple, resource aware ML [18] computes an amortized complexity for recursivefunctional programs with inductive data types without a consideration of upperbounds dependent on integer values. We can further find numerous techniquesfor complexity analysis of term rewriting and logic programming [4, 14, 11, 21].19 Future Work According to results from the evaluation we will primarily focus on improvingscalability of the proposed approach. We have already analysed problematicbenchmarks and we have uncovered a couple of promising directions for im-provements. Here we present three of them.1. The loop condition directly implies only inequalities which are non-linearwith respect to the path counters. For example, the analysis of the loop while(x*x We presented an algorithm computing upper bounds for execution counts ofindividual instructions of an analysed program during any program run. The20lgorithm is based on symbolic execution and the concept of path counters.The upper bounds are parametrized by input values of the analysed program.Evaluation of our experimental tool Looperman shows that our approach isslightly less robust than the leading loop bound analysis tools Loopus and KoAT (i.e. it infers a bound in less cases). On the positive side, the loopbounds detected by Looperman are often tighter than these found by othertools, which may be a crucial advantage in some applications including the worstcase execution time (WCET) analysis. References [1] E. Albert, P. Arenas, S. Genaim, and G. Puebla. Automatic inferenceof upper bounds for recurrence relations in cost analysis. In Proceedingsof the 15th International Symposium on Static Analysis , Lecture Notes inComputer Science, pages 221–237. Springer-Verlag, 2008.[2] E. Albert, P. Arenas, S. Genaim, G. Puebla, and D. Zanardini. Costanalysis of java bytecode. In Programming Languages and Systems , volume4421 of Lecture Notes in Computer Science , pages 157–172. Springer BerlinHeidelberg, 2007.[3] C. Alias, A. Darte, P. Feautrier, and L. Gonnord. Multi-dimensional rank-ings, program termination, and complexity bounds of flowchart programs.In Proceedings of the 17th International Conference on Static Analysis ,Lecture Notes in Computer Science, pages 117–133. Springer-Verlag, 2010.[4] M. Avanzini and G. Moser. A combination framework for complexity. In Proceedings of 24th International Conference on Rewriting Techniques andApplications , volume 21 of Leibniz International Proceedings in Informat-ics , pages 55–70, 2013.[5] R. Blanc, T. A. Henzinger, T. Hottelier, and L. Kovács. ABC: Algebraicbound computation for loops. In E. M. Clarke and A. Voronkov, editors, Proceedings of the 16th International Conference on Logic for Program-ming, Artificial Intelligence, and Reasoning , volume 6355 of Lecture Notesin Computer Science , pages 103–118. Springer-Verlag, 2010.[6] M. Brockschmidt, B. Cook, and C. Fuhs. Better termination provingthrough cooperation. In Computer Aided Verification , volume 8044 of Lec-ture Notes in Computer Science , pages 413–429. Springer Berlin Heidel-berg, 2013.[7] M. Brockschmidt, F. Emmes, S. Falke, C. Fuhs, and J. Giesl. Alternatingruntime and size complexity analysis of integer programs. In Tools andAlgorithms for the Construction and Analysis of Systems , volume 8413 of Lecture Notes in Computer Science , pages 140–155. Springer Berlin Hei-delberg, 2014.[8] J. Burnim, S. Juvekar, and K. Sen. WISE: Automated test generation forworst-case complexity. In Proceedings of the 31st International Conferenceon Software Engineering , pages 463–473. IEEE Computer Society, 2009.219] B. Cook, A. Podelski, and A. Rybalchenko. TERMINATOR: Beyond safety.In Proceedings of the 18th International Conference on Computer AidedVerification , volume 4144 of Lecture Notes in Computer Science , pages415–418. Springer-Verlag, 2006.[10] P. Cousot and R. Cousot. Abstract interpretation: a unified lattice modelfor static analysis of programs by construction or approximation of fix-points. In Conference Record of the Fourth Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages , pages 238–252. ACM Press, New York, NY, 1977.[11] S. K. Debray and N.-W. Lin. Cost analysis of logic programs. ACM Trans-actions on Programming Languages and Systems, TOPLAS’93 , 15(5):826–875, Nov. 1993.[12] S. Falke, D. Kapur, and C. Sinz. Termination analysis of imperative pro-grams using bitvector arithmetic. In Proceedings of the 4th InternationalConference on Verified Software: Theories, Tools, Experiments , LectureNotes in Computer Science, pages 261–277. Springer-Verlag, 2012.[13] C. Fuhs, J. Giesl, M. Plücker, P. Schneider-Kamp, and S. Falke. Provingtermination of integer term rewriting. In Rewriting Techniques and Appli-cations , volume 5595 of Lecture Notes in Computer Science , pages 32–47.Springer Berlin Heidelberg, 2009.[14] J. Giesl, T. Ströder, P. Schneider-Kamp, F. Emmes, and C. Fuhs. Symbolicevaluation graphs and term rewriting: A general methodology for analyzinglogic programs. In Proceedings of the 14th Symposium on Principles andPractice of Declarative Programming , Lecture Notes in Computer Science,pages 1–12. ACM, 2012.[15] S. Gulwani, K. K. Mehra, and T. Chilimbi. SPEED: Precise and efficientstatic estimation of program computational complexity. In Proceedings ofthe 36th Annual ACM SIGPLAN-SIGACT Symposium on Principles ofProgramming Languages , pages 127–139. ACM, 2009.[16] S. Gulwani and F. Zuleger. The reachability-bound problem. In Proceedingsof the 31st ACM SIGPLAN Conference on Programming Language Designand Implementation , pages 292–304. ACM, 2010.[17] J. Gustafsson, A. Ermedahl, C. Sandberg, and B. Lisper. Automatic deriva-tion of loop bounds and infeasible paths for WCET analysis using abstractexecution. In Real-Time Systems Symposium, 2006. RTSS ’06. 27th IEEEInternational , pages 57–66, 2006.[18] J. Hoffmann, K. Aehlig, and M. Hofmann. Multivariate amortized re-source analysis. ACM Transactions on Programming Languages and Sys-tems , 34(3):14:1–14:62, 2012.[19] J. C. King. Symbolic execution and program testing. Communications ofthe ACM , 19(7):385–394, 1976. 2220] J. Knoop, L. Kovács, and J. Zwirchmayr. Symbolic loop bound computa-tion for WCET analysis. In Proceedings of the 8th International Conferenceon Perspectives of System Informatics , Lecture Notes in Computer Science,pages 227–242. Springer-Verlag, 2012.[21] J. Navas, E. Mera, P. López-García, and M. Hermenegildo. User-definableresource bounds analysis for logic programs. In Logic Programming , volume4670 of Lecture Notes in Computer Science , pages 348–363. Springer BerlinHeidelberg, 2007.[22] A. Podelski and A. Rybalchenko. ARMC: The logical choice for softwaremodel checking with abstraction refinement. In Proceedings of the 9thInternational Conference on Practical Aspects of Declarative Languages ,Lecture Notes in Computer Science, pages 245–259. Springer-Verlag, 2007.[23] M. Sinn, F. Zuleger, and H. Veith. A simple and scalable static analysis forbound analysis and amortized complexity analysis. In Proceedings of the16th International Conference on Computer Aided Verification - Volume8559 , Lecture Notes in Computer Science, pages 745–761. Springer-VerlagNew York, Inc., 2014.[24] M. Sinn, F. Zuleger, and H. Veith. Difference constraints: An adequateabstraction for complexity analysis of imperative programs. In Proceedingsof the Formal Methods in Computer-Aided Design , pages 144–151, 2015.[25] J. Strejček and M. Trtík. Abstracting path conditions. In Proceedings ofthe 2012 International Symposium on Software Testing and Analysis , pages155–165. ACM, 2012.[26] F. Zuleger, S. Gulwani, M. Sinn, and H. Veith. Bound analysis of imperativeprograms with the size-change abstraction. In Proceedings of the 18th in-ternational conference on Static analysis , pages 280–297. Springer-Verlag,2011.[27] Bugst . http://sourceforge.net/projects/bugst/ .[28] Looperman, benchmarks, and evaluation . https://sourceforge.net/projects/bugst/files/Looperman/1.0.0/ .[29] Evaluation results of KoAT, PUBS, and RANK.http://aprove.informatik.rwth-aachen.de/eval/IntegerComplexity .[30] Loopus: Comparison to the tools KoAT, PUBS and Rank.http://forsyte.at/static/people/sinn/loopus/CAV14/index.html .[31] Loopus . http://forsyte.at/software/loopus/ version from February 2,2015.[32] Z3 . https://github.com/Z3Prover/z3https://github.com/Z3Prover/z3