From Big-Step to Small-Step Semantics and Back with Interpreter Specialisation
John P. Gallagher, Manuel Hermenegildo, Bishoksan Kafle, Maximiliano Klemen, Pedro López García, José Morales
LL. Fribourg and M. Heizmann (Eds.): VPT/HCVS 2020EPTCS 320, 2020, pp. 50–64, doi:10.4204/EPTCS.320.4 © Gallagher, Hermenegildo, Kafle, Klemen,L´opez Garc´ıa and MoralesThis work is licensed under theCreative Commons Attribution License.
From Big-Step to Small-Step Semantics and Back withInterpreter Specialisation
John P. Gallagher *Roskilde University, DenmarkIMDEA Software Institute, Spain
Manuel Hermenegildo
IMDEA Software Institute, Spain
Bishoksan Kafle
IMDEA Software Institute, Spain
Maximiliano Klemen
IMDEA Software Institute, Spain
Pedro L´opez Garc´ıa
IMDEA Software Institute, Spain
Jos´e Morales
IMDEA Software Institute, Spain
Abstract.
We investigate representations of imperative programs as constrained Horn clauses. Start-ing from operational semantics transition rules, we proceed by writing interpreters as constrainedHorn clause programs directly encoding the rules. We then specialise an interpreter with respect to agiven source program to achieve a compilation of the source language to Horn clauses (an instanceof the first Futamura projection). The process is described in detail for an interpreter for a subset ofC, directly encoding the rules of big-step operational semantics for C. A similar translation based onsmall-step semantics could be carried out, but we show an approach to obtaining a small-step repre-sentation using a linear interpreter for big-step Horn clauses. This interpreter is again specialised toachieve the translation from big-step to small-step style. The linear small-step program can be trans-formed back to a big-step non-linear program using a third interpreter. A regular path expression iscomputed for the linear program using Tarjan’s algorithm, and this regular expression then guides aninterpreter to compute a program path. The transformation is realised by specialisation of the pathinterpreter. In all of the transformation phases, we use an established partial evaluator and exploitstandard logic program transformation to remove redundant data structures and arguments in predi-cates and rename predicates to make clear their link to statements in the original source program.
The operational semantics of a program defines the execution of the program as a run in a transitionsystem. The rules of the transition system usually follow one of two styles, which are called naturalsemantics and structural operational semantics [32]. Both styles have roots in to the early history of pro-gramming languages but were formally presented later; natural semantics (NS) was explicitly proposedby Kahn in the 1980s [22] and used to specify programming languages and type systems [6, 7]; the namewas chosen to indicate an analogy with natural deduction. Structural operational semantics (SOS) wasformulated by Plotkin in 1981 [35, 37], who later wrote an account of the origins of SOS [36].In both NS and SOS approaches an imperative program P defines a relation σ (cid:104) P (cid:105) σ (cid:48) where σ , σ (cid:48) stand for the program state before and after execution of P respectively. This relation, closely related to aHoare triple [17], can be formally specified by transition systems in different ways. In NS, transitions areof the form (cid:104) s , σ (cid:105) ⇒ σ (cid:48) , where s is a program statement, σ , σ (cid:48) are states, and the transition means that s iscompletely executed in state σ , terminating in final state σ (cid:48) . Thus NS is often called big-step semanticssince the transition for a statement goes from the initial to the final state. By contrast, in SOS, oftencalled small-step semantics, a transition has the form (cid:104) s , σ (cid:105) ⇒ (cid:104) s (cid:48) , σ (cid:48) (cid:105) , which defines a single step thatmoves from s in state σ to the next statement s (cid:48) and next state σ (cid:48) . We also have a transition (cid:104) s , σ (cid:105) ⇒ σ (cid:48) * Email. [email protected] allagher, Hermenegildo, Kafle, Klemen,L´opez Garc´ıa and Morales (cid:104) s , σ (cid:105) ⇒ σ (cid:48) (cid:104) s , σ (cid:48) (cid:105) ⇒ σ (cid:48)(cid:48) (cid:104) s ; s , σ (cid:105) ⇒ σ (cid:48)(cid:48) (cid:104) s , σ (cid:105) ⇒ σ (cid:48) (cid:104) while ( b ) s , σ (cid:48) (cid:105) ⇒ σ (cid:48)(cid:48) (cid:104) while ( b ) s , σ (cid:105) ⇒ σ (cid:48)(cid:48) if b is true in σ (cid:104) while ( b ) s , σ (cid:105) ⇒ σ if b is false in σ Figure 1: Examples of big-step rules for s ; s (left) and while ( b ) s (right). (cid:104) s , σ (cid:105) ⇒ (cid:104) s (cid:48) , σ (cid:48) (cid:105)(cid:104) s ; s , σ (cid:105) ⇒ (cid:104) s (cid:48) ; s , σ (cid:48) (cid:105) (cid:104) while ( b ) s , σ (cid:105) ⇒ (cid:104) s ; while ( b ) s , σ (cid:105) if b is true in σ (cid:104) s , σ (cid:105) ⇒ σ (cid:48) (cid:104) s ; s , σ (cid:105) ⇒ (cid:104) s , σ (cid:48) (cid:105) (cid:104) while ( b ) s , σ (cid:105) ⇒ σ if b is false in σ Figure 2: Examples of small-step rules for s ; s (left) and while ( b ) s (right).for the case that s terminates in one step. A computation is defined as a chain of small steps. We will usethe nicknames big-step and small-step for the rest of the paper, since they capture the essential differencebetween NS and SOS. The rules for big-step transitions follow the syntactic structure of statements, andthe transition for a statement is defined in terms of transitions of its immediate components. Consider forexample rules for transitions for the statements s ; s and while ( b ) s shown in Figure 1. (Somewhat moreelaborate and realistic rules are used in Section 3). A rule is read as stating that the transition below theline holds if the conditions above the line, along with any side-conditions, hold.Transitions in small-step semantics define individual computation steps from one statement-state pairto the next. Consider again the statements s ; s and while ( b ) s , the small-step rules for which are shownin Figure 2.A complete computation for the execution of a program P with initial state σ is constructed usingbig-step semantics by finding a derivation, using big-step rules, of the transition (cid:104) P , σ (cid:105) ⇒ σ n , where σ n isthe final state. On the other hand, in the small-step semantics, execution of P is modelled by constructinga run of the form (where s = P ): (cid:104) s , σ (cid:105) ⇒ (cid:104) s , σ (cid:105) ⇒ (cid:104) s , σ (cid:105) ⇒ · · · ⇒ σ n where each step (cid:104) s i , σ i (cid:105) ⇒ (cid:104) s i + , σ i + (cid:105) and the final step (cid:104) s n − , σ n − (cid:105) ⇒ σ n is derivable from the small-step transition rules. This run is also denoted (cid:104) s , σ (cid:105) ⇒ ∗ σ n . The transition rules of operational semantics, both big- and small-step, such as those exemplified above,have the following form. α . . . α k α if θ where α , α , . . . , α k are atomic statements about transitions, and θ is a side-condition, which is assumedto consist of simple guards or subsidiary operations evaluated as part of the rule. Such a rule is read as an2 Big-step to small-step semantics and back implication ∀ ( θ ∧ α ∧ . . . ∧ α k → α ) , which is a Horn clause. The close correspondence between big-step rules and Horn clauses was noted by Kahn [22]. For instance, the first big-step rule for s ; while ( b ) s shown in Figure 1, is written as the following Horn clause. ∀ b , s , σ , σ (cid:48) , σ (cid:48)(cid:48) . ( b is true in σ ∧ (cid:104) s , σ (cid:105) ⇒ σ (cid:48) ∧ (cid:104) while ( b ) s , σ (cid:48) (cid:105) ⇒ σ (cid:48)(cid:48) → (cid:104) while ( b ) s , σ (cid:105) ⇒ σ (cid:48)(cid:48) ) Naturally, the correspondence between transition rules and Horn clauses assumes that the statements andstates are represented as first-order terms. A front end parser generates a suitable term representationfor the source program (an abstract syntax tree) while program states are represented (for example) by alist or tree data structures in which the values of program variables are recorded. We will see a detailedexample of program and state representation in Section 3. We will also write atomic statements usingtypical predicate-argument form, when writing rules as Horn clauses. For instance the rule above mightbe written as follows (omitting the universal quantifier, writing the implication θ ∧ α ∧ . . . ∧ α k → α as α : − θ ∧ α ∧ . . . ∧ α k , and using comma in place of the conjunction symbol). transition( while (B, S), St, St2) :-eval(B, St, true),transition(S, St, St1),transition( while (B, S), St1, St2). The correspondence between natural semantics and Horn clauses was identified and used in the Ty-pol system [6] to implement executable versions of semantic specifications as logic programs. Theset of Horn clauses derived from the transition rules, together with definitions for the subsidiary pred-icates, constitutes an executable interpreter in a logic programming system. In our setting, given aprogram p and an initial state s0 (in some suitable representation as variable-free terms), the query transition(p,s0,X) can be run to compute the state X resulting from executing p in state s0 .Executability is not our main goal, however; the interpreter is a step in the procedure to transform Cprograms to Horn clauses, but it is useful to be able to test the interpreter by executing programs.Using such an interpreter, for either big-step or small-step transitions, a translation from the imper-ative source code to Horn clauses is achieved using partial evaluation . This is an instance of the firstFutamura projection [8]; an interpreter specialised (by partial evaluation) with respect to a source pro-gram can be seen as a compilation of the source program into the language of the interpreter, which inour case is Horn clauses.In Section 3 we will describe an interpreter for a subset of C, based directly on big-step semantics.An interpreter could be written based on small-semantics in a similar way. Consider a semantics for a subset of C, with the abstract syntax shown in Figure 3. We use a C parser toread a source program and produce an abstract syntax tree (AST) (see example in Figure 4). A programconsists of a list of function declarations and global variable declarations. There is exactly one functioncalled main , a call to which is the entry to the program.The complete interpreter can be found in [9](bigstep.pl). It can be run in Ciao Prolog [15] and inother Prolog systems with minor modifications. The most important predicates are the following. eval(Expr, St0, St1, V, Env)solve(S, St0, St1, Ret, Env)
In both predicates, the argument
Env is the function environment, which is the list of global variable andfunction declarations. This remains constant during program execution. The arguments
St0 and
St1 allagher, Hermenegildo, Kafle, Klemen,L´opez Garc´ıa and Morales program → decl ∗ decl → vardecl | fdeclfdecl → function ( id , vardecl ∗ , stm ) var → var ( id ) value → null | nexpr → var ( expr ) | cns ( num ) | add ( expr , expr ) | call ( id , expr ∗ ) | div ( expr , expr ) | logicaland ( expr , expr ) | mul ( expr , expr ) | sub ( expr , expr ) | not ( expr ) | expr < expr | expr ≤ expr | expr == expr | expr > expr | expr ≥ exprstm → skip | ret ( expr ) | ret | asg ( var ( id ) , expr ) | block ( localdecl ∗ , stm ) | call ( id , expr ∗ ) | seq ( stm , stm ) | while ( expr , stm ) | ifthenelse ( expr , stm , stm ) | let ( var , expr , stm ) | for ( stm , expr , stm , stm ) vardecl → vardecl ( var , type , expr ) localdecl → decl ( var , expr ) type → int num → nat ( n ) Figure 3: Abstract syntax of source programs.represent memory states. They are lists of elements of the form (V,Val) where V is a program sourcevariable name, and Val is its value in the state (an integer).The predicate eval(Expr,St0,St1,V,Env) means that expression
Expr is evaluated in state
St0 and yields value V and updated state St1 . (Evaluation of an expression can cause a state change,if a function is called within the expression). The predicate solve(S,St0,St1,Ret,Env) meansthat the statement S is executed in initial state St0 , and terminates with final state
St1 and statementoutcome
Ret (this construct is inspired by the big-step semantics for the Clight subset of the C languageby Blazy and Leroy [2], and can be used in future extensions of the interpreter for handling break,continue and return statements).Utility predicates in the interpreter include those for retrieving and updating values in the state, forextending the state when local variables are introduced in blocks, and for matching function parameters.
Given the representation of a program, namely a set of declarations of functions and global variables env ,a partial evaluator is used to specialise the interpreter. The call to the interpreter is solve(call(main, args ), st , St1, Ret, env ) where args are the arguments of the main function in env , and st env . We use the partial evaluatorLogen [25, 26]. This is an offline partial evaluator, which means that the program to be specialised (thebig-step interpreter) is first annotated to indicate which parts will be evaluated and which parts will beretained in the specialised program. For the interpreter, we annotate all calls to solve as being retainedin the specialised program, and all other calls except operations on undefined values from the state areevaluated.Logen also requires a filter to be defined for each retained predicate (in our case solve ). The filter4 Big-step to small-step semantics and back int n; void f( int n){ int x,a;x=n;a=1; while (x>0){x--;{ int y;y=a; while (y>0){y--;a++;}}}} void main() {f(n);} [[vardecl(var(n), int ,null)], function (f,[[vardecl(var(n), int ,null)]], let (var(x), null, let (var(a), null, seq ( asg (var(x),var(n)), seq ( asg (var(a),cns(nat(1))), while (var(x)>cns(nat(0)), seq ( asg (var(x),sub(var(x),cns(nat(1)))), let (var(y), null, seq ( asg (var(y),var(a)), while (var(y)>cns(nat(0)), seq ( asg (var(y),sub(var(y),cns(nat(1)))), asg (var(a),add(var(a),cns(nat(1))))))))))))))), function (main, [], call(f,[var(n)]) )] Figure 4: Source program (left) and its abstract syntax tree (right).declares which parts of a call to the predicate are known and unknown during partial evaluation. Thisinformation is used both to rename specialised predicates, and to abstract away parts of calls declared tobe unknown. The filter declaration for solve is as follows; it uses a binding type called store and thestandard binding types static (known) and dynamic (unknown). :- type store--->(type list(struct(',',[static,dynamic]))).:- filter solve(static,(type store),(type store),dynamic, static).
The binding type store declares lists of pairs of binding type (static,dynamic) . This describesprogram states containing variable-value pairs in which the variable name is known but the value isunknown. The filter declaration for solve thus states that the statement and environment argumentsare known, while the state arguments consist of lists describing the unknown values of a set of knownvariables. The statement outcome argument is unknown. More information on binding types and filterscan be found in the references to Logen.
Logen requires the program to be annotated such that calls that arise during partial evaluation generateonly a finite number of different values of the known parts of the filter declaration. This guaranteesthat only a finite number of different calls arise and partial evaluation terminates. For each call, a re-named version is generated, whose arguments consist only of the dynamic parts of the call. In the caseof solve , the arguments of specialised versions are thus the values of variables in the state, and thestatement outcome. allagher, Hermenegildo, Kafle, Klemen,L´opez Garc´ıa and Morales solve for the program in Figure 4 is renamed as follows. solve(call(main,[]),[(n,B)],[(n,C)],A,[[vardecl(var(n), int ,null)], function (f,[[vardecl(var(n), int ,null)]], let (var(x),null, let (var(a),null, seq ( asg (var(x),var(n)), seq ( asg (var(a),cns(nat(1))), while (var(x)>cns(nat(0)), seq ( asg (var(x),sub(var(x),cns(nat(1)))), let (var(y),null, seq ( asg (var(y),var(a)), while (var(y)>cns(nat(0)), seq ( asg (var(y),sub(var(y),cns(nat(1)))), asg (var(a),add(var(a),cns(nat(1))))))))))))))), function (main,[],call(f,[var(n)]))]) = ⇒ solve__1(C,B,A). The renamed predicate solve__1(C,B,A) has three arguments corresponding to the dynamic valuesin the call.
The specialised interpreter produced by Logen contains predicates of the form solve__n(...) . Inorder to generate more informative predicate names, a post-processing step is performed on the outputof Logen. Using the renaming table (which can optionally be output by Logen), it is possible to retrievethe first argument (that is, the statement) of the call for which solve__n(...) is a renamed version.We then use the statement type to construct a new name, reusing the index n so as to ensure a uniquepredicate name. In this way, the predicate solve__1 shown above is renamed to main__1 , and a callof the form solve__n(....) where solve__n is a renaming of solve(while(...), ...) is renamed as while__n , and so on.A further post-processing step is performed, unfolding calls to basic statements such as assignments,statement sequences (semicolon) and local variable declarations. The predicates generated by Logen, with filter described above, represent big steps and have the form solve__n( x , . . . , x n , x (cid:48) , . . . , x (cid:48) n , r ) , where x , . . . , x n are the values of the variables in the input state, x (cid:48) , . . . , x (cid:48) n are the values of the variables in the output state and r is the statement outcome value. However,a statement typically only affects some variables of the state. The values of the unaffected variables arejust “passed through” from initial to final state. That is, x i = x (cid:48) i holds for the variables that are unaffected.For instance, for the statement if (x > y) x=x+1; else y=y-1; where x , y , z and w are thevariables in scope, then the big-step predicate would be solve__n(X,Y,Z,W,X’,Y’,Z’,W’,R) with definition: solve__n(X,Y,Z,W,X',Y',Z',W',R) :- X < Y, X'=X+1, Y'=Y,Z'=Z,W'=W.solve__n(X,Y,Z,W,X',Y',Z',W',R) :- X ≥ Y, X'=X, Y'=Y-1,Z'=Z,W'=W.
Thus only the values of x and y are affected by the statement, while the values of z and w pass throughthe transition unchanged. Furthermore, the value of R is not assigned.We apply known logic program analysis and transformation techniques to eliminate some variablesfrom predicates. Firstly we use an abstract interpretation to produce a strengthened or more specific program [30, 19]. In the example above, we can prove that the constraints Z’=Z,W’=W can safely beconjoined to all calls to solve__n and the arguments
Z’,W’ removed from the call, resulting in theequivalent call solve__n(X,Y,Z,W,X’,Y’,R),Z’=Z,W’=W . .Having strengthened the clauses, we apply the redundant argument filtering algorithms [27] to re-move arguments from predicates. This technique was previously used to improve the result of interpreterspecialisation [14], where the relationship between argument filtering and liveness analysis was shown.6
Big-step to small-step semantics and back
The removal of the redundant variables from the transitions simplifies the clauses, makes them morereadable and can reduce the complexity of analyses, since the complexity of some constraint operationsis affected by the number of variables in the constraints.
Example 1
Let the source program be the program in Figure 4. Using a big-step semantics interpreter,we obtain the following clauses by partial evaluation, followed by predicate renaming and redundantargument removal as described above. main__1(A) :-f__2(A).f__2(A) :-C is A, D is C, E is 1,while__9(D,E,F,G).while__9(A,B,D,E) :-A>0, G is A-1, H is B,while__18(B,H,I,J),while__9(G,I,D,E).while__9(A,B,A,B) :-A=<0.while__18(A,B,E,F) :-B>0, H is B-1, I is A+1,while__18(I,H,E,F).while__18(A,B,A,B) :-B=<0.
A feature of clauses generated from the big-step interpreter is that the source program’s statement nestingis preserved. Thus the inner while loop corresponds to while__18 , which is called from within theouter while loop represented by while__9 . Also, note that although variables x , y , a and n are all inscope when the inner loop of the program is reached, the predicate while__18 has only 4 arguments,a reduction from the 9 arguments that are generated from the interpreter (4 variables each for input andoutput states, plus the statement outcome). In summary, specialisation of the big-step interpreter yields a translation from the source program toHorn clauses where there is a clear relationship between the predicates and the source code, and wherethe predicate arguments contain only the values affected by the statement represented by the predicate,rather than the whole state as is the case with some other translations. The whole process, including theparsing of the input C program, has been automated.Correctness of the translation follows from the correctness of both the interpreter and the specialiser.The interpreter can be obtained directly from a formal definition of the imperative language semantics asdescribed in Section 2, thus giving confidence in its correctness. In our experiments we use an establishedtool for specialisation of Horn clauses, namely the Logen partial evaluator, whose correctness followsfrom established theory of fold-unfold transformations applied to Horn clauses [34]. In addition we haveapplied semantics-preserving logic program transformations whose correctness has been proved.
A similar procedure could be followed to translate imperative programs into constrained Horn clauses,specialising an interpreter for small-step semantics. Such an interpreter and translation was described in[33] and [14]. allagher, Hermenegildo, Kafle, Klemen,L´opez Garc´ıa and Morales solve([A|As]) :-clpClause(A,B),solveConstraints(B,B1),append(B1,As,As1),solve(As1). solve([A|As]) :-constraint(A),call(A),solve(As).solve([]). Figure 5: Clauses from a linear interpreter.Small-step semantics is associated with linearity ; when we specialise an interpreter for the rulesof small-step semantics, we expect to get linear clauses, which are Horn clauses consisting of at mostone non-constraint body atom. The clauses would have the form p ( ¯ x ) : − θ ( ¯ x , ¯ y ) , q ( ¯ y ) representing onecomputation step, where ¯ x and ¯ y are the values of state variables before and after the step, and θ ( ¯ x , ¯ y ) isa constraint relating the values. The final transition is a clause p ( ¯ x ) : − θ ( ¯ x ) .The clauses obtained from specialising the big-step interpreter are not in general linear, a fact illus-trated by the clauses in Example 1. In this section, we show how to linearise big-step clauses, obtainingclauses that correspond directly to a small-step semantics for the program. This is an alternative to writ-ing small-step semantics for the source language. In other words, we write just one semantics, namelybig-step semantics, and then automatically obtain translations into Horn clauses corresponding to big-and small-step semantics. Linear resolution with a fixed selection rule is known to be a complete proof method for Horn clauses[28]. This can be thought of as providing a small-step semantics for Horn clauses. We proceed as follows:we write a linear resolution interpreter for Horn clauses and then specialise it using Logen, with similarpost-processing operations as described in Section 3.Figure 5 shows the main clauses from a linear interpreter. We assume that the clauses being inter-preted are included as facts in the interpreter of the form clpClause(A,B) , where A is the clausehead and B is the body. The predicate solve([A|As]) has as argument a list of atoms representing aconjunction to be proved. One step in the interpreter consists of picking the leftmost atom A , and eithersolving it if A is a constraint, or else resolving A with the head of a clause, appending the body of thatclause to the remaining atoms As . The computation is completed when the argument of solve is empty.The full linear interpreter can be found in [9](linearSolve.pl). Compared with the clauses in Figure 5,it contains an additional mechanism to encode the argument of solve in a way that records repeatedvariables. Much of the interpreter is concerned with this encoding and decoding. This is only for thepurpose of improving the output of Logen, and is in fact rendered unnecessary by subsequent processingto eliminate redundant arguments, as described in Section 3. We proceed to translate an arbitrary set of clauses P into linear form by specialising the linear interpreterwith respect to P . The translation to linear form has previously been exploited in logic programming [5].In specialising the interpreter, a key decision is how to define the filter for solve . If we can determinein advance that the length of conjunctions is bounded (and thus the length of the argument to solve isbounded), then the filter is defined so that only the arguments of atoms in the conjunction are dynamic.If the size of conjunctions is bounded, this is sufficient to ensure that only a finite number of distinct callsto solve arise, and partial evaluation terminates. On the other hand, if the size of the conjunction is8 Big-step to small-step semantics and back unbounded, the filter has to be defined so that the whole conjunction is dynamic, and much specialisationwill thereby be lost. We return to this point below.The question of whether the size of the conjunction is bounded is related to the tree dimension ofthe clauses being interpreted by the linear interpreter [20]. In the case of the linear interpreter, where theleftmost atom is selected at each step, the problem amounts to determining whether there are non-tail-recursive predicates in the clauses.Assuming that the conjunction is bounded, specialisation terminates and returns a set of linearclauses, whose arguments correspond to arguments of the original source predicates.
Example 2
Consider the clauses produced by specialising the big-step interpreter in Example 1. Afterspecialisation of the linear interpreter with respect to those clauses, followed by elimination of redundantarguments as previously described, we obtain the following linear clauses representing the function f . f__2__3 :-B is A, C is B, D is 1,while__9__4(C,D).while__9__4(A,B) :-A>0, E is A-1, F is B,while__18__5(B,F,E).while__9__4(A,B) :-A=<0.while__18__5(A,B,C) :-B>0, H is B-1, I is A+1,while__18__5(I,H,C).while__18__5(A,B,C) :-B=<0,while__9__4(C,A). Note that the predicates representing the nested loops, namely while__9__4 and while__18__5 are mutually recursive instead of being nested as in the original clauses. Furthermore, the predicates donot directly produce output; the final state of the computation is represented in the variables at the pointwhere while__9__4 is called and terminates.
There are two approaches to handling the application of the linear interpreter to a set of clauses in whichthe size of the conjunction has no upper bound. One is to represent the conjunction explicitly. This wouldresult in a specialised version of the linear interpreter in which the list argument of solve remains. Forexample, consider the clauses for the Fibonacci function, which is binary recursive. Specialising thelinear interpreter results in linear clauses as follows. solve([fib(0,0)|As]) :- solve(As).solve([fib(1,1)|As]) :- solve(As).solve([fib(N,M)|As]) :-N>1,N1 is N-1,N2 is N-2,solve([fib(N1,M1),fib(N2,M2),M is M1+M2|As]).solve([M is M1+M2|As]) :-M is M1+M2,solve(As).
Although formally a linear set of clauses, the list structure presents difficulties for the purposes of verifi-cation and analysis.The other approach is to mix big-step and small-step semantics. A recursive function call is handledby a big step, often called a procedure summary in the literature, defined by a predicate that represents allagher, Hermenegildo, Kafle, Klemen,L´opez Garc´ıa and Morales et al. [4] alsoderived a form that they call multi-step semantics by interpreter specialisation, allowing for recursivefunction calls, that is essentially the same.Our proposal for realising this approach is simply to extend the linear interpreter with an extra non-linear clause for solve , handling recursive functions. It is assumed that the first clause in Figure 5 hasan added condition so that it is applied only to predicates that do not represent recursive function calls. solve([A|As]) :-recursiveCall(A),solve([A]), solve(As).
This is no longer a linear interpreter, since there are two calls to solve in the body of the clause.However, specialising an interpreter including this clause ensures that the conjunction in calls to solve is bounded, and the code apart from the recursive calls is linearised. If the original C program input tothe big-step interpreter was a single procedure, then specialising the resulting big-step clauses results inlinear clauses as the clause above is never used. Tail-recursive functions will also result in linear clausessince the above clause will only be called when the tail of the conjunction As is empty and solve(As) is evaluated to true . In this section we take a set of linear clauses and transform them to non-linear clauses that reflect thenested call structure of the clauses. Consider the result of linearising our running example, shown inExample 2. We noted that the predicates while__9__4 and while__18__5 are mutually recursive.However, when analysing this program, for example to perform resource or termination analysis, it mightbe an advantage to identify that while__18__5 is the inner loop, which can be analysed on its own,and then its solution applied to the analysis of the outer loop while__9__4 . Though in our runningexample we started with a big-step representation, we might have obtained the linear clauses from someless structured source code such as machine language or control flow graphs, and in such cases it is oftenuseful to be able to reconstruct big-step clauses.
A set of linear clauses induces a directed graph in which the nodes are predicate names and there is anedge from p to q if there is a clause with head predicate p and a call to q in the body. The edge islabelled by an identifier for the clause. The call graph for the clauses in Example 2 is shown in Figure 6.For any graph with a designated entry node and exit node, there is a well known algorithm by Tarjan[38] that computes a regular path expression describing exactly the paths from the entry to exit. Thealphabet of the regular expression is the set of clause identifiers. There are usually many equivalentpath expressions that describe this set of paths; non-deterministic choices within the algorithm determinewhich expression is produced. For the graph in Figure 6, with entry node f__2__3 and exit true , oneregular path expression is c ( c c ∗ c ) ∗ c c . . . c c
5) is nested inside theouter loop, since the structure of the regular expression is nested.0
Big-step to small-step semantics and back
Figure 6: Predicate call-graph for the clauses in Example 2.
Figure 7 shows the main clause of an interpreter for a set of linear clauses that follows a computationpath given by a regular expression. The predicate pathsolve(A,Z,Expr,F) finds a path from atom A to atom Z that is recognised by the path expression Expr . The final argument F is a set of symbols thatcan follow a path described by Expr , and is present only to aid the partial evaluator. It can immediatelybe seen that the clauses for the cases of concatenation
E1:E2 and repetition star(E) are nonlinearclauses. The complete interpreter, which includes the option of calling Tarjan’s algorithm, is availablefrom [9](solveReg.pl).Specialisation of the path interpreter with respect to a set of linear clauses and a path expressionfor paths from entry to true computed by Tarjan’s algorithm results in a “path program”. It has thecharacteristics of a big-step program in that each predicate has an input and output state (the start andend points of the subpath that is represented by that predicate). It is in general non-linear since any starexpressions or path concatenations will result in non-linear clauses.
Example 3
Consider the linear clauses resulting from the specialisation of the linear interpreter inExample 2. Computing a path expression c ( c c ∗ c ) ∗ c using Tarjan’s algorithm (where c , . . . , c identify the clauses in the order they appear), and then specialising the path interpreter with respect tothe linear clauses and the expression yields the following clauses. go__0(A) :-D=1,while__9__4__4(A,D,B,C),B=<0.while__9__4__4(A,B,A,B) :-true.while__9__4__4(A,B,C,D) :-A>0, A1 is A-1,while__18__5__9(B,B,A1,E,F,G),F=<0,while__9__4__4(G,E,C,D).while__18__5__9(A,B,C,A,B,C) :-true.while__18__5__9(A,B,C,D,E,F) :-B>0, A1 is A+1, B1 is B-1,while__18__5__9(A1,B1,C,D,E,F). allagher, Hermenegildo, Kafle, Klemen,L´opez Garc´ıa and Morales The structure of the clauses resembles those of the big-step clauses in Example 1. The inner loop, nowrepresented by while__18__5__9 is nested in the outer loop while__9__4__4 . However the pathprogram has some minor differences from the original big-step clauses; the base case of a loop arisingfrom a “star” expression is the empty path, while any base case conditions in the original clauses nowappear conjoined immediately after the star expression predicate. For example, the condition
F=<0 appears after the loop predicate while__18__5__9(B,B,A1,E,F,G) instead of in the base caseof that predicate.
Semantics-based translation of imperative programs into Horn clauses is a topic of growing importance,as Horn clause solvers become more powerful and effective [3, 13, 23, 21, 18], providing a generalframework for imperative program verification [33, 31, 12, 1] and enabling the application of the largebody of techniques and tools for semantics-based analysis and verification developed in the (constraint)logic programming field (e.g. [16]). These include abstract interpreters for a variety of domains includingnon-functional program properties such as complexity and resource usage, e.g.[29].Big-step based interpretation using Horn clause interpreters was proposed by Kahn and others [22,6, 7] but not as a basis for translation. Peralta et al. [33] proposed an approach to translation based onthe first Futamura projection [8], specialising a Horn clause interpreter for imperative programs usingsmall-step semantics. This only handled a small imperative language with procedure calls. De Angelis et al. [4] further developed this approach, handling also procedure calls using a “multi-step” semantics,combining aspects of small-step and big-step semantics as discussed in Section 4. % Regular expressions% E ::= symb(Id) | E1:E2 | E1+E2 | star(E) | null | epspathsolve(A,Z,symb(C),_) :-clpClause(C,A,Cs,[Z]),solveConstraints(Cs).pathsolve(A,Z,E1:E2,F) :-first(E2,F1),member(P/N,F1),functor(X,P,N),pathsolve(A,X,E1,F1),pathsolve(X,Z,E2,F).pathsolve(A,Z,E1+_,F) :-pathsolve(A,Z,E1,F).pathsolve(A,Z,_+E2,F) :-pathsolve(A,Z,E2,F).pathsolve(A,A,star(_),_).pathsolve(A,Z,star(E),F) :-first(E,FE),setunion(FE,F,F1),member(P/N,F1),functor(X,P,N),pathsolve(A,X,E,F1),pathsolve(X,Z,star(E),F).
Figure 7: Main clauses for a path expression interpreter.2
Big-step to small-step semantics and back
Big-step and small-step semantics each have their strong and weak points from the point of viewof program analysis. Small-step programs are essentially transition systems and are amenable to wellestablished model-checking techniques. Big-step programs are more structured, allowing compositionalanalysis, but at the cost of having predicates with a greater number of arguments, which can be expensivefor analysis algorithms.Analysis of linear programs, with various kinds of graph path analysis, has been the subject of someprevious work. Kincaid et al. [24] have explored the use of Tarjan’s regular path expression to performcompositional program analysis starting from a linear program representation such as a control flowgraph. However, they do not perform a program transformation based on the path. Wei et al. [39]present an algorithm for discovering the nesting structure of loops in control flow graphs, for the purposeof decompilation.The interpreter-based approach described in this paper has both practical and conceptual advantages.Being able to transform between big-step and small-step styles, or mix them, allows use of a singleverification framework but with the flexibility of different program representations. The conceptualadvantages relate to the understanding gained of the connections between different styles of semantics,and how they can be derived from each other.
Acknowledgements
Discussions on semantics and Horn clauses with Alberto Pettorossi, Maurizio Proietti, Fabio Fioravantiand Emanuele De Angelis are gratefully acknowledged.
References [1] N. Bjørner, A. Gurfinkel, K. L. McMillan & A. Rybalchenko (2015):
Horn Clause Solvers for ProgramVerification . In L. D. Beklemishev, A. Blass, N. Dershowitz, B. Finkbeiner & W. Schulte, editors:
Fields ofLogic and Computation II , LNCS
Mechanized Semantics for the Clight Subset of the C Language . J. Autom.Reasoning
Program verification via iterated special-ization . Sci. Comput. Program.
95, pp. 149–175, doi:10.1016/j.scico.2014.05.017.[4] E. De Angelis, F. Fioravanti, A. Pettorossi & M. Proietti (2015):
Semantics-based generation of verificationconditions by program specialization . In M. Falaschi & E. Albert, editors:
Proceedings of the 17th Inter-national Symposium on Principles and Practice of Declarative Programming, Siena, Italy, July 14-16, 2015 ,ACM, pp. 91–102, doi:10.1145/2790449.2790529.[5] B. Demoen (1992):
On the Transformation of a Prolog Program to a More Efficient Binary Program . InK. Lau & T. Clement, editors:
Logic Program Synthesis and Transformation, Proceedings of LOPSTR 92,International Workshop on Logic Program Synthesis and Transformation, University of Manchester, UK, 2-3July 1992 , Workshops in Computing, Springer, pp. 242–252, doi:10.1007/978-1-4471-3560-9 17.[6] T. Despeyroux (1984):
Executable Specification of Static Semantics . In G. Kahn, D. B. MacQueen &G. Plotkin, editors:
Semantics of Data Types , LNCS
Programming Environments Based on StructuredEditors: The MENTOR experience . In D. Barstow, E. Sandewall & H. Shrobe, editors:
Interactive Program-ming Environments , ISBN=978-0070038851, McGraw-Hill, p. 128–140.[8] Y. Futamura (1971):
Partial Evaluation of Computation Process - An Approach to a Compiler-Compiler . Systems, Computers, Controls allagher, Hermenegildo, Kafle, Klemen,L´opez Garc´ıa and Morales [9] J. P. Gallagher, M. Hermenegildo, B. Kafle, M. Klemen, P. L. Garc´ıa & J. Morales (2020): InterpretersRepository . http://webhotel4.ruc.dk/˜jpg/Software/Semantics .[10] G. Gange, J. A. Navas, P. Schachte, H. Søndergaard & P. J. Stuckey (2015): Horn clauses as an intermediaterepresentation for program analysis and transformation . Theory Pract. Log. Program.
Decompilation of Java bytecode to Prolog by partialevaluation . Inf. Softw. Technol.
Synthesizing software verifiers fromproof rules . In J. Vitek, H. Lin & F. Tip, editors:
ACM SIGPLAN Conference on Programming LanguageDesign and Implementation, PLDI ’12 , ACM, pp. 405–416, doi:10.1145/2254064.2254112.[13] A. Gurfinkel, T. Kahsai, A. Komuravelli & J. A. Navas (2015):
The SeaHorn Verification Framework . InD. Kroening & C. S. Pasareanu, editors:
Computer Aided Verification - 27th International Conference, CAV2015, San Francisco, CA, USA, July 18-24, 2015, Proceedings, Part I , Lecture Notes in Computer Science
Abstract Interpretation of PIC Programs through Logic Pro-gramming . In:
Sixth IEEE International Workshop on Source Code Analysis and Manipulation (SCAM2006), 27-29 September 2006, Philadelphia, Pennsylvania, USA , IEEE Computer Society, pp. 184–196,doi:10.1109/SCAM.2006.1.[15] M. V. Hermenegildo, F. Bueno, M. Carro, P. L´opez-Garc´ıa, R. Haemmerl´e, E. Mera, J. F. Morales & G. Puebla(2011):
An Overview of the Ciao System . In N. Bassiliades, G. Governatori & A. Paschke, editors:
Rule-Based Reasoning, Programming, and Applications - 5th International Symposium, RuleML 2011 - Europe,Barcelona, Spain, July 19-21, 2011. Proceedings , Lecture Notes in Computer Science
Integrated Program Debugging, Ver-ification, and Optimization Using Abstract Interpretation (and The Ciao System Preprocessor) . Science ofComputer Programming
An Axiomatic Basis for Computer Programming . Commun. ACM
The ELDARICA Horn Solver . In N. Bjørner & A. Gurfinkel, editors: , IEEE, pp. 1–7, doi:10.23919/FMCAD.2018.8603013.[19] B. Kafle & J. P. Gallagher (2017):
Constraint specialisation in Horn clause verification . Sci. Comput.Program.
Tree dimension in verification of constrained Horn clauses . TPLP
RAHFT: A Tool for Verifying Horn Clauses Using AbstractInterpretation and Finite Tree Automata . In:
Computer Aided Verification - 28th International Conference,CAV 2016, Toronto, ON, Canada, July 17-23, 2016, Proceedings, Part I , pp. 261–268, doi:10.1007/978-3-319-41528-4 14.[22] G. Kahn (1987):
Natural Semantics . In F. Brandenburg, G. Vidal-Naquet & M. Wirsing, editors:
STACS87, 4th Annual Symposium on Theoretical Aspects of Computer Science, Passau, Germany, February 19-21,1987, Proceedings , Lecture Notes in Computer Science
JayHorn: A Framework for Verifying Java programs .In S. Chaudhuri & A. Farzan, editors:
Computer Aided Verification - 28th International Conference, CAV2016, Toronto, ON, Canada, July 17-23, 2016, Proceedings, Part I , Lecture Notes in Computer Science
Compositional recurrence analysis revisited . InA. Cohen & M. T. Vechev, editors:
Proceedings of the 38th ACM SIGPLAN Conference on Programming Big-step to small-step semantics and back
Language Design and Implementation, PLDI 2017, Barcelona, Spain, June 18-23, 2017 , ACM, pp. 248–262,doi:10.1145/3062341.3062373.[25] M. Leuschel & J. Jørgensen (1999):
Efficient Specialisation in Prolog Using the Hand-Written CompilerGenerator LOGEN . Elec. Notes Theor. Comp. Sci.
The Ecce and Logen par-tial evaluators and their web interfaces . In J. Hatcliff & F. Tip, editors:
PEPM , ACM, pp. 88–94,doi:10.1145/1111542.1111557.[27] M. Leuschel & M. H. Sørensen (1996):
Redundant Argument Filtering of Logic Programs . In J. P. Gal-lagher, editor:
Logic Programming Synthesis and Transformation, 6th International Workshop, LOPSTR’96,Stockholm, Sweden, August 28-30, 1996, Proceedings , pp. 83–103, doi:10.1007/3-540-62718-9 6.[28] J. Lloyd (1987):
Foundations of Logic Programming: 2nd Edition . Springer-Verlag, doi:10.1007/978-3-642-83189-8.[29] P. L´opez-Garc´ıa, M. Klemen, U. Liqat & M. V. Hermenegildo (2016):
A general framework forstatic profiling of parametric resource usage . Theory Pract. Log. Program.
Most Specific Logic Programs . Ann. Math. Artif. Intell.
1, pp.303–338, doi:10.1007/BF01531082.[31] M. M´endez-Lojo, J. A. Navas & M. V. Hermenegildo (2007):
A Flexible, (C)LP-Based Approach to the Anal-ysis of Object-Oriented Programs . In A. King, editor:
Logic-Based Program Synthesis and Transformation,17th International Symposium, LOPSTR 2007, Kongens Lyngby, Denmark, August 23-24, 2007, RevisedSelected Papers , Lecture Notes in Computer Science
Semantics with applications - a formal introduction . Wiley professionalcomputing, Wiley.[33] J. Peralta, J. P. Gallagher & H. Sa˘glam (1998):
Analysis of Imperative Programs through Analysis of Con-straint Logic Programs . In G. Levi, editor:
Static Analysis. 5th International Symposium, SAS’98, Pisa , Springer-Verlag Lecture Notes in Computer Science
Synthesis and Transformation of Logic Programs Using Unfold/FoldProofs.
J. Log. Program.
A Structural Approach to Operational Semantics . Technical Report DAIMI FN-19,Computer Science Department, Aarhus University.[36] G. D. Plotkin (2004):
The origins of structural operational semantics . J. Log. Algebr. Program.
A structural approach to operational semantics . J. Log. Algebr. Program.
Fast Algorithms for Solving Path Problems . J. ACM
A New Algorithm for Identifying Loops in Decompilation . In H. R.Nielson & G. Fil´e, editors: