A Comparison of Big-step Semantics Definition Styles
11 A Comparison of Big-step Semantics Definition Styles
P´eter Bereczky , D´aniel Horp´acsi , Simon Thompson , ELTE E¨otv¨os Lor´and University, Department of Programming Languages and Compilers University of Kent, School of Computing [email protected], [email protected], [email protected] Abstract
Formal semantics provides rigorous, mathematically precise definitions of programminglanguages, with which we can argue about program behaviour and program equivalence byformal means; in particular, we can describe and verify our arguments with a proof assistant.There are various approaches to giving formal semantics to programming languages, at dif-ferent abstraction levels and applying different mathematical machinery: the reason for usingthe semantics determines which approach to choose.In this paper we investigate some of the approaches that share their roots with traditionalrelational big-step semantics, such as (a) functional big-step semantics (or, equivalently, a def-initional interpreter), (b) pretty-big-step semantics and (c) traditional natural semantics. Wecompare these approaches with respect to the following criteria: executability of the semanticsdefinition, proof complexity for typical properties (e.g. determinism) and the conciseness ofexpression equivalence proofs in that approach. We also briefly discuss the complexity of thesedefinitions and the coinductive big-step semantics, which enables reasoning about divergence.To enable the comparison in practice, we present an example language for comparing thesemantics: a sequential subset of Core Erlang, a functional programming language, whichis used in the intermediate steps of the Erlang/OTP compiler. We have already defined arelational big-step semantics for this language that includes treatment of exceptions and sideeffects. The aim of this current work is to compare our big-step definition for this languagewith a variety of other equivalent semantics in different styles from the point of view of testingand verifying code refactorings.
This work is part of a wider project that aims to reason about the correctness of code refactoring. Tothis end, a rigorous, formal definition is needed for the programming language under refactoring:in our case, Erlang. In earlier work, we developed a relational big-step semantics for sequentialCore Erlang, including exceptions and side effects. This semantics is used in general proofs ofcharacteristic properties (e.g. determinism) as well as proofs of equivalence between pairs of patternexpressions. The latter are important from the refactoring point of view: pattern equivalences canbe interpreted as simple, correct refactorings for Core Erlang [3, 4]. Building on these simpleequivalences, we plan to prove compound code transformations correct.Formalising Core Erlang in the big-step operational definitional approach was a somewhat adhoc decision, supported by the following facts: it is not as detailed as small-step definitions, offeringshorter proofs, and, at the same time, unlike in denotational definitions, semantics and proofs ofnondeterministic and divergent programs do not need special treatment in the proof assistantembedding. Nonetheless, relational big-step semantics comes with its drawbacks: in general, itis not directly executable, the proof of determinism is complex, and we cannot use this styleof semantics to argue about concurrency. After working with the relational big-step semanticsformalisation for a while, the shortcomings of this approach became apparent, and we decided toinvestigate whether other semantics definition styles would be more suitable.It seems to be a simple choice: the purpose of defining the semantics should determine theapplied definition approach. However, conflicting requirements can make the decision unclear. a r X i v : . [ c s . P L ] N ov For instance, in related work, different approaches have been applied to reason about programtransformations: Grigore et al. [10] and Garrido et al. [13] use (reduction style) small-step semantics,whilst Owens et al. [21] use (functional) big-step semantics. Both of these are executable and canbe used to argue about program equivalence, but show different characteristics in general. Thereare a number of ways to create a testable and usable formal semantics, as, for example, addressedin a related discussion by Blazy and Leroy [5], but it is not obvious to tell which is the best optionfor our purposes. Moreover, this choice is not only about the different mathematical approaches,but also how easy it is to implement them in the Coq proof assistant.In this paper we analyse and compare different methods of defining big-step style semantics fora small, Erlang-like programming language. We do this to answer the question of which methodshould be used when creating a semantic description of sequential Core Erlang, when the descrip-tion should support equivalence proofs and be efficiently executable. In doing this we survey thefollowing methods: (a) traditional relational big-step semantics [14], (b) pretty-big-step seman-tics [8], and (c) functional big-step semantics [21], which can be seen as equivalent to supplying adefinitional interpreter [23]. We also briefly discuss a coinductive approach to define big-step se-mantics [16]. When comparing the semantic approaches, we aim to answer the following questions:1. Does the semantics definition scale in terms of the complexity of expression equivalenceproofs? Since our primary purpose is to prove expression pattern equivalences, the semanticshas to be especially supportive of constructing such proofs.2. Is the semantics effectively executable, allowing for automatic evaluation of expressions? Isthis automatic execution efficient, with a performance comparable to a reference implemen-tation? Execution of the semantics definition is crucial when it comes to validation: testingthe semantics against a reference implementation needs the semantics to be executed.3. How complex are the proofs for the common properties such as determinism or progress?For instance, some semantics are inherently deterministic, because they are presented as asemantic function, while it is a lot more cumbersome to prove this property in a relationalsemantics.We note that the paper not only makes a survey of the abovementioned semantics definition styles,but implements a benchmark language in each of those, and makes the detailed comparison basedon the case study. Namely, we make the following main contributions: • Traditional big-step, pretty-big-step and functional big-step semantics definitions for a simplefunctional programming language resembling sequential Core Erlang, moreover, we prove theequivalence of these definitions too. • Proofs of basic properties of each semantics and proofs for simple expression pattern equiv-alences (local refactorings) in each definition style. • A systematic comparison of the approaches with respect to execution and proof complexity.We will often quote Coq code to highlight the fact that all these concepts have been formalizedin Coq [24]. Inductive constructors in the relational semantics are described as inference rules.The rest of the paper is structured as follows. In Section 2 we describe the syntax, and necessaryabstractions for our benchmark language. In Section 3 we discuss the traditional big-step andthe pretty-big-step semantics, and in Section 4 we cover the functional approaches, in particularthe functional big-step semantics. Section 5 evaluates the presented approaches, and also brieflysummarises coinductive big-step semantics [16]. Finally, Section 6 concludes and discusses futurework.
Throughout the paper, we define formal semantics for a simple but representative, functionalprogramming language, which resembles Erlang; in fact, our case study language is a proper subsetof Core Erlang. In this section, we introduce the syntax and a semantic domain for the language,based on which the later sections will define big-step operational semantics of different styles inorder to make a systematic comparison between them.
The case study language includes abstractions known from the functional paradigm (such as singleassignment variables, let -binding, lambda abstraction and function application), but we also incor-porate impure expressions (such as I/O calls and exception handling). Furthermore, the languagesupports recursive function definitions ( letrec -binding), but only one name can be bound by eachexpression. Figure 1 defines the syntax of the language precisely, as an inductive type.
Inductive
Expression : Type := | ELit ( l : Literal a ) | EVar ( v : Var ) | EFunId ( f : FunctionIdentifier ) | EFun ( vl : list Var ) ( e : Expression ) | ECall ( f : string ) ( params : list Expression ) | EApp ( exp : Expression ) ( params : list Expression ) | ELet ( v : Var ) ( e b : Expression ) | ELetRec ( fid : FunctionIdentifier ) ( params : list Var ) ( b e : Expression ) | ETry ( e : Expression ) ( v : Var ) ( e : Expression ) ( vl : list Var ) ( e : Expression ). a Literals are either atoms or integers.
Figure 1: The syntax of our case study language (subset of Core Erlang)
This language has expressions of three types: atoms, integers and functions. Therefore, values ofexpressions can only be literal values and closures (see Figure 2). Closures are the normal forms offunctions, and store the function’s parameter list, body expression and an evaluation environmentin which the body should be evaluated; moreover, the collection of recursive functions definedsimultaneously . Inductive
Value : Type := | VLit ( l : Literal ) | VClos ( ref : Environment ) ( ext : list ( FunctionIdentifier × FunctionExpression ))( vl : list Var ) ( e : Expression ). Definition
Exception := ExceptionClass × Value × Value .Figure 2: Semantic domainExceptions may also be the results of expression evaluations. In our formalisation, exceptionsare represented as triples: exception class ( error , throw or exit ) and two values describing theexception reason. In our case studies, we will use two often seen exceptions known from Erlang: badarity happens when an application evaluation fails due to the faulty parameter number, and badfun is encountered when the main expression of the application evaluates to a value that is nota function closure. The presented approach is based on our previous work and is fairly general: it can handle multiple simultaneousfunction definitions, not only one; see [4] for more details.
Finally, we define the semantic domain as the union of values and exception descriptions:
Value + Exception . In the formalisation, we use Coq’s built-in union type with the standard inl and inr constructors to make elements of the semantic domain.
In order to share as much as possible in the different semantics definitions, not only we fix thesemantic domain, but we define a common type for the evaluation environment. Basically, this is acollection of variable names (and function identifiers) mapped to values. There are several helperfunctions to manage this environment, namely: • get value : Returns the value associated with a given name. If the name is unbound, it yieldsan exception. • insert value : Inserts a binding into the environment. • append vars to env : Inserts several variable bindings into the environment. • append funs to env : Inserts function identifier-closure bindings into the environment.We remind the reader that the case study language allows for calling some built-in I/O functions,thus the semantics will need to address the meaning of these side-effects. For this, we define a type( SideEffectList ), the values of which log simple input-output effects produced by the evaluation ofspecific
ECall expressions. While evaluating
ECall expressions, we use the auxiliary eval function,which returns a value or exception and a side effect trace — only this operation can extend theside effect trace. In our previous work [4] we applied a slightly different method using standardlist append operations in every derivation rule; however, this former approach had to be refined inorder to support automatic evaluation of expressions.
As mentioned already in the introduction, we will compare the different approaches of definingbig-step semantics based on the following criteria: • How complex is proving the properties of the semantics. We will use the determinism propertyto investigate this. • Is the approach executable? Is the semantics efficiently executable? • How complex is proving expression evaluation formally. We will use two smaller expressionsto investigate this: let X = fun (Y , Z) -> Y inapply X(’a ’, ’b ’)let X = 4 inlet Y = 5 inapply ( fun (X , Y) -> X + Y) (X , Y)
Listing 1: Expression evaluation examples • How complex is proving expression equivalence? We will use one unconditional and oneconditional equivalence to investigate this: e ⇔ let X = fun () -> e in apply X ()let A = e in let B = e inlet B = e in ⇔ let A = e in A + B A + B Listing 2: Expression equivalence examples In the second example, the side effects produced by e and e are swapped during the evaluation of theseexpressions. A traditional big-step operational semantics is a relation between the evaluable expression andits value, or more generally, between initial and final configurations, where the configurations mayinclude the evaluation environment or the side-effects of the evaluation. Note that in big-step style,the intermediate stages of the evaluation are not visible from the relation [20]. The idea of thisstyle of semantics is originated from Kahn [14]. In Coq, such a relation can be formalised with aninductive type, where the data constructors represent the derivation rules (or judgements).
Traditional inductive big-step semantics are used in many projects, to mention but a few: derivingsuch a semantics from a small-step definition [9], call-by-need semantics of let and letrec calculus( λ let , λ letrec ) [17], or the trace-based operational semantics for While [18] (this one is definedcoinductively), as well as our project defining Core Erlang [3, 4, 19].For the investigation of the different big-step definition styles, we reuse our Core Erlang for-malisation mentioned above, but discard parts of it since the case study language used in thecomparison is a subset of it. The big-step semantics will be denoted by (cid:104) Γ , exp , eff (cid:105) ⇓ { res , eff } where Γ is the evaluation environment, exp is the evaluable expression, eff and eff are the initialand final side effect traces and res is the result which is either a value or an exception. Beforedescribing the semantics, we introduce some predicates and notations for readability about eval-uating a list of expressions (we use | l | to denote the length of list l , S i denotes the successor of i and l [ i ] denotes the i th element of l ). The function nth def l default i works the same way as l [ i − i >
0, but for i = 0 it returns the default value. We also use Coq’s standard last function [11]. eval all Γ es vs eff eff := ( | es | = | vs | ) ⇒ ( | es | = | eff | ) ⇒ ( ∀ j < | es | , (cid:104) Γ , es [ j ] , nth def eff eff j (cid:105) ⇓ { inl vs [ j ] , nth def eff eff ( S j ) } ) eval prefix Γ es vs i eff eff := ( i < | es | ) ⇒ ( | vs | = i ) ⇒ ( | eff | = i ) ⇒ ( ∀ j < i , (cid:104) Γ , es [ j ] , nth def eff eff j (cid:105) ⇓ { inl vs [ j ] , nth def eff eff ( S j ) } ) . The eval all states that an expression list es evaluates to a value list vs (note that this for-mula also expresses that the evaluation of the expressions does not produce any exceptions). Theevaluation for the j th step starts with the side effect log eff [ j −
1] (or the default initial log eff )and the result log is eff [ j ]. The eval prefix describes the same behaviour, but only for the first i elements. Now we can describe the big-step semantics for our case study language (Figure 3 showsthe evaluation of expressions without exceptions, and Figure 4 explains the exceptional semantics). The traditional, relational big-step semantics introduced in the previous section is not inherentlyexecutable or computable: for a given pair of starting and final configurations, it needs to beproven that they are in operational semantics relation. In Coq, such a proof can be given in termsof proof primitives, or one can write a program in the tactic language to construct the proof.Automatic execution of relational semantics can be done with the latter. We could also see theevaluation tactics as a machinery that can turn the relational semantics into functional: a programin the tactic language can perform pattern matching, case distinction and even recursion, and canultimately compute the results of the relation.
Limitations of Coq tactics.
Executing the relational semantics in Coq involves technical con-siderations: one needs to make sure that the operational semantics derivation rules do not containauxiliary function calls in their consequences. Otherwise, the Coq tactic language cannot do simplepattern matching on the proof goals and prevents syntax-directed evaluation. In our semantics, In the following figures, the result res could be either a value or an exception, so its type is
Value + Exception . (cid:104) Γ , ELit l , eff (cid:105) ⇓ { inl ( VLit l ) , eff } ( Lit ) res = get value Γ ( inl s ) (cid:104) Γ , EVar s , eff (cid:105) ⇓ { res , eff } ( Var ) (cid:104) Γ , EFun vl e , eff (cid:105) ⇓ { inl ( VClos Γ (cid:104)(cid:105) vl e ) , eff } ( Fun ) res = get value Γ ( inr fid ) (cid:104) Γ , EFunId fid , eff (cid:105) ⇓ { res , eff } ( FunId ) eval all Γ params vals eff eff eval fname vals ( last eff eff ) = ( res , eff ) (cid:104) Γ , ECall fname params , eff (cid:105) ⇓ { res , eff } ( Call ) (cid:104) Γ , exp , eff (cid:105) ⇓ { inl ( VClos ref ext var list body ) , eff } | var list | = | vals | eval all Γ params vals eff eff (cid:104) append vars to env var list vals ( get env ref ext ) , body , last eff eff (cid:105) ⇓ { res , eff }(cid:104) Γ , EApp exp params , eff (cid:105) ⇓ { res , eff } ( App ) (cid:104) Γ , e , eff (cid:105) ⇓ { inl val , eff } (cid:104) insert value Γ v val , b , eff (cid:105) ⇓ { res , eff }(cid:104) Γ , ELet v e b , eff (cid:105) ⇓ { res , eff } ( Let ) (cid:104) append funs to env [ fid ] [ params ] [ b ] Γ , e , eff (cid:105) ⇓ { res , eff }(cid:104) Γ , ELetRec fid params b e , eff (cid:105) ⇓ { res , eff } ( LetRec )Figure 3: The core traditional big-step definition of our case study languagewe needed to apply minor changes in the derivation rule of variables and at uses of the append operation on side effect logs in our Core Erlang semantics [4]. The issue has been solved by refactor-ing: we replaced the auxiliary function applications with fresh variables and added extra premisesstating equality between the variables and the corresponding function applications.On the other hand, in case of the side effects (and the mentioned append operations) to avoidthe introduction of unreasonable numbers of new variables, we changed the use of these traces.Note that currently only
ECall expressions can cause new side effects, the other rules just have topropagate the logs. Instead of handling only the additional side effects of an expression evaluationstep, we rather consider using always the whole initial and final side effect traces (i.e. not only thedifference like in [4]). This way we could dispose of the append operations in the consequences ofthe derivation rules. (cid:104) Γ , e , eff (cid:105) ⇓ { inl val’ , eff } (cid:104) insert value Γ v val’ , e , eff (cid:105) ⇓ { res , eff }(cid:104) Γ , ETry e v e vl e , eff (cid:105) ⇓ { res , eff } ( Try ) (cid:104) Γ , e , eff (cid:105) ⇓ { inr ( ex , ex , ex ) , eff }(cid:104) append try vars to env vl [ exclass to value ex ; ex ; ex ] Γ , e , eff (cid:105) ⇓ { res , eff }(cid:104) Γ , ETry e v e vl e , eff (cid:105) ⇓ { res , eff } ( Catch )For the next rule, let us consider nonclosure v := ∀ Γ (cid:48) , ext , var list , body : v (cid:54) = VClos Γ (cid:48) ext var list body . (cid:104) Γ , exp , eff (cid:105) ⇓ { inl v , eff } eval all Γ params vals eff eff nonclosure v eff = last eff eff (cid:104) Γ , EApp exp params , eff (cid:105) ⇓ { inr ( badfun v ) , eff } ( AppExc )In the following rule, we denote VClos ref ext var list body with v . (cid:104) Γ , exp , eff (cid:105) ⇓ { inl v, eff } eval all Γ params vals eff eff | var list | (cid:54) = | vals | eff = last eff eff (cid:104) Γ , EApp exp params , eff (cid:105) ⇓ { inr ( badarity v ) , eff } ( AppExc ) eval prefix Γ params vals i eff eff (cid:104) Γ , params [ i ] , last eff eff (cid:105) ⇓ { inr ex , eff }(cid:104) Γ , ECall fname params , eff (cid:105) ⇓ { inr ex , eff } ( CallExc ) (cid:104) Γ , e , eff (cid:105) ⇓ { inr ex , eff }(cid:104) Γ , ELet v e b , eff (cid:105) ⇓ { inr ex , eff } ( LetExc ) (cid:104) Γ , exp , eff (cid:105) ⇓ { inr ex , eff }(cid:104) Γ , EApp exp params , eff (cid:105) ⇓ { inr ex , eff } ( AppExc ) (cid:104) Γ , exp , eff (cid:105) ⇓ { inl v , eff } eval prefix Γ params vals i eff eff (cid:104) Γ , params [ i ] , last eff eff (cid:105) ⇓ { inr ex , eff }(cid:104) Γ , EApp exp params , eff (cid:105) ⇓ { inr ex , eff } ( AppExc )Figure 4: The traditional big-step operational semantics of exception creation and propagation Evaluation tactic.
We use Coq’s tactic sublanguage called Ltac [12] to automate proof con-struction. In our case, the evaluation of the semantics of the case study language without exceptionsis syntax-directed, i.e. a tactic can be designed to evaluate any expression in any context based onpattern-matching on the expression to be evaluated (e.g.
ECall expression can be evaluated with
Call ). On the other hand, after introducing exceptions, several derivation rules are applicablefor evaluating a particular expression (e.g. there are two rules for
ECall , five rules for functionapplications, etc.). We extended the evaluation tactic to try applying the applicable rules one afterthe other. This can be seen as a backtracking proof-search for a successful evaluation path. As it turned out, such evaluation tactics in Coq are rather ineffective in terms of time andspace. To speed up the execution, we can create some helper functions and prove lemmas aboutspecific expressions (e.g. the evaluation of parameters which are just literals), so that the evaluationtactic can apply these lemmas before trying to evaluate an expression with the mentioned slowbacktracking process. These lemmas can significantly speed up the evaluation of expressions whichcontain such specific sub-expressions; however, they only solve a small part of the problem.
As seen before, the traditional definition contains several similar rules with the same premises.The idea of Chargu´eraud — called pretty-big-step semantics [8] — is focusing on eliminatingthis redundancy. Let us discuss his idea through our case study using the evaluation rules forapplications. First of all, Chargu´eraud identified two sources of duplication: • The similar premises in the rules for exceptions, correct evaluation (and divergence). • The duplication of the evaluation judgement both for values and exceptions. This is notpresent in our case study language, however,
App could be described in form of two rules:one for exception and one for the value final result.In the following paragraphs we focus on the first problem. Instead of using duplicated conditions,Chargu´eraud suggests to use “intermediate terms” which contain the satisfied conditions implicitly.These can be seen also as terms, which remember the state of the evaluation, i.e. which sub-termshave already been evaluated (this resembles a small-step semantics in some aspects).
Applications with intermediate terms.
Let us see how the idea applies to our semantics.First, we need to create the syntax for intermediate terms (see Figure 5). In our case, we need threeadditional constructors for applications:
AApp1 corresponds to the function expression evaluation,
AList to the evaluation of the parameters, while
AApp2 to the application exception creation andfunction body evaluation.
Inductive
AuxExpression := | AApp1 ( b : Value + Exception ) ( params : list Expression ) | AApp2 ( v : Value ) ( b : list Value + Exception ) · · · . Inductive
AuxList := AList ( rest : list Expression ) ( b : list Value + Exception ).Figure 5: The syntax of intermediate termsAfter having the intermediate terms defined, we can rewrite the semantics of applications(Figure 6). We decided not to include our side effect traces in the intermediate terms, because thisway the effects can be handled just like before. First, we have to evaluate the function expressionof the application (
App pretty ). We create the intermediate term AApp1 with the result of this step.If this result was an exception, then the evaluation is finished with
ExcApp pretty , otherwise, theparameters follow after using FinApp pretty .When there are parameters, we can take the first one and evaluate it with StepList pretty .The result will be appended to the end of the value list in the constructor
AList if it is a valueby the mk result function; however, in case of an exception this attribute of
AList becomes thementioned exception. We repeat this process until all parameter expressions are evaluated, or anexception occurs inside the
AList . In the latter case,
ExcList pretty finishes the evaluation, and thestored exception will be propagated. When there is no exception, we use
FinList pretty to finish theparameter list evaluation. At this point, we can notice that this a general approach to evaluatinga list of expressions, so it can be used for
ECall expressions too.In this figure, lres is either a list of
Value s, or an exception, so its type is list Value + Exception . (cid:104) Γ , exp , eff (cid:105) ⇓ p { res , eff } (cid:104) Γ , AApp1 res params , eff (cid:105) ⇓ p { res , eff }(cid:104) Γ , EApp exp params , eff (cid:105) ⇓ p { res , eff } ( App pretty ) (cid:104) Γ , AApp1 ( inr ex ) params , eff (cid:105) ⇓ p { inr ex , eff } ( ExcApp pretty ) (cid:104) Γ , AList params ( inl []) , eff (cid:105) ⇓ p { lres , eff }(cid:104) Γ , AApp2 v lres , eff (cid:105) ⇓ p { res , eff }(cid:104) Γ , AApp1 ( inl v ) params , eff (cid:105) ⇓ p { res , eff } ( FinApp pretty ) (cid:104) Γ , AApp2 v ( inr ex ) , eff (cid:105) ⇓ p { inr ex , eff } ( ExcApp pretty ) | var list | = | vals |(cid:104) append vars to env var list vals ( get env ref ext ) , body , eff (cid:105) ⇓ p { res , eff }(cid:104) Γ , AApp2 ( VClos ref ext var list body ) ( inl vals ) , eff (cid:105) ⇓ p { res , eff } ( FinApp pretty ) nonclosure v (cid:104) Γ , AApp2 v ( inl vals ) , eff (cid:105) ⇓ p { inr ( badfun v ) , eff } ( ExcApp pretty , badfun )In the following rule, we denote VClos ref ext var list body with v . | var list | (cid:54) = | vals |(cid:104) Γ , AApp2 v ( inl vals ) , eff (cid:105) ⇓ p { inr ( badarity v ) , eff } ( ExcApp pretty , badarity ) (cid:104) Γ , AList [] ( inl vals ) , eff (cid:105) ⇓ p { inl vals , eff } ( FinList pretty ) (cid:104) Γ , AList rest ( inr ex ) , eff (cid:105) ⇓ p { inr ex , eff } ( ExcList pretty ) (cid:104) Γ , r , eff (cid:105) ⇓ p { res , eff } (cid:104) Γ , AList rest ( mk result res vals ) , eff (cid:105) ⇓ p { lres , eff }(cid:104) Γ , AList ( r :: rest ) ( inl vals ) , eff (cid:105) ⇓ p { lres , eff } ( StepList pretty )Figure 6: Pretty-big-step semantics for applicationsFinally, if there was an exception during parameter evaluation, instead of the parameter values,an exception is stored in
AApp2 , and this can be propagated with
ExcApp pretty . Otherwise,all parameters were correctly evaluated and FinApp pretty can be applied, when the first savedvalue (the evaluated application function expression) is a closure, moreover, the number of formal0parameters in this closure is the same as the actual parameters, which are also stored in a value listin AApp2 . However, if the first stored value is not a closure, we use
ExcApp pretty , badfun and a badfun exception will be the result, otherwise, we can apply ExcApp pretty , badarity if the number of formal andactual parameters mismatch to create a badarity exception. Brief evaluation.
Compared to the traditional semantics, in the pretty-big-step approach we seethe increase in the number of inference rules, while the premise redundancy is eliminated and thenumber of premises drops to two at most. Obviously, pretty-big-step semantics cannot overcomeall weaknesses of the big-step approach, but provides a good alternative in terms of readabilityand usability.Transforming the big-step semantics to pretty-big-step style was a straightforward process,except the transformation of expression lists: if there were two or more derivation steps in thebig-step premises, the intermediate results were turned into terms like
AApp1 , and the rule wassplit. These steps could have been automated, however, in case of expression lists, the use ofaccumulation in
AList instead of the eval all predicate was not as simple.We should also note, that the pretty-big-step definition is a relational semantics just like thetraditional one. This means, we need a tactic again to execute this semantics, however, unlikein the traditional case, here backtracking is not needed, because the evaluation is syntax-driven(with the exception of the last step of application evaluation:
FinApp pretty , ExcApp pretty , badfun and ExcApp pretty , badarity ). Although, the use of this evaluation tactic is still not efficient enough.There are interesting applications of the method in related research, such as deriving pretty-big-step style semantics from small-step semantics [2] or certified abstract interpretation using thisdefinition style [6]. We have summarised two ways to create a relational big-step semantics, however, both of themsuffer mainly from the same problem: they cannot be executed efficiently. In this section, we discussa functional approach, called functional big-step semantics [21] and its origin, the definitionalinterpreter [23].
The idea of functional big-step semantics was developed by Owens et al. [21]. A semantics in thisstyle is basically a recursive function. In order to assure its termination for arbitrary inputs (e.g. fordiverging expressions too), there is also a “clock” variable which decreases in the steps of the exe-cution. We note that the functional big-step semantics is essentially a definitional interpreter [23]equipped with a clock, and it is defined in a “higher-order logic rather than a programming lan-guage” [21].This approach is also used in research, for example in the FEther project [26] and the typesoundness proof for System F by Amin and Rompf [1] uses definitional interpreters, while a verifiedcompiler backend for CakeML [21, 25] in based on functional big-step semantics. Now, let us seehow can we create such a semantics for our case study language.When we define a function (in Coq), it should explicitly implement behaviour for all inputs (i.e.the function is total). However, in practice, there can be programs or expressions with undefined orunspecified behaviour (naturally, that the relational semantics are partial too). Moreover, becauseof the “clock” variable which ensures the termination of the function, an expression evaluation couldterminate before finding the right result for small “clocks”. This means, we can have three differentresults: correct termination , failure , and timeout (see Figure 7). In our case study language, thereis no undefined behaviour, so we never get failure as result, however, we do not omit this from theresult definition, because if we extend this definition e.g. with Core Erlang-like case expressions,1 Inductive
ResultType : Type := | Result ( res : Value + Exception ) ( eff : SideEffectList ) | Timeout | Failure . Inductive
ResultListType : Type := | LResult ( res : list Value + Exception ) ( eff : SideEffectList ) | LTimeout | LFailure . Figure 7: The possible results of the functional big-step semantics
Fixpoint eval elems ( f : Environment → Expression → SideEffectList → ResultType )(Γ :
Environment ) ( exps : list Expression ) ( eff : SideEffectList ) :
ResultListType := match exps with | [] ⇒ LResult ( inl []) eff | x :: xs ⇒ match f Γ x eff with | Result ( inl v ) eff ’ ⇒ let res := eval elems f Γ xs eff ’ inmatch res with | LResult ( inl xs’ ) eff ’’ ⇒ LResult ( inl ( v :: xs’ )) eff ’’ | r ⇒ r end | Result ( inr ex ) eff ’ ⇒ LResult ( inr ex ) eff ’ | Failure ⇒ LFailure | Timeout ⇒ LTimeout endend . Figure 8: The functional big-step semantics of list evaluationthen the guards of these expressions cannot produce observable side effects [7], so the semantics of case expressions with such guards would be failure .As we have seen before, we have to define the semantics for lists of expressions too. For theselists, we can define the functional big-step semantics distinctly, just like in case of the other dis-cussed semantics (see the eval all , eval prefix predicates in Section 3.1 and the AList constructor,
StepList pretty , ExcList pretty and
FinList pretty in Section 3.2). So we define also a result type forlist evaluation (Figure 7).Now, we have the result types, we can define the functional semantics (Figure 9 shows a rep-resentative part of it). The first step of this function is to check whether the clock is alreadyconsumed, in this case, the function returns the
Timeout value. Otherwise, the expression eval-uation can begin. For calls and applications, we need the above-mentioned list evaluation. Thisproblem is solved by the other semantics function (see Figure 8), where we pass the curried versionof the original functional big-step semantics as an argument (we decrease clock only in the originalfunctional big-step semantics, so that Coq can find the decreasing argument of the function toensure termination, moreover it enables us to use simple, yet powerful induction over the clock).Note that we decided to decrease the clock value on every nested recursive call, otherwise Coqcannot find the decreasing argument of the semantics function (however, this problem could besolved with the
Program construction possibly too). We also note, that exceptions, failures andtimeouts were be handled together (except in the semantics of
ETry ), because these results justneeded to be propagated resulting in a short semantics definition.2
Fixpoint eval fbos expr ( clock : nat ) (Γ : Environment ) ( exp : Expression )( eff : SideEffectList ) { struct clock } : ResultType := match clock with | ⇒ Timeout | S clock’ ⇒ match exp with | ELit l ⇒ Result ( inl ( VLit l )) eff | EVar v ⇒ Result ( get value Γ ( inl v )) eff | EFunId f ⇒ Result ( get value Γ ( inr f )) eff | EFun vl e ⇒ ( inl ( VClos
Γ [] vl e )) eff | EApp exp l ⇒ match eval fbos expr clock’ Γ exp eff with | Result ( inl v ) eff ’ ⇒ let res := eval elems ( eval fbos expr clock’ ) Γ l eff ’ inmatch res with | LResult ( inl vl ) eff ’’ ⇒ match v with | VClos ref ext varl body ⇒ if Nat.eqb ( length varl ) ( length vl ) then eval fbos expr clock’ ( append vars to env varl vl ( get env ref ext )) body eff ’’ else Result ( inr ( badarity v )) eff ’’ | ⇒ Result ( inr ( badfun v )) eff ’’ end | LResult ( inr ex ) eff ’’ ⇒ Result ( inr ex ) eff ’’ | LFailure ⇒ Failure | LTimeout ⇒ Timeout end | r ⇒ r end · · · endend . Figure 9: Functional big-step semantics of our case study language In this section we evaluate and compare the semantics definitions given in the previous sections,and we also discuss a coinductive approach to handle divergence.
First of all, we can notice that all the three semantics can handle the evaluation of list of expressionsseparately from the body of the semantics: eval all and eval prefix in the traditional big-step,
ExcList pretty , FinList pretty , and
StepList pretty in the pretty-big-step, and eval elems in thefunctional big-step semantics.In terms of definition size and complexity, the functional big-step definition is superior, it ismuch more compact than the other two. Besides, the pretty-big-step definition uses more inferencerules than the traditional big-step semantics, but these rules are much simpler (they have at mosttwo premises). This difference increases the number of subgoals in proofs in case of the pretty-big-step semantics, but these goals are usually simpler than in the other case. In turn, simpler3goals could mean simpler proofs, but since the pretty-big-step semantics is defined by mutuallyinductive types, the related proofs in some cases can become rather complex due to involvingmutual induction. Figure 10: Comparison of formal expression evaluation
Expression evaluation.
The first problem we encounter is that the traditional big-step andpretty-big-step semantics are relational semantics (defined by an inductive type) and are not in-herently executable. To describe an expression evaluation, we need to prove the evaluation step-by-step using the inference rules (the constructors of the inductive type). As discussed before, wecan also create an evaluation tactic, which can find the proof for expression evaluation, however,the use of this tactic is not efficient: it takes unreasonable amounts of memory and time (see Figure10). The use of pretty-big-step semantics is more efficient than the traditional one, because for onegoal, one derivation rule can match syntactically at most (except in case of different applicationexceptions), thus no backtracking is needed. However, it is still not efficient enough, especiallycompared to the functional approach.4 We can also see (Figure 10) that the proof length of simple expression evaluations in thetraditional and pretty-big-step semantics is similar. However, in the pretty-big-step semantics weused much more inference rules to reach the result, while with the traditional semantics, we hadto use inversion tactic several times and specify results by hand. This is because expression listevaluation is not described step-by-step, but in universally quantified predicates, we needed toinput the result list of values and side effects (e.g. eff and vs in eval all ) during formal evaluation(alternatively, list unfolding lemmas based on the length can also solve this problem). This issueis not present in the pretty-big-step semantics, because lists are handled in a step-by-step way by StepList pretty (again, resembling small-step evaluation). All in all, the complexity of these proofsare similar in both relational approaches.On the other hand, the functional big-step semantics is inherently executable (because it isjust a recursive function), so expressions can be simply evaluated using it, we just have to pick anappropriate initial clock value (recursion limit).Figure 11: Complexity of expression equivalence proofs
Expression equivalence proofs.
As we can see (Figure 11), surprisingly the expression equiv-alence proofs were the most complex in the traditional big-step semantics, while the functionalbig-step style performed very well. This is because in the traditional semantics we had to use listunfolding lemmas several times, which quickly increased the size of the proofs.The use of pretty-big-step semantics was quite straightforward, and the equivalence proofs werenot too complex. Once again, this is partly because no list unfolding lemmas were needed.Based on the diagrams, one could think that there is no disadvantage of using functional big-stepsemantics, but that is not the case. While interactively proving the equivalences (and also semanticsproperties), the intermediate subgoals and assumptions were hard to read and understand, becauseCoq usually oversimplified the function definition, and we often saw the whole definition of thissemantics, and not just necessary parts of it. We used remember tactics [12] on the clock valueswhich prevented the oversimplification, however, this solution is not the most convenient one.
Semantics property proofs.
For semantics property proofs, we chose determinism in case ofthe pretty-big-step and traditional big-step semantics, while a clock increasing lemma in case of the For example a list of length n can be described as [ x , x , .., x n ] with the x , x , .., x n existential variables. . The determinism proof for the traditional approach is very complex.We needed to use various helper theorems about (partial) evaluation of lists of expressions and alot of case distinctions. Still, the proof is quite long.Proving determinism of the pretty-big-step approach was very simple, we did not need to createany helper lemmas, we used only a few case distinctions and the proof is short in spite of havingto use mutual induction principle.The proof complexity of the clock increasing lemma for the functional big-step semantics isbetween the previous two. We had to create and prove one helper theorem, and use several casedistinctions. However, this proof is not as complex as the determinism of the traditional style,we have used simple induction over the clock variable. This style of induction is usable, becausethe clock is decreased in every recursive call of the semantics . We should also note that whileinteractively proving this theorem, the subgoals were difficult to understand because of the reasonsmentioned before.In addition, we also proved the equivalence of these approaches: between pretty-big-step andfunctional approaches, and between traditional big-step and functional approaches by induction.Thereafter using the previous two, we also proved the equivalence of traditional and pretty-big-stepsemantics. We encountered one difficulty: while proving the equivalence of pretty and functionalbig-step semantics, the mutual induction could not be used (in functional big-step semantics, wecan not give a meaning for “intermediate terms”). To solve this problem, we followed the footstepsof Chargu´eraud’s [22] formalisation, and defined another (equivalent) version of the pretty-big-stepsemantics, equipped with a counter which increased when using the derivation rules of the seman-tics, in order to use induction over this counter. We proved the equivalence using this semanticsas an intermediate step. There are two concept which were not investigated in detail: concurrency and divergence. In general,a big-step semantics can not express concurrency efficiently, because it can not handle interleaving.For this purpose, a small-step approach is more suitable. When we evaluate an expression and get a result with the constructor
Result , then we can increase the initialclock, and get the same result. Alternatively, we could have used functional induction (similar to the one mentioned by Owens et al. [21]),however, we faced the limitations of Coq when trying to generate the induction principle. et al. [21] described: the evaluation is divergent, when for any possible clock value the result is
Timeout . We also proved an expression evaluation divergent using this idea and an induction onthe clock.The previously described relational approaches are suitable to describe semantics of terminatingexpressions, however, they can not effectively express divergence. If one wants to reason aboutdivergence too, a coinductive big-step semantics can be used. We have found the work of Leroyand Grall [16] the most influential, where they define a semantics for λ -calculus extended withconstants. They also extend this semantics with traces, a similar feature to our side effect loggingapproach. Moreover, they also implemented these semantics in Coq and the source is availablepublicly.We followed their footsteps to define a coinductive big-step semantics for our case-study lan-guage (in particular, for applications) with a distinct relation. For the divergence rules, we neededinfinite traces for side effects too. However, this approach is not straightforward to use because ofthe guardedness of subgoals and we are still investigating this issue. In conclusion, we defined various approaches (primarily traditional big-step, pretty-big-step andfunctional big-step semantics) to define the semantics of a functional programming language, andused a small subset of sequential Core Erlang as a case study. We proved the equivalence ofthese semantics and evaluated, compared them from different aspects in order to choose the mostfitting way to reason about refactoring correctness. Every one of these has its advantages anddisadvantages. Our main three aspects were the executability of the approach, the complexity ofexpression equivalence proofs and proofs about the properties of the semantics, and from this pointof view, the functional big-step style semantics proved to be the most useful. We highlight the fact,that these semantics and proofs are all formalised in Coq [24].In the future, we are planning to formalise functional big-step semantics for sequential CoreErlang to enable effective testing of the semantics and then use comparative testing of our CoreErlang semantics and a small-step Erlang semantics [15] defined by one of our former projectmembers. Naturally, we also plan to prove the existing big-step and the mentioned functional big-step semantics equivalent, after having finished the implementation. We also plan to investigatethe coinductive approach more in detail. Our long term goal is to formalise entire Core Erlang andErlang in Coq to reason about refactoring correctness on Erlang programs.
Acknowledgements
This work was supported by the project “Integr´alt kutat´oi ut´anp´otl´as-k´epz´esi program az infor-matika ´es sz´am´ıt´astudom´any diszciplin´aris ter¨uletein (Integrated program for training new gen-eration of researchers in the disciplinary fields of computer science)”, No. EFOP-3.6.3-VEKOP-16-2017-00002. The project has been supported by the European Union and co-funded by theEuropean Social Fund.“Application Domain Specific Highly Reliable IT Solutions” project has been implementedwith the support provided from the National Research, Development and Innovation Fund of Hun-gary, financed under the Thematic Excellence Programme TKP2020-NKA-06 (National ChallengesSubprogramme) funding scheme.7
References [1] Nada Amin and Tiark Rompf. Type Soundness Proofs with Definitional Interpreters.
SIG-PLAN Not. , 52(1):666–679, January 2017.[2] Casper Bach Poulsen and Peter D. Mosses. Deriving Pretty-Big-Step Semantics from Small-Step Semantics. In Zhong Shao, editor,
Programming Languages and Systems , pages 270–289,Berlin, Heidelberg, 2014. Springer Berlin Heidelberg.[3] P´eter Bereczky, D´aniel Horp´acsi, and Simon Thompson. A Proof Assistant Based Formalisa-tion of a Subset of Sequential Core Erlang. In Aleksander Byrski and John Hughes, editors,
Trends in Functional Programming , pages 139–158, Cham, 2020. Springer International Pub-lishing.[4] P´eter Bereczky, D´aniel Horp´acsi, and Simon J. Thompson. Machine-Checked Natural Seman-tics for Core Erlang: Exceptions and Side Effects. In
Proceedings of the 19th ACM SIGPLANInternational Workshop on Erlang , Erlang 2020, page 1–13, New York, NY, USA, 2020. As-sociation for Computing Machinery.[5] Sandrine Blazy and Xavier Leroy. Mechanized Semantics for the Clight Subset of the CLanguage.
Journal of Automated Reasoning , 43(3):263–288, Jul 2009.[6] Martin Bodin, Thomas Jensen, and Alan Schmitt. Certified Abstract Interpretation withPretty-Big-Step Semantics. In
Proceedings of the 2015 Conference on Certified Programsand Proofs , CPP ’15, page 29–40, New York, NY, USA, 2015. Association for ComputingMachinery.[7] Richard Carlsson, Bj¨orn Gustavsson, Erik Johansson, Thomas Lindgren, Sven-Olof Nystr¨om,Mikael Pettersson, and Robert Virding. Core Erlang 1.0.3 language specification. Technicalreport, 2004.[8] Arthur Chargu´eraud. Pretty-Big-Step Semantics. In Matthias Felleisen and Philippa Gardner,editors,
Programming Languages and Systems , pages 41–60, Berlin, Heidelberg, 2013. SpringerBerlin Heidelberg.[9] S¸tefan Ciobˆac˘a. From Small-Step Semantics to Big-Step Semantics, Automatically. InEinar Broch Johnsen and Luigia Petre, editors,
Integrated Formal Methods , pages 347–361,Berlin, Heidelberg, 2013. Springer Berlin Heidelberg.[10] S¸tefan Ciobˆac˘a, Dorel Lucanu, Vlad Rusu, and Grigore Ro¸su. A language-independent proofsystem for mutual program equivalence. In Stephan Merz and Jun Pang, editors,
FormalMethods and Software Engineering , pages 75–90, Cham, 2014. Springer International Publish-ing.[11] The Coq Proof Assistant Documentation. https://coq.inria.fr/documentation , 2020.Accessed on October 1st, 2020.[12] Ltac documentation. https://coq.inria.fr/refman/proof-engine/ltac.html , 2020. Ac-cessed on September 22nd, 2020.[13] A. Garrido and J. Meseguer. Formal specification and verification of java refactorings. In , pages165–174, 2006.[14] G. Kahn. Natural semantics. In Franz J. Brandenburg, Guy Vidal-Naquet, and MartinWirsing, editors,
STACS 87 , pages 22–39, Berlin, Heidelberg, 1987. Springer Berlin Heidelberg.8[15] Judit K˝oszegi. KErl: Executable semantics for Erlang.
CEUR Workshop Proceedings ,2046:144–160, 2018.[16] Xavier Leroy and Herv´e Grall. Coinductive big-step operational semantics.
Information andComputation , 207(2):284 – 304, 2009. Special issue on Structural Operational Semantics(SOS).[17] Keiko Nakata and Masahito Hasegawa. Small-step and big-step semantics for call-by-need.
Journal of Functional Programming , 19(6):699–722, 2009.[18] Keiko Nakata and Tarmo Uustalu. Trace-Based Coinductive Operational Semantics for While.In Stefan Berghofer, Tobias Nipkow, Christian Urban, and Makarius Wenzel, editors,
TheoremProving in Higher Order Logics , pages 375–390, Berlin, Heidelberg, 2009. Springer BerlinHeidelberg.[19] Core Erlang Formalization. https://github.com/harp-project/Core-Erlang-Formalization , 2020. Accessed on September 24th, 2020.[20] Tobias Nipkow and Gerwin Klein.
Concrete semantics: with Isabelle/HOL . Springer, 2014.[21] Scott Owens, Magnus O. Myreen, Ramana Kumar, and Yong Kiam Tan. Functional Big-StepSemantics. In Peter Thiemann, editor,
Programming Languages and Systems , pages 589–615,Berlin, Heidelberg, 2016. Springer Berlin Heidelberg.[22] Pretty-big-step semantics formalisation. . Accessed on October 26th, 2020.[23] John C. Reynolds. Definitional Interpreters for Higher-Order Programming Languages. In
Proceedings of the ACM Annual Conference - Volume 2 , ACM ’72, page 717–740, New York,NY, USA, 1972. Association for Computing Machinery.[24] Semantics comparison. https://github.com/harp-project/Semantics-comparison , 2020.Accessed on October 26th, 2020.[25] Yong Kiam Tan, Magnus O. Myreen, Ramana Kumar, Anthony Fox, Scott Owens, and MichaelNorrish. A New Verified Compiler Backend for CakeML.
SIGPLAN Not. , 51(9):60–73, Septem-ber 2016.[26] Z. Yang and H. Lei. FEther: An Extensible Definitional Interpreter for Smart-Contract Veri-fications in Coq.