[PDF] A Comparison of Big-step Semantics Definition Styles

Abstract

Formal semantics provides rigorous, mathematically precise definitions of programming languages, with which we can argue about program behaviour and program equivalence by formal means; in particular, we can describe and verify our arguments with a proof assistant. There are various approaches to giving formal semantics to programming languages, at different abstraction levels and applying different mathematical machinery: the reason for using the semantics determines which approach to choose. In this paper we investigate some of the approaches that share their roots with traditional relational big-step semantics, such as (a) functional big-step semantics (or, equivalently, a definitional interpreter), (b) pretty-big-step semantics and (c) traditional natural semantics. We compare these approaches with respect to the following criteria: executability of the semantics definition, proof complexity for typical properties (e.g. determinism) and the conciseness of expression equivalence proofs in that approach. We also briefly discuss the complexity of these definitions and the coinductive big-step semantics, which enables reasoning about divergence. To enable the comparison in practice, we present an example language for comparing the semantics: a sequential subset of Core Erlang, a functional programming language, which is used in the intermediate steps of the Erlang/OTP compiler. We have already defined a relational big-step semantics for this language that includes treatment of exceptions and side effects. The aim of this current work is to compare our big-step definition for this language with a variety of other equivalent semantics in different styles from the point of view of testing and verifying code refactorings.

Full PDF

11 A Comparison of Big-step Semantics Deﬁnition Styles

P´eter Bereczky , D´aniel Horp´acsi , Simon Thompson , ELTE E¨otv¨os Lor´and University, Department of Programming Languages and Compilers University of Kent, School of Computing [email protected], [email protected], [email protected] Abstract

Formal semantics provides rigorous, mathematically precise deﬁnitions of programminglanguages, with which we can argue about program behaviour and program equivalence byformal means; in particular, we can describe and verify our arguments with a proof assistant.There are various approaches to giving formal semantics to programming languages, at dif-ferent abstraction levels and applying diﬀerent mathematical machinery: the reason for usingthe semantics determines which approach to choose.In this paper we investigate some of the approaches that share their roots with traditionalrelational big-step semantics, such as (a) functional big-step semantics (or, equivalently, a def-initional interpreter), (b) pretty-big-step semantics and (c) traditional natural semantics. Wecompare these approaches with respect to the following criteria: executability of the semanticsdeﬁnition, proof complexity for typical properties (e.g. determinism) and the conciseness ofexpression equivalence proofs in that approach. We also brieﬂy discuss the complexity of thesedeﬁnitions and the coinductive big-step semantics, which enables reasoning about divergence.To enable the comparison in practice, we present an example language for comparing thesemantics: a sequential subset of Core Erlang, a functional programming language, whichis used in the intermediate steps of the Erlang/OTP compiler. We have already deﬁned arelational big-step semantics for this language that includes treatment of exceptions and sideeﬀects. The aim of this current work is to compare our big-step deﬁnition for this languagewith a variety of other equivalent semantics in diﬀerent styles from the point of view of testingand verifying code refactorings.

This work is part of a wider project that aims to reason about the correctness of code refactoring. Tothis end, a rigorous, formal deﬁnition is needed for the programming language under refactoring:in our case, Erlang. In earlier work, we developed a relational big-step semantics for sequentialCore Erlang, including exceptions and side eﬀects. This semantics is used in general proofs ofcharacteristic properties (e.g. determinism) as well as proofs of equivalence between pairs of patternexpressions. The latter are important from the refactoring point of view: pattern equivalences canbe interpreted as simple, correct refactorings for Core Erlang [3, 4]. Building on these simpleequivalences, we plan to prove compound code transformations correct.Formalising Core Erlang in the big-step operational deﬁnitional approach was a somewhat adhoc decision, supported by the following facts: it is not as detailed as small-step deﬁnitions, oﬀeringshorter proofs, and, at the same time, unlike in denotational deﬁnitions, semantics and proofs ofnondeterministic and divergent programs do not need special treatment in the proof assistantembedding. Nonetheless, relational big-step semantics comes with its drawbacks: in general, itis not directly executable, the proof of determinism is complex, and we cannot use this styleof semantics to argue about concurrency. After working with the relational big-step semanticsformalisation for a while, the shortcomings of this approach became apparent, and we decided toinvestigate whether other semantics deﬁnition styles would be more suitable.It seems to be a simple choice: the purpose of deﬁning the semantics should determine theapplied deﬁnition approach. However, conﬂicting requirements can make the decision unclear. a r X i v : . [ c s . P L ] N ov For instance, in related work, diﬀerent approaches have been applied to reason about programtransformations: Grigore et al. [10] and Garrido et al. [13] use (reduction style) small-step semantics,whilst Owens et al. [21] use (functional) big-step semantics. Both of these are executable and canbe used to argue about program equivalence, but show diﬀerent characteristics in general. Thereare a number of ways to create a testable and usable formal semantics, as, for example, addressedin a related discussion by Blazy and Leroy [5], but it is not obvious to tell which is the best optionfor our purposes. Moreover, this choice is not only about the diﬀerent mathematical approaches,but also how easy it is to implement them in the Coq proof assistant.In this paper we analyse and compare diﬀerent methods of deﬁning big-step style semantics fora small, Erlang-like programming language. We do this to answer the question of which methodshould be used when creating a semantic description of sequential Core Erlang, when the descrip-tion should support equivalence proofs and be eﬃciently executable. In doing this we survey thefollowing methods: (a) traditional relational big-step semantics [14], (b) pretty-big-step seman-tics [8], and (c) functional big-step semantics [21], which can be seen as equivalent to supplying adeﬁnitional interpreter [23]. We also brieﬂy discuss a coinductive approach to deﬁne big-step se-mantics [16]. When comparing the semantic approaches, we aim to answer the following questions:1. Does the semantics deﬁnition scale in terms of the complexity of expression equivalenceproofs? Since our primary purpose is to prove expression pattern equivalences, the semanticshas to be especially supportive of constructing such proofs.2. Is the semantics eﬀectively executable, allowing for automatic evaluation of expressions? Isthis automatic execution eﬃcient, with a performance comparable to a reference implemen-tation? Execution of the semantics deﬁnition is crucial when it comes to validation: testingthe semantics against a reference implementation needs the semantics to be executed.3. How complex are the proofs for the common properties such as determinism or progress?For instance, some semantics are inherently deterministic, because they are presented as asemantic function, while it is a lot more cumbersome to prove this property in a relationalsemantics.We note that the paper not only makes a survey of the abovementioned semantics deﬁnition styles,but implements a benchmark language in each of those, and makes the detailed comparison basedon the case study. Namely, we make the following main contributions: • Traditional big-step, pretty-big-step and functional big-step semantics deﬁnitions for a simplefunctional programming language resembling sequential Core Erlang, moreover, we prove theequivalence of these deﬁnitions too. • Proofs of basic properties of each semantics and proofs for simple expression pattern equiv-alences (local refactorings) in each deﬁnition style. • A systematic comparison of the approaches with respect to execution and proof complexity.We will often quote Coq code to highlight the fact that all these concepts have been formalizedin Coq [24]. Inductive constructors in the relational semantics are described as inference rules.The rest of the paper is structured as follows. In Section 2 we describe the syntax, and necessaryabstractions for our benchmark language. In Section 3 we discuss the traditional big-step andthe pretty-big-step semantics, and in Section 4 we cover the functional approaches, in particularthe functional big-step semantics. Section 5 evaluates the presented approaches, and also brieﬂysummarises coinductive big-step semantics [16]. Finally, Section 6 concludes and discusses futurework.

Throughout the paper, we deﬁne formal semantics for a simple but representative, functionalprogramming language, which resembles Erlang; in fact, our case study language is a proper subsetof Core Erlang. In this section, we introduce the syntax and a semantic domain for the language,based on which the later sections will deﬁne big-step operational semantics of diﬀerent styles inorder to make a systematic comparison between them.

The case study language includes abstractions known from the functional paradigm (such as singleassignment variables, let -binding, lambda abstraction and function application), but we also incor-porate impure expressions (such as I/O calls and exception handling). Furthermore, the languagesupports recursive function deﬁnitions ( letrec -binding), but only one name can be bound by eachexpression. Figure 1 deﬁnes the syntax of the language precisely, as an inductive type.

Inductive

Expression : Type := | ELit ( l : Literal a ) | EVar ( v : Var ) | EFunId ( f : FunctionIdentiﬁer ) | EFun ( vl : list Var ) ( e : Expression ) | ECall ( f : string ) ( params : list Expression ) | EApp ( exp : Expression ) ( params : list Expression ) | ELet ( v : Var ) ( e b : Expression ) | ELetRec ( ﬁd : FunctionIdentiﬁer ) ( params : list Var ) ( b e : Expression ) | ETry ( e : Expression ) ( v : Var ) ( e : Expression ) ( vl : list Var ) ( e : Expression ). a Literals are either atoms or integers.

Figure 1: The syntax of our case study language (subset of Core Erlang)

This language has expressions of three types: atoms, integers and functions. Therefore, values ofexpressions can only be literal values and closures (see Figure 2). Closures are the normal forms offunctions, and store the function’s parameter list, body expression and an evaluation environmentin which the body should be evaluated; moreover, the collection of recursive functions deﬁnedsimultaneously . Inductive

Value : Type := | VLit ( l : Literal ) | VClos ( ref : Environment ) ( ext : list ( FunctionIdentiﬁer × FunctionExpression ))( vl : list Var ) ( e : Expression ). Definition

Exception := ExceptionClass × Value × Value .Figure 2: Semantic domainExceptions may also be the results of expression evaluations. In our formalisation, exceptionsare represented as triples: exception class ( error , throw or exit ) and two values describing theexception reason. In our case studies, we will use two often seen exceptions known from Erlang: badarity happens when an application evaluation fails due to the faulty parameter number, and badfun is encountered when the main expression of the application evaluates to a value that is nota function closure. The presented approach is based on our previous work and is fairly general: it can handle multiple simultaneousfunction deﬁnitions, not only one; see [4] for more details.

Finally, we deﬁne the semantic domain as the union of values and exception descriptions:

Value + Exception . In the formalisation, we use Coq’s built-in union type with the standard inl and inr constructors to make elements of the semantic domain.

In order to share as much as possible in the diﬀerent semantics deﬁnitions, not only we ﬁx thesemantic domain, but we deﬁne a common type for the evaluation environment. Basically, this is acollection of variable names (and function identiﬁers) mapped to values. There are several helperfunctions to manage this environment, namely: • get value : Returns the value associated with a given name. If the name is unbound, it yieldsan exception. • insert value : Inserts a binding into the environment. • append vars to env : Inserts several variable bindings into the environment. • append funs to env : Inserts function identiﬁer-closure bindings into the environment.We remind the reader that the case study language allows for calling some built-in I/O functions,thus the semantics will need to address the meaning of these side-eﬀects. For this, we deﬁne a type( SideEﬀectList ), the values of which log simple input-output eﬀects produced by the evaluation ofspeciﬁc

ECall expressions. While evaluating

ECall expressions, we use the auxiliary eval function,which returns a value or exception and a side eﬀect trace — only this operation can extend theside eﬀect trace. In our previous work [4] we applied a slightly diﬀerent method using standardlist append operations in every derivation rule; however, this former approach had to be reﬁned inorder to support automatic evaluation of expressions.

As mentioned already in the introduction, we will compare the diﬀerent approaches of deﬁningbig-step semantics based on the following criteria: • How complex is proving the properties of the semantics. We will use the determinism propertyto investigate this. • Is the approach executable? Is the semantics eﬃciently executable? • How complex is proving expression evaluation formally. We will use two smaller expressionsto investigate this: let X = fun (Y , Z) -> Y inapply X(’a ’, ’b ’)let X = 4 inlet Y = 5 inapply ( fun (X , Y) -> X + Y) (X , Y)

Listing 1: Expression evaluation examples • How complex is proving expression equivalence? We will use one unconditional and oneconditional equivalence to investigate this: e ⇔ let X = fun () -> e in apply X ()let A = e in let B = e inlet B = e in ⇔ let A = e in A + B A + B Listing 2: Expression equivalence examples In the second example, the side eﬀects produced by e and e are swapped during the evaluation of theseexpressions. A traditional big-step operational semantics is a relation between the evaluable expression andits value, or more generally, between initial and ﬁnal conﬁgurations, where the conﬁgurations mayinclude the evaluation environment or the side-eﬀects of the evaluation. Note that in big-step style,the intermediate stages of the evaluation are not visible from the relation [20]. The idea of thisstyle of semantics is originated from Kahn [14]. In Coq, such a relation can be formalised with aninductive type, where the data constructors represent the derivation rules (or judgements).

Traditional inductive big-step semantics are used in many projects, to mention but a few: derivingsuch a semantics from a small-step deﬁnition [9], call-by-need semantics of let and letrec calculus( λ let , λ letrec ) [17], or the trace-based operational semantics for While [18] (this one is deﬁnedcoinductively), as well as our project deﬁning Core Erlang [3, 4, 19].For the investigation of the diﬀerent big-step deﬁnition styles, we reuse our Core Erlang for-malisation mentioned above, but discard parts of it since the case study language used in thecomparison is a subset of it. The big-step semantics will be denoted by (cid:104) Γ , exp , eﬀ (cid:105) ⇓ { res , eﬀ } where Γ is the evaluation environment, exp is the evaluable expression, eﬀ and eﬀ are the initialand ﬁnal side eﬀect traces and res is the result which is either a value or an exception. Beforedescribing the semantics, we introduce some predicates and notations for readability about eval-uating a list of expressions (we use | l | to denote the length of list l , S i denotes the successor of i and l [ i ] denotes the i th element of l ). The function nth def l default i works the same way as l [ i − i >

0, but for i = 0 it returns the default value. We also use Coq’s standard last function [11]. eval all Γ es vs eﬀ eﬀ := ( | es | = | vs | ) ⇒ ( | es | = | eﬀ | ) ⇒ ( ∀ j < | es | , (cid:104) Γ , es [ j ] , nth def eﬀ eﬀ j (cid:105) ⇓ { inl vs [ j ] , nth def eﬀ eﬀ ( S j ) } ) eval preﬁx Γ es vs i eﬀ eﬀ := ( i < | es | ) ⇒ ( | vs | = i ) ⇒ ( | eﬀ | = i ) ⇒ ( ∀ j < i , (cid:104) Γ , es [ j ] , nth def eﬀ eﬀ j (cid:105) ⇓ { inl vs [ j ] , nth def eﬀ eﬀ ( S j ) } ) . The eval all states that an expression list es evaluates to a value list vs (note that this for-mula also expresses that the evaluation of the expressions does not produce any exceptions). Theevaluation for the j th step starts with the side eﬀect log eﬀ [ j −

1] (or the default initial log eﬀ )and the result log is eﬀ [ j ]. The eval preﬁx describes the same behaviour, but only for the ﬁrst i elements. Now we can describe the big-step semantics for our case study language (Figure 3 showsthe evaluation of expressions without exceptions, and Figure 4 explains the exceptional semantics). The traditional, relational big-step semantics introduced in the previous section is not inherentlyexecutable or computable: for a given pair of starting and ﬁnal conﬁgurations, it needs to beproven that they are in operational semantics relation. In Coq, such a proof can be given in termsof proof primitives, or one can write a program in the tactic language to construct the proof.Automatic execution of relational semantics can be done with the latter. We could also see theevaluation tactics as a machinery that can turn the relational semantics into functional: a programin the tactic language can perform pattern matching, case distinction and even recursion, and canultimately compute the results of the relation.

Limitations of Coq tactics.

Executing the relational semantics in Coq involves technical con-siderations: one needs to make sure that the operational semantics derivation rules do not containauxiliary function calls in their consequences. Otherwise, the Coq tactic language cannot do simplepattern matching on the proof goals and prevents syntax-directed evaluation. In our semantics, In the following ﬁgures, the result res could be either a value or an exception, so its type is

Value + Exception . (cid:104) Γ , ELit l , eﬀ (cid:105) ⇓ { inl ( VLit l ) , eﬀ } ( Lit ) res = get value Γ ( inl s ) (cid:104) Γ , EVar s , eﬀ (cid:105) ⇓ { res , eﬀ } ( Var ) (cid:104) Γ , EFun vl e , eﬀ (cid:105) ⇓ { inl ( VClos Γ (cid:104)(cid:105) vl e ) , eﬀ } ( Fun ) res = get value Γ ( inr ﬁd ) (cid:104) Γ , EFunId ﬁd , eﬀ (cid:105) ⇓ { res , eﬀ } ( FunId ) eval all Γ params vals eﬀ eﬀ eval fname vals ( last eﬀ eﬀ ) = ( res , eﬀ ) (cid:104) Γ , ECall fname params , eﬀ (cid:105) ⇓ { res , eﬀ } ( Call ) (cid:104) Γ , exp , eﬀ (cid:105) ⇓ { inl ( VClos ref ext var list body ) , eﬀ } | var list | = | vals | eval all Γ params vals eﬀ eﬀ (cid:104) append vars to env var list vals ( get env ref ext ) , body , last eﬀ eﬀ (cid:105) ⇓ { res , eﬀ }(cid:104) Γ , EApp exp params , eﬀ (cid:105) ⇓ { res , eﬀ } ( App ) (cid:104) Γ , e , eﬀ (cid:105) ⇓ { inl val , eﬀ } (cid:104) insert value Γ v val , b , eﬀ (cid:105) ⇓ { res , eﬀ }(cid:104) Γ , ELet v e b , eﬀ (cid:105) ⇓ { res , eﬀ } ( Let ) (cid:104) append funs to env [ ﬁd ] [ params ] [ b ] Γ , e , eﬀ (cid:105) ⇓ { res , eﬀ }(cid:104) Γ , ELetRec ﬁd params b e , eﬀ (cid:105) ⇓ { res , eﬀ } ( LetRec )Figure 3: The core traditional big-step deﬁnition of our case study languagewe needed to apply minor changes in the derivation rule of variables and at uses of the append operation on side eﬀect logs in our Core Erlang semantics [4]. The issue has been solved by refactor-ing: we replaced the auxiliary function applications with fresh variables and added extra premisesstating equality between the variables and the corresponding function applications.On the other hand, in case of the side eﬀects (and the mentioned append operations) to avoidthe introduction of unreasonable numbers of new variables, we changed the use of these traces.Note that currently only

ECall expressions can cause new side eﬀects, the other rules just have topropagate the logs. Instead of handling only the additional side eﬀects of an expression evaluationstep, we rather consider using always the whole initial and ﬁnal side eﬀect traces (i.e. not only thediﬀerence like in [4]). This way we could dispose of the append operations in the consequences ofthe derivation rules. (cid:104) Γ , e , eﬀ (cid:105) ⇓ { inl val’ , eﬀ } (cid:104) insert value Γ v val’ , e , eﬀ (cid:105) ⇓ { res , eﬀ }(cid:104) Γ , ETry e v e vl e , eﬀ (cid:105) ⇓ { res , eﬀ } ( Try ) (cid:104) Γ , e , eﬀ (cid:105) ⇓ { inr ( ex , ex , ex ) , eﬀ }(cid:104) append try vars to env vl [ exclass to value ex ; ex ; ex ] Γ , e , eﬀ (cid:105) ⇓ { res , eﬀ }(cid:104) Γ , ETry e v e vl e , eﬀ (cid:105) ⇓ { res , eﬀ } ( Catch )For the next rule, let us consider nonclosure v := ∀ Γ (cid:48) , ext , var list , body : v (cid:54) = VClos Γ (cid:48) ext var list body . (cid:104) Γ , exp , eﬀ (cid:105) ⇓ { inl v , eﬀ } eval all Γ params vals eﬀ eﬀ nonclosure v eﬀ = last eﬀ eﬀ (cid:104) Γ , EApp exp params , eﬀ (cid:105) ⇓ { inr ( badfun v ) , eﬀ } ( AppExc )In the following rule, we denote VClos ref ext var list body with v . (cid:104) Γ , exp , eﬀ (cid:105) ⇓ { inl v, eﬀ } eval all Γ params vals eﬀ eﬀ | var list | (cid:54) = | vals | eﬀ = last eﬀ eﬀ (cid:104) Γ , EApp exp params , eﬀ (cid:105) ⇓ { inr ( badarity v ) , eﬀ } ( AppExc ) eval preﬁx Γ params vals i eﬀ eﬀ (cid:104) Γ , params [ i ] , last eﬀ eﬀ (cid:105) ⇓ { inr ex , eﬀ }(cid:104) Γ , ECall fname params , eﬀ (cid:105) ⇓ { inr ex , eﬀ } ( CallExc ) (cid:104) Γ , e , eﬀ (cid:105) ⇓ { inr ex , eﬀ }(cid:104) Γ , ELet v e b , eﬀ (cid:105) ⇓ { inr ex , eﬀ } ( LetExc ) (cid:104) Γ , exp , eﬀ (cid:105) ⇓ { inr ex , eﬀ }(cid:104) Γ , EApp exp params , eﬀ (cid:105) ⇓ { inr ex , eﬀ } ( AppExc ) (cid:104) Γ , exp , eﬀ (cid:105) ⇓ { inl v , eﬀ } eval preﬁx Γ params vals i eﬀ eﬀ (cid:104) Γ , params [ i ] , last eﬀ eﬀ (cid:105) ⇓ { inr ex , eﬀ }(cid:104) Γ , EApp exp params , eﬀ (cid:105) ⇓ { inr ex , eﬀ } ( AppExc )Figure 4: The traditional big-step operational semantics of exception creation and propagation Evaluation tactic.

We use Coq’s tactic sublanguage called Ltac [12] to automate proof con-struction. In our case, the evaluation of the semantics of the case study language without exceptionsis syntax-directed, i.e. a tactic can be designed to evaluate any expression in any context based onpattern-matching on the expression to be evaluated (e.g.

ECall expression can be evaluated with

Call ). On the other hand, after introducing exceptions, several derivation rules are applicablefor evaluating a particular expression (e.g. there are two rules for

ECall , ﬁve rules for functionapplications, etc.). We extended the evaluation tactic to try applying the applicable rules one afterthe other. This can be seen as a backtracking proof-search for a successful evaluation path. As it turned out, such evaluation tactics in Coq are rather ineﬀective in terms of time andspace. To speed up the execution, we can create some helper functions and prove lemmas aboutspeciﬁc expressions (e.g. the evaluation of parameters which are just literals), so that the evaluationtactic can apply these lemmas before trying to evaluate an expression with the mentioned slowbacktracking process. These lemmas can signiﬁcantly speed up the evaluation of expressions whichcontain such speciﬁc sub-expressions; however, they only solve a small part of the problem.

As seen before, the traditional deﬁnition contains several similar rules with the same premises.The idea of Chargu´eraud — called pretty-big-step semantics [8] — is focusing on eliminatingthis redundancy. Let us discuss his idea through our case study using the evaluation rules forapplications. First of all, Chargu´eraud identiﬁed two sources of duplication: • The similar premises in the rules for exceptions, correct evaluation (and divergence). • The duplication of the evaluation judgement both for values and exceptions. This is notpresent in our case study language, however,

App could be described in form of two rules:one for exception and one for the value ﬁnal result.In the following paragraphs we focus on the ﬁrst problem. Instead of using duplicated conditions,Chargu´eraud suggests to use “intermediate terms” which contain the satisﬁed conditions implicitly.These can be seen also as terms, which remember the state of the evaluation, i.e. which sub-termshave already been evaluated (this resembles a small-step semantics in some aspects).

Applications with intermediate terms.

Let us see how the idea applies to our semantics.First, we need to create the syntax for intermediate terms (see Figure 5). In our case, we need threeadditional constructors for applications:

AApp1 corresponds to the function expression evaluation,

AList to the evaluation of the parameters, while

AApp2 to the application exception creation andfunction body evaluation.

Inductive

AuxExpression := | AApp1 ( b : Value + Exception ) ( params : list Expression ) | AApp2 ( v : Value ) ( b : list Value + Exception ) · · · . Inductive

AuxList := AList ( rest : list Expression ) ( b : list Value + Exception ).Figure 5: The syntax of intermediate termsAfter having the intermediate terms deﬁned, we can rewrite the semantics of applications(Figure 6). We decided not to include our side eﬀect traces in the intermediate terms, because thisway the eﬀects can be handled just like before. First, we have to evaluate the function expressionof the application (

App pretty ). We create the intermediate term AApp1 with the result of this step.If this result was an exception, then the evaluation is ﬁnished with

ExcApp pretty , otherwise, theparameters follow after using FinApp pretty .When there are parameters, we can take the ﬁrst one and evaluate it with StepList pretty .The result will be appended to the end of the value list in the constructor

AList if it is a valueby the mk result function; however, in case of an exception this attribute of

AList becomes thementioned exception. We repeat this process until all parameter expressions are evaluated, or anexception occurs inside the

AList . In the latter case,

ExcList pretty ﬁnishes the evaluation, and thestored exception will be propagated. When there is no exception, we use

FinList pretty to ﬁnish theparameter list evaluation. At this point, we can notice that this a general approach to evaluatinga list of expressions, so it can be used for

ECall expressions too.In this ﬁgure, lres is either a list of

Value s, or an exception, so its type is list Value + Exception . (cid:104) Γ , exp , eﬀ (cid:105) ⇓ p { res , eﬀ } (cid:104) Γ , AApp1 res params , eﬀ (cid:105) ⇓ p { res , eﬀ }(cid:104) Γ , EApp exp params , eﬀ (cid:105) ⇓ p { res , eﬀ } ( App pretty ) (cid:104) Γ , AApp1 ( inr ex ) params , eﬀ (cid:105) ⇓ p { inr ex , eﬀ } ( ExcApp pretty ) (cid:104) Γ , AList params ( inl []) , eﬀ (cid:105) ⇓ p { lres , eﬀ }(cid:104) Γ , AApp2 v lres , eﬀ (cid:105) ⇓ p { res , eﬀ }(cid:104) Γ , AApp1 ( inl v ) params , eﬀ (cid:105) ⇓ p { res , eﬀ } ( FinApp pretty ) (cid:104) Γ , AApp2 v ( inr ex ) , eﬀ (cid:105) ⇓ p { inr ex , eﬀ } ( ExcApp pretty ) | var list | = | vals |(cid:104) append vars to env var list vals ( get env ref ext ) , body , eﬀ (cid:105) ⇓ p { res , eﬀ }(cid:104) Γ , AApp2 ( VClos ref ext var list body ) ( inl vals ) , eﬀ (cid:105) ⇓ p { res , eﬀ } ( FinApp pretty ) nonclosure v (cid:104) Γ , AApp2 v ( inl vals ) , eﬀ (cid:105) ⇓ p { inr ( badfun v ) , eﬀ } ( ExcApp pretty , badfun )In the following rule, we denote VClos ref ext var list body with v . | var list | (cid:54) = | vals |(cid:104) Γ , AApp2 v ( inl vals ) , eﬀ (cid:105) ⇓ p { inr ( badarity v ) , eﬀ } ( ExcApp pretty , badarity ) (cid:104) Γ , AList [] ( inl vals ) , eﬀ (cid:105) ⇓ p { inl vals , eﬀ } ( FinList pretty ) (cid:104) Γ , AList rest ( inr ex ) , eﬀ (cid:105) ⇓ p { inr ex , eﬀ } ( ExcList pretty ) (cid:104) Γ , r , eﬀ (cid:105) ⇓ p { res , eﬀ } (cid:104) Γ , AList rest ( mk result res vals ) , eﬀ (cid:105) ⇓ p { lres , eﬀ }(cid:104) Γ , AList ( r :: rest ) ( inl vals ) , eﬀ (cid:105) ⇓ p { lres , eﬀ } ( StepList pretty )Figure 6: Pretty-big-step semantics for applicationsFinally, if there was an exception during parameter evaluation, instead of the parameter values,an exception is stored in

AApp2 , and this can be propagated with

ExcApp pretty . Otherwise,all parameters were correctly evaluated and FinApp pretty can be applied, when the ﬁrst savedvalue (the evaluated application function expression) is a closure, moreover, the number of formal0parameters in this closure is the same as the actual parameters, which are also stored in a value listin AApp2 . However, if the ﬁrst stored value is not a closure, we use

ExcApp pretty , badfun and a badfun exception will be the result, otherwise, we can apply ExcApp pretty , badarity if the number of formal andactual parameters mismatch to create a badarity exception. Brief evaluation.

Compared to the traditional semantics, in the pretty-big-step approach we seethe increase in the number of inference rules, while the premise redundancy is eliminated and thenumber of premises drops to two at most. Obviously, pretty-big-step semantics cannot overcomeall weaknesses of the big-step approach, but provides a good alternative in terms of readabilityand usability.Transforming the big-step semantics to pretty-big-step style was a straightforward process,except the transformation of expression lists: if there were two or more derivation steps in thebig-step premises, the intermediate results were turned into terms like

AApp1 , and the rule wassplit. These steps could have been automated, however, in case of expression lists, the use ofaccumulation in

AList instead of the eval all predicate was not as simple.We should also note, that the pretty-big-step deﬁnition is a relational semantics just like thetraditional one. This means, we need a tactic again to execute this semantics, however, unlikein the traditional case, here backtracking is not needed, because the evaluation is syntax-driven(with the exception of the last step of application evaluation:

FinApp pretty , ExcApp pretty , badfun and ExcApp pretty , badarity ). Although, the use of this evaluation tactic is still not eﬃcient enough.There are interesting applications of the method in related research, such as deriving pretty-big-step style semantics from small-step semantics [2] or certiﬁed abstract interpretation using thisdeﬁnition style [6]. We have summarised two ways to create a relational big-step semantics, however, both of themsuﬀer mainly from the same problem: they cannot be executed eﬃciently. In this section, we discussa functional approach, called functional big-step semantics [21] and its origin, the deﬁnitionalinterpreter [23].

The idea of functional big-step semantics was developed by Owens et al. [21]. A semantics in thisstyle is basically a recursive function. In order to assure its termination for arbitrary inputs (e.g. fordiverging expressions too), there is also a “clock” variable which decreases in the steps of the exe-cution. We note that the functional big-step semantics is essentially a deﬁnitional interpreter [23]equipped with a clock, and it is deﬁned in a “higher-order logic rather than a programming lan-guage” [21].This approach is also used in research, for example in the FEther project [26] and the typesoundness proof for System F by Amin and Rompf [1] uses deﬁnitional interpreters, while a veriﬁedcompiler backend for CakeML [21, 25] in based on functional big-step semantics. Now, let us seehow can we create such a semantics for our case study language.When we deﬁne a function (in Coq), it should explicitly implement behaviour for all inputs (i.e.the function is total). However, in practice, there can be programs or expressions with undeﬁned orunspeciﬁed behaviour (naturally, that the relational semantics are partial too). Moreover, becauseof the “clock” variable which ensures the termination of the function, an expression evaluation couldterminate before ﬁnding the right result for small “clocks”. This means, we can have three diﬀerentresults: correct termination , failure , and timeout (see Figure 7). In our case study language, thereis no undeﬁned behaviour, so we never get failure as result, however, we do not omit this from theresult deﬁnition, because if we extend this deﬁnition e.g. with Core Erlang-like case expressions,1 Inductive

ResultType : Type := | Result ( res : Value + Exception ) ( eﬀ : SideEﬀectList ) | Timeout | Failure . Inductive

ResultListType : Type := | LResult ( res : list Value + Exception ) ( eﬀ : SideEﬀectList ) | LTimeout | LFailure . Figure 7: The possible results of the functional big-step semantics

Fixpoint eval elems ( f : Environment → Expression → SideEﬀectList → ResultType )(Γ :

Environment ) ( exps : list Expression ) ( eﬀ : SideEﬀectList ) :

ResultListType := match exps with | [] ⇒ LResult ( inl []) eﬀ | x :: xs ⇒ match f Γ x eﬀ with | Result ( inl v ) eﬀ ’ ⇒ let res := eval elems f Γ xs eﬀ ’ inmatch res with | LResult ( inl xs’ ) eﬀ ’’ ⇒ LResult ( inl ( v :: xs’ )) eﬀ ’’ | r ⇒ r end | Result ( inr ex ) eﬀ ’ ⇒ LResult ( inr ex ) eﬀ ’ | Failure ⇒ LFailure | Timeout ⇒ LTimeout endend . Figure 8: The functional big-step semantics of list evaluationthen the guards of these expressions cannot produce observable side eﬀects [7], so the semantics of case expressions with such guards would be failure .As we have seen before, we have to deﬁne the semantics for lists of expressions too. For theselists, we can deﬁne the functional big-step semantics distinctly, just like in case of the other dis-cussed semantics (see the eval all , eval preﬁx predicates in Section 3.1 and the AList constructor,

StepList pretty , ExcList pretty and

FinList pretty in Section 3.2). So we deﬁne also a result type forlist evaluation (Figure 7).Now, we have the result types, we can deﬁne the functional semantics (Figure 9 shows a rep-resentative part of it). The ﬁrst step of this function is to check whether the clock is alreadyconsumed, in this case, the function returns the

Timeout value. Otherwise, the expression eval-uation can begin. For calls and applications, we need the above-mentioned list evaluation. Thisproblem is solved by the other semantics function (see Figure 8), where we pass the curried versionof the original functional big-step semantics as an argument (we decrease clock only in the originalfunctional big-step semantics, so that Coq can ﬁnd the decreasing argument of the function toensure termination, moreover it enables us to use simple, yet powerful induction over the clock).Note that we decided to decrease the clock value on every nested recursive call, otherwise Coqcannot ﬁnd the decreasing argument of the semantics function (however, this problem could besolved with the

Program construction possibly too). We also note, that exceptions, failures andtimeouts were be handled together (except in the semantics of

ETry ), because these results justneeded to be propagated resulting in a short semantics deﬁnition.2

Fixpoint eval fbos expr ( clock : nat ) (Γ : Environment ) ( exp : Expression )( eﬀ : SideEﬀectList ) { struct clock } : ResultType := match clock with | ⇒ Timeout | S clock’ ⇒ match exp with | ELit l ⇒ Result ( inl ( VLit l )) eﬀ | EVar v ⇒ Result ( get value Γ ( inl v )) eﬀ | EFunId f ⇒ Result ( get value Γ ( inr f )) eﬀ | EFun vl e ⇒ ( inl ( VClos

Γ [] vl e )) eﬀ | EApp exp l ⇒ match eval fbos expr clock’ Γ exp eﬀ with | Result ( inl v ) eﬀ ’ ⇒ let res := eval elems ( eval fbos expr clock’ ) Γ l eﬀ ’ inmatch res with | LResult ( inl vl ) eﬀ ’’ ⇒ match v with | VClos ref ext varl body ⇒ if Nat.eqb ( length varl ) ( length vl ) then eval fbos expr clock’ ( append vars to env varl vl ( get env ref ext )) body eﬀ ’’ else Result ( inr ( badarity v )) eﬀ ’’ | ⇒ Result ( inr ( badfun v )) eﬀ ’’ end | LResult ( inr ex ) eﬀ ’’ ⇒ Result ( inr ex ) eﬀ ’’ | LFailure ⇒ Failure | LTimeout ⇒ Timeout end | r ⇒ r end · · · endend . Figure 9: Functional big-step semantics of our case study language In this section we evaluate and compare the semantics deﬁnitions given in the previous sections,and we also discuss a coinductive approach to handle divergence.

First of all, we can notice that all the three semantics can handle the evaluation of list of expressionsseparately from the body of the semantics: eval all and eval preﬁx in the traditional big-step,

ExcList pretty , FinList pretty , and

StepList pretty in the pretty-big-step, and eval elems in thefunctional big-step semantics.In terms of deﬁnition size and complexity, the functional big-step deﬁnition is superior, it ismuch more compact than the other two. Besides, the pretty-big-step deﬁnition uses more inferencerules than the traditional big-step semantics, but these rules are much simpler (they have at mosttwo premises). This diﬀerence increases the number of subgoals in proofs in case of the pretty-big-step semantics, but these goals are usually simpler than in the other case. In turn, simpler3goals could mean simpler proofs, but since the pretty-big-step semantics is deﬁned by mutuallyinductive types, the related proofs in some cases can become rather complex due to involvingmutual induction. Figure 10: Comparison of formal expression evaluation

Expression evaluation.

The ﬁrst problem we encounter is that the traditional big-step andpretty-big-step semantics are relational semantics (deﬁned by an inductive type) and are not in-herently executable. To describe an expression evaluation, we need to prove the evaluation step-by-step using the inference rules (the constructors of the inductive type). As discussed before, wecan also create an evaluation tactic, which can ﬁnd the proof for expression evaluation, however,the use of this tactic is not eﬃcient: it takes unreasonable amounts of memory and time (see Figure10). The use of pretty-big-step semantics is more eﬃcient than the traditional one, because for onegoal, one derivation rule can match syntactically at most (except in case of diﬀerent applicationexceptions), thus no backtracking is needed. However, it is still not eﬃcient enough, especiallycompared to the functional approach.4 We can also see (Figure 10) that the proof length of simple expression evaluations in thetraditional and pretty-big-step semantics is similar. However, in the pretty-big-step semantics weused much more inference rules to reach the result, while with the traditional semantics, we hadto use inversion tactic several times and specify results by hand. This is because expression listevaluation is not described step-by-step, but in universally quantiﬁed predicates, we needed toinput the result list of values and side eﬀects (e.g. eﬀ and vs in eval all ) during formal evaluation(alternatively, list unfolding lemmas based on the length can also solve this problem). This issueis not present in the pretty-big-step semantics, because lists are handled in a step-by-step way by StepList pretty (again, resembling small-step evaluation). All in all, the complexity of these proofsare similar in both relational approaches.On the other hand, the functional big-step semantics is inherently executable (because it isjust a recursive function), so expressions can be simply evaluated using it, we just have to pick anappropriate initial clock value (recursion limit).Figure 11: Complexity of expression equivalence proofs

Expression equivalence proofs.

As we can see (Figure 11), surprisingly the expression equiv-alence proofs were the most complex in the traditional big-step semantics, while the functionalbig-step style performed very well. This is because in the traditional semantics we had to use listunfolding lemmas several times, which quickly increased the size of the proofs.The use of pretty-big-step semantics was quite straightforward, and the equivalence proofs werenot too complex. Once again, this is partly because no list unfolding lemmas were needed.Based on the diagrams, one could think that there is no disadvantage of using functional big-stepsemantics, but that is not the case. While interactively proving the equivalences (and also semanticsproperties), the intermediate subgoals and assumptions were hard to read and understand, becauseCoq usually oversimpliﬁed the function deﬁnition, and we often saw the whole deﬁnition of thissemantics, and not just necessary parts of it. We used remember tactics [12] on the clock valueswhich prevented the oversimpliﬁcation, however, this solution is not the most convenient one.

Semantics property proofs.

For semantics property proofs, we chose determinism in case ofthe pretty-big-step and traditional big-step semantics, while a clock increasing lemma in case of the For example a list of length n can be described as [ x , x , .., x n ] with the x , x , .., x n existential variables. . The determinism proof for the traditional approach is very complex.We needed to use various helper theorems about (partial) evaluation of lists of expressions and alot of case distinctions. Still, the proof is quite long.Proving determinism of the pretty-big-step approach was very simple, we did not need to createany helper lemmas, we used only a few case distinctions and the proof is short in spite of havingto use mutual induction principle.The proof complexity of the clock increasing lemma for the functional big-step semantics isbetween the previous two. We had to create and prove one helper theorem, and use several casedistinctions. However, this proof is not as complex as the determinism of the traditional style,we have used simple induction over the clock variable. This style of induction is usable, becausethe clock is decreased in every recursive call of the semantics . We should also note that whileinteractively proving this theorem, the subgoals were diﬃcult to understand because of the reasonsmentioned before.In addition, we also proved the equivalence of these approaches: between pretty-big-step andfunctional approaches, and between traditional big-step and functional approaches by induction.Thereafter using the previous two, we also proved the equivalence of traditional and pretty-big-stepsemantics. We encountered one diﬃculty: while proving the equivalence of pretty and functionalbig-step semantics, the mutual induction could not be used (in functional big-step semantics, wecan not give a meaning for “intermediate terms”). To solve this problem, we followed the footstepsof Chargu´eraud’s [22] formalisation, and deﬁned another (equivalent) version of the pretty-big-stepsemantics, equipped with a counter which increased when using the derivation rules of the seman-tics, in order to use induction over this counter. We proved the equivalence using this semanticsas an intermediate step. There are two concept which were not investigated in detail: concurrency and divergence. In general,a big-step semantics can not express concurrency eﬃciently, because it can not handle interleaving.For this purpose, a small-step approach is more suitable. When we evaluate an expression and get a result with the constructor

Result , then we can increase the initialclock, and get the same result. Alternatively, we could have used functional induction (similar to the one mentioned by Owens et al. [21]),however, we faced the limitations of Coq when trying to generate the induction principle. et al. [21] described: the evaluation is divergent, when for any possible clock value the result is

Timeout . We also proved an expression evaluation divergent using this idea and an induction onthe clock.The previously described relational approaches are suitable to describe semantics of terminatingexpressions, however, they can not eﬀectively express divergence. If one wants to reason aboutdivergence too, a coinductive big-step semantics can be used. We have found the work of Leroyand Grall [16] the most inﬂuential, where they deﬁne a semantics for λ -calculus extended withconstants. They also extend this semantics with traces, a similar feature to our side eﬀect loggingapproach. Moreover, they also implemented these semantics in Coq and the source is availablepublicly.We followed their footsteps to deﬁne a coinductive big-step semantics for our case-study lan-guage (in particular, for applications) with a distinct relation. For the divergence rules, we neededinﬁnite traces for side eﬀects too. However, this approach is not straightforward to use because ofthe guardedness of subgoals and we are still investigating this issue. In conclusion, we deﬁned various approaches (primarily traditional big-step, pretty-big-step andfunctional big-step semantics) to deﬁne the semantics of a functional programming language, andused a small subset of sequential Core Erlang as a case study. We proved the equivalence ofthese semantics and evaluated, compared them from diﬀerent aspects in order to choose the mostﬁtting way to reason about refactoring correctness. Every one of these has its advantages anddisadvantages. Our main three aspects were the executability of the approach, the complexity ofexpression equivalence proofs and proofs about the properties of the semantics, and from this pointof view, the functional big-step style semantics proved to be the most useful. We highlight the fact,that these semantics and proofs are all formalised in Coq [24].In the future, we are planning to formalise functional big-step semantics for sequential CoreErlang to enable eﬀective testing of the semantics and then use comparative testing of our CoreErlang semantics and a small-step Erlang semantics [15] deﬁned by one of our former projectmembers. Naturally, we also plan to prove the existing big-step and the mentioned functional big-step semantics equivalent, after having ﬁnished the implementation. We also plan to investigatethe coinductive approach more in detail. Our long term goal is to formalise entire Core Erlang andErlang in Coq to reason about refactoring correctness on Erlang programs.

Acknowledgements

This work was supported by the project “Integr´alt kutat´oi ut´anp´otl´as-k´epz´esi program az infor-matika ´es sz´am´ıt´astudom´any diszciplin´aris ter¨uletein (Integrated program for training new gen-eration of researchers in the disciplinary ﬁelds of computer science)”, No. EFOP-3.6.3-VEKOP-16-2017-00002. The project has been supported by the European Union and co-funded by theEuropean Social Fund.“Application Domain Speciﬁc Highly Reliable IT Solutions” project has been implementedwith the support provided from the National Research, Development and Innovation Fund of Hun-gary, ﬁnanced under the Thematic Excellence Programme TKP2020-NKA-06 (National ChallengesSubprogramme) funding scheme.7

References [1] Nada Amin and Tiark Rompf. Type Soundness Proofs with Deﬁnitional Interpreters.

SIG-PLAN Not. , 52(1):666–679, January 2017.[2] Casper Bach Poulsen and Peter D. Mosses. Deriving Pretty-Big-Step Semantics from Small-Step Semantics. In Zhong Shao, editor,

Programming Languages and Systems , pages 270–289,Berlin, Heidelberg, 2014. Springer Berlin Heidelberg.[3] P´eter Bereczky, D´aniel Horp´acsi, and Simon Thompson. A Proof Assistant Based Formalisa-tion of a Subset of Sequential Core Erlang. In Aleksander Byrski and John Hughes, editors,

Trends in Functional Programming , pages 139–158, Cham, 2020. Springer International Pub-lishing.[4] P´eter Bereczky, D´aniel Horp´acsi, and Simon J. Thompson. Machine-Checked Natural Seman-tics for Core Erlang: Exceptions and Side Eﬀects. In

Proceedings of the 19th ACM SIGPLANInternational Workshop on Erlang , Erlang 2020, page 1–13, New York, NY, USA, 2020. As-sociation for Computing Machinery.[5] Sandrine Blazy and Xavier Leroy. Mechanized Semantics for the Clight Subset of the CLanguage.

Journal of Automated Reasoning , 43(3):263–288, Jul 2009.[6] Martin Bodin, Thomas Jensen, and Alan Schmitt. Certiﬁed Abstract Interpretation withPretty-Big-Step Semantics. In

Proceedings of the 2015 Conference on Certiﬁed Programsand Proofs , CPP ’15, page 29–40, New York, NY, USA, 2015. Association for ComputingMachinery.[7] Richard Carlsson, Bj¨orn Gustavsson, Erik Johansson, Thomas Lindgren, Sven-Olof Nystr¨om,Mikael Pettersson, and Robert Virding. Core Erlang 1.0.3 language speciﬁcation. Technicalreport, 2004.[8] Arthur Chargu´eraud. Pretty-Big-Step Semantics. In Matthias Felleisen and Philippa Gardner,editors,

Programming Languages and Systems , pages 41–60, Berlin, Heidelberg, 2013. SpringerBerlin Heidelberg.[9] S¸tefan Ciobˆac˘a. From Small-Step Semantics to Big-Step Semantics, Automatically. InEinar Broch Johnsen and Luigia Petre, editors,

Integrated Formal Methods , pages 347–361,Berlin, Heidelberg, 2013. Springer Berlin Heidelberg.[10] S¸tefan Ciobˆac˘a, Dorel Lucanu, Vlad Rusu, and Grigore Ro¸su. A language-independent proofsystem for mutual program equivalence. In Stephan Merz and Jun Pang, editors,

FormalMethods and Software Engineering , pages 75–90, Cham, 2014. Springer International Publish-ing.[11] The Coq Proof Assistant Documentation. https://coq.inria.fr/documentation , 2020.Accessed on October 1st, 2020.[12] Ltac documentation. https://coq.inria.fr/refman/proof-engine/ltac.html , 2020. Ac-cessed on September 22nd, 2020.[13] A. Garrido and J. Meseguer. Formal speciﬁcation and veriﬁcation of java refactorings. In , pages165–174, 2006.[14] G. Kahn. Natural semantics. In Franz J. Brandenburg, Guy Vidal-Naquet, and MartinWirsing, editors,

STACS 87 , pages 22–39, Berlin, Heidelberg, 1987. Springer Berlin Heidelberg.8[15] Judit K˝oszegi. KErl: Executable semantics for Erlang.

CEUR Workshop Proceedings ,2046:144–160, 2018.[16] Xavier Leroy and Herv´e Grall. Coinductive big-step operational semantics.

Information andComputation , 207(2):284 – 304, 2009. Special issue on Structural Operational Semantics(SOS).[17] Keiko Nakata and Masahito Hasegawa. Small-step and big-step semantics for call-by-need.

Journal of Functional Programming , 19(6):699–722, 2009.[18] Keiko Nakata and Tarmo Uustalu. Trace-Based Coinductive Operational Semantics for While.In Stefan Berghofer, Tobias Nipkow, Christian Urban, and Makarius Wenzel, editors,

TheoremProving in Higher Order Logics , pages 375–390, Berlin, Heidelberg, 2009. Springer BerlinHeidelberg.[19] Core Erlang Formalization. https://github.com/harp-project/Core-Erlang-Formalization , 2020. Accessed on September 24th, 2020.[20] Tobias Nipkow and Gerwin Klein.

Concrete semantics: with Isabelle/HOL . Springer, 2014.[21] Scott Owens, Magnus O. Myreen, Ramana Kumar, and Yong Kiam Tan. Functional Big-StepSemantics. In Peter Thiemann, editor,

Programming Languages and Systems , pages 589–615,Berlin, Heidelberg, 2016. Springer Berlin Heidelberg.[22] Pretty-big-step semantics formalisation. . Accessed on October 26th, 2020.[23] John C. Reynolds. Deﬁnitional Interpreters for Higher-Order Programming Languages. In

Proceedings of the ACM Annual Conference - Volume 2 , ACM ’72, page 717–740, New York,NY, USA, 1972. Association for Computing Machinery.[24] Semantics comparison. https://github.com/harp-project/Semantics-comparison , 2020.Accessed on October 26th, 2020.[25] Yong Kiam Tan, Magnus O. Myreen, Ramana Kumar, Anthony Fox, Scott Owens, and MichaelNorrish. A New Veriﬁed Compiler Backend for CakeML.

SIGPLAN Not. , 51(9):60–73, Septem-ber 2016.[26] Z. Yang and H. Lei. FEther: An Extensible Deﬁnitional Interpreter for Smart-Contract Veri-ﬁcations in Coq.