Certifying Choreography Compilation
aa r X i v : . [ c s . P L ] F e b Certifying Choreography Compilation ⋆ Lu´ıs Cruz-Filipe, Fabrizio Montesi, and Marco Peressotti
Department of Mathematics and Computer Science, University of Southern DenmarkCampusvej 55, 5230 Odense M, Denmark { lcfilipe,fmontesi,peressotti } @imada.sdu.dk Abstract.
Choreographic programming is a paradigm for developingconcurrent and distributed systems, where programs are choreographies that define, from a global viewpoint, the computations and interactionsthat communicating processes should enact.
Choreography compilation translates choreographies into the local definitions of process behaviours,given as terms in a process calculus.Proving choreography compilation correct is challenging and error-prone,because it requires relating languages in different paradigms (global in-teractions vs local actions) and dealing with a combinatorial explosion ofproof cases. We present the first certified program for choreography com-pilation for a nontrivial choreographic language supporting recursion.
Keywords:
Choreographic Programming · Formalisation · Compilation.
Choreographic programming is an emerging programming paradigm, where thedesired communication behaviour of a system of communicating processes can bedefined from a global viewpoint in programs known as choreographies [23]. Then,a provably-correct compiler can automatically generate executable code for eachprocess, providing the guarantee that executing these processes together willimplement the communications prescribed in the choreography [5,7]. The theoryof such compilers is typically called EndPoint Projection (EPP).Choreographies are inspired by the “Alice and Bob” notation for securityprotocols [25]. The key idea is to have a linguistic primitive for a communica-tion from a process (also called role, or participant) to another. For example,the statement
Alice . e → Bob . x reads “ Alice evaluates expression e and sends theresult to Bob , who stores it in variable x ”. This syntax has two main advantages.The first is that it makes the desired communications syntactically manifest in achoreography, making choreographic programming suitable for making interac-tion protocols precise. The second is that the syntax does not allow for writingmismatched send and receive actions by design, which makes code generatedfrom a choreography enjoy progress (the system never gets stuck) [7].The potential of choreographic programming has motivated the study ofchoreographic languages and EPP definitions for different applications, including ⋆ Work partially supported by Villum Fonden, grant no. 29518. L. Cruz-Filipe et al. self-adaptive systems [14], information flow [20], system integration [15], parallelalgorithms [9], cyber-physical systems [16,21,22], and security protocols [16].Proving a methodology for EPP correct involves three elements: the syntaxand semantics of the source choreography language and of the target language,and the definition of the compiler. A single instruction at the choreographiclevel might be implemented by multiple instructions in the target language. Allthis makes formulating a theory of choreographic programming error-prone: foreven simpler approaches, like abstract choreographies without computation, ithas been recently discovered that a few key results published in peer-reviewedarticles do not hold and their theories required adjustments [26], raising concernsabout the soundness of these methods.In this article, we present a certified program for EPP, which translates termsof a Turing complete choreographic language into terms of a distributed processcalculus. Our main result is the formalisation of the hallmark result of choreo-graphic programming, the “EPP Theorem”: an operational correspondence be-tween choreographies and their endpoint projections. This is the first time thatsuch a result has been formalised in a theorem prover, increasing our confidencein the methodology of choreographies.
Structure and Related Work.
We use Coq for our development, assuming somefamiliarity with it. We start from a previous formalisation [12] of a choreographiclanguage (Core Choreographies [11]), which we recap in Sect. 2.In Sect. 3, we formalise our target language: a distributed process calculusinspired by the informal presentation in [24]. The calculus has communicationprimitives that recall those commonly used for implementing choreography lan-guages, e.g., as found in (multiparty) session types [19].In Sect. 4, we define merging [5]: a partial operator that addresses the stan-dard problem (for choreographies) of checking that each process implementationeventually agrees on the choice between alternative behaviours in protocols [3,5].Building on merging, in Sect. 5, we define EPP. Then, in Sect. 6, we explore pruning [5]: a preorder induced by merging that plays a key role in the EPPTheorem, which is proved in Sect. 7.Choreographies are used in industry for the specification and definition ofweb services and business processes [18,27]. These languages feature recursionor loops, which are not present in the only other formalisation work on choreogra-phies that we are aware of [17]. We have validated the theory of EPP from [24],and made explicit several properties that are typically only implicitly assumed.Our results show that the ideas developed by researchers on choreographies, likemerging, can be relied upon for languages of practical appeal.
We use the choreographic language of [24], which is inspired by Core Choreogra-phies (CC) [11]. So far, this is the only Turing complete choreographic languagethat has been formalised [12]. In this section, we recap this formalisation, which ertifying Choreography Compilation 3 our work builds upon. We refer the reader to [12] for a discussion of the designoptions both behind the choreographic language and the formalisation. Some ofthese are relevant for our development, and we explain them when needed.The purpose of CC is to model communication protocols involving severalparticipants, called processes. Each process is equipped with memory cells, iden-tified by variables. Communications are of two kinds: value communications,where a process evaluates an expression that may depend on the values storedin its memory and sends the result to a (distinct) process; and label selections,where a process selects from different behaviours available at another processby means of an identifier (the label). Label selections are typically used to com-municate local choices made by a process to other processes. In order to modelrecursive or infinite behaviour, CC supports the definition of procedures.
Syntax.
The formalisation of CC is parametric on the types of processes (
Pid ),variables (
Var ), expressions (
Expr ), values (
Value , stored in memory), Booleanexpressions (
BExpr ) and procedure names (
RecVar ). These are required to havedecidable syntactic equality. Labels are restricted to a two-element set ( left and right ), a common choice in choreographies and session types [4,6,8,28].The syntax of choreographies is defined by the following BNF grammar.
Eta ::= p e −→ q $ x | p −→ q [ l ] C ::= Eta ;; C | If p ?? b Then Ct Else Ce | Call X | RT_Call X ps C | EndEta is the type of communication actions, where p e −→ q $ x is a value commu-nication and p −→ q [ l ] is a label selection. Choreography eta ;; C , can executea communication eta and continue as C . Conditionals If p ?? b Then Ct Else Ce evaluate of b and continue as either Ct or Ce , according to whether b evaluatesto true or false . Call X denotes a call to a procedure X . Since processes executeindependently, the runtime term RT_Call X ps C is used to denote the situationwhere execution of X has started and reduced to C , but the processes in ps stillhave not entered X . Term End is the terminated choreography.This grammar is implemented as a Coq inductive type, e.g., eta ;; C standsfor Interaction eta C , where
Interaction : Eta → Choreography → Choreography is a constructor of type
Choreography .Intuitively,
Call X behaves as the definition of X . However, the semanticsincludes internal actions at each process p , corresponding to p calling the proce-dure. Thus, when the first process p calls X , choreography Call X transitions to
RT_Call X ps C , where C is the definition of X and ps are the processes used in X (other than p ). This term can then either execute by another process enteringthe procedure (and this process is removed from ps ), or by executing some ac-tion in C that does not involve processes in ps (and C is updated). When the lastprocess in ps enters the procedure call, the term transitions simply to C .This informal explanation illustrates two important properties of CC. First,the semantics is parametric on the set of procedure definitions, modelled as afunction Defs : DefSet , where
DefSet = RecVar → ( list Pid ) ∗ Choreography , where fst ( Defs X ) is the set of processes involved in X , and snd ( Defs X ) is the choreog-raphy defining X . A program is a pair consisting of a set of procedure definitions, L. Cruz-Filipe et al.
Procedures P , and a
Choreography , Main P . We write
Vars P X for fst ( Procedures P X ) and Procs P X for snd ( Procedures P X ) .The second property is that CC supports out-of-order execution: interactionsinvolving distinct processes can be executed in any order, reflecting concurrency.For example, p e −→ q $ x ;; r −→ s [ left ] can execute as a communication from p to q followed by a label selection from r to s , but also as the latter label selectionfollowed by the former communication. The semantics of CC is formalised in [12];this informal explanation suffices for the remainder of this article.The runtime term RT_Call X ps C is not meant to be used when writing achoreography, and in particular it should not occur in procedure definitions. Achoreography that does not include any such term is called initial .There are several restrictions when writing choreographies, of three kinds.(i) Restrictions on the intended use of choreographies. Interactions must havedistinct processes (no self-communication), e.g., p e −→ p $ x is disallowed.(ii) Restrictions on the intended use of runtime terms. All choreographies Procs P X must be initial. As for
Main P , for any subterm
RT_Call X ps C , ps must benonempty and include only process names that occur in Vars P X .(iii) Restrictions that arise from design choices in the formalisation. Informally,
Vars X contains the processes that are used in
Procs X . This introduces con-straints that are encapsulated in the definition of well-formed program.The constraints in the last category are particularly relevant for the proof of theEPP Theorem, so we discuss them in that context in Sect. 7.
Semantics.
The semantics of CC is defined as a labelled transition system usinginductive types. It uses a notion of state , which is simply a function mappingprocess variables to their values:
State : Pid → Var → Value .The semantics is structured in three layers. The first layer specifies transitionswith the following relation, parameterised on a set of procedure definitions.
CCC_To : DefSet → Choreography → State → RichLabel → Choreography → State → Prop
Rules in this set include that p e −→ q $ x ;; C in state s transitions to C in state s ' , where s ' coincides with s except that s ' q x now stores the value obtainedby evaluating e at p in state s . The label in the transition includes informationabout the executed term – the above communication is labelled
R_Com p v q x .The second layer raises transitions to the level of programs, abstracting fromunobservable details. It is defined by a single rule: if
CCC_To Defs C s t C ' s ' ,then ( {| Defs ; C |} , s ) —[ forget t ] −→ ( {| Defs ; C ' |} , s ' ) . Here, {| Defs ; C |} denotes the program built from Defs and C , and forget removes unobservable detailsfrom transition labels (e.g., forget ( R_Com p v q x ) is L_Com p v q ). Labels corre-sponding to conditionals and procedure calls all simplify to
L_Tau p , denoting aninternal action at p . The third layer defines the reflexive and transitive closureof program transitions. The semantics of CC only requires extensional equality of states, which is why s ' isquantified over rather than directly defined from s . This is a slightly simplified version of the actual Coq notation.ertifying Choreography Compilation 5
Important properties of CC include deadlock-freedom by design (any pro-gram P such that Main P = End reduces), confluence (if P can execute two differentsequences actions due to out-of-order execution, then the resulting programs canalways reduce to a common program and state), and Turing completeness. Allthese results are formalised in [12]. We now move to the contributions of this article. In this section, we formaliseSP (Stateful Processes), the process calculus for implementing choreographiesin [24], which is inspired by the process calculi in [10,11].
SP is defined as a Coq functor with the same parameters as CC. Its syntaxcontains three main ingredients: behaviours, which define the actions performedby the individual processes; networks, which consist of several processes runningin parallel; and programs, which (as in CC) make available a common set ofprocedure definitions that each process can use.
Behaviours.
Behaviours are sequences of local actions, corresponding to thepossible terms that can be written in CC, defined by the following BNF grammar. B ::= End | p ! e ; B | p ? x ; B | p (+) l ; B | p & mBl // mBr | If b Then Bt Else Be | Call X
This grammar is again formalised as a Coq inductive type
Behaviour . End denotesa terminated behaviour. Value communications are implemented by p ! e ; B , whichevaluates e , sends the result to p , and continues as B , and by p ? x ; B , which receivesa value from p , storing it at x , and continues as B . Label selections are divided into p (+) l ; B – sending the label l to p andcontinuing as B – and p & mBl // mBr , where one of mBl or mBr is chosen accordingto the label selected by p . Both mBl and mBr have type option Behaviour : a processdoes not need to offer behaviours corresponding to all possible labels. Informally,branching terms are partial functions from labels to behaviours. Our definitioncapitalises on the fact that there are only two labels to simplify the formalisation.A conditional If b Then Bt Else Be evaluates the Boolean expression b to de-cide whether to continue as Bt or Be . Finally, Call X simply continues as thebehaviour defining procedure X .The automatically generated induction principles for Behaviour do not allowus to inspect the subterms of branching terms. Therefore, we prove a strongerresult that allows us to apply the induction hypothesis to these terms. Processes communicate by name. In practice, these can be either process identifiers(cf. actors), network addresses, or session correlation data. L. Cruz-Filipe et al.
Networks.
A network is a function with type
Pid → Behaviour . We define exten-sional equality for networks, written N == N ' , and prove that it is an equivalencerelation. In practice, networks only use a finite number of processes, and arewritten as (finite) parallel compositions of behaviours. We introduce these par-ticular constructions: N | N ' is the parallel composition of N and N ' ; p [ B ] is thenetwork where p is mapped to B and all other processes to End ; and N ∼∼ p de-notes network N where p is now mapped to End . The following are examples ofproperties useful for proving results such as confluence. Lemma Network_rm_out : ∀ N p p ' , p = p ' → ( N ∼∼ p ) p ' = N p ' . Lemma Network_eq_cross : ∀ N N1 N2 p q r s Bp Bq Br Bs , p = q → p = r → p = s → q = r → q = s → r = s → N1 == N ∼∼ p ∼∼ q | p [ Bp ] | q [ Bq ] → N2 == N ∼∼ r ∼∼ s | r [ Br ] | s [ Bs ] → N2 ∼∼ p ∼∼ q | p [ Bp ] | q [ Bq ] == N1 ∼∼ r ∼∼ s | r [ Br ] | s [ Bs ]. Most of these results do not hold for (intensional) equality of
Network s. Programs.
Finally, a
Program is a pair consisting of a set of procedure definitions(of type
DefSetB = RecVar → Behaviour ) and a
Network .Programs, networks and behaviours should satisfy well-formedness propertiessimilar to those of CC. However, these properties are automatically ensured whennetworks are automatically generated from choreographies, so we do not discussthem here. They are included in the formalisation.
The semantics of SP is defined by a labelled transition system. Transitions forcommunications match dual actions in two processes, while conditionals andprocedure calls simply run locally. We report the representative cases – thecomplete definition can be found in the source formalisation [13].
Inductive SP_To ( Defs : DefSetB ) :
Network → State → RichLabel → Network → State → Prop := | S_Com N p e B q x B ' N ' s s ' : let v := ( eval_on_state e s p ) inN p = ( q ! e ; B ) → N q = ( p ? x ; B ' ) → N ' == N ∼∼ p ∼∼ q | p [ B ] | q [ B ' ] → eq_state_ext s ' ( update s q x v ) → SP_To Defs N s ( R_Com p v q x ) N ' s ' | S_LSel N p B q Bl Br N ' s s ' : N p = ( q (+) left ; B ) → N q = ( p & Some Bl // Br ) → N ' == N ∼∼ p ∼∼ q | p [ B ] | q [ Bl ] → eq_state_ext s s ' → SP_To Defs N s ( R_Sel p q left ) N ' s ' | S_Then N p b B1 B2 N ' s s ' : N p = (
If b Then B1 Else B2 ) → beval_on_state b s p = true → N ' == N ∼∼ p | p [ B1 ] → eq_state_ext s s ' → SP_To Defs N s ( R_Cond p ) N ' s ' | S_Call N p X N ' s s ' : N p = Call X → N ' == N ∼∼ p | p [ Defs X ] → eq_state_ext s s ' → SP_To Defs N s ( R_Call X p ) N ' s ' (...) Coq’s type inference mechanism allows us to omit types of most universally quanti-fied variables. We abuse this possibility to lighten the presentation, and use variablenames that are suggestive of their type.ertifying Choreography Compilation 7
As for CC, this relation is then lifted to programs and closed under reflexivityand transitivity, with similar notations.Transitions are compatible with network equality and state equivalence.
Lemma SP_To_eq : ∀ Defs N tl s1 N ' s2 s1 ' s2 ' , eq_state_ext s1 s1 ' → eq_state_ext s2 s2 ' → SP_To Defs N s1 tl N ' s2 → SP_To Defs N s1 ' tl N ' s2 ' . Lemma SP_To_Network_eq : ∀ N1 N1 ' N2 SPDefs s s ' t ,( N1 == N2 ) → SP_To SPDefs N1 s t N1 ' s ' → SP_To SPDefs N2 s t N1 ' s ' . Lemma SP_To_Defs_wd : ∀ Defs Defs ' , ( ∀ X , Defs X = Defs ' X ) →∀ N s tl N ' s ' , SP_To Defs N s tl N ' s ' → SP_To Defs ' N s tl N ' s ' . These results are instrumental in some of the later proofs. The first two arealso proven for programs and many-step transitions, while the last one has thefollowing counterpart.
Lemma SPP_To_Defs_stable : ∀ Defs ' N N ' tl s s ' , {| Defs N , s |} —[ tl ] −→ {| Defs ' N ' , s ' |} → Defs = Defs ' . We also prove that transitions are completely determined by the label.
Lemma SP_To_deterministic_1 : ∀ N N1 N2 tl s s1 s2 , SP_To Defs N s tl N1 s1 → SP_To Defs N s tl N2 s2 → N1 == N2 . Lemma SP_To_deterministic_2 : ∀ N N1 N2 tl s s1 s2 , SP_To Defs N s tl N1 s1 → SP_To Defs N s tl N2 s2 → eq_state_ext s1 s2 . Finally, we show that the semantics of SP is confluent. Although this is notrequired for our main theorem, it is a nice result that confirms our expectations.The formalisation of SP consists of 37 definitions, 80 lemmas, and 2000 lines.
Intuitively, process implementations can be generated from choreographies re-cursively, by projecting each action in the choreography to the correspondingprocess action – for example, a value communication p e −→ q $ x should be pro-jected as a send action q ! e at p , as a receive action p ? x at q , and not projectedto any other processes. However, this causes a problem with conditionals. Pro-jecting a choreography If p ?? b Then Ct Else Ce for any process other than p ,say q , requires combining the projections obtained for Ct and Ce , such that q can“react” to whichever choice p will make. This combination is called merging [5].Merge is typically defined as a partial function mapping pairs of behavioursto behaviours that returns a behaviour combining all possible executions of thetwo input behaviours (if possible). For SP, two behaviours can be merged if theyare structurally similar, with the possible exception of branching terms: we canmerge branchings that offer options on distinct labels. For example, merging p & ( Some B ) //
None with p & None // (
Some B ' ) yields p & ( Some B ) // (
Some B ' ) ,allowing If p ?? b Then ( p −→ q [ left ];; q . e −→ p ;; End ) Else ( p −→ q [ right ];; End ) tobe projected for q as p & Some ( p ! e ; End ) //
Some End . L. Cruz-Filipe et al.
Formalising merging poses the problem that all functions in Coq are total.Since the type of arguments to branching terms is option Behaviour , trying todefine merging with type
Behaviour → Behaviour → option Behaviour fails. Instead, we define a type
XBehaviour of extended behaviours, defined as
Behaviour with an extra constructor
XUndefined : XBehaviour . Thus, an
XBehaviour is a
Behaviour that may contain
XUndefined subterms.The connection between
Behaviour and
XBehaviour is established by meansof two functions: inject : Behaviour → XBehaviour , which isomorphically injectseach constructor of
Behaviour into the corresponding one in
XBehaviour , and collapse : XBehaviour → XBehaviour that maps all
XBehaviour s with
XUndefined as a subterm to
XUndefined . The most relevant properties of these functions are:
Lemma inject_elim : ∀ B , ∃ B ' , inject B = B ' ∧ B ' = XUndefined . Lemma collapse_inject : ∀ B , collapse ( inject B ) = inject B . Lemma collapse_char '' : ∀ B , collapse B = XUndefined → ∀ B ' , B = inject B ' . Lemma collapse_ ∃ : ∀ B , collapse B = XUndefined → ∃ B ' , B = inject B ' . Using this type, we first define
XMerge on XBehaviour s as follows, where wereport only representative cases.
Fixpoint Xmerge ( B1 B2 : XBehaviour ) :
XBehaviour := match B1 , B2 with | XEnd , XEnd ⇒ XEnd | XSend p e B , XSend p ' e ' B ' ⇒ if Pid_dec p p ' && Expr_dec e e ' then match Xmerge B B ' with XUndefined ⇒ XUndefined | _ ⇒ XSend p e ( Xmerge B B ' ) endelse XUndefined | XBranching p Bl Br , XBranching p ' Bl ' Br ' ⇒ if Pid_dec p p ' then let BL := match Bl with None ⇒ Bl ' | Some B ⇒ match Bl ' with None ⇒ Bl | Some B ' ⇒ Some ( Xmerge B B ' ) end endin let BR := match Br with None ⇒ Br ' | Some B ⇒ match Br ' with None ⇒ Br | Some B ' ⇒ Some ( Xmerge B B ' ) end endin match BL , BR with Some XUndefined , _ ⇒ XUndefined | _ , Some XUndefined ⇒ XUndefined | _ , _ ⇒ XBranching p BL BR endelse XUndefined | XCond e B1 B2 , XCond e ' B1 ' B2 ' ⇒ if BExpr_dec e e ' then match Xmerge B1 B1 ' , Xmerge B2 B2 ' with XUndefined , _ ⇒ XUndefined | _ , XUndefined ⇒ XUndefined | Bt , Be ⇒ XCond e Bt Be endelse XUndefined (...)
Using
XMerge we can straightforwardly define merging.
Definition merge B1 B2 := Xmerge ( inject B1 ) ( inject B2 ). Function merge is shown to be idempotent and commutative. It is also asso-ciative, but this has not been proven: it is not relevant for the remaining results,and its proof illustrates the major challenge experienced in this stage of the for-malisation: all proofs are by induction on behaviours, and the number of casesgenerated quickly becomes unmanageable. Automation only works to a limited Essentially because it is not possible to distinguish if the behaviour assigned to alabel is
None because it was not defined, or because a recursive call to merge failed.ertifying Choreography Compilation 9 extent, and therefore the proof of associativity was dropped. (This proof gener-ated 84 subcases that were not automatically proven. These had to be dividedin further subcases, which could then be dealt with relatively simply. The finalcase, when all behaviours are branching terms, then must be divided in the 64possible combinations of defined/undefined branches, which although similar aresubtly different. Three of these cases were proven, leading to our claim that theproof can be completed given sufficient patience.)The largest set of lemmas about merge deals with inversion results, such as:
Lemma merge_inv_Send : ∀ B B ' p e X , merge B B ' = XSend p e X →∃ B1 B1 ' , B = p ! e ; B1 ∧ B ' = p ! e ; B1 ' ∧ merge B1 B1 ' = X . Lemma merge_inv_Branching : ∀ B B ' p Bl Br , merge B B ' = XBranching p Bl Br →∃ Bl ' Bl '' Br ' Br '' , B = p & Bl ' // Br ' ∧ B ' = p & Bl '' // Br '' ∧ ( Bl = None → Bl ' = None ∧ Bl '' = None ) ∧ (...) ∧ ( ∀ BL , Bl = Some BL → ( Bl ' = None → ∃ BL '' , Bl '' = Some BL '' ∧ BL = inject BL '' ) ∧ ( Bl '' = None → ∃ BL ' , Bl ' = Some BL ' ∧ BL = inject BL ' ) ∧ ( ∀ BL ' BL '' , Bl ' = Some BL ' ∧ Bl '' = Some BL '' → merge BL ' BL '' = BL )) The omitted parts of lemma merge_inv_Branching state properties of Br analo-gous to those of Bl . Although these proofs amount mostly to induction and caseanalysis, they suffer from the problems described for the proof of associativity(thankfully, not to such a dramatic level). Automation works better here, andthe effect of the large number of subcases is mostly felt in the time required bythe auto tactic. Still, the formalisation of merging consists of 6 definitions, 42lemmas, and 1950 lines – giving an average proof length of over 40 lines. We are now ready to define EndPoint Projection (EPP): a partial function thatmaps programs in CC to programs in SP. The target instance of SP has thesame parameters as CC, except for the set of procedure names: this is definedas
RecVar ∗ Pid – we project an implementation of each procedure from the pointof view of each process involved in it.EPP is partial because of the standard problem of choreography realisabil-ity [3]. Consider the choreography
If p ?? b Then ( q e −→ r $ x ) Else End . Thiscannot be implemented without additional communications between p , q and r ,since the latter processes need to know the result of the conditional to decidewhether to communicate. We say that this choreography is not projectable [11].We define EPP in several layers. First, we define a function bproj : DefSet → Choreography → Pid → XBehaviour for projecting the behaviour of a single pro-cess. Intuitively, bproj Defs C p attempts to construct p ’s behaviour as specifiedby C ; the parameter Defs is used to project procedure calls, which depend onwhether p participates in the procedure. Returning an XBehaviour instead of an option Behaviour has the advantage of giving information about where exactlymerging fails (the location of
XUndefined subterms). This information can be used for debugging, providing information to programmers, or for automaticrepair of choreographies [2,11], which we plan to exploit in future work.We show some illustrative cases of the definition of bproj . Fixpoint bproj ( Defs : DefSet ) ( C : Choreography ) ( r : Pid ) :
XBehaviour := match C with | p e −→ q $ x ;; C ' ⇒ if Pid_dec p r then XSend q e ( bproj Defs C ' r ) else if Pid_dec q r then XRecv p x ( bproj Defs C ' r ) else bproj Defs C ' r | p −→ q [ left ];; C ' ⇒ if Pid_dec p r then XSel q left ( bproj Defs C ' r ) else if Pid_dec q r then XBranching p ( Some ( bproj Defs C ' r )) Noneelse bproj Defs C ' r | If p ?? b Then C1 Else C2 ⇒ if Pid_dec p r then XCond b ( bproj Defs C1 r ) ( bproj Defs C2 r ) else Xmerge ( bproj Defs C1 r ) ( bproj Defs C2 r ) | CCBase . Call X ⇒ if In_dec P . eq_dec r ( fst ( Defs X )) then XCall ( X , r ) else XEnd (...) The next step is generating projections for all relevant processes. We takethe set of processes as a parameter, and collapse all individual projections.
Definition epp_list ( Defs : DefSet ) ( C : Choreography ) ( ps : list Pid ): list ( Pid ∗ XBehaviour ) := map ( fun p ⇒ ( p , collapse ( bproj Defs C p ))) ps . A choreography C is projectable wrt Defs and ps iff epp_list Defs C ps doesnot contain XUndefined . Set
Defs : DefSet is projectable wrt a set of procedurenames Xs if snd ( Defs X ) is projectable wrt Defs and fst ( Defs X ) for each X in Xs . Projectability of programs is a bit more involved, and we present its Coqformalisation before discussing it. Definition projectable Xs ps P := projectable_C ( Procedures P ) ps ( Main P ) ∧ projectable_D Xs ( Procedures P ) ∧ ( ∀ p , In p ( CCC_pn ( Main P ) ( fun _ ⇒ nil )) → In p ps ) ∧ ( ∀ p X , In X Xs → In p ( fst ( Procedures P X )) → In p ps ) ∧ ( ∀ p X , In X Xs → In p ( CCC_pn ( snd ( Procedures P X )) ( fun _ ⇒ nil )) → In p ps ). The first two conditions simply state that
Main P and
Procedures P are projectablewrt the appropriate parameters. The remaining conditions state that the sets ps and Xs include all processes used in P and all procedures needed to execute P .(These sets are not necessarily computable, since Xs is not required to be finite.However, in practice, these parameters are known – so it makes sense to includethem in the definition.) Function CCC_pn returs the set of processes used in achoreography, given the sets of processes each procedure call is supposed to use.We now define epp_C , for compiling projectable choreographies to networks, epp_D , for compiling projectable sets of procedure definitions, and epp , for com-piling projectable programs. As these definitions depend on proof terms whosestructure needs to be explored, they are done interactively; afterwards, we provethat these definitions are independent of the proof terms, and otherwise work asexpected. We give a few examples. ertifying Choreography Compilation 11
Lemma epp_C_wd : ∀ Defs C ps H H ' , ( epp_C Defs ps C H ) == ( epp_C Defs ps C H ' ). Lemma epp_C_Com_p : ∀ Defs ps C p e q x HC HC ' , In p ps → epp_C Defs ps ( p e −→ q $ x ;; C ) HC p = q ! e ; epp_C Defs ps C HC ' p . Lemma epp_C_Cond_r : ∀ Defs ps p b C1 C2 HC HC1 HC2 r , p = r → inject ( epp_C Defs ps ( If p ?? b Then C1 Else C2 ) HC r )= merge ( epp_C Defs ps C1 HC1 r ) ( epp_C Defs ps C2 HC2 r ). Furthermore, we need a number of inversion lemmas for branching and con-ditionals that restrict possible projections of subterms of a choreography.
Lemma epp_C_Sel_Branching_l : ∀ Defs ps C HC p q Bp Bl Br , epp_C Defs ps C HC p = q (+) left ; Bp → epp_C Defs ps C HC q = p & Bl // Br → Bl = None ∧ Br = None . Lemma epp_C_Cond_Send_inv : ∀ Defs ps p b C1 C2 HC HC1 HC2 r q e B , epp_C Defs ps ( If p ?? b Then C1 Else C2 ) HC r = q ! e ; B →∃ B1 B2 , epp_C Defs ps C1 HC1 r = q ! e ; B1 ∧ epp_C Defs ps C2 HC2 r = q ! e ; B2 ∧ merge B1 B2 = inject B . As defined so far, projectability of C does not imply projectability of chore-ographies that C can transition to. This is due to the way runtime terms areprojected: RT_Call X ps C ' is projected as a call to ( X , p ) if p is in ps , and as theprojection of C ' otherwise. Our definition of projectability allows in principle for C to be unprojectable for a process in ps , which would make it unprojectable af-ter transition. That this situation does not arise is a consequence of the intendedusage of runtime terms: initially C ' is obtained from the definition of a procedure,and ps is the set of processes used in this procedure. Afterwards ps only shrinks,while C ' may change due to execution of actions outside ps . We capture theseconditions in the notion of strong projectability, whose representative case is: Fixpoint strongly_projectable Defs ( C : Choreography ) ( r : Pid ) :
Prop := match C with | RT_Call X ps C ⇒ strongly_projectable Defs C r ∧ ( ∀ p , In p ps → In p ( fst ( Defs X )) ∧ Xmore_branches ( bproj Defs ( snd ( Defs X )) p ) ( bproj Defs C p )) (...) The relation
Xmore_branches is explained in the next section: it is a semanticcharacterisation of how the projection of bproj Defs snd ( Defs X ) p may changedue to execution of actions not involving p in snd ( Defs X ) .Projectability and strong projectability coincide for initial choreographies.Furthermore, we state and prove lemmas that show that strong projectability of C imply strong projectability of any choreography that C can transition to. The key ingredient for our correspondence result is a relation on behavioursusually called pruning [5,7]. Pruning relates two behaviours that differ only inthat one offers more options in branching terms than the other; we formalisethis relation with the more suggestive name of more_branches (in line with [24]),and we include some illustrative cases of its definition.
Inductive more_branches : Behaviour → Behaviour → Prop := | MB_Send p e B B ' : more_branches B B ' → more_branches ( p ! e ; B ) ( p ! e ; B ' ) | MB_Branching_None_None p mBl mBr : more_branches ( p & mBl // mBr ) ( p & None // None ) | MB_Branching_Some_Some p Bl Bl ' Br Br ' : more_branches Bl Bl ' → more_branches Br Br ' → more_branches ( p & Some Bl // Some Br ) ( p & Some Bl ' // Some Br ' ) (...) The need for pruning arises naturally when one considers what happens whena choreography executes a conditional at a process p . In the continuation, onlyone of the branches will be kept. However, no process other than p knows thatthis action has been executed; therefore, in the choreography’s projection bothbehaviours will still be available for all other processes. Given the definition ofmerging, this means that these processes’ behaviours may contain more branchesthan those of the projection of the choreography after reduction.Pruning is naturally related to merging, as stated in the following lemmas. Lemma more_branches_char : ∀ B B ' , more_branches B B ' ↔ merge B B ' = inject B . Lemma merge_more_branches : ∀ B1 B2 B , merge B1 B2 = inject B → more_branches B B1 . Pruning is also reflexive and transitive. Further, if two behaviours have anupper bound according to pruning, then their merge is defined and is their lub.
Lemma more_branches_merge : ∀ B B1 B2 , more_branches B B1 → more_branches B B2 → ∃ B ' , merge B1 B2 = inject B ' ∧ more_branches B B ' . Finally, two behaviours that are lower than two mergeable behaviours arethemselves mergeable.
Lemma more_branches_merge_extend : ∀ B1 B2 B1 ' B2 ' B , more_branches B1 B1 ' → more_branches B2 B2 ' → merge B1 B2 = inject B →∃ B ' , merge B1 ' B2 ' = inject B ' ∧ more_branches B B ' . These two results are key ingredients to the cases dealing with conditionals inthe proof of the EPP Theorem. They require extensive case analysis (512 casesfor the last lemma, of which 81 are not automatically solved by Coq’s inversion tactic even though most are contradictory). Analogous versions of some of theselemmas also need to be extended to
XBehaviour s, which is straightforward.Pruning extends pointwise to networks, which we denote as N ≫ N ' . The keyresult is that, due to how the semantics of SP is defined, pruning a networkcannot add new transitions. Lemma SP_To_more_branches_N : ∀ Defs N1 s N2 s ' Defs ' N1 ' tl , SP_To Defs N1 s tl N2 s ' → N1 ' ≫ N1 → ( ∀ X , Defs X = Defs ' X ) →∃ N2 ' , SP_To Defs ' N1 ' s tl N2 ' s ' ∧ N2 ' ≫ N2 . The reciprocal of this result only holds for choreography projections, and isproven after the definition of EPP.The formalisation of pruning includes 3 definitions, 25 lemmas, and 950 lines.This file is imported in the file containing the definition of EPP (previous sec-tion), but we delayed its presentation as its motivation is clearer after seeingthose definitions. ertifying Choreography Compilation 13
We now prove the operational correspondence between choreographies and theirprojections, in two directions: if a choreography can make a transition, then itsprojection can make the same transition; and if the projection of a choreographycan make a transition, then so can the choreography. The results of the transi-tions are not directly related by projection, since choreography transitions mayeliminate some branches in the projection; thus, establishing the correspondencefor multi-step transitions requires some additional lemmas on pruning.
Preliminaries.
Both directions of the correspondence depend on a number ofresults relating choreography transitions and their projections. These resultsfollow a pattern: the results for communications state a precise correspondence;the ones for conditionals include pruning in their conclusions; and the ones forprocedure calls require additional hypotheses on the set of procedure definitions.
Lemma CCC_To_bproj_Sel_p : ∀ Defs C s C ' s ' p q l , strongly_projectable Defs C p → CCC_To Defs C s ( CCBase . TL . R_Sel p q l ) C ' s ' →∃ Bp , bproj Defs C p = XSel q l Bp ∧ bproj Defs C ' p = Bp . Lemma CCC_To_bproj_Call_p : ∀ Defs C s C ' s ' p X Xs , strongly_projectable Defs C p → ( ∀ Y , In Y Xs → strongly_projectable Defs ( snd ( Defs Y )) p ) → ( ∀ Y , set_incl_pid ( CCC_pn ( snd ( Defs Y )) ( fun X ⇒ fst ( Defs X )))( fst ( Defs Y ))) → In X Xs → CCC_To Defs C s ( CCBase . TL . R_Call X p ) C ' s ' → bproj Defs C p = XCall ( X , p ) ∧ Xmore_branches ( bproj Defs ( snd ( Defs X )) p ) ( bproj Defs C ' p ). These lemmas are simple to prove by induction on C . The tricky part is gettingthe hypotheses strong enough that the thesis holds, and weak enough that anywell-formed program will satisfy them throughout its entire execution.From these results, it follows that projectability is preserved by choreographyreductions. This property is needed even to state the EPP Theorem, since wecan only compute projections of projectable programs. Lemma CCC_To_projectable : ∀ P Xs ps , Program_WF Xs P → well_ann P → projectable Xs ps P → ( ∀ p , In p ps → strongly_projectable ( Procedures P ) (
Main P ) p ) → ( ∀ p , In p ( CCC_pn ( Main P ) (
Vars P )) → In p ps ) → ( ∀ p X , In X Xs → In p ( Vars P X ) → In p ps ) →∀ s tl P ' s ' , ( P , s ) —[ tl ] −→ ( P ' , s ' ) → projectable Xs ps P ' . Some of the hypotheses from the previous lemmas are encapsulated in thefirst two conditions: well-formedness of P , which ensures that any runtime term RT_Call X qs C in Main P only includes processes in qs that are declared to beused by X (this trivially holds if Main P is initial, and is preserved throughoutexecution); and well-annotation of P , i.e., the processes used in any procedure are a subset of those it declares. The remaining hypotheses state, as before, that Xs and ps include all processes and procedures relevant for executing Main P .Similarly, we prove that strong projectability is preserved by transitions.
Completeness.
In the literature, completeness of EPP is proven by structuralinduction on the derivation of the transition performed by the choreography C . For each case, we look at how a transition for C can be derived, and showthat the projection of C can make the same transition to a network with morebranches than the projection of C ′ . The proof is lengthy, but poses no surprises. Lemma EPP_Complete : ∀ P Xs ps , Program_WF Xs P → well_ann P → ∀ HP ,( ∀ p , In p ps → strongly_projectable ( Procedures P ) (
Main P ) p ) → ( ∀ p , In p ( CCC_pn ( Main P ) (
Vars P )) → In p ps ) → ( ∀ p X , In X Xs → In p ( Vars P X ) → In p ps ) →∀ s tl P ' s ' , ( P , s ) —[ tl ] −→ ( P ' , s ' ) →∃ N tl ' , ( epp Xs ps P HP , s ) —[ tl ' ] −→ ( N , s ' ) ∧ Procs N = Procs ( epp Xs ps P HP ) ∧ ∀ H , Net N ≫ Net ( epp Xs ps P ' H ). By combining with the earlier results on pruning, we immediately obtain thegeneralisation for multi-step transitions.
Soundness.
We prove soundness by case analysis on the transition made by thenetwork, and then by induction on the choreography inside each case. For con-venience, we split this proof in separate proofs, one for each transition. Againwe observe that the results for communications are stronger than those for con-ditionals, which in turn have fewer hypotheses than those for procedure calls.
Lemma SP_To_bproj_Com : ∀ Defs Defs ' ps C HC s N ' s ' p x q v ,( ∀ p , In p ps → strongly_projectable Defs C p ) → ( ∀ p , In p ( CCC_pn C ( fun X ⇒ fst ( Defs X ))) → In p ps ) → SP_To Defs ' ( epp_C Defs ps C HC ) s ( R_Com p v q x ) N ' s ' →∃ C ' , CCC_To Defs C s ( CCBase . TL . R_Com p v q x ) C ' s ' ∧ ∀ HC ' , N ' == ( epp_C Defs ps C ' HC ' ). Lemma SP_To_bproj_Call : ∀ Defs Defs ' ps C HC s N ' s ' p X Xs , Choreography_WF C → within_Xs Xs C → In X Xs → ( ∀ p , In p ps → strongly_projectable Defs C p ) → ( ∀ p , In p ( CCC_pn C ( fun X ⇒ fst ( Defs X ))) → In p ps ) → ( ∀ p HX , Defs ' ( X , p ) = epp_C Defs ps ( snd ( Defs X )) HX p ) → ( ∀ p X , In p ( CCC_pn ( snd ( Defs X )) ( fun Y ⇒ fst ( Defs Y ))) → In p ( fst ( Defs X ))) → ( ∀ X , In X Xs → projectable_C Defs ps ( snd ( Defs X ))) → ( ∀ X , In X Xs → initial ( snd ( Defs X )) ∧ ∀ p , In p ( fst ( Defs X )) → In p ps ) → SP_To Defs ' ( epp_C Defs ps C HC ) s ( R_Call ( X , p ) p ) N ' s ' →∃ C ' , CCC_To Defs C s ( CCBase . TL . R_Call X p ) C ' s ' ∧ ∀ HC ' , N ' ≫ epp_C Defs ps C ' HC ' . Equality is not necessary, and it would make this property harder to prove.ertifying Choreography Compilation 15
By contrast with completeness, all these lemmas are complex to prove: eachrequires around 300 lines of Coq code. The proofs are similar, but still differentenough that the task is not completely mechanic.The last ingredient is a lemma of practical interest on procedure names: eachprocess only uses “its” copy of the original procedure names.
Lemma SP_To_bproj_Call_name : ∀ Defs Defs ' ps C HC s N ' s ' p X , SP_To Defs ' ( epp_C Defs ps C HC ) s ( R_Call X p ) N ' s ' →∃ Y , X = ( Y , p ) ∧ X_Free Y C . This lemma is not only crucial in the proof of the next theorem, but also in-teresting in itself: it shows that the set of procedure definitions can be fullydistributed among the processes with no duplications.
Lemma EPP_Sound : ∀ P Xs ps , Program_WF Xs P → well_ann P → ∀ HP ,( ∀ p , In p ps → strongly_projectable ( Procedures P ) (
Main P ) p ) → ( ∀ p , In p ( CCC_pn ( Main P ) (
Vars P )) → In p ps ) → ( ∀ p X , In X Xs → In p ( Vars P X ) → In p ps ) →∀ s tl N ' s ' , ( epp Xs ps P HP , s ) —[ tl ] −→ ( N ' , s ' ) →∃ P ' tl ' , ( P , s ) —[ tl ' ] −→ ( P ' , s ' ) ∧ ∀ H , Net N ' ≫ Net ( epp Xs ps P ' H ). Generalising this result to multi-step transitions requires showing that prun-ing does not eliminate possible transitions of a network. This is in general nottrue, but it holds when the pruned network is the projection of a choreography.
Lemma SP_To_more_branches_N_epp : ∀ Defs N1 s N2 s ' tl Defs ' ps C HC , N1 ≫ epp_C Defs ' ps C HC → SP_To Defs N1 s tl N2 s ' →∃ N2 ' , SP_To Defs ( epp_C Defs ' ps C HC ) s tl N2 ' s ' ∧ N2 ≫ N2 ' . The formalisation of EPP and the proof of the EPP theorem consists of13 definitions, 110 lemmas, and 4960 lines of Coq code. The proof of the EPPTheorem and related lemmas make up for around 75% of this size.
We have successfully formalised a translation from a Turing complete choreo-graphic language into a process calculus and proven its correctness in terms ofan operational correspondence result. This formalisation showed that the prooftechniques used in the literature are correct, and identified only missing minorassumptions about runtime terms that trivially hold when these are used as in-tended. To the best of our knowledge, this is the first time such a correspondencehas been formalised for a full-fledged (Turing complete) choreographic language.The complexity of the formalisation, combined with the similarities betweenseveral of the proofs, means that future extensions would benefit from exploitingsemi-automatic generation of proof scripts.Combining these results with those from [12] would yield a proof that SPis also Turing complete. Unfortunately, the choreographies used in the proofof Turing completeness in [12] are not projectable, but they can be made soautomatically, by means of an amendment (repair) procedure [11]. In futurework, we plan to formalise amendment in order to obtain this result.
References
1. Albert, E., Lanese, I. (eds.): Formal Techniques for Distributed Objects, Compo-nents, and Systems - 36th IFIP WG 6.1 International Conference, FORTE 2016,Held as Part of the 11th International Federated Conference on Distributed Com-puting Techniques, DisCoTec 2016, Heraklion, Crete, Greece, June 6-9, 2016, Pro-ceedings, Lecture Notes in Computer Science, vol. 9688. Springer (2016)2. Basu, S., Bultan, T.: Automated choreography repair. In: Stevens, P., Wasowski,A. (eds.) Fundamental Approaches to Software Engineering - 19th InternationalConference, FASE 2016, Held as Part of the European Joint Conferences on Theoryand Practice of Software, ETAPS 2016, Eindhoven, The Netherlands, April 2-8, 2016, Proceedings. Lecture Notes in Computer Science, vol. 9633, pp. 13–30.Springer (2016). https://doi.org/10.1007/978-3-662-49665-7 23. Basu, S., Bultan, T., Ouederni, M.: Deciding choreography realizability.In: Field, J., Hicks, M. (eds.) Procs. POPL. pp. 191–202. ACM (2012).https://doi.org/10.1145/2103656.21036804. Caires, L., Pfenning, F.: Session types as intuitionistic linear proposi-tions. In: Gastin, P., Laroussinie, F. (eds.) Procs. CONCUR. LectureNotes in Computer Science, vol. 6269, pp. 222–236. Springer (2010).https://doi.org/10.1007/978-3-642-15375-4 165. Carbone, M., Honda, K., Yoshida, N.: Structured communication-centered pro-gramming for web services. ACM Trans. Program. Lang. Syst. (2), 8:1–8:78(2012). https://doi.org/10.1145/2220365.22203676. Carbone, M., Lindley, S., Montesi, F., Sch¨urmann, C., Wadler, P.: Coherencegeneralises duality: A logical explanation of multiparty session types. In: De-sharnais, J., Jagadeesan, R. (eds.) 27th International Conference on Concur-rency Theory, CONCUR 2016, August 23-26, 2016, Qu´ebec City, Canada. LIPIcs,vol. 59, pp. 33:1–33:15. Schloss Dagstuhl - Leibniz-Zentrum f¨ur Informatik (2016).https://doi.org/10.4230/LIPIcs.CONCUR.2016.337. Carbone, M., Montesi, F.: Deadlock-freedom-by-design: multiparty asynchronousglobal programming. In: Giacobazzi, R., Cousot, R. (eds.) Procs. POPL. pp. 263–274. ACM (2013). https://doi.org/10.1145/2429069.24291018. Carbone, M., Montesi, F., Sch¨urmann, C.: Choreographies, logically. DistributedComput. (1), 51–67 (2018). https://doi.org/10.1007/s00446-017-0295-19. Cruz-Filipe, L., Montesi, F.: Choreographies in practice. In: Albert and Lanese [1],pp. 114–123. https://doi.org/10.1007/978-3-319-39570-8 810. Cruz-Filipe, L., Montesi, F.: Procedural choreographic programming. In: Bouaj-jani, A., Silva, A. (eds.) Procs. FORTE. Lecture Notes in Computer Science, vol.10321, pp. 92–107. Springer (2017). https://doi.org/10.1007/978-3-319-60225-7 711. Cruz-Filipe, L., Montesi, F.: A core model for choreographic programming. Theor.Comput. Sci. , 38–66 (2020). https://doi.org/10.1016/j.tcs.2019.07.00512. Cruz-Filipe, L., Montesi, F., Peressotti, M.: Formalising a turing-complete chore-ographic language in coq. CoRR abs/2102.02627 (2021)13. Cruz-Filipe, L., Montesi, F., Peressotti, M.: A Formalisation of a Turing-CompleteChoreographic Language in Coq. https://doi.org/10.5281/zenodo.454870914. Dalla Preda, M., Gabbrielli, M., Giallorenzo, S., Lanese, I., Mauro, J.: Dynamicchoreographies: Theory and implementation. Log. Methods Comput. Sci. (2)(2017). https://doi.org/10.23638/LMCS-13(2:1)201715. Giallorenzo, S., Lanese, I., Russo, D.: Chip: A choreographic integration process. In:Panetto, H., Debruyne, C., Proper, H.A., Ardagna, C.A., Roman, D., Meersman,ertifying Choreography Compilation 17R. (eds.) Procs. OTM, part II. Lecture Notes in Computer Science, vol. 11230, pp.22–40. Springer (2018). https://doi.org/10.1007/978-3-030-02671-4 216. Giallorenzo, S., Montesi, F., Peressotti, M.: Choreographies as objects. CoRR abs/2005.09520 (1), 9 (2016). https://doi.org/10.1145/2827695, also: POPL, pages 273–284, 200820. Lluch-Lafuente, A., Nielson, F., Nielson, H.R.: Discretionary information flowcontrol for interaction-oriented specifications. In: Mart´ı-Oliet, N., ¨Olveczky,P.C., Talcott, C.L. (eds.) Logic, Rewriting, and Concurrency. LectureNotes in Computer Science, vol. 9200, pp. 427–450. Springer (2015).https://doi.org/10.1007/978-3-319-23165-5 2021. L´opez, H.A., Heussen, K.: Choreographing cyber-physical distributed con-trol systems for the energy sector. In: Seffah, A., Penzenstadler, B.,Alves, C., Peng, X. (eds.) Procs. SAC. pp. 437–443. ACM (2017).https://doi.org/10.1145/3019612.301965622. L´opez, H.A., Nielson, F., Nielson, H.R.: Enforcing availability in failure-aware communicating systems. In: Albert and Lanese [1], pp. 195–211.https://doi.org/10.1007/978-3-319-39570-8 1323. Montesi, F.: Choreographic Programming. Ph.D. Thesis, IT University of Copen-hagen (2013)24. Montesi, F.: Introduction to Choreographies (2021), accepted for publication byCambridge University Press25. Needham, R.M., Schroeder, M.D.: Using encryption for authentication inlarge networks of computers. Commun. ACM (12), 993–999 (1978).https://doi.org/10.1145/359657.35965926. Scalas, A., Yoshida, N.: Less is more: multiparty session types revisited. Proc. ACMProgram. Lang.24