[PDF] ManyDSL: A Host for Many Languages

Abstract

Domain-specific languages are becoming increasingly important. Almost every application touches multiple domains. But how to define, use, and combine multiple DSLs within the same application? The most common approach is to split the project along the domain boundaries into multiple pieces and files. Each file is then compiled separately. Alternatively, multiple languages can be embedded in a flexible host language: within the same syntax a new domain semantic is provided. In this paper we follow a less explored route of metamorphic languages. These languages are able to modify their own syntax and semantics on the fly, thus becoming a more flexible host for DSLs. Our language allows for dynamic creation of grammars and switching languages where needed. We achieve this through a novel concept of Syntax-Directed Execution. A language grammar includes semantic actions that are pieces of functional code executed immediately during parsing. By avoiding additional intermediate representation, connecting actions from different languages and domains is greatly simplified. Still, actions can generate highly specialized code though lambda encapsulation and Dynamic Staging.

Full PDF

MManyDSL: A Host for Many Languages

Piotr Danilewski

Philipp Slusallek Saarland University, Germany Intel Visual Computing Institute, Germany Theoretical Computer Science, Jagiellonian University, Poland Deutsches Forschungszentrum für Künstliche Intelligenz, Germany [email protected] [email protected]

Abstract

Domain-speciﬁc languages are becoming increasingly impor-tant. Almost every application touches multiple domains. Buthow to deﬁne, use, and combine multiple DSLs within thesame application?The most common approach is to split the project alongthe domain boundaries into multiple pieces and ﬁles. Each ﬁleis then compiled separately. Alternatively, multiple languagescan be embedded in a ﬂexible host language: within the samesyntax a new domain semantic is provided.In this paper we follow a less explored route of metamor-phic languages. These languages are able to modify theirown syntax and semantics on the ﬂy, thus becoming a moreﬂexible host for DSLs.Our language allows for dynamic creation of grammarsand switching languages where needed. We achieve thisthrough a novel concept of Syntax-Directed Execution. Alanguage grammar includes semantic actions that are piecesof functional code executed immediately during parsing. Byavoiding additional intermediate representation, connectingactions from different languages and domains is greatlysimpliﬁed. Still, actions can generate highly specialized codethough lambda encapsulation and Dynamic Staging.

1. Introduction “Languages shape thought”. This Sapir-Whorf Hypothesisrefers to human-speaking languages and the perception ofthe real world [Who40; Sap29]. This is no less true forprogramming languages [Pau04]. A language style and theparadigms it supports can inadvertently put us in one mindset,preventing us from seeing alternative solutions [Dij82].

For that reason, in an ideal situation each domain should haveits own Domain Speciﬁc Language (DSL) that best representsthe concepts of the domain and guides the way of thinking ofthe programmer in the right direction. However, creating sucha language from scratch is not an easy task. Available tools,such as YACC [Joh75; LMB92] or ANTLR [PF11] helpdeﬁne custom grammars and perform basic operations onthe created Abstract Syntax Tree (AST). However, more ad-vanced aspects of language creation, such as variable lookup rules, type checking, or translation to other representationstill needs to be performed by the language designer.Languages created in such a way are independent one fromanother. Yet, it is rare for a computer project to touch only asingle domain. Most applications deal with multiple domains,such as UI, database, communication, work scheduling, etc...In order for a DSL to be used in practice, it needs toexchange data with other DSLs. Such communication is oftenlimited, inefﬁcient, and unsafe: external protocols, functions,or raw strings and ﬁles are used.Inter-DSL communication is simpler when the DSL is embedded within a more generic host language [Hud96]through a set of overloaded functions, operators or macros.Such languages are executed as part of the host code, and canexchange data between different languages through the host.However, such DSLs are constrained by the syntax of thehost, and cannot use their most natural and expected syntacticform. In Table 1 for example we show how the same SQLsyntax is realized in various embedded database languages –in all these cases, the obtained syntax is more complex andcluttered, hiding the original meaning of the code.A metamorphic language combines the strength of bothstand-alone and embedded languages, by allowing itself tobe modiﬁed on the ﬂy. As a host it offers a common baseconnecting all DSLs while at the same time permits eachlanguage to have its own syntax and semantic.

In this paper we present a metamorphic language designedwith multi-DSL support in mind. We combine the syntacticﬂexibility of stand-alone languages with the composabilityand inter-DSL communication of an embedded approach: • We facilitate the dynamic creation of grammars. Newgrammars can be deﬁned, combined, and used as a library. • We deﬁne a Language Programming Interface (LPI),deﬁning how one DSL can be used by another. Theinternal structure of a language does not impact the LPI. • Our implementation permits language switching and com-munication between different DSLs through the LPI. a r X i v : . [ c s . P L ] D ec mbedding Host ExampleSQL SELECT Name, Surname FROM Members WHERE Age = 18

Haskell/DB [LM99] Haskell do r <- table Members;restrict $ r!Age .==. constant 18project $ Name << r!Name

LINQ [BMT07] C

Members.Where(row => row.Age == 18).Select(row => new {row.Name, row.Surname}); jOOQ [Joo] Java create.select(MEMBERS.NAME,MEMBERS.SURNAME).from(MEMBERS).where(MEMBERS.AGE.eq(18)).fetch();

Slick [DF15] Scala for {m <- Members if m.Age === 18} yield (m.Name, m.Surname)

Table 1:

Comparison of different SQL embeddings into general-purpose languages. The same semantic meaning given in asyntax not speciﬁc to the domain makes the message harder to comprehend by an inexperienced programmer. Note that thegrammar of C

2. Related Work

A common tool for generating parsers, that now has manyderivatives, is YACC [Joh75]. It is able to produce anLALR(1) parser [AJ74] — a practical approach to pars-ing of a subset of LR(1) grammars [Knu65]. Further toolssupport more powerful classes of grammars: GeneralizedLR (GLR) [Tom85], packrat parsing [For02] or LL(*) pars-ing [PF11].While the classes of grammars change, the basic princi-ple of specifying the language remains. The description isgiven in a format similar to a Backus-Naur form [Bac+63],augmented with attributes and actions . Each production is aconcrete top-level entry and cannot be reused in another gram-mar. The generated code is represented within the actions astext or as an AST that is being generated, or both. These toolsare typically used to produce stand-alone languages. Theyprovide no facilities for communication between DSLs orlanguage switching.

Instead of using a dedicated tool, a parser can be deﬁned in afunctional language as a parser combinator [Wad85; Fok95;HM98]. Each parser is a ﬁrst-class entity that can be createddynamically and combined together to form bigger, morecomplex parsers.Classical parser combinators create a parser that is highlyredundant and may backtrack multiple times. Although extracomputation can be avoided through lazy evaluation and acareful design of the combinators, in practice inefﬁcient oreven ambiguous parsers can be easily created by accident.Many attempts have been made to address these issues [KP98;LM01]. Furthermore, combinators can use staging to opti-mize themselves early, producing a simpler representation ofthe grammar that performs faster [Jon+14].

Languages can be created by embedding them into a general-purpose host language, such as Haskell, ML, or Scala. These DSLs use the same syntax as the host, but the names areoverloaded to serve the new semantics. Having the commonbase, DSLs can use it to relay information between the do-mains. For this reason, this approach is particularly commonwhen deﬁning small, embedded DSLs (EDSLs). For example,Haskell has been used extensively to deﬁne geometric opera-tions [HJ94], COM component scripting [JML98], hardwaredesign [Bje+98], or server-side web scripting [Mei00; Thi05].The semantics of an EDSL is embedded within functionsof the host. Depending on the implementation two things canhappen: • In shallow embedding the associated semantics is repre-sented directly in the host language. • In deep embedding a structure representing the domain-speciﬁc construct is created, that can later be translated,optimized and run separately. The type of embedding is closely related to the form of stag-ing supported by the host language. Staging is a mechanismthat controls the execution order of the code: • A piece of code may execute within a body of a func-tion that has not been called, often leading to symboliccomputation. • A piece of code may be kept as a code, despite itssurrounding context being executed.The former case is a form of function specialization, thelatter – deferring and code generation.The simplest approach is textual staging , where stringsrepresent fragments of programs. In structural staging , theprogram code is represented explicitly as a data structure, typ-ically as an Abstract Syntax Tree (AST) or graph. The struc-ture can be created explicitly, for example through the LLVMinstruction builders [Llv]. Languages such as ‘C [EHK96]and MetaML [TS99] use a dedicated syntax to represent acode object. Alternatively, the process can be hidden behindoverloaded functions, operators, or templates [Vel96]. e identify functional staging as a case of structural stag-ing where the structure is represented entirely by ordinaryfunctions [Rey75; CKS09]. In Lightweight Modular Stag-ing [RO12] these builder functions are hidden by overloadingordinary functions over a higher-kinded type Rep[T] .All the above techniques represent staged code as anobject and lead to deep embedding. MetaML and LightModular Staging are particularly popular in the context ofEDSLs [Cha+10; Ofe+13; SBP99].Not all staging techniques require such a code represen-tation. Impala [Lei+15] and DeepCPS [Dan+14] languagesachieve the same result by extending the core lambda calcu-lus, rather than introducing new data types.With this approach, there is no inherent difference betweenstages. For example, DeepCPS can be used to assemble afunction incrementally through continuation calls, and thenuse staging to transform it to an efﬁcient function as if it waswritten by hand in one go [DS16]. In this paper we rely onthis functional approach to code building.

The syntactic limitation of embedded DSLs can be lifted, atleast partially, through macro processing. Simplest macrolanguages, such as C-preprocessor and m4 operate lexi-cally [KR77]. Lexical macro languages can be fully pro-grammable, such as TeX [Knu84]. Advanced macro systemsoperate with additional syntax knowledge, for example byexamining and transforming an AST. Lisp and Scheme in-troduced the concept of hygienic macros [Ada+98; Koh+86],that are referentially transparent and prevent accidental namecapturing.The macro system of the compiler [BS02]allow grammar extensions to the host language. The changescan be packaged and templated through so-called meta-morphs . However, the macro system is not programmable andrecursion is explicitly forbidden. While macros are hygienic,metamorphs are not.Macros can be even more powerful when using semanticknowledge. For example, XL [Mad89] can be enrichedsemantically, but its syntax cannot be altered.

Most ﬂexibility is given in what we call a metamorphiclanguage — a language that allows its user to alter nearlyevery aspect of the language on the ﬂy, including its syntaxand semantics.This is achieved for example in Racket [TH+11], a de-scendant of Scheme. The programmer can change its syntax,semantic, type system, linking and optimizations. Racket cre-ates an IR that the user can explicitly alter through syntaxtransformers . Unfortunately, controlling the transformationsis cumbersome. Several library functions are attributed to dojust that . http://docs.racket-lang.org/reference/stxtrans.html, retrieved on 23.03.2015 The macro system of Fortress [All+09] allows the userto deﬁne nearly arbitrary syntactic constructs, using theformalism of Parsing Expression Grammars [For04]. The newconstructs can be used within the grammar deﬁnition as well,even if it leads to recursion. SugarJ [Erd+11] is a languagebuild on top of Java, SDF [Hee+89], and Stratego [Vis04],capable of extending Java syntax in a similar way through sugar libraries . The downside of these approaches is thatthe new constructs must appear at the top level of the source.They cannot be created dynamically, conditionally, or beparametrized.To our knowledge, all metamorphic languages operate ona single AST to connect the languages, either directly or usingthe deep embedding approaches.

3. Overview

Our overarching goal is to create a ﬂexible yet comprehen-sible metamorphic language. We want syntactic ﬂexibilitytypical to stand-alone languages. At the same time, the DSLsshould be embeddable in a single host language, permittingcommunication with other DSLs. Finally, we expect to gener-ate highly efﬁcient code, without any overhead coming fromthe language embedding.Our solution, which we name

ManyDSL , realizes thesegoals through the following means: • A grammar is used to describe the syntax. • The grammar is augmented with parameters passed downand up the productions, as well as semantic action func-tions put within the rules. We call this a syntax-directedexecution scheme, which we describe in Section 5.1. • The semantic actions are executed immediately duringparsing. We use Dynamic Staging, a feature of DeepCPS,to guide the code generation. • Parsing and code execution is interleaved and intercon-nected, allowing one to affect another. In particular, newlanguages can be loaded on the ﬂy and parsing can beswitched to those new languages.

4. The Host Language

We have chosen DeepCPS as the host language in ManyDSL.It is a functional language that enforces the ContinuationPassing Style (CPS) and enables Dynamic Staging [Dan+14]. • CPS provides high ﬂexibility when designing custom con-trol ﬂow structures. All branches and loops can be ex-pressed as functions. Any DSL embedded in DeepCPS isnot limited by the control structures of the host language. • Dynamic Staging introduces staging as ﬁrst-class con-struct. Staging allows the user to specify domain optimiza-tion, but also let the new DSLs expose staging in variousdegrees on their own. • DeepCPS allows for incremental building of code, withoutoverhead, through the fragment functions [DS16]. asic syntax: λ [ y ] x.b (x , x , ...)[y]{ b } (lambda function) (cid:62) ⊥ and or not always never & | ! (staging constants, operators) [ e ] v v @e:v v v ... ( λ body: application) [ e ]fix [ y ] x = v in b @e: fix [y]x v b ( λ body: ﬁx-point combinator)Syntactic sugar: λ [ y ] x. [ y ] vv (x , x , ...) { v v v ... } (natural staging) [ e ] ( λ [ y ] x.b ) v @e: let [y] x v b (let construct) [ e ] v v ( λ [ y ] x.b ) @e:v v v ... (x , x , ...)[y] b (last argument) ( λ [ y ] x.b ) p β −→ ( λ [ y ] x.b ) v p (x)[y] b (non-CPS expression p ) v | , v | , ... !v (tuple splicing) λ [ s ] x (cid:124)(cid:123)(cid:122)(cid:125) y .b (!y)[s]{ b } (tuple aggregate) Figure 1:

Comparison between the lambda calculus with Dynamic Staging (left), and the actual syntax of DeepCPS (right).

In the Figure 1 we summarize the syntax of DeepCPS.Apart from standard lambda calculus semantics with CPSrestrictions, DeepCPS adds Dynamic Staging. It is realized bythe implicit staging parameter [y] present in every lambdaand a staging expression @e: present in each lambda body.When a lambda is invoked, the implicit staging parameteris always replaced by a special staging constant (cid:62) . Stagingexpressions use these staging parameters to form booleanexpressions. When an expression @e: evaluates to (cid:62) , theannotated body is considered active and is scheduled forexecution. During the execution process, the DeepCPS inter-preter maintains a set of all bodies that are active. At eachexecution step the deepest active body (containing no nestedactive bodies) is executed.Staging variables can be used as normal arguments, andnormal variables can appear in staging expressions as well.Non-stage constants are equivalent to (cid:62) , while variablesthat are still represented only as symbolic values are ⊥ . Theprecise formal deﬁnition of the syntax and semantics is givenin the original DeepCPS paper [Dan+14].The ! operator is an extension to DeepCPS that weheavily depend upon. A parameter preceded by ! acceptsany excessive arguments given to a function, and packs themall into a tuple under the given name. The symbol ! in frontof an argument unpacks all elements of a tuple and splicesthem as an argument list to a function call. For convenience,all arguments that are packed or unpacked through ! , arehighlighted in italics. One of the beneﬁts of DeepCPS that we rely on in this paperis the ability to form staging chains. Each link within thechain is a piece of code that is staged upon some parameter @s1: . The last continuation introduces a new implicit stagingparameter [s2] that is used to stage another link A series oflinks of that kind put together, as for example in Listing 1,form a chain s . After the ﬁrst link is executed the followinglinks become invoked as well, in sequence. The links in the example follow one another. However,because the staging variables can be freely passed betweenfunctions, the links can originate from completely indepen-dent functions. Staging can be used to build arbitrary function incrementally.Each incremental addition is kept in a separate lambda of theform: (!args, cont) ... computation ... cont !args2

The !args is a set of arbitrary arguments passed betweenthe incremental additions. The cont is the continuationrepresenting the rest of the function we construct. We referto these lambdas as subject code as they contain the code ofa function we build. These lambdas are connected togetherthrough the builder functions: • The build function takes the subject code as a lambdaargument and encapsulates it in a fragment function . • The merge function takes two fragment functions andconnects them together. The end result is a new, biggerfragment function containing the subject code of botharguments merged together.Usually, each code needs a single continuation. However,when the subject code represents a branch or a jump, thenumber of continuations may differ. We refer to that number @s1: fct !args (!params)[s2]...@s2: fct2 !args2 (!params)[s3]...@s3: fct3 !args3 (!params)[s4]...

Listing 1:

A series of functions calls chained together by aseries of staging parameters s1 , s2 , s3 , s4 . Such a stagingchain s is executed by triggering the s1 variable. Afterward,all pieces are executed in order, sequentially activating thenext staging parameter. et create_signum(return) {build 2 (ft, val, exit, cont1, cont2)[bt] {@ft: val>0 (positive)if positive ()[ft] {@bt: cont1 ft val exit} ()[ft] {@bt: cont2 ft val exit}} (Fif_pos)build 2 (ft, val, exit, cont1, cont2)[bt] {@ft: val<0 (negative)if negative ()[ft] {@bt: cont1 ft val exit} ()[ft] {@bt: cont2 ft val exit}} (Fif_neg)build 0 (ft,val,exit)[bt] { @ft: exit 1 } (Fp)build 0 (ft,val,exit)[bt] { @ft: exit 0 } (Fz)build 0 (ft,val,exit)[bt] { @ft: exit -1 } (Fn)merge Fif_pos Fp (F)merge F Fif_neg (F)merge F Fn (F)merge F Fz (F)finalize F Preturn (arg, exit)[ft] {@P: P ft arg exit}} Listing 2:

Example of building a code for a function signum .as arity of a fragment function. The arity must be providedby the user into the build function.In order to see how the builders work in practice, consideran example in Listing 2. Here a function signum is beingconstructed, deﬁned as: signum( x ) =  x > x = 0 − x < The body of the function consists of two conditionals, check-ing if the argument is positive, negative or neither. Dependingon a result, a different value is returned through the exit continuation.All the function-related computation is performed in ft staging chain. The staging chain bt is responsible for callingall the merged continuations early, so that they are removedfrom the produced code. In the end, we obtain a functiongiven in Listing 3.The precise deﬁnition of build and merge , togetherwith the explanation how they work is given in [DS16].

5. Functional Grammar

When discussing functional parsing, one typically thinksabout parser combinators (Section 2.2). The generated parseris treated as an ordinary function. It is up to the parsercreator to ensure that it is efﬁcient, but this is non-trivial. Theprogrammer must understand when exactly lazy evaluation is (arg, exit)[ft] {@ft: val>0 (positive)if positive ()[ft] {@ft: exit 1} ()[ft] {@ft: val<0 (negative)if negative ()[ft] {@ft: exit -1} ()[ft]@ft: exit 0}}

Listing 3:

The code produced by the builders from Listing 2.Only code in the ft staging chain remains. We get a tightrepresentation and no overhead from the construction process.triggered and how combinators need to be connected to avoidambiguities.In our approach, we use a functional language to create a grammar instead of a parser. Once created, the grammar isthen processed in a traditional way to create an LL(1) parser.Although it may seem as a step backwards, this allows us tomaintain important practical properties: • Backtracking is guaranteed to never to occur. • Any ambiguities are detected when the parser is generated. • The parsing process is straightforward and easy to follow.On the other hand, since the grammar is deﬁned within afunctional language, we still maintain composability similarto when using parser combinators.

We use a new syntax-directed execution scheme (SDE) asa basis of our parsing process. On the surface, it is verysimilar to syntax-directed translation scheme (SDT) [LS68;Paa95]. SDE however puts emphasis on the execution ofcode. There are no objects representing the parse tree, AST,or IR that would be generated. Instead, productions andactions are treated as functions that are executed as a part ofthe parsing process. Code can be generated during parsingthrough lambda encapsulation and staging.The language is deﬁned by an L-attributed [LRS74] LL(1)grammar augmented by semantic actions that may appear at p ( i p ) → → ( o p ) t ( i ) → → ( o ) t ( i ) → → ( o ) t ( i ) → → ( o ) Figure 2:

A fragment of attributed Abstract Syntax Treecreated for a production p → t t t . An L-attributed tree canbe traversed in a depth-ﬁrst left-to-right fashion (green path).Each attribute along the path depends only on the valuesencountered earlier on the path. ny position in the production body. A hypothetical parsetree can be evaluated with a single depth-ﬁrst left-to-righttraversal, as shown in Figure 2. In our SDE scheme, however,no tree is ever generated. Grammar terms are treated asfunction calls. Attributes are replaced with an equivalentnotion of parameters and arguments that are passed intoand from the terms. We represent each attributed term t as ( i ) → t → ( o ) to indicate the ﬂow of the data — inputarguments (i) are passed into t and the results are returnedinto ( o ) . A complete parametrized production of the form p ::= t t t looks as: ( i p ) → p → ( o p ) ::= ( i ) → t → ( o )( i ) → t → ( o )( i ) → t → ( o ) The head of the production acts as a function header(or signature): The input values are parameters (red) — alist of names, that are set to concrete values when theproduction is taken. The output values are return arguments(green) that must be concrete values themselves by the timethe production is resolved. This distinction is reversed foreach term within the body: input values are arguments andoutput entries are parameters which become set by the calledproduction. A name at argument position always refers to thesame name last seen at the parameter position.A grammar may feature many productions for the samenonterminal. However, the signature of all productions forgiven nonterminal must be equivalent: the name and numberof the input parameters must be the same, as well as thenumber of output values.The set of all productions for a given nonterminal p isrefered to as a rule of p . When such a rule is invoked throughits head nonterminal, the parser performs a standard LL1lookahead to decide which particular production should betaken. Language grammar deﬁnes one or more entry rules . Theseare the possible rules where the parsing in given languagebegins.Entry rules may be referred within productions of otherlanguages. For example, if in language A a language B isused through an entry rule of nonterminal n , we denote it as: ( i ) → B.n → ( o ) When such a foreign nonterminal is encountered, the parsercompletely switches the language: The rules of A becomenonexistent and only rules of grammar B are in effect. Whenthe entry rule of B ﬁnishes, the parser switches back tolanguage A .In general, foreign nonterminal may appear anywhere anormal nonterminal would. The only restriction is that a tokenthat is a foreign nonterminal cannot be used to compute theparse table for LL1. Grammar A has no knowledge of tokens function lassoc {N ::= elem R;R ::= epsilon ;R ::= op elem action R;return N;} function rassoc {N ::= elem R;R ::= epsilon ;R ::= op elem R action;return N;} grammar MinusDiv {Diff ::= lassoc< Quotient, "-", epsilon >;Quotient ::= lassoc< Value, "/", epsilon >;}

Listing 4:

Grammar functions for a left and right associativebinary operators. The difference is in the position of theaction use. We use lassoc to create a grammar supportingleft-associative - and / operators and taking precedence intothe account.of grammar B . These two grammars may use a completelydifferent set of tokens.The parser does not create any global AST. The onlyinformation that is exchanged between the languages isdeﬁned by the foreign nonterminal call and the respectiveentry rule. For that reason, we refer to the set of all entry rulesfor given language as a Language Programming Interface (LPI). Knowing the LPI sufﬁces to use the given languagewith others. Other language rules and actions are private tothat language.

We created a library and a language on top of DeepCPS thatfacilitates the creation of new grammars. Our new language,LangDSL, uses a syntax similar to one used to describe theSDE in Section 5.1. The productions do not have to appear atany speciﬁc point in the code. While grammar, productions,and terms carry additional semantic value, they can be passedfreely between functions. This allows us to deﬁne abstractionsover portions of the grammar. The grammar abstraction f canbe deﬁned through function f syntax andlater used within the context of another production throughthe invocation f .A typical problem for LL parser is the handling of leftrecursion. This is most common when parsing an expressionthat uses left-associative binary operators. A well knownsolution is to take the offending production: A ::= Aα | β and rewrite it as: A ::= βRR ::= (cid:15) | αR The new grammar can parse the same input, but creates adifferent AST that resembles the use of a right-associativeoperator. unction lassoc {alias v = elem:out;N->(v) ::= elem->(v) (v)->R->(v);(v)->R->(v) ::= epsilon ;(v)->R->(v) ::= opelem->(r.v)(v,r.v)->action->(v)(v)->R->(v);return N;} (S)->Value->(S)lassoc reduces to: (S)->N->(S) ::= (S)->Value->(S) (S)->R->(S);(S)->R->(S) ::= epsilon ;(S)->R->(S) ::= op(S)->Value->(r_S)(S,r_S)->action->(S)(S)->R->(S); Listing 5:

The lassoc abstraction taking the rule arguments into the account. On the right, an example derivation of theabstraction for the element rule (S)->Value->(S) . Almost all term arguments are obtained through simple substitution, butthe input arguments for

Value (marked in pink) are added later as the default arguments. The input parameter for N is addedtoo, because otherwise the ﬁrst S would be undeﬁned in the production. Note that two stacks are passed into action , whileonly the second one should actually be used.In SDE however we do not create an AST thus the aboveis not a problem. The distinction between left and rightassociation is achieved not through the shape of the tree,but through the position of the semantic action. Dependingon its position, the actions are executed in different order,taking different arguments.Consider an example in the Listing 4 were we createtwo grammar abstractions — one for each: left and rightassociative binary operator. In the left-associative operator,the action is performed before the recursive call. As a result,actions are performed in the same order as input is being read.In the right-associative operator, action is performed after therecursion is complete. They are performed in the order ofproductions returning, starting from the bottom.We use the abstractions to create concrete grammar forbinary - and / . In our example, the grammar recognizes anyexpression using the two operators, but performs no semanticaction yet. Grammar abstractions are higher-order functions. Each ar-gument, such as elem in the example above is a name ofa nonterminal, which in turn acts as another function. The elem argument may take some arguments, perform parsingoperation, and return a new set of values. Notice however,that in the abstraction in Listing 4 we did not specify whatarguments elem may have.This is not an accedient. We would like the functionssuch as lassoc to be generic enough so that all versions of elem are accepted. Consider, for example different ﬂavorsof

MinusDiv grammar having values of different kind: • A value may simply be a single integer returned by the

Value rule: ->(integer) • A value may be a quotient represented by a pair of integers.A

Value would return a pair: ->(num, denom) . • A value may be a string name. In order to retrieve a valuean environment parameter env may be needed. Such useexample is discussed in Section 7.1. • Elements may be added to a stack S . We would then havea production of the form (S)->Value->(S’) .LangDSL helps achieve that goal in three ways: • First, when an argument or parameter is missing, LangDSLadds a default argument – it is the name of the correspond-ing parameter causing the error. This is performed after allgrammar abstractions are resolved. Only the entry rules,that deﬁne the LPI, cannot be altered in that way. • Secondly, a grammar abstraction can take an argumentdeﬁning the name list. Whenever the name list is usedas a nonterminal’s input or output, the content of thelist is spliced. Moreover, for a name tuple n , a value prefix.v is a new name tuple with a prefix_ addedto each element of the original n . • Finally, for any grammar term t , its input and outputargument list can be checked throuh t:in and t:out .While such string manipulation is a simple solution tothe problem, in most cases it sufﬁces. Any potential nameclashes can be avoided by adding the preﬁx, and the scopeof the names is local. With this help lassoc can be madegeneric enough, as shown in Listing 5, to handle all the casesgiven above.

6. Semantic Actions

Semantic actions are small DeepCPS functions embeddedwithin the grammar. These actions are invoked whenever theparser reaches the term within the production. Upon invoca-tion the parsing process is halted and the action function isexecuted.As shown in Listing 6, an action has an arbitrary number( n , m ) of input and output attributes. The DeepCPS functionmust take n + 1 parameters. It also provides an implicit stag-ing parameter parse that is set to (cid:62) when the parser invokesthe function. The action’s input arguments are mapped to theﬁrst n parameters. The n +1 argument is a return continua-tion function provided by the parser. Calling the continuationreturns the control back to the parser. The return accepts i , i , ..., i n ) -> ( o , o , ..., o m ) {@parse: ... body ...return o o ... o m } Listing 6:

A semantic action with n input and m outputattributes, which are mapped to DeepCPS function andits continuation. For convenience, the action body is putafter, and not in between input and output parameters. TheDeepCPS section between the curly braces (green) skips thehead of a lambda, as it is auto-generated as: (i , i ,..., i n , return)[parse] . grammar MinusDiv {Diff->(v) ::= lassoc< Quotient, "+",(l,r)->(v) { l-r (diff)return diff} >;Quotient->(v) ::= lassoc< Integer, "*",(l,r)->(v) { l/r (quot)return quot} >;}

Listing 7:

Grammar for binary - and / , performing thecomputation immediately, during parsing. The ﬁnal resultis a single number. m arguments that are passed back into the output parametersof the action. The most straightforward use of the semantic actions is toprovide the intended semantic meaning of the parsed codedirectly. The

MinusDiv grammar with all the necessaryparameters and actions is given in Listing 7. The actiontakes two arguments l and r , and applies the correspondingmathematical operation at that point in time, during parsing.The result is returned as v back to the parser and used insubsequent production calls. The SDE scheme can also be used for code generation,through means of Dynamic Staging and builders given inSection 4.3. In Listing 8 there is no change in the grammaritself, but the parameters and actions are a bit different.The bodies of the actions still contain the same code: amathematical operation followed by a continuation call. Thistime however, these operations are embedded as a subjectcode within the fragment functions.Note how number literals are handled in the

Number production. A single line within the builder takes the newvalue v and concatenates it into the recurring !args tupleby calling the continuation cont with arguments ft v!args . As a result, each time a number is read, the arity of !args increases by one. The - and / operations pop twolast elements of !args and push the result, decreasing thearity of the tuple by one. let finalize(F,return)[bt] { return(!args)[ft]@bt: F always ft !args} grammar MinusDiv {Expr->(P) ::=()->(F) {build 1 (!args, cont){cont !args} return}(F)->Diff->(F)(F)->(P) {build 0 (ft, v, end) {@ft: end v} (Fend)merge F Fend (F)finalize F return};(F)->Diff->(F) ::= lassoc< Quotient, "+",(F)->(F) {build 1 (ft, l, r, !args, cont)[bt] {@ft: l-r (diff)[ft]@bt: cont ft diff !args} (Fnext)merge F Fnext return} >;(F)->Quotient->(F) ::= lassoc< Number, "/",(F)->(F) {build 1 (ft, l, r, !args, cont)[bt] {@ft: l/r (quot)[ft]@bt: cont ft quot !args} (Fnext)merge F Fnext return} >;(F)->Number->(F) ::= Integer->(v)(F,v)->(F) {build 1 (ft !args, cont) {cont ft v !args} (Fnext)merge F Fnext return};}

Listing 8:

Grammar for binary - and / with a deferredcomputation. The result is a function that, when invoked,performs the computation.Note that all these operations on !args are not staged inthe ft chain. They are resolved early, during the construction,substituting the respective arguments of the - and / operators.As a result, the ﬁnal code contains only the mathematicaloperators, without any overhead.The new production Expr initializes a new functionfragment representing the expression. At the end, it ﬁnishesit with the call to finalize , which triggers the bt chain(set to always ), but leaves the ft chain intact. It returns alambda (!args)[ft] with only function-time code in it.For the input we obtain: (end)[ft] {@ft: (quot)[ft2]@ft2: (diff1)[ft3]@ft3: diff1-3 (diff2)[ft4]@ft4: end diff2} .3 Multi-domain code Fragment functions, same as any other data, may be passedthrough the LPI into another language. For each fragment F the languages must agree on the type of fragments accepted.This entails: • the minimum expected arity of F • the arguments that are passed into the subject code of F through the !args .Note that the languages do not need to agree on the innerrepresentation of the code. The grammar of one language, aswell as the structure of the fragment functions that were usedto construct F does not matter for the other.For that reason, even in the context of fragment functions,it sufﬁces to check the signatures of the entry rule and relyon the LPI. The designers of the languages may choose toexchange more complex data structures, such as a representa-tion of compound or recursive data types. This is however adecision to be made by the language designers, independentlyfrom ManyDSL core. It is not forced by ManyDSL itself.

7. Challenges

In the previous section we have explained the basic mecha-nism of the Syntax-Directed Execution scheme with the useof Dynamic Staging. Let us now focus on more pragmaticchallenges when designing a small DSL.

One of important aspects of almost any language is namebinding. What scopes does a DSL provide and how can namesbe mapped to values? How is the name lookup performedbetween languages?With our SDE scheme we do not have to rely on asingle approach. Name binding can be handled by user code,possibly in an early stage to avoid run-time overhead. Letus assume that the environment is represented in an object-oriented style, through a mutable object env with methods insert and lookup .Custom name binding can be realized by such environ-ment, within the builders, staged in the build-time chain. Bydoing so, function-time values are stored symbolically withinthe environment and are referenced as such in other frag-ments. For example, in Listing 9 we use the environmentto extend the

MinusDiv grammar to support assignmentstatement id = Expr . The

Expr rule builds a function P that is used to compute a value v at function-time. However,at build time we take the symbolic name v and include it inthe environment under a new name id .The identiﬁers can be used as a part of an Expr , replacingthe constant integers. When an identiﬁer is encountered, stillat build time, the symbolic name v is retrieved.In the above example we use a single environment typewithin a single DSL. However, the implementation of eachenvironment is independent and may be very different from another one. As long as they share the signature, they can becombined together.Consider an env object with a dynamic dispatch for itsmethods that implement language A lookup rules. It is thenpassed through the LPI to another language B . Then, if thelanguage B uses the object, it can access values deﬁnedin language A using A ’s scoping rules provided by thepolymorphic object env . Some DSLs require multiple passes over their AST for seman-tic analysis and code generation. This is most common whena DSL supports some form of recursion where all declaredterms must be visible before processing their deﬁnitions. Inour SDE scheme we do not create an AST that could be tra-versed. However, each action can create multiple fragmentsthat can be connected in some different order.Consider a simple example DSL for specifying directedgraphs. Each entry consist of a head vertex, and an edge listnaming the vertices where the head connects to. We want torepresent the graph as an adjacency tuple, with each vertexbeing implicitly represented as an index. With the input ofthe form

Start -> X, Y;X -> Y;Y -> X, Start; we want to create an environment: [["Start",1],["X",2],["Y",3]] and an indexed list describing the graph, such as: [[2,3],[3],[2,1]]

The full grammar is given in Listing 11 in the Appendix.Let us consider here only actions that must be performedwhen reading the head vertex and when reading the edge list.We create two fragments:

Decl and

Def . Upon reading thehead, we change our environment by assigning a new indexto the vertex name: (Decl,name)->(Decl) {(F)->Assgn->(F) ::= Identifier->(id) "="Expr->(P) ";"(id,F,P)->(F) {build 1 (ft, !args, env, cont)[bt] {@ft: P env (v)[ft]@bt: env.insert(id,v) (env)cont ft !args env} (Fnext)merge F Fnext return};(F)->Number->(F) ::= Identifier->(id)(F,id)->(F) {build 1 (ft, !args, env, cont) { env.lookup(id) (v)cont ft v !args env} (Fnext)merge F Fnext return};

Listing 9:

Example use of an environment within builders.This grammar extends Listing 8 to support named values. Inthe produced code name binding is already resolved as noenvironment operations are present in the staging chain ft . uild 1 (ft, env, idx, end, cont) { env.insert(name,idx) (env) idx+1 (idx)cont ft env idx end} (DeclNext)merge Decl DeclNext return} When reading a name within the edge list we look-up theindex within the environment and update the adjacency list: (Def,name)->(Def) {build 1 (ft, env, graph, adjacent, end, cont) { env.lookup(name) (idx)[bt]@ft: concat(adjacent,[idx]) (adjacent)[ft]@bt: cont ft env graph adjacent end} (DefNext)merge Def DefNext return};

Note that

Decl fragments are merged to other

Decl -s, and

Def only to other

Def -s. Only at the end, when the wholegraph has been parsed, the

Decl and

Def fragments areconnected. This way we obtain a single function where alldeclarations appear before the adjacent list deﬁnitions.

The type system of DeepCPS is rudimentary and provides nostatic correctness with the respect to staging. Using it directlyin a higher-level DSL would be limiting, and the producederror messages originating from within the semantic actionscould be confusing to the DSL user.However, type system need not be limited to a ﬁxed ruleset of a language. Type interference can be seen as a formof partial evaluation of the code with respect to its typeannotations [Her10]. In a DSL embedded in DeepCPS werepresent this as auxiliary values (e.g. types) and auxiliarycomputation performed in an early stage.For a simple example, consider an extension to the

MinusDiv language supporting different types of num-bers. Each DSL value is represented by two variables: theactual value and its type. Before each mathematical operationwe ﬁrst check the operand types and compute the type of theresult. For example, within the action of

Diff production ofListing 8 we have: build 1(ft,lval,ltype,rval,rtype,!args,cont)[bt] {@ltype & rtype: // as soon as types are known ltype != rtype (error)if error () { // error occurred print "Type mismatch!" ()exit} () // else — no type error let [ok] difftype ltype@ok & ft: lval-rval (diff)[ft]@bt: cont ft diff difftype !args} (Fnext) ...

This way type checking is performed in the code itself. Inthis simple example it boils down to a simple comparison,but a custom DSL may perform more involved checks.Note that we chose not to create a dedicated stage chainfor type checking (e.g. tc ). Instead, the check is performedas soon as all the necessary information is available. This way, the same function can be used in a statically-typedand dynamically-typed DSL, as well as in a context of type-dependent functions.

8. Implementation

ManyDSL is the name we have given to our implementationof the metamorphic language described in this paper. TheManyDSL workﬂow is sketched in Figure 3. The ﬁrst stepis parsing the source code into Target Representation (TR).TR is interpreted and partially evaluated using DynamicStaging. By calling a built-in $compile , any function canbe compiled, through Thorin [LKH15] and LLVM [LA04],into highly efﬁcient machine code.Initially, ManyDSL can parse only DeepCPS source,which has almost 1:1 correspondence to TR. The user how-ever can introduce a new language using the SDE schemepresented in this paper. When the language deﬁnition is exe-cuted by ManyDSL and LL parser is created, it can be usedto parse the remainder of the source code.While technically possible, the user never directly createsnor modiﬁes TR. It is transformed solely through the seman-tics of lambda calculus with dynamic staging. The partiallyevaluated TR code can also be emitted back as DeepCPS. Weuse this mechanism for bootstrapping LangDSL (Section 8.3),but it is not needed in the normal usage of ManyDSL.

It would be pointless to change the parser when all sourceis already read. For that reason, in ManyDSL the parsingprocess can be interrupted letting the ManyDSL interpreterexecute the part that has been already translated into TR. Thisinterleaved parsing and execution gives ManyDSL a uniquepossibility where the code and the parser can communicatewith each other.This communication is primarily used for user-guidedparsing, but it can beneﬁt other pragmatic situations. Forexample, the name of an include ﬁle may be the result of source TRparser ThorinLLVMinterpretcompile

Figure 3:

The structure of ManyDSL. Source code is parseddirectly into Target Representation (TR). TR can be inter-preted and partially evaluated, producing a more efﬁcient TRcode. It can also modify the parser of ManyDSL, so that dif-ferent DSLs can be read. Finally, TR code can be translatedto Thorin [LKH15] and then compiled to LLVM [LA04]. ome computation — such as inspecting the operating systemenvironment or the target hardware architecture. This wayDeepCPS code may contain not only the program to becompiled, but the whole build system around it.In DeepCPS the interruption is achieved through a specialsyntax . However, as explained in Section 6, semanticactions are executed immediately during parsing. That means,switching between parsing and execution occurs implicitly atevery action. The parser generator of ManyDSL is available through C API.The grammar is build incrementally by building its structure,piece-by-piece by invoking these C functions from DeepCPS.The C functions can take higher-order DeepCPS values asarguments.We used this low-level approach to bootstrap LangDSLwith the syntax explained in this paper. With LangDSL,the use of C API is entirely hidden from the user. Circularreferences between nonterminals are handled the same way asshown in Section 7.2. Since the bodies of semantic actions arewritten in DeepCPS, we instruct ManyDSL to switch parsersfrom LangDSL to native DeepCPS whenever an action isencountered.LangDSL includes a few additional syntactic sugar con-structs that were omitted in this paper to further simplifylanguage creation. It includes a dedicated syntax for thebuild+merge pattern. Commonly recurring production pa-rameters, such as F in Listing 8, can be skipped at the usesite of a nonterminal. The output of LangDSL is a function describing a newlanguage. When invoked, the parser generator of ManyDSLis used to create a new LL(1) parser. Afterward, when theuser orders ManyDSL to change the parser, the next token inthe input stream is processed by the new language.At low level, switching is performed by calling C functionsfrom DeepCPS. This can be encapsulated within an action ofa grammar. For example, when LangDSL deﬁnes itself, therule for the semantic action (see Listing 6) is deﬁned similarlyto the code in Listing 10.In the above production, the ﬁrst action invokes C func-tions to switch language from LangDSL to DeepCPS. A non-terminal

LambdaBody from within DeepCPS is invoked.The ! indicates a foreign nonterminal, which does not existwithin the language where it is used. When LambdaBody has completed, the second action switches the currentlyparsed language back to LangDSL.

9. Discussion

We have shown a unique compiler system that allows creationof multiple DSL with custom syntax, yet embedding it inthe functional language DeepCPS to express its semantics.Our embedded DSLs are not limited by the syntax of the

Action->(action) ::=ParamList->(p_in) "->" ParamList->(p_out) "{"(parin)->(head,lang) {TRCreateLambdaHead parin (head)getCurrentLanguage $parser (lang)setCurrentLanguage $parser $DeepCPS ()return head lang}(head)->!LambdaBody->(lambda)(lang,lambda,p_in,p_out)->(action) {setCurrentLanguage $parser lang ()LangDSLCreateAction p_in p_out lambda return}"}";

Listing 10:

A LangDSL semantic action construct deﬁnedin LangDSL itself. The grammar calls a foreign production

LambdaBody of the

DeepCPS language.host language. The semantic actions of a DSL can be usedto build code directly without any additional intermediaterepresentation.The grammar speciﬁcation itself is embedded in Deep-CPS and is entirely function-based. We have shown howportions of a grammar can be abstracted, creating reusableand parametrizable fragments of a grammar.The ﬂexibility and staging of DeepCPS allows for creatingarbitrary control ﬂow and introducing multiple passes for thetarget DSL. The additional stages can be used to programdomain-speciﬁc optimizations. Staging can also be used toperform early program checks, implementing a custom typesystem or adding auxiliary computations [Her10].The code we generate is represented with fragment func-tions. These fragments can be connected together akin to ASTnodes. Unlike an AST however, the functions are opaque:their behavior cannot be inadvertently changed in any way,by other nodes or tree transformations. The only operationspossible on a fragment functions is merging and execution.Still, the execution may trigger optimization that is deﬁnedwithin the fragments through Dynamic Staging.The opaqueness of the fragment functions gives each cre-ated language a unique possibility to deﬁne its own LanguageProgramming Interface – a set of entry rules that other lan-guages may use. The LPI of the language deﬁnes entirely howit can be used and what can be produced with it. The userdoes not need to worry about the internals of that language.

Future Work

Within the paper, as well as in ManyDSL, we limit ourselvesto LL(1) grammars. Our main focus was to introduce theSDE scheme and language switching. We plan to explore ifand when this constraint can be lifted. While SDE requirestop-down parsing, supporting LL(*) or PEG is a possibilityas long as backtracking is limited or avoided and languageswitching is suitably handled. Alternatively, our LL(1) parsercould be extended to support productions predicated byarbitrary DeepCPS code. e continue to search for a good set of grammar abstrac-tions within LangDSL. We hope that nearly all aspects ofDSL building can be deferred to a few grammar-buildingfunction calls. Only the most unique syntactic constructs,speciﬁc for given domain, would require a direct grammardescription.Moreover, the handling of rule parameters in the abstrac-tion is less than ideal. We hope to ﬁnd a more robust solutionin the future.Currently, the grammar actions can be expressed onlydirectly in DeepCPS. However, any language embedded inManyDSL is suitable. We want to increase productivity ofLangDSL by permitting higher-level languages deﬁne thesemantic actions.In Section 7.3 we described how Dynamic Staging canbe used to deﬁne auxiliary computation and a simple typesystem. In theory, nearly any type system can be deﬁnedas a staged computation and used in a custom DSL. To ourknowledge, however, this possibility has not yet been fullyexplored and require further research. References [Ada+98] N. I. Adams IV et al. “Revised(5) Report on the Algo-rithmic Language Scheme”. In:

ACM SIGPLAN Notices

ACMComputing Surveys

FOOL . 2009.[Bac+63] J. W. Backus et al. “Revised Report on the AlgorithmLanguage ALGOL 60”. In:

Commun. of the ACM

ICFP . 1998, pp. 174–184.[BMT07] Gavin M. Bierman, Erik Meijer, and Mads Torgersen.“Lost in Translation: Formalizing Proposed Exten-sions to C

SIGPLAN Notices

PEPM . 2002, pp. 31–40.[Cha+10] Hassan Chaﬁ et al. “Language Virtualization for Het-erogeneous Parallel Computing”. In:

OOPSLA . 2010,pp. 835–847.[CKS09] Jacques Carette, Oleg Kiselyov, and Chung-chieh Shan.“Finally Tagless, Partially Evaluated: Tagless StagedInterpreters for Simpler Typed Languages”. In:

Journalof Functional Programming

GPCE . 2014.[DF15] Richard Dallaway and Jonathan Ferguson.

EssentialSlick . Underscore, 2015.[Dij82] Edsger W. Dijkstra. “How do we tell truths that mighthurt?” In:

Selected Writings on Computing: A PersonalPerspective . EWD 498. Springer-Verlag, 1982, pp. 129–131. [DS16] Piotr Danilewski and Philipp Slusallek. “Building Codewith Dynamic Staging”. In:

ArXiv e-prints (Dec. 2016).arXiv: .[EHK96] Dawson R Engler, Wilson C Hsieh, and M FransKaashoek. “‘C: A Language for High-level, Efﬁcient,and Machine-independent Dynamic Code Generation”.In:

POPL . 1996, pp. 131–144.[Erd+11] Sebastian Erdweg et al. “SugarJ: Library-based Syntac-tic Language Extensibility”. In:

OOPSLA . 2011.[Fok95] J. Fokker. “Functional Parsers”. In:

Advanced Func-tional Programming: 1st International Spring School onAdvanced Functional Programming Techniques . Ed. byJ. Jeuring and E. Meijer. Springer, 1995, pp. 1–23.[For02] Bryan Ford. “Packrat Parsing:: Simple, Powerful, Lazy,Linear Time, Functional Pearl”. In:

ICFP . 2002, pp. 36–47.[For04] Bryan Ford. “Parsing Expression Grammars: A Recognition-based Syntactic Foundation”. In:

POPL . 2004, pp. 111–122.[Hee+89] J. Heering et al. “The Syntax Deﬁnition Formalism SDF— Reference Manual”. In:

SIGPLAN Notices

Haskell vs. Ada vs. C++ vs.Awk vs. ... An Experiment in Software Prototyping Pro-ductivity . Tech. rep. Department of Computer Science,Yale University, 1994.[HM98] Graham Hutton and Erik Meijer. “Monadic Parsing inHaskell”. In:

Journal of Functional Programming

ACM Computing Surveys

DOI : .[JML98] Simon P. Jones, E. Meijer, and D. Leijen. “ScriptingCOM Components in Haskell”. In: ICSR . 1998.[Joh75] Stephen C. Johnson.

YACC – Yet Another Compiler-Compiler . Tech. rep. Bell Laboratories, 1975.[Jon+14] Manohar Jonnalagedda et al. “Staged Parser Combina-tors for Efﬁcient Data Processing”. In:

OOPSLA . 2014,pp. 637–653.[Joo]

The jOOQ User Manual . 2016.

URL : . retrieved on 01.06.2016.[Knu65] Donald E. Knuth. “On the Translation of Languagesfrom Left to Right”. In: Information and Control

The TeXbook . Addison-Wesley Pro-fessional, 1984.[Koh+86] Eugene Kohlbecker et al. “Hygienic Macro Expansion”.In:

LFP . 1986, pp. 151–161.[KP98] Pieter Koopman and Rinus Plasmeijer. “Efﬁcient Com-binator Parsers”. In:

IFL . 1998, pp. 122–138. KR77] Brian W. Kernighan and Dennis M. Ritchie.

The M4Macro Processor . Tech. rep. Bell Laboratories, July1977.[LA04] Chris Lattner and Vikram Adve. “LLVM: A Compila-tion Framework for Lifelong Program Analysis & Trans-formation”. In:

CGO . 2004, pp. 75–86.[Lei+15] Roland Leißa et al. “Shallow Embedding of DSLs viaOnline Partial Evaluation”. In:

Proceedings of the 14thInternational Conference on Generative Programming:Concepts & Experiences (GPCE) . ACM. Pittsburgh, PA,USA, 2015, pp. 11–20.[LKH15] Roland Leißa, Marcel Köster, and Sebastian Hack. “AGraph-Based Higher-Order Intermediate Representa-tion”. In:

CGO . 2015.[Llv]

Kaleidoscope: Code generation to LLVM IR . http :/ / llvm . org / docs / tutorial / LangImpl3 .html . Accessed on 12.05.2015.[LM01] Daan Leijen and Erik Meijer. Parsec: Direct StyleMonadic Parser Combinators for the Real World . Tech.rep. Department of Computer Science, UniversiteitUtrecht, 2001.[LM99] Daan Leijen and Erik Meijer. “Domain Speciﬁc Embed-ded Compilers”. In:

DSL . 1999, pp. 109–122.[LMB92] John R. Levine, Tony Mason, and Doug Brown.

Lex&Amp; Yacc (2Nd Ed.)

O’Reilly & Associates, Inc.,1992.

ISBN : 1-56592-000-7.[LRS74] P.M. Lewis, D.J. Rosenkrantz, and R.E. Stearns. “At-tributed translations”. In:

Journal of Computer and Sys-tem Sciences

Journal of the ACM

Semantically-Sensitive Macroprocess-ing . Tech. rep. EECS Department, University of Califor-nia, Berkeley, 1989.[Mei00] Erik Meijer. “Server Side Web Scripting in Haskell”. In:

Journal of Functional Programming

GPCE . 2013, pp. 125–134.[Paa95] Jukka Paakki. “Attribute Grammar Paradigms — a High-level Methodology in Language Implementation”. In:

ACM Computing Surveys

Hackers &Painters: Big Ideas from the Computer Age . O’ReillyMedia, Inc., 2004, pp. 169–180.[PF11] Terence Parr and Kathleen Fisher. “LL(*): The Founda-tion of the ANTLR Parser Generator”. In:

PLDI . 2011,pp. 425–436.[Rey75] John Reynolds. “User-deﬁned Types and ProceduralData Structures as Complementary Approaches to DataAbstraction.” In:

New Directions in Algorithmic Lan-guages . 1975.[RO12] Tiark Rompf and Martin Odersky. “Lightweight Mod-ular Staging: A Pragmatic Approach to Runtime Code Generation and Compiled DSLs”. In:

Communicationsof the ACM

Language . Vol. 5. 4. Linguistic Society of America,1929, pp. 207–214.[SBP99] Tim Sheard, Zine el-abidine Benaissa, and Emir Pasalic.“DSL Implementation Using Staging and Monads”. In:

DSL . 1999, pp. 81–94.[TH+11] Sam Tobin-Hochstadt et al. “Languages as Libraries”.In:

PLDI . 2011, pp. 132–141.[Thi05] Peter Thiemann. “An Embedded Domain-speciﬁc Lan-guage for Type-safe Server-side Web Scripting”. In:

ACM Transactions on Internet Technology

Efﬁcient Parsing for Natural Language:A Fast Algorithm for Practical Systems . Kluwer Aca-demic Publishers, 1985.[TS99] Walid Taha and Tim Sheard. “MetaML and Multi-Stage Programming with Explicit Annotations”. In:

Theoretical Computer Science . 1999.[Vel96] Todd Veldhuizen. “C++ Gems”. In: ed. by Stanley B.Lippman. SIGS Publications, Inc., 1996. Chap. Expres-sion Templates, pp. 475–487.[Vis04] Eelco Visser. “Program Transformation with Strate-go/XT”. In:

Domain-Speciﬁc Program Generation . Ed.by Christian Lengauer et al. Vol. 3016. Lecture Notes inComputer Science. 2004, pp. 216–238.[Wad85] Philip Wadler. “How to Replace Failure by a List ofSuccesses”. In:

ICFP . 1985, pp. 113–128.[Who40] Benjamin Lee Whorf. “Science and Linguistics”. In:

Technology Review . MIT, 1940. ppendix Graph->(P) ::= ()->(Decl,Def) {build 1 (ft, end, cont) {newEnv (env) cont ft env 1 end} (decl)build 1 (ft, env, end, cont) {cont ft env [] end} (def)return decl def}(Decl,Def)->Vertex->(Decl,Def)(Decl,Def)->(G) {merge Decl Def (Descr)build 0 (ft, env, graph, end) {@ft: end graph} (Fend)merge Descr Fend (Descr)finalize Descr return};(Decl,Def)->Graph->(Decl,Def) ::= lassoc;(Decl,Def)->Vertex->(Decl,Def) ::=Name->(name)(Decl,name)->(Decl) {build 1 (ft, env, idx, end, cont) { env.insert(name,idx) (env) idx+1 (idx)cont ft env idx end} (DeclNext)merge Decl DeclNext return}(Def)->(Def) {build 1 (ft, env, graph, end, cont) {cont ft env graph [] end // adjacent list starting empty } (DefNext)merge Def DefNext return}"->"lassoc(Def)->(Def) {build 1 (ft, env, graph, adjacent, end, cont)[bt] {@ft: concat(graph,[adjacent]) (graph)[ft]@bt: cont ft env graph end} (DefNext)merge Def DefNext return};(Def)->Edge->(Def) ::=Name->(name)(Def,name)->(Def) {build 1 (ft, env, graph, adjacent, end, cont) { env.lookup(name) (idx)[bt]@ft: concat(adjacent,[idx]) (adjacent)[ft]@bt: cont ft env graph adjacent end} (DefNext)merge Def DefNext return};

Listing 11:

The complete grammar for a graph-describing language from Section 7.2. Two series of fragment functions arecreated:

Decl and

Def . In

Decl , each new vertex is given a new index and added to an environment. Within

Def it is assumedthat all vertices are already given a number. These fragment functions are then merged in such a way that all

Decl -s precede all

Def -s.14