[PDF] Generating Code with Polymorphic let: A Ballad of Value Restriction, Copying and Sharing

Abstract

Getting polymorphism and effects such as mutation to live together in the same language is a tale worth telling, under the recurring refrain of copying vs. sharing. We add new stanzas to the tale, about the ordeal to generate code with polymorphism and effects, and be sure it type-checks. Generating well-typed-by-construction polymorphic let-expressions is impossible in the Hindley-Milner type system: even the author believed that. The polymorphic-let generator turns out to exist. We present its derivation and the application for the lightweight implementation of quotation via a novel and unexpectedly simple source-to-source transformation to code-generating combinators. However, generating let-expressions with polymorphic functions demands more than even the relaxed value restriction can deliver. We need a new deal for let-polymorphism in ML. We conjecture the weaker restriction and implement it in a practically-useful code-generation library. Its formal justification is formulated as the research program.

Full PDF

JJeremy Yallop and Damien Doligez (Eds.)ML/OCaml 2015EPTCS 241, 2017, pp. 1–22, doi:10.4204/EPTCS.241.1 This work is licensed under the Creative CommonsAttribution-No Derivative Works License.

Generating Code with Polymorphic letA Ballad of Value Restriction, Copying and Sharing

Oleg Kiselyov

Tohoku University, Japan [email protected]

Getting polymorphism and effects such as mutation to live together in the same language is a taleworth telling, under the recurring refrain of copying vs. sharing. We add new stanzas to the tale, aboutthe ordeal to generate code with polymorphism and effects, and be sure it type-checks. Generatingwell-typed–by–construction polymorphic let-expressions is impossible in the Hindley-Milner typesystem: even the author believed that.The polymorphic-let generator turns out to exist. We present its derivation and the application forthe lightweight implementation of quotation via a novel and unexpectedly simple source-to-sourcetransformation to code-generating combinators.However, generating let-expressions with polymorphic functions demands more than even therelaxed value restriction can deliver. We need a new deal for let-polymorphism in ML. We conjecturethe weaker restriction and implement it in a practically-useful code-generation library. Its formaljustiﬁcation is formulated as the research program.

This paper revolves around code generation, namely, generating typed, higher-order code for languagessuch as OCaml. Speciﬁcally we deal with one approach to code generation: staging (recalled in § .However, staging here is the lens through which to look at the old problem of let-generalization.The unexpected interactions of polymorphism and staging brings into focus the ‘too obvious’ and hencerarely mentioned assumptions of the value restriction. Generating code that contains polymorphic let-expressions is a non-contrived, real-life application that requires let-generalization of effectful expres-sions – going beyond what even the relaxed value restriction offers. Staging thus motivates further workon the seemingly closed topic of let-generalization in the presence of effects.Although program generation is a vast area, surprisingly there has been very little research on typed-assured code generation with polymorphic let. To our knowledge, [15] is the ﬁrst paper that bringsup a staged calculus that has both polymorphism and mutable cells. It is motivated by the unexpectedinteraction of polymorphism and staging that we describe in § Post-validation is hence similar to run-time failure of ill-typed code in dynamically-typed languages. However, with a run-time error we can get a stack trace, etc. On the other hand, when post-validating the (typically large and obfuscated) generatedcode, the generator is long gone and its state can no longer be examined.

Generating Code withPolymorphic letcode generation (see [11, §

5] for the recent overview) yet polymorphic-let expressions are not includedin the target language. The only related, albeit quite remotely, is the work [3] on typed self-interpreters,which does include the representation of polymorphic expressions as code – but lacks any effects. Thatwork is based on System F w , which is difﬁcult to use in practice, in part because it lacks type inference.In contrast, in our code generation approach all types are inferred. Contributions

First, the paper presents a new translation from the staged code – with quotations,unquotations and cross-stage persistence – to quotation-free expressions over code-generating combi-nators. The translation is remarkably simpler than the other unstaging translations. It also translatesquoted let-expressions to let-expressions, for the ﬁrst time giving the chance to generate polymorphiclet-expressions, well-typed by construction. Second, we present the ﬁrst library of typed code combi-nators whose target language includes polymorphic let-bindings and effects. The library requires noﬁrst-class polymorphism, no type annotations and, combined with the unstaging translation, is suitablefor implementing staging by source-to-source translation to combinators. The library solves the problemthat the author claimed in 2013 to be unsolvable [13].Although the translation and the library are already practically useful, their formalization requiresdeeper understanding of polymorphism and effects. The paper proposes a research program, which willhave to open the old value-restriction wounds and could ﬁnally heal them. Thus we end up posing morequestions – the questions that could not have been asked before.The paper starts with extensive background. § § § § § § § http://okmij.org/ftp/meta-programming/polylet.ml This background section recalls let-generalization; its problems in the presence of effects; staging; andthe unexpected interaction of generalization and staging that calls up the assumptions of the value re-striction. The section introduces the running examples used later in the paper.

Since the early days of LISP and ISWIM [16], let-expressions let us introduce and name common ornotable expressions which are used, often repetitively, later in the code. Here is a simple example: (1) let x = [1] in (2:: x ,3:: x) legKiselyov 3It may be regarded as a sort of a ‘macro’ that expands into (2) (2::[1],3::[1]) In fact, such a ‘macro-expansion’ – copying (inlining) the let-bound expression into the places marked bythe let-bound variable – is the meaning given to let-expressions in Landin’s ISWIM [16]. The alternativeto this copying, or substitution-based semantics is sharing. It views (1) as introducing the expression [1] that is shared across all occurrences of x . Hence the two lists in (2) share the common tail. Copying vs.sharing reverberates throughout the paper as the constant refrain. If our program simply prints out (2),the two semantics are indistinguishable. The equivalence lets the compiler choose inlining or sharing asﬁts.Likewise, the code (3) let x = [] in (2:: x,”3”::x) may be viewed as a macro that expands into (4) (2::[], ”3” ::[]) It is tempting to also regard (3) as the sharing of the empty list across the two components of the returnedpair. Unlike (2), however, [] in (4) has different types: namely, int list in the ﬁrst component vs. string list in the second. Thus comes the problem of what type to give to the shared value and to x .The answer developed by Milner [20] was polymorphism: the same expression that occurs in – hasbeen copied into – differently typed contexts may be shared and given the common, the most general,polymorphic type (see also the extended discussion in [5]). The empty list [] has the type a list to befully determined by the context; a is the placeholder: a (unique) type variable. In (4), the contextsdetermine the type as int list and string list , respectively. In (3), the context of the right-hand side (RHS)of the let-binding has not determined what a list should be. In that case, the type is generalized to thepolymorphic type schema: ∀ a . a list .Formally, the typing of let-expressions is represented by the (GenLet) rule below. The rule is writtenin terms of the judgments G ⊢ e : t that an expression e has the type t in the type environment G (whichlists the free variables of e and their types). G ⊢ e : t G , x : GEN ( G , t ) ⊢ e’ : t’ GenLet G ⊢ let x = e in e’ : t’ x : ∀ a . . . a n . t ∈ G Inst G ⊢ x : t { a = t . . . a n = t n } The generalization function

GEN ( G , t ) for the type t with respect to the type environment G quantiﬁesthe free type variables of t that do not occur as free in G : GEN ( G , t ) = ∀ a . . . a n . t where { a . . . a n } = FV ( t ) − FV ( G ) where FV ( · ) denotes the set of free variables. When a variable with the polymorphic type schema suchas x: ∀ a . a list in (3) is used in an expression, e.g., , the schema is converted to a more speciﬁctype, int list in our example: see the rule (Inst). The underlying assumption is that the value named by x indeed has the same representation for all instances of the polymorphic type schema and hence may beshared, even across differently-typed contexts; the instantiation is a purely type-checking-time operationthat behaves like identity at run-time. One may say that the motivation of polymorphism is to extend theequivalence of the copying and sharing semantics to the cases like (3).Side-effects break the equivalence of copying and sharing. (5) let x = begin printf ”bound”; [1] end in (2:: x ,3:: x) Generating Code withPolymorphic letIf the right-hand side is copied, substituting for the occurrences of x in (5), the string ”bound” is printedtwice. If the RHS is ﬁrst reduced, however, (5) turns to the earlier (1), where x is bound to the value thatcan be either shared or copied. Hence the copying/sharing equivalence holds even in the case of (5), ifwe regard variables as bound to values – as we do in call-by-value languages . The polymorphic caseshould likewise be unproblematic: (6) let x = begin printf ”bound”; [] end in (2:: x,”3”::x) The polymorphic equality of OCaml can distinguish sharing and copying: (7) let x = [1] in x == x whereas (7) returns the result true , the expression [1] == [1] produces false . Another, universal wayto distinguish sharing and copying uses mutable data structures, in particular, mutable cells [1]. Let’sdeﬁne (8) let rset : a list ref → a → a list = fun r v → let vs ’ = v :: ! r in r : = vs’;vs ’ that prepends the value v to the list stored in the reference cell r , stores the new list in the cell and returnsit. Then (9) let x = ref [1] in ( rset x 2, rset x 3) ([2; 3; 1], [3; 1])( rset ( ref [1]) 2, rset ( ref [1]) 3) ([2; 1], [3; 1]) produce the different results as shown underneath the expressions. Since the distinction between copyingand sharing is generally visible, there is no longer freedom of choosing between the two. OCaml usessharing for let -expressions, performing inlining (copying) only when it can see the equivalence.The example paralleling (3) however does not type-check. (10) let x = ref [] in ( rset x 2, rset x ”3”) ( ∗ Does not type − check! ∗ ) As we have just seen, with reference cells, sharing and copying differ and the OCaml compiler has to usethe default sharing. Had the expression type-checked, at run-time rset x ”3” would modify the emptylist stored in x by prepending the string ”3” to it. The expression rset x 2 will then try to prepend theinteger to the contents of x , which by that time is the string list [”3”] . Clearly that is a program thathas “gone wrong”. We should well remember this example: we shall be seeing it, in different guises, allthroughout the paper.Although the RHS of the let-binding in (10) has the type a list ref with the type variable that couldbe generalized, it should not be, to prevent (10) from type-checking. Intuitively, sharing and copying ofa reference cell have different semantics, hence it should not get the polymorphic type schema.The danger of giving reference cells a polymorphic type has been recognized early on [26]. So has theproblem of how to restrict (GenLet) from applying to “dangerous” expressions. The most straightforwardsolution, used in the early ML and OCaml for a long time, was to limit reference cells to base types Another way to restore the equivalence is to regard x as bound to an expression that is evaluated only at the places of x ’soccurrence. That was the idea of Leroy’s call-by-name polymorphism [18]. legKiselyov 5only. The restriction made it impossible however to write any polymorphic function that internallyuses reference cells. A good overview of less draconian approaches is given in [8]. The most widelyimplemented, because of its balance of expressiveness with sheer simplicity, is the value restriction [27]:applying (GenLet) only to those let-expressions whose RHS is syntactically a value. Since ref [] in (10)is not a value, x is not generalized and its occurrences in differently typed contexts will raise the typeerror. On the other hand, [] in (3) is a value and x there does get the polymorphic type. Strictly speaking, x in (6) should not be generalized either. However, it is syntactically obvious that the printing effect has nocontribution to the result of the containing expression. The RHS of (6) is what is called ‘non-expansive’.OCaml generalizes non-expansive expressions, not just values.Although the value restriction on balance proved expressive enough for many ML programs, asOCaml gained features such as objects, polymorphic variants and a form of ﬁrst-class polymorphism(enough to support polymorphic recursion), the restrictiveness of the value restriction was felt more andmore acutely. Against this backdrop, Garrigue introduced the ‘relaxed value restriction’ [8], which webrieﬂy overview below as we will be relying on it.The relaxed value restriction explores the close analogy between type instantiation and subtyping. Itcan also be justiﬁed from the point of view of copying-sharing: a value occurring in differently-typedcontexts may be let-bound and shared if it can be given the ‘common type’, the supertype of the typesexpected by the contexts. The coercion to a subtype, like the type instantiation, is a compile-time–only operation, behaving as identity at run-time. Suppose a value has the type zero c where zero is theempty type, and it can be coerced by subtyping to the type t c for any t . We may as well then give thevariable that is let-bound to the value the type ∀ a . a c , which can then be instantiated to t c . Since zero is (vacuously) coercible to any type, a value of the type zero c can be coerced to t c only whenthe type zero occurs covariantly in zero c . Hence the relaxed value restriction: If the expression e in let x = e in e’ has a type with covariant type variables (which do not occur in the typing context), theyare generalized in the type inferred for x . (The actual implementation is somewhat more restrictive: see[8] for details.)For example, x below is generalized (11) let x = let r = ref [] in ! r in (2:: x,”3”::x) despite the fact the RHS is an expression – moreover, the expression whose result comes right from areference cell. Still, the result has the type a list whose type variable is covariant (with List.map beingthe witness of it). On the other hand, the type of reference cells a ref is non-variant and hence x in (10)remains ungeneralized. The relaxed value restriction applies not only to built-in data types but also touser-deﬁned and abstract ones: (12) type + a mylist = List of a list let mklist : a list → a mylist = fun x → List x let mycons : a → a mylist → a mylist = fun x → function List l → mklist (x :: l ) let x = mklist [] in (mycons 2 x, mycons ”3” x) Although the RHS of the let-binding is an expression, x is generalized because the type a mylist iscovariant in a . It is declared to be covariant, by the + a covariance annotation. The compiler will checkthat the RHS of the type declaration really uses the type variable a covariantly. The compiler can alsoinfer the variance, hence the annotations are normally omitted. They are necessary only for abstracttypes, whose declaration lacks the RHS. Generating Code withPolymorphic letOverall, the relaxed value restriction turned up even better balanced, accommodating not just poly-morphic functions but polymorphic data (including row-polymorphic data such as extensible recordsand variants), whose construction often involves computations. The relaxed value restriction was almostenough for implementing staging via code combinators – but not quite, as we see in § Staging is an approach to writing programs that generate programs; it may be regarded as a principledversion of Lisp quotation. For example, MetaOCaml lets us quote any OCaml expression, by enclosingit within . < and > . brackets: (1) let c = . < + > . val c : int code = . < + > . A quoted expression, representing the generated code, is the value of the type a code , and can be bound,passed around, printed – as well as saved to a ﬁle and afterwards compiled or evaluated. For that reason,the code value such as c is called a ‘future-stage’ expression (or, an expression at level 1), to contrast tothe code that is being evaluated now, at the present, or 0, stage. An expression that evaluates to a codevalue can be spliced-in (or, unquoted, in Lisp terminology) into a bigger quotation: (2) let cb = . < fun x → .˜c + x > . val cb : ( int → int ) code = . < fun x 1 → (1 + + x 1 > . The spliced-in expression is marked with .˜ , which is called an escape. The generated code can beexecuted by the function run (in the module Runcode ), reminiscent of Lisp’s eval : (3) open Runcode val run :: a code → a let cbr = run cb val cbr : int → int = < fun > cbr 2 − : int = Running cb hence compiled the cb code of a function into an int → int function that can be invoked at thepresent level. As one expects, running the code indeed invokes the compiler and the dynamic linker. The run operation hence lets us generate code at run-time and then use it – in other words, it offers run-timecode specialization.When generating functions it is natural to require that the behavior of the resulting program shouldnot depend on the choice of names for bound variables. For example, (4) let c1 = . < fun x → .˜( let body = . < x > . in . < fun x → .˜body > .) > . val c1 : ( a → b → a ) code = . < fun x 1 → fun x 2 → x 1 > . let c2 = . < fun y → .˜( let body = . < y > . in . < fun x → .˜body > .) > . val c2 : ( a → b → a ) code = . < fun y 3 → fun x 4 → y 3 > . The expressions c1 and c2 should build the code that behaves the same when evaluated. This is indeedthe case, as one can see from the generated code, printed underneath. If we write this example withquotations in Lisp, the expressions are no longer equivalent: whereas c2 generates the code for the Kcombinator, c1 builds a function that takes two arguments returning the second one. Lisp quotations arehence not hygienic.When generating code for a typed language, it is also natural to require that the produced code istype-correct by construction. For that reason, the code type is parametrized by the type of the generatedlegKiselyov 7expression, as we saw for c , cb , etc. The formal treatment of type soundness is well covered in [24, 4]and will be brieﬂy reminded of in § . < + > . and ponder the addition operation there. In the ordinary OCaml expression + ,the addition is the ordinary function, deﬁned in the Pervasives module. The addition in . < + > . refersto exactly the same function: MetaOCaml permits any value of the generator to appear in the generatedcode. This is called “cross-stage persistence” (CSP) (see [24] for more discussion). One may think ofCSP identiﬁers as references to ‘external libraries’.The trivial code for the addition of two numbers has already demonstrated how wide-spread CSP is.Let us show a more explicit example of CSP, brought about by the function (5) let lift : a → a code = fun x → . < x > . The following example then produces the code as shown (compare with (2)): (6) . < fun x → .˜( lift (1 + + x > . − : ( int → int ) code = . < fun x 1 → ( ∗ CSP x ∗ ) Obj.magic 3 + x 1 > . In contrast to (2), here the addition of (1 + is done at the code generation time; the generated codeincludes the computed value. CSP hence lets us do some of the future-stage computations at the presentstage, and hence generate more efﬁcient code. The bizarre Obj.magic appearing in the generated code isthe artifact of printing. The following code (7) . < fun x → .˜( let y = + in . < y > .) + x > . − : ( int → int ) code = . < fun x 2 → + x 2 > . (where the CSP identiﬁer y is known to be of the int type) produces the more obvious result.Our refrain of copying vs. sharing repeats for CSP. When a value from the present stage is used at afuture stage, do the two stages share the value or does the future stage get a separate copy? Unfortunatelythis issue is not discussed in the literature let alone formally addressed – which is a pity since it isresponsible for the unexpected soundness problem to be described in the next section. The case of aglobal (library) identiﬁer seems clear: code such as . < succ 3 > . contains the identiﬁer succ that refers tothe same library function it does at the present level. Whether that function is shared or copied betweenthe present-stage and the generated code depends on the inlining strategy of the compiler and the staticvs. dynamic linking. One could expect sharing/copying to be equivalent in this case.The cross-stage persistence of a locally-created value is much less clear : (8) let cs = let z = string of ﬂoat @@ Sys.time () in . < print endline z > . val cs : unit code = . < Pervasives. print endline ”0.051” > . One may imagine that the code value . < print endline z > . (represented, say, as an AST) contains thepointer to a string allocated in the running-program heap – the same pointer that is denoted by the localvariable z . Then run cs will print the value of that string on the heap. The present and the future stagehence share the string. Rather than running cs however, we may save it to a ﬁle, as library code foruse in other programs. In this case, when the generated code is evaluated the generator program is longgone, along with its heap. Therefore, when storing a code value to a ﬁle we must serialize all its CSPvalues, creating copies. In the upshot, cross-staged persistent library identiﬁers are always shared; otherCSP values are shared if the code value is run , and copied otherwise. The semantics of CSP is indeed the exception being the work [15] which was inspired by the problem we discuss in § The right-associative inﬁx operator @@ of low precedence is application: f @@ x + 1 is the same as f (x + 1) but avoidsthe parentheses. The operator is the analogue of $ in Haskell. Generating Code withPolymorphic let G ⊢ n e : t’ → t G ⊢ n e’ : t’ G ⊢ n e e’ : t G , x n : t’ ⊢ n e : t G ⊢ n fun x → e : t’ → t G ⊢ n v : t G , x n : GEN ( G , t ) ⊢ n e’ : t’ GenLet G ⊢ n let x = v in e’ : t’ G ⊢ n + e : t Bracket G ⊢ n < e > : t code G ⊢ n e : t code Escape G ⊢ n + ˜ e : t G ⊢ n x : t CSP G ⊢ n + x : t Figure 1: Type system of a staged languageintricate. We have just described the CSP implementation in the extant MetaOCaml; there is an ongoingdiscussion of it and its possible improvements .The question of sharing vs. copying CSP becomes non-trivial when the CSP value is mutable: (9) let r = ref inlet cr = . < incr r > . in run cr ; run cr ; ! r − : int = Mutable CSP values naturally arise when run-time specializing imperative code. They can be used forcross-stage communication, e.g., counting how many times the code is run, as shown in (9) – whichworks as intended only with the shared CSP. Sharing of mutable CSP values is also responsible for theunexpected problem with let-polymorphism, detailed next.

For a long time let-polymorphism and staging were considered orthogonal features. It is not until 2009that their surprising interaction was discovered ; it has not been formally published. Before describingthis interaction, we ﬁrst brieﬂy remind the type system of a staged language, on a representative subsetof MetaOCaml.Staging adds to the base language the expression forms for brackets < e > and escapes ˜ e and thetype of code values t code . We use the meta-variable x for variables, e for expressions, v for values,and t for types. The type system is essentially the standard, Figure 1. It is derived from the type systemof [24] by replacing the sequence of no-longer used classiﬁers with the single number, the stage level.(Since brackets may nest, there may be an arbitrary number of future stages.) The judgments have theform G ⊢ n e : t : they are now indexed by the level of the expression; the level is incremented whentype-checking the expression within brackets and decremented for escapes. The identiﬁers within thetyping environment G are now indexed by the level at which they are bound. The (GenLet) rule reﬂectsthe value restriction.Staging thus contributes the three rules (Bracket), (Escape) and (CSP) and the indexing of the envi-ronment and the judgments by the stage level. If the program has no brackets, the stage level stays at 0and the type system degenerates to the one for the (subset of the) ordinary OCaml. Moreover, except forthe three staging-speciﬁc rules, the rest are the ordinary OCaml typing rules, uniformly indexed by thestage level. Thus, aside from brackets, escapes and CSP, the type-checking of the staged code proceeds http://okmij.org/ftp/ML/MetaOCaml.html http://okmij.org/ftp/meta-programming/calculi.html legKiselyov 9identically to that for the ordinary code. In particular, let-expressions within brackets are handled andgeneralized the same way they do outside brackets. For example: (1) . < let x = [] in (2:: x,”3”::x) > .. < let f = fun x → x in (f 2, f ”3”) > .(2) . < let x = ref [] in ( rset x 2, rset x ”3”) > . ( ∗ Does not type − check! ∗ ) It appears hence that let-generalization and staging are orthogonal features.Consider however the following code (3) . < let f = fun () → ref [] in ( rset (f ()) 2, rset (f ()) ”3”) > . The type-checker accepts it and infers the type (int list ∗ string list) code . The variable f hence gets thepolymorphic type. After all, the RHS of the let-binding is syntactically the (functional) value. Thereis really nothing wrong with (3): f can be either copied or shared across its uses without the change insemantics: the invocation f () in either case will produce a fresh reference cell holding an empty list,later modiﬁed by prepending either or ”3” to its contents.Now consider the simple modiﬁcation, along the lines of (6) in § (4) let cbad = . < let f = fun () → .˜( lift ( ref [])) in ( rset (f ()) 2, rset (f ()) ”3”) > .run cbad Segmentation fault

It is also accepted, with the same inferred type – for any version of MetaOCaml including the currentone. The RHS of the let-binding is still syntactically a function; we merely modiﬁed its body. Runningthat code however ends in the segmentation fault. One should not be surprised: we have managed togenerate and type-check (10) from § t ref is not syntactically a value.Cross-stage persistence, however, lets one stage share its values with a future one. Suddenly there areliterals of the reference types: these are values imported from the generator into the generated code.The problem has been overlooked for more than a decade because none of the formalizations ofstaging have been complete enough, and hence do not handle let-polymorphism along with referencecells and shared CSP. There is currently no ﬁx for the unsound let-generalization problem. One solutionis proposed in [15] but it is restrictive. Another possible solution is to force the CSP locally-createdvalues to follow the copying semantics. One may also prohibit generalization if the RHS of a future-stagelet-binding contains an escape, thus introducing the explicit correlation of staging and let-polymorphism.Along with bad programs, all these proposals outlaw good ones. Investigating these trade-offs and ﬁndingbetter ones is the subject of future work. The present paper does not solve the unsound staged let-generalization problem either. However, we build a simpler framework to deal with it, reducing theproblem to non-staged generalization.0 Generating Code withPolymorphic let The stymieing problems encountered in the previous section come at the conﬂuence of staging, let-polymorphism and effects. It is only natural to wish to investigate them in a simpler setting; for example,to ﬁnd a way to translate a staged calculus into the ordinary one. There have been indeed proposedseveral ‘unstaging translations’ [28, 9, 7], with similar motivations.Translating the staging away is also practically signiﬁcant, as the method for implementing stagedlanguages. The most attractive is a source-to-source translation: it lets us implement MetaOCaml just asa pre-processor to OCaml, fully reusing the existing OCaml compiler without modifying it (and havingto bear the burden of maintaining the modiﬁcations, in sync with the mainline OCaml). This practicalapplication is the main reason to be interested in unstaging translations.Unfortunately, none of the existing unstaging translations deal with polymorphic let-expressions.Furthermore, an attempt to add them, described in § § § The simplest approach for adding quotation to an existing language is to write a pre-processor thattranslates quoted expressions into ordinary ones, which use pre-deﬁned functions that build and combinecode values, so-called code combinators [25, 28, 22]. Code combinators may of course be used for codegeneration directly, rather than through quotation, as has been well demonstrated in Scala [22]. Thatsaid, we will explain code combinators in the context of an unstaging translation, from the language withquotations to the language without them – motivated by the practical beneﬁts of such translation.Our source language, Figure 2, is a simple subset of MetaOCaml (for now, without let-expressions).From now on, we restrict staging to two-levels only – in other words, considering brackets withoutnesting – as this turns out the overwhelmingly common use of staged languages. The constants of thelanguage are integer i and string s literals and the empty list. Besides abstraction and application thelanguage includes pairs, consing to a list and the creation and dereference of reference cells. We take themutation function rset: a list ref → a → a list deﬁned in § + are worked out into the syntax. On the other hand, cross-stage persistence ofother identiﬁers must be explicitly marked with the % syntax. (The marking is inferred in MetaOCaml.)Constants c ::= i | s | [] Variables x,y,z,f

Expressions e ::= x | c | e e | fun x → e | e + e | (e,e) | e :: e | ref e | !r | rset r Staged expressions e +::= . < e > . | .˜e | %x Figure 2: Source and target languages for the unstaging translationThe target language of the translation is OCaml, without ‘Meta’, i.e., without the staged expressions.On the other hand, it has additional constants for code generation, deﬁned by the following signature (1) module type

Code = sigtype + a cod legKiselyov 11Translation at the present-stage ⌊ e ⌋⌊ x ⌋ 7→ x ⌊ c ⌋ 7→ c ⌊ e1 e2 ⌋ 7→ ⌊ e1 ⌋ ⌊ e2 ⌋ ⌊ fun x → e ⌋ 7→ fun x → ⌊ e ⌋ . . . ⌊ . < e > . ⌋ 7→ ⌈ e ⌉ Translation at the future-stage ⌈ e ⌉⌈ x ⌉ 7→ x ⌈ i ⌉ 7→ int i ⌈ s ⌉ 7→ str i ⌈ [] ⌉ 7→ nil ⌈ e1 + e2 ⌉ 7→ add ⌈ e1 ⌉ ⌈ e2 ⌉⌈ (e1,e2) ⌉ 7→ pair ⌈ e1 ⌉ ⌈ e2 ⌉⌈ e1 :: e2 ⌉ 7→ cons ⌈ e1 ⌉ ⌈ e2 ⌉ ⌈ ref e ⌉ 7→ ref ⌈ e ⌉⌈ !e ⌉ 7→ rget ⌈ e ⌉⌈ rset e ⌉ 7→ rset ⌈ e ⌉⌈ e1 e2 ⌉ 7→ app ⌈ e1 ⌉ ⌈ e2 ⌉⌈ fun x → e ⌉ 7→ lam ( fun x → ⌈ e ⌉ ) ⌈ .˜e ⌉ 7→ ⌊ e ⌋⌈ %x ⌉ 7→ csp x Figure 3: Unstaging translation val int : int → int cod val str : string → string cod val add: int cod → int cod → int cod val lam: ( a cod → b cod) → ( a → b ) cod val app: ( a → b ) cod → ( a cod → b cod) val pair : a cod → b cod → ( a ∗ b ) cod val nil : a list cod val cons: a cod → a list cod → a list cod val ref : a cod → a ref cod val rget : a ref cod → a cod val rset : a list ref cod → a cod → a list cod val csp: a → a cod ( ∗ CSP local values ∗ )end The signature speciﬁes the collection of typed combinators to generate code for our subset of OCaml: int 1 builds the literal code, add combines two pieces of code into the addition expression, etc. Thecombinator lam builds the code of a function; its argument is an OCaml function that returns the codefor the body upon receiving the code for the bound variable. A MetaOCaml expression like (2) fun x → . < fun y → (y +

1) :: .˜x > . then corresponds to the plain OCaml expression with the code combinators: (3) fun x → lam ( fun y → cons (add y ( int 1)) x) Formally the unstaging translation is speciﬁed in Figure 3, with two sets of mutually recursive rules: ⌊ e ⌋ deals with the present-stage expressions of the source language and ⌈ e ⌉ handles expressionswithin brackets. The former is essentially identity, with the single non-trivial rule for brackets. Thetranslation seems straightforward, which is a great surprise since the related unstaging translations [6, §

3] and [9, 7] are all excruciatingly more complex and type-directed. The shown translation is novel,which will become apparent as we discuss the implementation of the

Code signature later.Our translation is clearly syntax-directed but not type-directed. Hence it is a source-to-source trans-lation, which can be done by a macro-processor such as camlp4 or a stand-alone pre-processor. The rest2 Generating Code withPolymorphic let ⌈ t code ⌉ 7→ ⌈ t ⌉ cod identity otherwise ⌈ x : t ⌉ 7→ x : t ⌈ x : t ⌉ 7→ x : t cod ⌈ G ⊢ e : t ⌉ 7→ ⌈ G ⌉ ⊢ ⌊ e ⌋ : ⌈ t ⌉⌈ G ⊢ e : t ⌉ 7→ ⌈ G ⌉ ⊢ ⌈ e ⌉ : ⌈ t ⌉ cod Figure 4: Translation for types, typing environments and judgmentsof the language system (type-checking, code-generation, standard and user-deﬁned libraries) is used asit is.The second property of the translation is that bindings within brackets are translated to ordinarylambda-bindings. Coupled with the appropriate implementation of the lam combinator, this propertymakes it easy to ensure hygiene. Correspondingly, variables bound within brackets are translated to theordinary, present-stage variables – with the change in type from t to t cod . One can see that change fromthe type of lam , and more clearly from Figure 4, which extends the translation to the typing judgmentsand environments described Figure 1. The translation is typing-preserving: Proposition 1 If G ⊢ n e : t holds then ⌈ G ⊢ n e : t ⌉ holds as well In other words, a well-typed two-stage MetaOCaml expression is translated into a well-typed OCamlexpression. The proposition is easily proven by induction on the typing derivation. If we also ensurethat individual code combinators produce well-typed code (see below), any typing errors in the quotedcode manifest themselves as OCaml type errors emitted when type-checking the translated expression.Absent such errors, the quoted expression, and hence the generated code, are type-correct.The following ﬁgure shows two implementations of the

Code signature.

CodeString combinatorsgenerate ML code as text strings, justifying their name ‘code-generating combinators’. (4) module

CodeString = structtype a cod = string let int = string of int let str x = ” \ ”” ˆ String.escaped x ˆ ” \ ”” let add x y = paren @@ x ˆ ” + ” ˆ y let lam body = let var = gensym ”x” in ”fun ” ˆ var ˆ ” → ” ˆ body var let app f x = paren @@ f ˆ ” ” ˆ x. . . let csp x = . . . marshaling / unmarshaling . . . end CodeReal is a meta-circular interpreter, representing a code value as an OCaml thunk (which is also avalue). (5) module

CodeReal = structtype a cod = unit → a open DynBindRef let int x = fun () → x let str x = fun () → x let add x y = fun () → x () + y () let lam body = let r = dnew () inlet b = body ( fun () → dref r) infun () → let denv = denv get () infun x → dlet denv r x b let app f x = fun () → f () (x ()) legKiselyov 13 let nil = fun () → []. . . let csp x = fun () → x end The code is utterly trivial, with the exception of lam , which does what a closure has to do: capture theenvironment at the point of its creation. We rely on the simple interface for dynamic binding: (6) module type

DynBind = sigtype a dref type denv val dnew: unit → a dref val dref : a dref → a val dlet : denv → a dref → a → (unit → w ) → w val denv get: unit → denv end where dnew creates a new unbound variable, dref dereferences it, denv get captures the current envi-ronment and dlet denv r x body sets the current environment to denv , binds r to x in it and evaluatesthe body , whose result is returned after the original environment is restored. The implementation, us-ing either reference cells or delimited control is straightforward; see the accompanying source code fordetails. The source code contains more examples of the staged translation, including the obligatoryfactorial: although our Code interface offers neither conditional branching nor recursive bindings (normultiplication, for that matter), they are all obtainable via CSP.

Proposition 2 If e : t code is a program in our subset of MetaOCaml, then ⌊ e ⌋ : unit → t is the plainOCaml program (assuming the Code interface is implemented by

CodeReal ) such that run e is observa-tionally equivalent to ⌊ e ⌋ () . Although the intuitions are clear, the rigorous proof of this proposition is a serious and interesting task.We leave the proof as a PhD topic. The proposition justiﬁes the name ‘unstaging translation’: translatingstaged OCaml code to plain OCaml. Our translation is remarkably simple because of the novel imple-mentation of lam in CodeReal . The earlier translations had to explicitly represent and translate the typingand the value environments of an expression.

DynBind lets us piggy-back on the typing environment ofOCaml.One can also intuitively see that

CodeString and

CodeReal correspond: the behavior of the codeproduced by

CodeString is the same as the behavior of running the thunk of

CodeReal (modulo thedifference in the copying/sharing semantics of CSP). The OCaml type-checker ensures that any thunkbuilt by

CodeReal combinators is well-typed; therefore, it “will not go wrong” thanks to the soundnessof OCaml. Hence the code generated by

CodeString will also be well-typed and will not go wrongeither. The existence of the

CodeReal implementation is thus crucial to assuring the soundness of codegeneration. Yet another proof of soundness is obtained through another implementation of

Code , backinto MetaOCaml: (7) module

CodeCode = structtype a cod = a code let int (x: int ) = . < x > . let str (x: string ) = . < x > . let add x y = . < .˜x + .˜y > . let lam body = . < fun x → .˜(body . < x > .) > . let app x y = . < .˜x .˜y > .. . . let csp x = . < x > . end Proposition 3 If e : t code is a program in our subset of MetaOCaml, then ⌊ e ⌋ : t code is the equiv-alent MetaOCaml program (assuming the Code interface is implemented by

CodeCode ): that is, e and ⌊ e ⌋ have the same side effects and either both diverge, or return identical (modulo a -conversion) codevalues. The proof is left as another PhD topic.Implementing staging by the translation into code combinators works surprisingly well: Scala’sLightweight Modular Staging (LMS) is based on similar ideas [22]. Scheme’s implementation of quasi-quote is also quite alike; only it pays no attention to quoted bindings and is hence non-hygienic. Thetranslation becomes more complex as we add to the target language more special forms such as loops,pattern matching, type annotations, etc. They pose problems, but they can and have been dealt with, e.g.,in [22]. What could not be dealt with is let-polymorphism.

The staging translation runs into the roadblock once we add polymorphic let-bindings, to handle expres-sions such as those shown in § (1) . < let x = [] in (2:: x,”3”::x) > .(2) . < let f = fun x → x in (f 2, f ”3”) > . It may seem we merely need to add to the

Code signature the combinator that combines app and lam : (3) val let : a cod → ( a cod → b cod) → b cod and the corresponding translation rule ⌈ let x = e1 in e2 ⌉ 7→ let ⌈ e1 ⌉ ( fun x → ⌈ e2 ⌉ ) analogous to lam . Then (1) is translated to (4) let nil ( fun x → pair (cons ( int 2) x) (cons ( str ”3”) x)) which, unfortunately, does not type-check.Recall that our unstaging translation maps bindings in the quoted code to ordinary lambda-bindings.This exactly is the problem: unlike let-bindings, lambda-bindings in ML are not generalizable. First-class polymorphism, if available, does not help since it requires type annotations, which preclude thesource-to-source translation, done before type checking.Let-polymorphism hence is the show-stopper for the unstaging translation. However attractive, wecannot use the translation for implementing MetaOCaml (unless we give up on polymorphic let withinbrackets, which is unpalatable). Therefore, MetaOCaml currently takes the steep implementation route:modifying the OCaml front-end to account for brackets and escapes, and the painful patching of the type-checker to implement the staged type system of Figure 1. After the type-checking, the staging constructsare eliminated by a variant of the unstaging translation [14]. That translation manipulates OCaml’s Typedtree , which represents the AST after type-checking. Although the tree bears OCaml types, itis ‘untyped’: it is the ordinary data structure that does not enforce any typing or scoping invariants.Manipulating the tree is error-prone, with no (mechanically checked) assurances of correctness. let -expressions

We now present the new translation for quoted let-expressions, which works even with polymorphic let-bindings. We attempt at the ‘rational derivation’ of the translation, with our constant refrain of copyingvs. sharing.legKiselyov 15The previous § (1) . < let x = in (2:: x ,3:: x) > (the quoted version of the ﬁrst example of § (2) let (cons ( int 1) nil ) @@ fun x → pair (cons ( int 2) x) (cons ( int 3) x) This example does not have let-polymorphism. But if it did, we are in trouble: the x let-binding of(1) is converted to the x lambda-binding of (2). In the Hindley-Milner type-system lambda-bindings,unlike let-bindings, are not generalizable. We see the dead end, regardless of how the let combinator isimplemented.To have any hope of generalization, we need a translation that could map a let-binding in the quotedcode to a let-binding. The putative translation should convert (1) into something like (3) comb1 ( let x = comb2 (cons (int 1) nil) in comb3 (pair (cons ( int 2) x) (cons ( int 3) x))) where comb1 , comb2 , and comb3 are yet to be determined combinators. This proposal seems to be themost general compositional, syntax-directed translation that has the desired let-binding. It ﬁts within theunstaging translation of § x in (1) of the type int list is mappedin (3) to the present-stage variable of the expected (see Figure 4) type int list cod . After all, this the onlytype that makes, say, cons (int 2) x well-typed.All is left is to appropriately implement comb1 , comb2 and comb3 , for all realizations of the Code interface. Proposition 3 imposes a constraint: Evaluating (3) with the

CodeCode implementation shouldgive back (1). And here we notice something odd. The expression (cons (int 1) nil) evaluates to . < [1] > . ,according to the existing code-combinators of CodeCode . The result of comb2 (cons (int 1) nil) shouldhence be or contain that singleton list; let us write it as . < . . . [1]. . . > . . The let-expression in (3) thenproduces comb3 . < (2::(. . . [1]. . . )),(3::(. . . [1]. . . )) > . . Code-generating combinators may only com-bine pieces of code received as arguments but can never deconstruct or examine them. Therefore, it doesnot seem possible that our result can lead to (1), regardless of what comb1 or comb3 might do. We havealready inlined . < [1] > . , which we should have let-bound and shared instead.The only way forward is to have comb1 . < [1] > . to somehow generate something like . < let y = [1] in body > . and return the let-bound variable as a code value, that is . < y > . . That does not seem possibleeither. To build code for a let-expression we need the code for the RHS of the binding, and the code forthe body. The combinator comb1 does get the RHS code as the argument; but where is the body?Fortunately we are stuck at the opportune place: the problem we are facing is real – but it has beensolved long time ago in the partial-evaluation community. The solution is called ‘let-insertion’ [2, 17]and requires access to continuations. The delimcc library of OCaml [12] has exactly the control operatorsneeded to implement the let-insertion interface : (4) type a scope val new scope: ( w scope → w cod) → w cod val genlet : w scope → a cod → a cod These combinators can be used as follows: (5) new scope @@ fun p → lam ( fun x → add x ( genlet p (add ( int 1) ( int 2)))) This let-insertion interface is introduced here for the sake of translating quoted expressions and hence the pattern of use for genlet and new scope is determined by the translation.

CodeCode implementation below it generates . < let y = + in fun x → x + y > . , which shares the result of the subexpression + across all invocations of the function. In other words, genlet p e inserts, at the place marked by the corresponding new scope , a let statement that binds e to a fresh variable, and returns the code with the name of that variable. We can ﬁnally complete thetentative translation (3): (6) new scope @@ fun p → let x = genlet p (cons ( int 1) nil ) in pair (cons ( int 2) x) (cons ( int 3) x) With the

CodeCode implementation of the combinators that expression indeed evaluates to (1).Formally, the new translation of let-expressions takes the form (7) ⌈ let x = e1 in e2 ⌉ 7→ new scope ( fun p → let x = genlet p ⌈ e1 ⌉ in ⌈ e2 ⌉ ) Our running example with let-polymorphism, example (2) from § (8) . < let x = [] in (2:: x,”3”::x) > . is hence translated to (9) new scope @@ fun p → let x = genlet p nil in pair (cons ( int 2) x)(cons ( str ”3”) x) which type-checks, and (with the CodeCode combinators) gives back (8). Incidentally, the combinatorcode without genlet (10) new scope @@ fun p → let x = nil in pair (cons ( int 2) x)(cons ( str ”3”) x) also type-checks. However, it generates (11) . < (2::[], ”3” ::[]) > . where [] is inlined rather than shared. The genlet combinator hence implements the sharing in the gen-erated code rather than in the generator. The fact that the let-variable x in (9) gets the polymorphic typeis the indication, and the vindication, of the equivalence of copying and sharing in this case. Althoughthe RHS of the let-binding in (9) is an expression – moreover, an effectful expression, as we are aboutto see – the generalization happens anyway, thanks to the relaxed value restriction, recalled in § a list cod occurs in the covariant position: note the covariance annotation + a cod in the Code signature.The code that should not type check in MetaOCaml (12) . < let x = ref [] in ( rset x 2, rset x ”3”) > . ( ∗ Does not type − check! ∗ ) is translated to (13) new scope @@ fun p → let x = genlet p ( ref nil ) in pair ( rset x ( int 2))( rset x ( str ”3”)) ( ∗ Does not type − check! ∗ ) and is rejected by OCaml as expected: the type variable a in the inferred type a list ref cod for x isnon-variant and is not generalized; x does not get the polymorphic type and hence cannot be used in thedifferently typed contexts.legKiselyov 17We implemented genlet , directly based on [17], for all three realizations of the Code signature: notjust for

CodeString but also for

CodeReal and

CodeCode , to demonstrate soundness: (14) module

CodeLetReal = structinclude CodeReal open

Delimcc type a scope = a cod prompt let new scope body = let p = new prompt () in push prompt p ( fun () → body p) let genlet p e = shift0 p ( fun k → let t = e () in k ( fun () → t)) end (15) module CodeLetString = structinclude CodeString open

Delimcc type a scope = a cod prompt let new scope body = . . . the same let genlet p e = let tvar = gensym ”t” in shift0 p ( fun k → ” let ” ˆ tvar ˆ ” = ” ˆ e ˆ ” in ” ˆ k tvar) end (16) module CodeLetCode = structinclude CodeCode open

Delimcc type a scope = a code prompt let new scope body = . . . the same . . . let genlet p e = shift0 p ( fun k → . < let t = .˜e in .˜(k . < t > .) > .) end (The code of new scope is identical in all three implementations, although the realizations of the abstracttype a scope differ.) The type a prompt and the delimited control operators push prompt and shift0 areprovided by the delimcc library [12].The genlet is so powerful that it easily moves bound variables (17) new scope @@ fun p → lam ( fun x → add x ( genlet p (add x ( int 2)))) resulting in the generated code let y = x + in fun x → x + y with the unbound variable x . One mayprevent such undesirable behavior either with a complex type system (whose glimpse can be caught in[11]) or with a dynamic test, as implemented in MetaOCaml [14]. In our case, however, genlet appearsin the code solely as the result of the translation of a quoted expression. Fortunately, our translation oflet-expressions puts new scope “right above” genlet , never letting them be separated by a lam binding.In this case, delimited control, which underlies genlet , is safe (for proofs, see [10]). Alas, our new translation stumbles for the common case, of polymorphic function bindings such as thefollowing: (1) . < let f = fun x → x in (f 2, f ”3”) > . The translation (2) new scope @@ fun p → let f = genlet p (lam ( fun x → x)) in pair (app f ( int 1)) (app f ( str ”3”)) ( ∗ Does not type − check! ∗ ) is rejected by OCaml: the genlet expression has the type ( a → a ) cod , which is not covariant in a .Generalizing expressions of such types is unsound : otherwise, we will have to accept the following However, if the target language of code generation has no ‘dangerous’ effects and does not need value restriction, we mayas well allow generalizing expressions of the type t cod regardless of the variance of type variables in t . § (3) . < let f = let r = ref [] infun x → rset r x in (f 1, f ”3”) > . ( ∗ Does not type − check! ∗ ) whose translation (4) new scope @@ fun p1 → let f = genlet p1(new scope @@ fun p2 → let r = genlet p2 (ref nil ) in lam ( fun x → rset r x)) in pair (app f ( int 1)) (app f ( str ”3”)) ( ∗ Does not type − check! ∗ ) would have type-checked had we allowed generalization for the genlet p1 expression.The problematic staged code (3) does not type-check according to the system of Figure 1 (and inMetaOCaml): the (GenLet) rule does not apply because the RHS of the let-binding in (3) is not syntac-tically a value. Hence we need something like the value restriction to likewise prevent generalization in(4) while still allowing it in (2).Therefore, we amend the translation of let-expressions, (7) in §

4, with the following (5) ⌈ let x = fun z → e1 in e2 ⌉ 7→ new scope ( fun p → let x = genletfun p ( fun z → ⌈ e1 ⌉ ) in ⌈ e2 ⌉ ) where (6) val genletfun : w scope → ( a cod → b cod) → ( a → b ) cod ( ∗ provisional ! ∗ ) is a new code-combinator to be added to the let-insertion interface. In other words, our translationshould recognize when a let -bound expression is syntactically a function, and use genletfun rather thanthe general genlet combinator.With the amended translation, the good example (1) is translated as (7) new scope @@ fun p → let f = genletfun p ( fun x → x) in pair (app f ( int 1))(app f ( str ”3”)) ( ∗ See the reﬁned version below! ∗ ) and will type-check. The translation (4) of the bad example (3) will have to use genlet rather than genletfun since the RHS of the let-expression in (3) is not syntactically a function. As we said, (4) doesnot actually type-check.We have thus separated the let-insertion combinators into the general genlet and the speciﬁc genletfun ,which applies only to the translation of what looks like a function. (We need similar genletX for otherpolymorphic values of non-covariant types, which are rare.) For genlet , generalization occurs only forcovariant type variables; for genletfun , the generalization should occur always.There remains a question how to make the generalization to always occur for genletfun expressionslike those in (7), short of modifying the OCaml compiler. Incidentally, even Obj.magic does not seem tohelp us with expressions that the relaxed value restriction cannot generalize: an application of

Obj.magic is not syntactically a value. The answer is admittedly a hack; nevertheless, it gives us another standpoint,however awkward, to hear the refrain of copying and sharing. And it also works with the extant OCamlcompiler.Let us step back to look at the clearly ﬂawed translation of (1)legKiselyov 19 (8) let f = fun () → (lam ( fun x → x)) in pair (app (f ()) ( int 1))(app (f ()) ( str ”3”)) and contemplate what is wrong with it. On the upside, the translated expression (8) does type-check: f is bound to a thunk (syntactically a value) and its type is hence generalized through the ordinaryvalue restriction. Since f is bound to a thunk we have to add explicit () applications at each place it isused. Evaluating (8) with the CodeString implementation of code combinators shows the generated code (( fun x2 → x2 1), ( fun x1 → x1 ”3”)) , with the inlined rather than shared identity function. We hadrather the identity function be let-bound and shared. Having learned that genlet introduces let-bindingsinto the generated code, the next attempt at the translation of (1) is (9) new scope @@ fun p → let f = fun () → genlet p (lam ( fun x → x)) in pair (app (f ()) ( int 1))(app (f ()) ( str ”3”)) It also type-checks, since f is still bound to a thunk. The generated code (10) let t2 = fun x1 → x1 inlet t4 = fun x3 → x3 in ((t4 1), (t2 ”3”)) is still unsatisfactory: we had rather the two applications in the pair used the same binding of the identityfunction. When f () in (9) is ﬁrst evaluated, it generates a let-binding and returns the code with the boundvariable. We want the second invocation of f () to return the code for the very same bound variable. Inother words, we would like to memoize f . Memoization [19] indeed was meant to make copying behavelike sharing.The trick hence is introducing a thunk into the let-binding in the translation to get around the gener-alization problem and introducing memoization to restore the sharing destroyed by thunking. In effect,we do ‘double memoization’: using genlet to ‘memoize’ the identity function in the generated code andmemoize the invocation of genlet at the present stage. Once this is understood, the rest is straightforward.To make the translation similar to (7), we combine genlet with the memoization into genletfun : (11) type w funscope val new funscope : ( w funscope → w cod) → w cod val genletfun : w funscope → ( a cod → b cod) → ( a → b ) cod The ﬁnal translation of (1) then reads: (12) new funscope @@ fun p → let f = fun () → genletfun p ( fun x → x) in pair (app (f ()) ( int 1))(app (f ()) ( str ”3”)) Unlike (7), we had to replace the occurrence of f with f () – explicitly marking the type instantiation, soto speak. This complication is still possible to implement with the source-to-source translation (call-by-name let-binding of [18] would be really handy here).The double-memoizing genletfun can be easily and generically implemented, with a small bit ofmagic (13) type afun = | AFun : ( a → b ) cod → afun | ANone : afun type w funscope = w scope ∗ afun reflet new funscope body = new scope ( fun p → body (p, ref ANone)) let genletfun : w funscope → ( a cod → b cod) → ( a → b ) cod = fun (p, r) body → match !r with | ANone → let fn = lam body inlet x = genlet p fn in r : = AFun x; x | AFun x → Obj.magic x

The code uncannily resembles (4) of § genletfun to the OCaml type-checker as an ad hoc,always-generalize case. The answer at present should be “no”: genletfun is still unsound, in the edgecase of (4) of § (14) . < let f = fun () → .˜( lift ( ref [])) in ( rset (f ()) 2, rset (f ()) ”3”) > . Its translation (15) new funscope @@ fun p → let f = fun () → genletfun p ( fun → csp ( ref [])) in pair ( rset (app (f ()) (csp ())) ( int 1))( rset (app (f ()) (csp ())) ( str ”3”)) type-checks – and when run with CodeReal exhibits the same segmentation fault it does in the case ofthe corresponding MetaOCaml code.It seems our unstaging translation is just as sound – or unsound – as MetaOCaml. Solving thesoundness problem of MetaOCaml described in § We have presented a new, typing-preserving translation from a higher-order typed staged language, withhygienic quotations and unquotations, to the language without quotations. Code-generation is accom-plished through a library of code-generation combinators. Our translation is remarkably simpler thanother unstaging translations: it is not type-directed and can be accomplished as a source-to-source trans-formation. Mainly, the translation works for polymorphic let: let-expressions within quotes are trans-formed to also let-expressions, hence preserving generalization. All throughout the presentation weemphasized deep connections, between polymorphism and sharing.Our translation is already a viable method of implementing staged languages. Yet the theoreticalwork has just began. Yet another feature of our translation is ‘bug-preservation’: the restrictions andunsound edge cases of let-polymorphic expressions are preserved in the translation. The problems hencecan be investigated in a simpler setting, without staging.We thus propose a research program:1. Formally establishing the equivalence properties of

CodeReal , CodeCode and

CodeString andformally justifying the translation;2. Generalizing from two-stage to multiple-stages, that is, to multiple levels of quotations;3. Proving that the edge case described in § genletfun is unsound;4. Relaxing the value restriction even more so that genletfun could be implemented without magic;5. Investigating trade-offs of various solutions to the unsoundness problem in § Acknowledgments

I thank Jacques Garrigue and Atsushi Igarashi for many helpful discussions. Comments and suggestionsby Yukiyoshi Kameyama and anonymous reviewers are gratefully appreciated. This work was partiallysupported by JSPS KAKENHI Grant Numbers 22300005, 25540001, 15H02681.

References [1] Henry G. Baker (1993):

Equal Rights for Functional Objects or, The More Things Change, The More TheyAre the Same . ACMSIGPLANOOPSMessenger 4(4), pp. 2–27, doi: . Availableat http://home.pipeline.com/~hbaker1/ObjectIdentity.html .[2] Anders Bondorf (1992):

Improving Binding Times Without Explicit CPS-Conversion . In: Lisp& FunctionalProgramming, pp. 1–10, doi: .[3] Matt Brown & Jens Palsberg (2016):

Breaking through the normalization barrier: a self-interpreter for f-omega . In Rastislav Bodik & Rupak Majumdar, editors: POPL’16: ConferenceRecordoftheAnnualACMSymposium on Principles of Programming Languages, ACM Press, New York, pp. 5–17, doi: .[4] Cristiano Calcagno, Eugenio Moggi & Walid Taha (2004):

ML-Like Inference for Classiﬁers . In: ESOP,LNCS 2986, pp. 79–93, doi: .[5] Luca Cardelli (1987):

Basic Polymorphic Typechecking . Scienceof ComputerProgramming 8(2), pp. 147–172, doi: .[6] Chiyan Chen & Hongwei Xi (2005):

Meta-Programming Through Typeful Code Representation . JournalofFunctionalProgramming15(6), pp. 797–835, doi: .[7] Wontae Choi, Baris Aktemur, Kwangkeun Yi & Makoto Tatsuta (2011):

Static Analysis of Multi-stagedPrograms via Unstaging Translation . In Thomas Ball & Mooly Sagiv, editors: POPL ’11: ConferenceRecordoftheAnnualACMSymposiumonPrinciplesofProgrammingLanguages, ACM Press, New York,pp. 81–92, doi: .[8] Jacques Garrigue (2004):

Relaxing the Value Restriction . In: FLOPS, LNCS 2998, Springer-Verlag, Berlin,pp. 196–213, doi: .[9] Yukiyoshi Kameyama, Oleg Kiselyov & Chung-chieh Shan (2008):

Closing the Stage: From Staged Code toTyped Closures . In: PEPM, pp. 147–157, doi: .[10] Yukiyoshi Kameyama, Oleg Kiselyov & Chung-chieh Shan (2011):

Shifting the Stage: Staging with Delim-ited Control . JournalofFunctionalProgramming21(6), pp. 617–662, doi: .[11] Yukiyoshi Kameyama, Oleg Kiselyov & Chung-chieh Shan (2015):

Combinators for impure yet hygieniccode generation . Science of Computer Programming 112 (part 2), pp. 120–144, doi: .[12] Oleg Kiselyov (2012):

Delimited control in OCaml, abstractly and concretely . Theor. Comp. Sci. 435, pp.56–76, doi: .[13] Oleg Kiselyov (2013):

MetaOCaml Lives On: Lessons from implementing a staged dialect of a functionallanguage . ACM SIGPLAN Workshop on ML.[14] Oleg Kiselyov (2014):

The Design and Implementation of BER MetaOCaml - System Description . In:FLOPS, LNCS 8475, Springer, pp. 86–102, doi: .[15] Megumi Kobayashi & Atsushi Igarashi (2015):

Polymorphic type system for the multi-stage calculus withreferences . 32. Meeting of Japan Society for Software Science and Technology (in Japanese).[16] Peter J. Landin (1966):

The Next 700 Programming Languages . Communications of the ACM 9(3), pp.157–166, doi: . [17] Julia L. Lawall & Olivier Danvy (1994): Continuation-Based Partial Evaluation . In: Lisp & FunctionalProgramming, pp. 227–238, doi: .[18] Xavier Leroy (1993):

Polymorphism by name for references and continuations . In: POPL ’93: ConferenceRecordoftheAnnualACMSymposiumonPrinciplesofProgrammingLanguages, ACM Press, New York,pp. 220–231, doi: .[19] Donald Michie (1968): “Memo” Functions and Machine Learning . Nature 218, pp. 19–22, doi: .[20] Robin Milner (1978):

A Theory of Type Polymorphism in Programming . Journal of Computer and SystemSciences 17, pp. 348–375. Available at , doi: .[21] (2003):

POPL ’03: Conference Record of the Annual ACM Symposium on Principles of Programming Lan-guages .[22] Tiark Rompf, Nada Amin, Adriaan Moors, Philipp Haller & Martin Odersky (2013):

Scala-Virtualized:linguistic reuse for deep embeddings . Higher-OrderandSymbolicComputation(September), doi: .[23] Tim Sheard & Simon L. Peyton Jones (2002):

Template Meta-programming for Haskell . In Manuel M. T.Chakravarty, editor: HaskellWorkshop, pp. 1–16, doi: .[24] Walid Taha & Michael Florentin Nielsen (2003):

Environment Classiﬁers . In POPL [21], pp. 26–37, doi: .[25] Peter Thiemann (1999):

Combinators for Program Generation . JournalofFunctionalProgramming9(5), pp.483–525, doi: .[26] Mads Tofte (1987):

Operational Semantics and Polymorphic Type Inference . Ph.D. thesis, Edinburgh Uni-versity. Laboratory for Foundations of Computer Science Technical Report ECS-LFCS-88-54.[27] Andrew K. Wright (1995):

Simple Imperative Polymorphism . Lisp and Symbolic Computation 8(4), pp.343–355, doi: .[28] Hongwei Xi, Chiyan Chen & Gang Chen (2003):

Guarded Recursive Datatype Constructors . In POPL [21],pp. 224–235, doi:10.1145/640128.604150