[PDF] Beyond Notations: Hygienic Macro Expansion for Theorem Proving Languages

Abstract

In interactive theorem provers (ITPs), extensible syntax is not only crucial to lower the cognitive burden of manipulating complex mathematical objects, but plays a critical role in developing reusable abstractions in libraries. Most ITPs support such extensions in the form of restrictive "syntax sugar" substitutions and other ad hoc mechanisms, which are too rudimentary to support many desirable abstractions. As a result, libraries are littered with unnecessary redundancy. Tactic languages in these systems are plagued by a seemingly unrelated issue: accidental name capture, which often produces unexpected and counterintuitive behavior. We take ideas from the Scheme family of programming languages and solve these two problems simultaneously by proposing a novel hygienic macro system custom-built for ITPs. We further describe how our approach can be extended to cover type-directed macro expansion resulting in a single, uniform system offering multiple abstraction levels that range from supporting simplest syntax sugars to elaboration of formerly baked-in syntax. We have implemented our new macro system and integrated it into the new version of the Lean theorem prover, Lean 4. Despite its expressivity, the macro system is simple enough that it can easily be integrated into other systems.

Full PDF

BBeyond Notations: Hygienic Macro Expansion forTheorem Proving Languages

Sebastian Ullrich ( (cid:0) ) and Leonardo de Moura Karlsruhe Institute of Technology, Germany [email protected] Microsoft Research, USA [email protected]

Abstract.

In interactive theorem provers (ITPs), extensible syntax isnot only crucial to lower the cognitive burden of manipulating complexmathematical objects, but plays a critical role in developing reusableabstractions in libraries. Most ITPs support such extensions in the formof restrictive “syntax sugar” substitutions and other ad hoc mechanisms,which are too rudimentary to support many desirable abstractions. Asa result, libraries are littered with unnecessary redundancy. Tactic lan-guages in these systems are plagued by a seemingly unrelated issue:accidental name capture, which often produces unexpected and counterin-tuitive behavior. We take ideas from the Scheme family of programminglanguages and solve these two problems simultaneously by proposing anovel hygienic macro system custom-built for ITPs. We further describehow our approach can be extended to cover type-directed macro expan-sion resulting in a single, uniform system oﬀering multiple abstractionlevels that range from supporting simplest syntax sugars to elaboration offormerly baked-in syntax. We have implemented our new macro systemand integrated it into the upcoming version (v4) of the Lean theoremprover. Despite its expressivity, the macro system is simple enough thatit can easily be integrated into other systems.

Mixﬁx notation systems have become an established part of many modern ITPsfor attaching terse and familiar syntax to functions and predicates of arbitraryarity. _ (cid:96) _:_ = TypingNotation "Ctx (cid:96) E : T" := (Typing Ctx E T).notation typing ("_ (cid:96) _ : _")notation Γ ` (cid:96) ` e `:` τ := Typing Γ e τ AgdaCoqIsabelleLean 3

As a further extension, all shown systems also allow binding names insidemixﬁx notations. syntax ∃ A ( λ x → P) = ∃ [ x ∈ A ] PNotation " ∃ x , P" := (exists (fun x => P)).notation exists (binder " ∃ ")notation ` ∃ ` binder `,` r:(scoped P, Exists P) := r AgdaCoqIsabelleLean 3 a r X i v : . [ c s . P L ] A p r S. Ullrich, L. de Moura

While these extensions diﬀer in the exact syntax used, what is true aboutall of them is that at the time of the notation declaration, the system already,statically knows what parts of the term are bound by the newly introducedvariable. This is in stark contrast to macro systems in Lisp and related languageswhere the expansion of a macro (a syntactic substitution) can be speciﬁed notonly by a template expression with placeholders like above, but also by arbitrary syntax transformers , i.e. code evaluated at compile time that takes and returnsa syntax tree. As we move to more and more expressive notations and ideallyremove the boundary between built-in and user-deﬁned syntax, we argue that weshould no more be limited by the static nature of existing notation systems andshould instead introduce syntax transformers to the world of ITPs.However, as usual, with greater power comes greater responsibility. By usingarbitrary syntax transformers, we lose the ability to statically determine whatparts of the macro template can be bound by the macro input (and vice versa).Thus it is no longer straightforward to avoid hygiene issues (i.e. accidentalcapturing of identiﬁers; [11]) by automatically renaming identiﬁers. We proposeto learn from and adapt the macro hygiene systems implemented in the Schemefamily of languages for interactive theorem provers in order to obtain moregeneral but still well-behaved notation systems.After giving a practical overview of the new, macro-based notation system weimplemented in the upcoming version of Lean (Lean 4) in Section 2, we describethe issue of hygiene and our general hygiene algorithm, which should be just asapplicable to other ITPs, in Section 3. Section 4 gives a detailed description ofthe implementation of this algorithm in Lean 4. In Section 5, we extend the usecase of macros from mere syntax substitutions to type-aware elaboration. Finally,we have already encountered hygiene issues in the current version of Lean in adiﬀerent part of the system: the tactic framework. We discuss how these issuesare inevitable when implementing reusable tactic scripts and how our macrosystem can be applied to this hygiene problem as well in Section 6.

Contributions . We present a system for hygienic macros optimized for theoremproving languages as implemented in the next version of the Lean theorem prover,Lean 4. – We describe a novel, eﬃcient hygiene algorithm to employ macros in ITPlanguages at large: a combination of a white-box, eﬀect-based approach fordetecting newly introduced identiﬁers and an eﬃcient encoding of scopemetadata. – We show how such a macro system can be seamlessly integrated into existingelaboration designs to support type-directed expansion even if they are notbased on homogeneous source-to-source transformations. – We show how hygiene issues also manifest in tactic languages and how theycan be solved with the same macro system. To the best of our knowledge, These two macro declaration styles are commonly referred to as pattern-based vs. procedural https://github.com/leanprover/lean4/blob/IJCAR20/src/Init/Lean/Elab eyond Notations 3 the tactic language in Lean 4 is the ﬁrst tactic language in an establishedtheorem prover that is automatically hygienic in this regard. Lean’s current notation system as shown in Section 1 is still supported in Lean 4,but based on a much more general macro system; in fact, the notation keyworditself has been reimplemented as a macro, more speciﬁcally as a macro-generatingmacro making use of our tower of abstraction levels. The corresponding Lean 4command for the example from the previous section notation Γ " (cid:96) " e ":" τ => Typing Γ e τ expands to the macro declaration macro Γ :term " (cid:96) " e:term ":" τ :term : term => `(Typing $ Γ $e $ τ ) where the syntactic category ( term ) of placeholders and of the entire macro is nowspeciﬁed explicitly. The right-hand side uses an explicit syntax quasiquotation to construct the syntax tree, with syntax placeholders ( antiquotations ) preﬁxedwith $ . As suggested by the explicit use of quotations, the right-hand side maynow be an arbitrary Lean term computing a syntax object; in other words, thereis no distinction between pattern-based and procedural macros in our system. Wecan now use this abstraction level to implement simple command-level macros,for example. macro "defthunk" id:ident ":=" e:term : command =>`(def $id:ident := Thunk.mk (fun _ => $e))defthunk big := mkArray 100000 true Syntactic categories can be speciﬁed explicitly for antiquotations as in $id:ident where otherwise ambiguous. macro itself is another command-level macro that,for our notation example, expands to two commands syntax term " (cid:96) " term ":" term : termmacro_rules| `($ Γ (cid:96) $e : $ τ ) => `(Typing $ Γ $e $ τ ) that is, a pair of parser extension (which we will not further discuss in thispaper) and syntax transformer. Our reason for ultimately separating these twoconcerns is that we can now obtain a well-structured syntax tree pre-expansion,i.e. a concrete syntax tree, and use it to implement source code tooling suchas auto-completion, go-to-deﬁnition, and refactorings. Implementing even justthe most basic of these tools for the Lean 3 frontend that combined parsingand notation expansion meant that they had to be implemented right insidethe parser, which was not an extensible or even maintainable approach in ourexperience. All examples including full context can be found in the supplemental material at https://github.com/Kha/macro-supplement

S. Ullrich, L. de Moura

Both syntax and macro_rules are in fact further macros for regular Leandeﬁnitions encoding procedural metaprograms, though users should rarely needto make use of this lowest abstraction level explicitly. Both commands can onlybe used at the top level; we are not currently planning support for local macros.There is no more need for the complicated scoped syntax since the desiredtranslation can now be speciﬁed naturally, without any need for further annota-tions. notation " ∃ " b "," P => Exists (fun b => P) The lack of static restrictions on the right-hand side ensures that this worksjust as well with custom binding notations, even ones whose translation cannotstatically be determined before substitution. syntax "{" term "|" term "}" : termmacro_rules| `({$x ∈ $s | $p}) => `(setOf (fun $x => $x ∈ $s ∧ $p))| `({$b | $p}) => `(setOf (fun $b => $p))notation " (cid:83) " b "," p => Union {b | p} Here we explicitly make use of the macro_rules abstraction level for its convenientsyntactic pattern matching syntax. macro_rules are “open” in the sense thatmultiple transformers for the same syntax declaration can be deﬁned; they aretried in reverse declaration order by default up to the ﬁrst match (though thiscan be customized using explicit priority annotations). macro_rules| `({$x ≤ $e | $p}) => `(setOf (fun $x => $x ≤ $e ∧ $p)) As a ﬁnal example, we present a partial reimplementation of the arithmetic“bigop” notations found in Coq’s Mathematical Components library [12] such as \sum_ (i <- [0, 2, 4] | i != 2) i for summing over a ﬁltered sequence of elements. The speciﬁc bigop notations aredeﬁned in terms of a single \big_ fold operator; however, because Coq’s notationsystem is unable to abstract over this new indexing syntax, every speciﬁc bigopnotation has to redundantly repeat every speciﬁc index notation before delegatingto \big_ . In total, the 12 index notations for \big_ are duplicated for 3 diﬀerentbigops in the ﬁle. Notation "\sum_ ( i <- r ) F" := (\big[addn/0]_(i <- r) F).Notation "\sum_ ( i <- r | P ) F" := (\big[addn/0]_(i <- r | P) F). . . .

In contrast, using our system, we can introduce a new syntactic category forindex notations, interpret it once in \big_ , and deﬁne new bigops on top of itwithout any redundancy. https://github.com/math-comp/math-comp/blob/master/mathcomp/ssreflect/bigop.v eyond Notations 5 declare_syntax_cat indexsyntax ident "<-" term : indexsyntax ident "<-" term "|" term : index . . . macro " Σ " "(" idx:index ")" F:term : term =>`(\big_ [HasAdd.add, 0] ($idx:index) $F) The full example is included in the supplement.

In this section, we will give a mostly self-contained description of our algorithmfor automatic hygiene applied to a simple recursive macro expander; we postponecomparisons to existing hygiene algorithms to Section 7.Hygiene issues occur when transformations such as macro expansions lead toan unexpected capture (rebinding) of identiﬁers. For example, given the notation notation "const" e => fun x => e we would not expect the term const x to be closed because intuitively there isno x in scope at the argument position of const ; that the implementation of themacro makes use of the name internally should be of no concern to the macrouser.Thus hygiene issues can also be described as a confusion of scopes whensyntax parts are removed from their original context and inserted into newcontexts, which makes name resolution strictly after macro expansion (such as ina compiler preceded by a preprocessor) futile. Instead we need to track scopes asmetadata before and during macro expansion so as not to lose information aboutthe original context of identiﬁers. Speciﬁcally,1. when an identiﬁer captured in a syntax quotation matches one or moretop-level symbols , the identiﬁer is annotated with a list of these symbols as top-level scopes to preserve its extra-macro context (which, because of thelack of local macros, can only contain top-level bindings), and2. when a macro is expanded, all identiﬁers freshly introduced by the expansionare annotated with a new macro scope to preserve the intra-macro context.Macro scopes are appended to a list, i.e. ordered by expansion time. Thisfull “history of expansions” is necessary to treat macro-producing macroscorrectly, as we shall see in Section 3.2.Thus, the expansion of the above term is (an equivalent of) fun x.1 => x where is a fresh macro scope appended to the macro-introduced x , preventing itfrom capturing the x from the original input. In general, we will style hygienicidentiﬁers in the following as n.msc .msc . . . . .msc n {tsc , . . . ,tsc n } where n is the original name, msc are macro scopes, and tsc top-level scopes, eliding Lean allows overloaded top-level bindings whereas local bindings are shadowing S. Ullrich, L. de Moura the braces if there are no top-level scopes as in the example above. We usethe dot notation to suggest both the ordered nature of macro scopes and theireventual implementation in Section 4. We will now describe how to implementthese operations in a standard macro expander.

The macro expander described in this section bundles the execution of macrosand insertion of their results with interspersed name resolution to track scopesand ensure hygiene of identiﬁers. As we shall see below, top-level scopes onbinding names are always discarded by it. Thus we will deﬁne a symbol moreformally as an identiﬁer together with a list of macro scopes, such as x.1 above.Given a global context (a set of symbols), the expander does a conventionaltop-down expansion, keeping track of an initially-empty local context (anotherset of symbols). When a binding is encountered, the local context is extendedwith that symbol; top-level scopes on bindings are discarded since they are onlymeaningful on references. When a reference, i.e. an identiﬁer not in bindingposition, is encountered, it is resolved according to the following rules:1. If the local context has an entry for the same symbol, the reference binds tothe corresponding local binding; any top-level scopes are ignored.2. Otherwise, if the identiﬁer is annotated with one or more top-level scopes ormatches one or more symbols in the global context, it binds to all of these(to be disambiguated by the elaborator).3. Otherwise, the identiﬁer is unbound and an error is generated.In the common incremental compilation mode of ITPs, every command is fullyprocessed before subsequent commands. Thus, an expander for such a systemwill not extend the global context by itself, but pass the fully expanded commandto the next compilation step before being called again with the next command’sunexpanded syntax tree and a possibly extended global context.Notably, our expander does not add macro scopes to identiﬁers by itself, either,much in contrast to other expansion algorithms. We instead delegate this task tothe macro itself, though in a completely transparent way for all pattern-based andfor many procedural macros. We claim that a macro should in fact be interpretedas an eﬀectful computation since two expansions of the same identiﬁer-introducingmacro should not return the same syntax tree to avoid unhygienic interactionsbetween them. Thus, as a side eﬀect , it should apply a fresh macro scope to eachcaptured identiﬁer. In particular, a syntax quotation should not merely be seenas a datum, but implemented as an eﬀectful value that obtains and applies thisfresh scope to all the identiﬁers contained in it to immediately ensure hygiene forpattern-based macros. Procedural macros producing identiﬁers not originatingfrom syntax quotations might need to obtain and make use of the fresh macroscope explicitly. We give a speciﬁc monad-based [14] implementation of eﬀectfulsyntax quotations as a regular macro in Section 4. eyond Notations 7

Given the following input, def x := 1def e := fun y => xnotation "const" e => fun x => edef y := const x we incrementally parse, expand, and elaborate each declaration before advancingto the next one. For a ﬁrst, trivial example, let us focus on the expansion of thesecond line. At this point, the global context contains the symbol x (plus anydefault imports that we will ignore here). Descending into the right-hand sideof the deﬁnition, we ﬁrst add y to the local context. The reference x does notmatch any local deﬁnitions, so it binds to the matching top-level deﬁnition.In the next line, the built-in notation macro expands to the deﬁnitions syntax "const" term : termmacro_rules| `(const $e) => `(fun x => $e) When a top-level macro application unfolds to multiple declarations, we expandand elaborate these incrementally as well to ensure that declarations are inthe global context of subsequent declarations. When recursively expanding the macro_rules declaration (we will assume for this example that macro_rules itselfis primitive) in the global context {x, e} , we ﬁrst visit the syntax quotationon the left-hand side. The identiﬁer e inside of it is in an antiquotation andthus not captured by the quotation. It is in binding position for the right-handside, so we add e to the local context. Visiting the right-hand side, we ﬁndthe quotation-captured identiﬁer x and annotate it with the matching top-leveldeﬁnition of the same name; we do not yet know that it is in a binding position.When visiting the reference e , we see that it matches a local binding and do notadd top-level scopes. macro_rules| `(const $e) => `(fun x{x} => $e) Visiting the last line def y := const x with the global context {x, e} , we descend into the right-hand side. We expandthe const macro given a fresh macro scope , which is applied to any capturedidentiﬁers. def y := fun x.2{x} => x We add the symbol x.2 (discarding the top-level scope x ) to the local contextand ﬁnally visit the reference x . The reference does not match the local binding x.2 but does match the top-level binding x , so it binds to the latter. def y := fun x.2 => x S. Ullrich, L. de Moura

Now let us brieﬂy look at a more complex macro-macro example demonstratinguse of the macro scopes stack: macro "m" n:ident : command => `(def f := 1macro "mm" : command => `(def $n:ident := f def f := $n:ident))

If we call m f , we apply a macro scope to all captured identiﬁers, then incre-mentally process the two new declarations. def f.1 := 1macro "mm" : command => `(def f := f.1{f.1} def f.1{f.1} := f) If we call the new macro mm , we apply one more macro scope . def f.2 := f.1.2{f.1} def f.1.2{f.1} := f.2 When processing these new deﬁnitions, we see that the scopes ensure the expectedname resolution. In particular, we now have global declarations f.1 , f.2 , and f.1.2 that show that storing only a single macro scope would have led to acollision. Syntax objects in Lean 4 are represented as an inductive type of nodes (ornonterminals), atoms (or terminals), and, as a special case of nonterminals, identiﬁers . inductive Syntax| node (kind : Name) (args : Array Syntax)| atom (info : Option SourceInfo) (val : String)| ident (info : Option SourceInfo) (rawVal : String) (val : Name)(preresolved : List (Nat × List String))| missing

An additional constructor represents missing parts from syntax error recovery.Atoms and identiﬁers are annotated with source location metadata unless gener-ated by a macro. Identiﬁers carry macro scopes inline in their

Name while top-levelscopes are held in a separate list. The additional

Nat is an implementation detailof Lean’s hierarchical name resolution.The type

Name of hierarchical names precedes the implementation of themacro system and is used throughout Lean’s implementation for referring to(namespaced) symbols. inductive Name| anonymous : Name| str : Name → String → Name| num : Name → Nat → Name

The syntax `a.b is a literal of type

Name for use in meta-programs. The numericpart of

Name is not accessible from the surface syntax and reserved for internalnames; similar designs are found in other ITPs. By reusing

Name for storing macro eyond Notations 9 scopes, but not top-level scopes, we ensure that the new deﬁnition of symbol from Section 3.1 coincides with the existing Lean type and no changes to theimplementation of the local or global context are necessary for adopting themacro system.A Lean 4 implementation of the expansion algorithm described in the previoussection is given in Fig. 1; the full implementation including examples is includedin the supplement. As a generalization, syntax transformers have the type

Syntax → TransformerM Syntax where the

TransformerM monad gives access to the globalcontext and a fresh macro scope per macro expansion. The expander itself usesan extended

ExpanderM monad that also stores the local context and the set ofregistered macros. We use the Lean equivalent of Haskell’s do notation [13] toprogram in these monads.As usual, the expander has built-in knowledge of some “core forms” (lines3-17) with special expansion behavior, while all other forms are assumed to bemacros and expanded recursively (lines 20-22). Identiﬁers form one base case ofthe recursion. As described in the algorithm, they are ﬁrst looked up in the localcontext (recall that the val of an identiﬁer includes macro scopes), then as afall back in the global context plus its own top-level scopes. mkTermId : Name → Syntax creates an identiﬁer without source information or top-level scopes,which are not needed after expansion. mkOverloadedConstant implements the Leanspecial case of overloaded symbols to be disambiguated by elaboration; systemswithout overloading support should throw an ambiguity error instead in this case.As an example of a core binding form, the expansion of a single-parameter fun is shown in lines 13-17 of Fig. 1. It recursively expands the given parametertype, then expands the body in a new local context extended with the value of id . Here getIdentVal : Syntax → Name in particular implements the discardingof top-level scopes from binders.Finally, in the macro case, we fetch the syntax transformer for the given nodekind, call it in a new context with a fresh current macro scope, and recurse.Syntax quotations are given as one example of a macro: they do not havebuilt-in semantics but transform into code that constructs the appropriate syntaxtree ( expandStxQuot in Fig. 2). More speciﬁcally, a syntax quotation will, atruntime, query the current macro scope msc from the surrounding

TransformerM monad and apply it to all captured identiﬁers, which is done in quoteSyntax . quoteSyntax recurses through the quoted syntax tree, reﬂecting its constructors.Basic datatypes such as String and

Name are turned into

Syntax via the typeclassmethod quote . For antiquotations, we return their contents unreﬂected. In thecase of identiﬁers, we resolve possible global references at compile time and reﬂectthem, while msc is applied at runtime. Thus a quotation `(a + $b) inside a globalcontext where the symbol a matches declarations a.a and b.a is transformed tothe equivalent of do msc ← getCurrMacroScope;pure (Syntax.node `plus[Syntax.ident none "a" (addMacroScope `a msc) [`a.a, `b.a],Syntax.atom none "+", b]) partial def expand : Syntax → ExpanderM Syntax | stx => match_syntax stx with | `($id:ident) => do let val := getIdentVal id; gctx ← getGlobalContext; lctx ← getLocalContext; if lctx.contains val then pure (mkTermId val) else match resolve gctx val ++ getPreresolved id with | [] => throw ("unknown identifier " ++ toString val) | [(id, _)] => pure (mkTermId id) | ids => pure (mkOverloadedIds ids) | `(fun ($id:ident : $ty) => $e) => do let val := getIdentVal id; ty ← expand ty; e ← withLocal val (expand e); `(fun ($(mkTermId val) : $ty) => $e) | . . . -- other core forms | _ => do t ← getTransformerFor stx.getKind; stx ← withFreshMacroScope (t stx); expand stx Fig. 1.

Abbreviated implementation of a recursive expander for our macro system

This implementation of syntax quotations itself makes use of syntax quotationsfor simplicity and thus is dependent on its own implementation in the previousstage of the compiler. Indeed, the helper variable msc must be renamed should thename already be in scope and used inside an antiquotation. Note that quoteSyntax is allowed to reference the same msc as expandStxQuot because they are part ofthe same macro call and the current macro scope is unchanged between them. The macro system as described so far can handle most syntax sugars of Lean3 except for ones requiring type information. For example, the anonymousconstructor (cid:104) e, . . . (cid:105) is sugar for (c e . . . ) if the expected type of the expressionis known and it is an inductive type with a single constructor c . While trivialto parse, there is no way to implement this syntax as a macro if expansion isdone strictly prior to elaboration. To the best of our knowledge, none of the ITPslisted in the introduction support hygienic elaboration extensions of this kind,but we will show how to extend their common elaboration scheme in that way inthis section. eyond Notations 111 partial def quoteSyntax : Syntax → TransformerM Syntax | Syntax.ident info rawVal val preresolved => do gctx ← getGlobalContext; let preresolved := resolve gctx val ++ preresolved; `(Syntax.ident none $(quote rawVal) (addMacroScope $(quote val) msc)$(quote preresolved)) | stx@(Syntax.node k args) => if isAntiquot stx then pure (getAntiquotTerm stx) else do args ← args.mapM quoteSyntax; `(Syntax.node $(quote k) $(quote args)) | Syntax.atom info val => `(Syntax.atom none $(quote val)) | Syntax.missing => pure Syntax.missing def expandStxQuot (stx : Syntax) : TransformerM Syntax := do stx ← quoteSyntax (stx.getArg 1); `(do msc ← getCurrMacroScope; pure $stx) Fig. 2.

Simpliﬁed syntax transformer for syntax quotations

Elaboration can be thought of as a function elabTerm : Syntax → ElabMExpr in an appropriate monad

ElabM from a (concrete or abstract) surface-levelsyntax tree type Syntax to a fully-speciﬁed core term type

Expr [15]. We havepresented the (concrete) deﬁnition of

Syntax in Lean 4 in Section 4; the particulardeﬁnition of

Expr is not important here. While such an elaboration system couldreadily be composed with a type-insensitive macro expander such as the onepresented in Section 3, we would rather like to intertwine the two to supporttype-sensitive but still hygienic-by-default macros (henceforth called elaborators )without having to reimplement macros of the kind discussed so far. Indeed, thesecan automatically be adapted to the new type given an adapter between the twomonads, similarly to the adaption of macros to expanders in [6]: def transformerToElaborator (m : Syntax → TransformerM Syntax) :Syntax → ElabM Expr :=fun stx => do stx' ← (transformerMToElabM m) stx; elabTerm stx' Because most parts of our hygiene system are implemented by the expanderfor syntax quotations, the only changes to an elaboration system necessary forsupporting hygiene are storing the current macro scope in the elaboration monad(to be passed to the expansion monad in the adapter) and allocating a fresh macroscope in elabTerm and other recursion points, which morally now represent thestarting point of a macro’s expansion. Thus elaborators immediately beneﬁt fromhygiene as well whenever they use syntax quotations to construct unelaboratedhelper syntax objects to pass to elabTerm . In order to support syntax quotations at the term level; other levels work analogously but with diﬀerent output types or some other encoding of eﬀects2 S. Ullrich, L. de Moura in these two and other monads, we generalize their implementation to a newmonad typeclass implemented by both monads. class MonadQuotation (m : Type → Type) :=(getCurrMacroScope : m MacroScope)(withFreshMacroScope { α : Type} : m α → m α ) The second operation is not used by syntax quotations directly, but can be usedby procedural macros to manually enter new macro call scopes.As an example, the following is a simpliﬁed implementation of the anonymousconstructor syntax mentioned above. @[termElab anonymousCtor]def elabAnonymousCtor (stx : Syntax) : ElabM Expr :=match_syntax stx with| `( (cid:104) $args* (cid:105) ) => doexpectedType ← getExpectedType;match Expr.getAppFn expectedType with| Expr.const constName _ _ => doctors ← getCtors constName;match ctors with| [ctor] => dostx ← `($(mkCTermId ctor) $(getSepElems args)*);elabTerm stx . . . -- error handling The [termElab] attribute registers this elaborator for the given syntax node kind. $args* is an antiquotation splice that extracts/injects a syntactic sequence ofelements into/from an

Array Syntax . The array by default includes separatorssuch as “,” as

Syntax.atom s in order to be lossless, which we here ﬁlter out using getSepElems . The function mkCTermId : Name → Syntax synthesizes a hygienicreference to the given constant name by storing it as a top-level scope andapplying a reserved macro scope to the constructed identiﬁer.This implementation fails if the expected type is not yet suﬃciently known atthis point. The actual implementation of this elaborator extends the code by postponing elaboration in this case. When an elaborator requests postponement,the system returns a fresh metavariable as a placeholder and associates the inputsyntax tree with it. Before ﬁnishing elaboration, postponed elaborators associatedwith unsolved metavariables are retried until they all ultimately succeed, or elseelaboration is stuck because of cyclic dependencies and an error is signed. Lean 3 includes a tactic framework that, much like macros, allows users towrite custom automation either procedurally inside a

Tactic monad (renamedto

TacticM in Lean 4) or “by example” using tactic language quotations, or in a https://github.com/leanprover/lean4/blob/IJCAR20/src/Init/Lean/Elab/BuiltinNotation.lean eyond Notations 13 mix of both [9]. For example, Lean 3 uses a short tactic block to prove injectionlemmas for data constructors. def mkInjEq : Tactic Unit :=`[intros; apply propext; apply Iff.intro; . . . ] Unfortunately, this code unexpectedly broke in Lean 3 when used from a library forhomotopy type theory that deﬁned its own propext and

Iff.intro declarations; in other words, Lean 3 tactic quotations are unhygienic and required manualintervention in this case. Just like with macros, the issue with tactics is thatbinding structure in such embedded terms is not known at declaration time. Onlyat tactic run time do we know all local variables in the current context thatpreceding tactics may have added or removed, and therefore the scope of eachcaptured identiﬁer.Arguably, the Lean 3 implementation also exhibited a lack of hygiene in thehandling of tactic-introduced identiﬁers: it did not prevent users from referencingsuch an identiﬁer outside of the scope it was declared in. def myTac : Tactic Unit := `[intro h]lemma triv (p : Prop) : p → p := begin myTac; exact h end Coq’s similar Ltac tactic language [5] exhibits the same issue and users areadvised not to introduce ﬁxed names in tactic scripts but to generate fresh namesusing the fresh tactic ﬁrst, which can be considered a manual hygiene solution.Lean 4 instead extends its automatically hygienic macro implementation totactic scripts by allowing regular macros in the place of tactic invocations. macro "myTac" : tactic => `(intro h; exact h)theorem triv (p : Prop) : p → p := begin myTac end By the same hygiene mechanism described above, introduced identiﬁers such as h are renamed so as not to be accessible outside of their original scope, whilereferences to global declarations are preserved as top-level scope annotations.Thus Lean 4’s tactic framework resolves both hygiene issues discussed herewithout requiring manual intervention by the user. Expansion of tactic macros infact does not precede but is integrated into the tactic evaluator evalTactic :Syntax → TacticM Unit such that recursive macro calls are expanded lazily. syntax "repeat" tactic : tacticmacro_rules| `(tactic| repeat $t) => `(tactic| try ($t; repeat $t))

Here the quotation kind tactic followed by a pipe symbol speciﬁes the parserto use for the quotation, since tactic syntax may otherwise overlap with termsyntax. macro automatically infers it from the given syntax category, but cannotbe used here because the parser for repeat would not yet be available in theright-hand side. When $t eventually fails, the recursion is broken without visiting https://github.com/leanprover/lean/pull/1913 https://github.com/coq/coq/issues/9474 and expanding the subsequent repeat macro call. The try tactical is used toignore this eventual failure.While we believe that macros will cover most use cases of tactic quotationsin Lean 3, their use within larger TacticM metaprograms can be recovered bypassing such a quotation to evalTactic : def myTac2 : TacticM Unit :=do stx ← `(tactic|intro h; exact h); evalTactic stxTacticM implements the MonadQuotation typeclass for this purpose.

The main inspiration behind our hygiene implementation was Racket’s new

Setsof Scopes [10] hygiene algorithm. Much like in our approach, Racket annotatesidentiﬁers both with scopes from their original context as well as with additionalmacro scopes when introduced by a macro expansion. However, there are somesigniﬁcant diﬀerences: Racket stores both types of scopes in a homogeneous,unordered set and does name resolution via a maximum-subset check. For bothsimplicity of implementation and performance, we have reduced scopes to thebare minimal representation using only strict equality checks, which we can easilyencode in our existing

Name implementation. In particular, we only apply scopesto matching identiﬁers and only inside syntax quotations. This optimization is ofspecial importance because top-level declarations in Lean and other ITPs arenot part of a single, mutually recursive scope as in Racket, but each open theirown scope over all subsequent declarations, which would lead to a total numberof scope annotations quadratic in the number of declarations using the Sets ofScopes algorithm. Finally, Racket detects macro-introduced identiﬁers using a“black-box” approach without the macro’s cooperation following the markingapproach of [11]: a fresh macro scope is applied to all identiﬁers in the macro input,then inverted on the macro output. While elegant, a naive implementation of thisapproach can result in quadratic runtime compared to unhygienic expansion andrequires further optimizations in the form of lazy scope propagation [7], which isdiﬃcult to implement in a pure language such as Lean. Our “white-box” approachbased on the single primitive of an eﬀectful syntax quotation, while slightly easierto escape from in procedural syntax transformers, is simple to implement, incursminimal overhead, and is equivalent for pattern-based macros.The idea of automatically handling hygiene in the macro, and not in the ex-pander, was introduced in [4], though only for pattern-based macros. MetaML [18]reﬁned this idea by tying hygiene more speciﬁcally to syntax quotations that couldbe used in larger metaprogram contexts, which Template Haskell [17] interpretedas eﬀectful (monadic) computations requiring access to a fresh-names generator,much like in our design. However, both of the latter systems should perhaps becharacterized more as metaprogramming frameworks than Scheme-like macrosystems: there are no “macro calls” but only explicit splices and so only built-insyntax with known binding semantics can be captured inside syntax quotations. eyond Notations 15

Thus the question of which captured identiﬁers to rename becomes trivial again,just like in the basic notation systems discussed in Section 1.While the vast majority of research on hygienic macro systems has focused onS-expression-based languages, there have been previous eﬀorts on marrying thatresearch with non-parenthetical syntax, with diﬀerent solutions for combiningsyntax tree construction and macro expansion. The Dylan language requiresmacro syntax to use predeﬁned terminators and eagerly scans for the end of amacro call using this knowledge [2], while in Honu [16] the syntactic structure of amacro call is discovered during expansion by a process called “enforestation”. TheFortress [1] language strictly separates the two concerns into grammar extensionsand transformer declarations, much like we do. Dylan and Fortress are restrictedto pattern-based macro declarations and thus can make use of simple hygienealgorithms while Honu uses the full generality of the Racket macro expander.On the other hand, Honu’s authors “explicitly trade expressiveness for syntacticsimplicity” [16]. In order to express the full Lean language and desirable extensionsin a macro system, we require both unrestricted syntax of macros and proceduraltransformers.Many theorem provers such as Coq, Agda, Idris, and Isabelle not already basedon a macro-powered language provide restricted syntax extension mechanisms,circumventing hygiene issues by statically determining binding as seen in Section 1.Extensions that go beyond that do not come with automatic hygiene guarantees.Agda’s macros , for example, operate on the De Bruijn index-based core termlevel and are not hygienic. The ACL2 prover in contrast uses a subset ofCommon Lisp as its input language and adapts the hygiene algorithm of [7] basedon renaming [8]. The experimental Cur [3] theorem prover is a kind of dual toour approach: it takes an established language with hygienic macros, Racket, andextends it with a dependent type system and theorem proving tools. ACL2 doesnot support tactic scripts, while in Cur they can be deﬁned via regular macros.However, this approach does not currently provide tactic hygiene as deﬁned inSection 6. We have proposed a new macro system for interactive theorem provers thatenables syntactic abstraction and reuse far beyond the usual support of mixﬁxnotations. Our system is based on a novel hygiene algorithm designed with afocus on minimal runtime overhead as well as ease of integration into pre-existingcodebases, including integration into standard elaboration designs to supporttype-directed macro expansion. Despite that, the algorithm is general enoughto provide a complete hygiene solution for pattern-based macros and providesﬂexible hygiene for procedural macros. We have also demonstrated how our macrosystem can address unexpected name capture issues that haunt existing tactic https://agda.readthedocs.io/en/v2.6.0.1/language/reflection.html https://github.com/agda/agda/issues/3819 https://github.com/wilbowma/cur/issues/104 frameworks. We have implemented our method in the upcoming version (v4) ofthe Lean theorem prover; it should be suﬃciently attractive and straightforwardto implement to be adopted by other interactive theorem proving systems aswell. Acknowledgments.

We are very grateful to the anonymous reviewers, DavidThrane Christiansen, Gabriel Ebner, Matthew Flatt, Sebastian Graf, Alexis King,Daniel Selsam, and Max Wagner for extensive comments, corrections, and advice.

References

1. Allen, E., Chase, D., Hallett, J., Luchangco, V., Maessen, J.W., Ryu, S., Steele Jr,G.L., Tobin-Hochstadt, S., Dias, J., Eastlund, C., et al.: The Fortress languagespeciﬁcation. Sun Microsystems (140), 116 (2005)2. Bachrach, J., Playford, K., Street, C.: D-expressions: Lisp power, Dylan style. StyleDeKalb IL (1999)3. Chang, S., Ballantyne, M., Turner, M., Bowman, W.J.: Dependent type systemsas macros. Proceedings of the ACM on Programming Languages (POPL), 1–29(2019)4. Clinger, W., Rees, J.: Macros that work. In: Proceedings of the 18th ACM SIGPLAN-SIGACT symposium on Principles of programming languages. pp. 155–162 (1991)5. Delahaye, D.: A tactic language for the system Coq. In: Logic for Programmingand Automated Reasoning, 7th International Conference, LPAR 2000, Proceedings.pp. 85–95 (2000)6. Dybvig, R.K., Friedman, D.P., Haynes, C.T.: Expansion-passing style: Beyondconventional macros. In: Proceedings of the 1986 ACM conference on LISP andfunctional programming. pp. 143–150 (1986)7. Dybvig, R.K., Hieb, R., Bruggeman, C.: Syntactic abstraction in Scheme. Lisp andsymbolic computation (4), 295–326 (1993)8. Eastlund, C., Felleisen, M.: Hygienic macros for ACL2. In: International Symposiumon Trends in Functional Programming. pp. 84–101. Springer (2010)9. Ebner, G., Ullrich, S., Roesch, J., Avigad, J., de Moura, L.: A metaprogrammingframework for formal veriﬁcation. Proc. ACM Program. Lang. (ICFP) (Sep 2017).https://doi.org/http://dx.doi.org/10.1145/311027810. Flatt, M.: Binding as sets of scopes. In: Proceedings of the 43rdAnnual ACM SIGPLAN-SIGACT Symposium on Principles of Program-ming Languages. pp. 705–717. POPL ’16, ACM, New York, NY, USA(2016). https://doi.org/10.1145/2837614.2837620, http://doi.acm.org/10.1145/2837614.2837620

11. Kohlbecker, E., Friedman, D.P., Felleisen, M., Duba, B.: Hygienic macro expansion.In: Proceedings of the 1986 ACM conference on LISP and functional programming.pp. 151–161 (1986)12. Mahboubi, A., Tassi, E.: Mathematical components, https://math-comp.github.io/mcb/

13. Marlow, S., et al.: Haskell 2010 language report (2010),

14. Moggi, E.: Notions of computation and monads. Information and computation (1), 55–92 (1991)eyond Notations 1715. de Moura, L., Avigad, J., Kong, S., Roux, C.: Elaboration in dependent type theory(2015)16. Rafkind, J., Flatt, M.: Honu: syntactic extension for algebraic notation throughenforestation. In: ACM SIGPLAN Notices. vol. 48, pp. 122–131. ACM (2012)17. Sheard, T., Jones, S.P.: Template meta-programming for Haskell. In: Proceedingsof the 2002 ACM SIGPLAN workshop on Haskell. pp. 1–16. ACM (2002)18. Taha, W., Sheard, T.: MetaML and multi-stage programming with explicit annota-tions. Theoretical computer science248