[PDF] A graded dependent type system with a usage-aware semantics (extended version)

Abstract

Graded Type Theory provides a mechanism to track and reason about resource usage in type systems. In this paper, we develop GraD, a novel version of such a graded dependent type system that includes functions, tensor products, additive sums, and a unit type. Since standard operational semantics is resource-agnostic, we develop a heap-based operational semantics and prove a soundness theorem that shows correct accounting of resource usage. Several useful properties, including the standard type soundness theorem, non-interference of irrelevant resources in computation and single pointer property for linear resources, can be derived from this theorem. We hope that our work will provide a base for integrating linearity, irrelevance and dependent types in practical programming languages like Haskell.

Full PDF

aa r X i v : . [ c s . P L ] J a n A Graded Dependent Type System with a Usage-AwareSemantics (extended version)

PRITAM CHOUDHURY,

University of Pennsylvania, USA

HARLEY EADES III,

Augusta University, USA

RICHARD A. EISENBERG,

Tweag I/O, France and Bryn Mawr College, USA

STEPHANIE WEIRICH,

University of Pennsylvania, USAGraded Type Theory provides a mechanism to track and reason about resource usage in type systems. In thispaper, we develop

GraD , a novel version of such a graded dependent type system that includes functions,tensor products, additive sums, and a unit type. Since standard operational semantics is resource-agnostic, wedevelop a heap-based operational semantics and prove a soundness theorem that shows correct accounting ofresource usage. Several useful properties, including the standard type soundness theorem, non-interferenceof irrelevant resources in computation and single pointer property for linear resources, can be derived fromthis theorem. We hope that our work will provide a base for integrating linearity, irrelevance and dependenttypes in practical programming languages like Haskell.CCS Concepts: • Theory of computation → Type theory ; Linear logic .Additional Key Words and Phrases: Irrelevance, linearity, quantitative reasoning, heap semantics.

Consider this typing judgement. x : Bool , y : Int , z : Bool ⊢ if 𝑥 then 𝑦 + else 𝑦 − Int

Here, the numbers in the context indicate that the variable 𝑥 is used once in the expression, thevariable 𝑦 is also used only once (although it appears twice), and the variable 𝑧 is never used at all.This sentence is a judgement of a graded type system which ensures that the grades or quan-tities annotating each in-scope variable reﬂects how it is used at run time. Graded type systemshave been explored in much detail in the literature [Atkey 2018; Brunel et al. 2014; Gaboardi et al.2016; Ghica and Smith 2014; McBride 2016; Orchard et al. 2019; Petricek et al. 2014]. The process oftracking usage through grades is straightforward, but this is a powerful method of instrumentingtype systems with analyses of irrelevance and linearity that have practical beneﬁts like erasure ofirrelevant terms (resulting in speed-up) and compiler optimizations (such as in-place update of lin-ear resources). This approach is also versatile. By abstracting over a domain of resources, the sameform of type system can be used to guarantee safe memory usage, or prevent insecure informa-tion ﬂow, or quantify information leakage, or identify irrelevant computations, or combine variousmodal logics. Several research languages, such as Idris 2 [Brady 2020] and Agda [Agda-Team 2020],are starting to adopt ideas from this domain, and new systems like Granule [Orchard et al. 2019]are being developed to explore its possibilities.Our concrete motivation for studying graded type systems is a desire to merge Haskell’s currentform of a linear type system [Bernardy et al. 2018] with dependent types [Weirich et al. 2017] ina clean manner. Crucially, the combined system must support type erasure : the compiler must beable to eliminate type arguments to polymorphic functions. Type erasure is key both to support A uthors’ addresses: Pritam Choudhury, Computer and Information Science, University of Pennsylvania, USA, [email protected]; Harley Eades III, School of Computer and Cyber Sciences, Augusta University, 2500 Walton Way, Augusta,GA, 30904, USA, [email protected]; Richard A. Eisenberg, Tweag I/O, Paris, France, Computer Science, Bryn MawrCollege, 101 N. Merion Ave, Bryn Mawr, PA, 19010, USA, [email protected]; Stephanie Weirich, Computer and InformationScience, University of Pennsylvania, 3330 Walnut St, Philadelphia, PA, 19104, USA, [email protected]. arametric polymorphism and to eﬃciently execute Haskell programs. We discuss this in moredetail in Section 2.Although Haskell is our eventual goal, our work remains general. Our designs are compatiblewith the current approaches in GHC, but are not specialized to Haskell.We make the following contributions in this paper: • Our system ﬂexibly abstracts over an algebraic structure used to count resources. Section 3describes this structure—a partially-ordered semiring—and its properties. This use of a re-source algebra is standard, although we identify subtle diﬀerences in its speciﬁcation. • Section 4 presents a simple graded type system, with standard algebraic types and a gradedmodal type. This system is not novel; instead, it establishes a foundation for the dependentsystem. However, even at this stage, we identify subtleties in the design space. • Because the standard operational semantics does not track resources, type safety does notimply that usage tracking is correct. Section 5 describes a heap-based operational semantics,inspired by Turner and Wadler [1999]. Every variable in the heap has an associated resourcetag from our abstract structure, modelling how resources are used during computation. Weprove that our type system is sound with respect to this instrumented semantics. This the-orem tells us that well-typed terms will not get stuck by running out of resources. In theprocess of showing that this result holds, we identify a key restriction on case analysis thatwas not forced by the non-resourced version of type safety. • Using soundness, we show (a generalization of) the single pointer property for linear re-sources in Section 6. The single pointer property says that a linear resource is referencedby precisely one pointer at runtime. Such a property would enable in-place updates of linearresources. • Our key contribution is the design of the language,

GraD , extending our ideas to dependenttypes. In contrast to other approaches [Atkey 2018], we use the same rules to check relevantand irrelevant phrases (that is, terms and types). When computing the resources used by theentire term, we discard irrelevant usages. Types are irrelevant to computation, so our systemignores these usages. We describe the design of the type system in Section 7 and extend thesoundness proof with respect to a heap semantics in Section 8.Our system is thus both simpler and more uniform than prior work that combines usagetracking with dependent types. In particular, Quantitative Type Theory (QTT) [Atkey 2018;McBride 2016] disables resource checking in types, leading to limitations on the sorts ofreasoning that can be done in the type system. On the other hand, Resourceful DependentTypes [Abel 2018] and GrTT [Moon et al. 2020] maintain separate counts of usages in typesand terms, incurring additional bookkeeping for less beneﬁt. Section 9.3 provides a detailedcomparison of our work with QTT. • While the exploration of graded type systems in this paper is applicable to a wide array of ex-amples (see Section 3.2), we were originally motivated to study such systems in the context ofGHC/Haskell, where we wish to combine its existing support for linearity [Bernardy et al. 2018] available as of GHC 9.0) with support for dependent types [Eisenberg 2016; Gundry 2013; Weirich et al.2019, 2017].A key advantage of a successful combination of graded and dependent types for Haskell is thatit allows us to use the quantity to mean irrelevant usage, where an irrelevant sub-term is one thatis not needed while computing the reduct of the concerned term. Irrelevant sub-terms are quitecommon in terms derived in dependent type systems. They are essential for type-checking theterms but if left as such, they can make programs run much slower. So we need to track irrelevantsub-terms and erase them before running a program. Weirich et al. [2017] use a relevance tag +/− on the Π for this purpose. On the other hand, Bernardy et al. [2018] use a linearity tag / 𝜔 on thefunction domain type to track linear usage. We can combine these two together using , and 𝜔 totrack irrelevant, linear and unrestricted usages respectively. This has an added advantage. It willallow us to provide Haskell programmers the option of annotating arguments with a usage tagthat subsumes relevance and linearity tags [Eisenberg 2018]. So the use of to mark irrelevanceﬁts in swimmingly with Haskell’s current story around linear types.Furthermore, given that we plan to implement these ideas concretely inside GHC, it is essentialthat the system be as simple as possible. As discussed in more detail in Section 9.3, our systemeliminates features that are not necessary in our case. Doing so will aid in integration with therest of the GHC implementation.Our intentions laid out, we start our exploration by reviewing semirings, the key algebraic struc-ture used to abstractly represent grades. The goal of a graded type theory is to track the demands that computations make on variablesthat appear in the context. In other words, the type system enables a static accounting of runtimeresources “used” in the evaluation of terms. This form of type system generalizes linear types(where linear resources must be used exactly once) [Wadler 1990] and bounded linear types (wherebounded resources must be used a ﬁnite number of times) [Girard et al. 1992], as well as many,many other type systems [Abadi et al. 1999; Miquel 2001; Pfenning 2001; Reed and Pierce 2010;Volpano et al. 1996].This generality derives from the fact that the type system is parametrized over an abstract al-gebraic structure of grades to model resources. The abstract algebraic structure enables additionand multiplication of resources and these operations conform to our general understanding of re-source arithmetic. A partially-ordered semiring is one such algebraic structure that captures thisidea of resource modelling nicely. A semiring is a set 𝑄 with two binary operations, _ + _ : 𝑄 × 𝑄 → 𝑄 (addition) and _ · _ : 𝑄 × 𝑄 → 𝑄 (multiplication), and two distinguished elements, and , such that ( 𝑄, + , ) is a commutativemonoid and ( 𝑄, · , ) is a monoid; furthermore, multiplication is both left and right distributiveover addition and is an annihilator for multiplication. Note that a semiring is not a full ringbecause addition does not have an inverse—we cannot subtract.We mark the variables in our contexts with quantities drawn from a semiring to represent de-mand of resources. In other words, if we have a typing derivation for a term a with free variable x marked with 𝑞 , we know that a demands 𝑞 uses of x .We can weaken the precision of our type system (but increase its ﬂexibility) by allowing thejudgement to express higher demand than is actually necessary. For example, we may need to Grades are also called quantities, modalities, resources, coeﬀects or usages. se some variable only once but it may be convenient to declare that the usage of that variableis unrestricted. To model this notion of sub-usage , we need an ordering on the elements of theabstract semiring, reﬂecting our notion of leniency. A partial order captures the idea nicely. Sincewe work with a semiring, such an order should also respect the binary operations of the semiring.Concretely, for a partial order ≤ on 𝑄 , if 𝑞 ≤ 𝑞 , then for any 𝑞 ∈ 𝑄 , we should have 𝑞 + 𝑞 ≤ 𝑞 + 𝑞 , 𝑞 · 𝑞 ≤ 𝑞 · 𝑞 , and 𝑞 · 𝑞 ≤ 𝑞 · 𝑞 . A semiring with a partial order satisfying this condition is calleda partially-ordered semiring.This abstract structure captures the operations and properties that the type system needs forresource accounting. Because we are working abstractly, we are limited to exactly these assump-tions. In practice, it means our design is applicable to settings beyond the simple use of naturalnumbers to count resources. Looking ahead, there are a few semirings that we are interested in. The trivial semiring has asingle element, and all operations just return that element. Our type system, when specialized tothis semiring, degenerates to the usual form of types as the quantities are uninformative.The boolean semiring has two elements, 0 and 1, with the property that + = . A type systemdrawing quantities from this semiring distinguishes between variables that are used (marked withone) and ones that are unused (marked with zero). In such a system, the quantity does not cor-respond to a linear usage: this system does not count usage, but instead checks whether a variableis used or not.There are two diﬀerent partial orders that make sense for the boolean semiring. If we use thereﬂexive relation, then this type system tracks relevance precisely. If a variable is marked with inthe context, then we know that the variable must not be used at runtime, and if it is marked with ,then we know that it must be used. On the other hand, if the partial ordering declares that ≤ ,then we still can determine that -marked variables are unused, but we do not know anythingabout the usage of -marked variables.The linearity semiring has three elements, 0, 1 and 𝜔 , where addition and multiplication aredeﬁned in the usual way after interpreting 𝜔 as “greater than ”. So, we have + = 𝜔 , 𝜔 + = 𝜔 ,and 𝜔 · 𝜔 = 𝜔 . A system using the linearity semiring tracks linearity by marking linear variableswith 1 and unrestricted variables with 𝜔 . A suitable ordering in this semiring is the reﬂexive closureof {( , 𝜔 ) , ( , 𝜔 )} . We do not want ≤ , since then we would not be able to guarantee that linearvariables in the context are used exactly once. This semiring is the one that makes the most sensefor Haskell as it integrates linearity (1) with irrelevance (0) and unrestricted usage ( 𝜔 ).The ﬁve-point linearity semiring has ﬁve elements, 0, 1, Aﬀ, Rel and 𝜔 , where addition andmultiplication are deﬁned in the usual way after interpreting Aﬀ as “1 or less", Rel as “1 or more",and 𝜔 as unrestricted. An ordering reﬂecting this interpretation is the reﬂexive transitive closure of {( , Aﬀ )) , ( , Aﬀ ) , ( , Rel ) , ( Aﬀ , 𝜔 ) , ( Rel , 𝜔 )} . This semiring can be used to track irrelevant, linear,aﬃne, relevant, and unrestricted usage.A security semiring is based on a lattice of security levels, with increasing order representing de-creasing security. The + and · correspond to the join and meet operations of the lattice respectively.The partial order corresponds to the lattice order and and are the Private and

Public securitylevels respectively.

Public can never be as or more secure than

Private , i.e.

Public (cid:2)

Private . Thislattice may include additional elements besides

Private and

Public , corresponding to multiple lev-els of secrecy. As Abel and Bernardy [2020] describe, security type systems deﬁned in this waydiﬀer from the usual convention (such as Abadi et al. [1999]) in that security levels are relative to1, the level of the program under execution. Grammar)types A , B :: = Unit | 𝑞 A → B | ✷ 𝑞 A | A ⊗ B | A ⊕ Bterms a , b :: = x | 𝜆 x : 𝑞 A . a | a b | unit | let unit = a in b | box 𝑞 a | let box x = a in b | ( a , b ) | let ( x , y ) = a in b | inj a | inj a | case 𝑞 a of b ; b usage contexts Γ :: = ∅ | Γ , x : 𝑞 Acontexts Δ :: = ∅ | Δ , x : Atyping judgement Δ ; Γ ⊢ a : A Fig. 1. The simply typed graded 𝜆 -calculus Many other examples of semirings are possible. Orchard et al. [2019] and Abel and Bernardy[2020] include comprehensive lists of several other applications including a type system for diﬀer-ential privacy [Reed and Pierce 2010] and a type system that tracks covariant/contravariant use ofassumptions.Partially-ordered semirings have been used to track resource usage in many type systems[Abel and Bernardy 2020; Atkey 2018; Brunel et al. 2014; Gaboardi et al. 2016; Ghica and Smith2014; McBride 2016; Orchard et al. 2019; Petricek et al. 2014] but there are some variations withrespect to the formal requirements. For example: Brunel et al. [2014] require the underlying set(of the semiring) along with the order to form a bounded sup-semilattice while Abel and Bernardy[2020] deﬁne the order using an additional meet operation on the underlying set; McBride [2016]uses a hemiring (a semiring without ) while Atkey [2018] uses a semiring where zero-usage sat-isﬁes a certain condition. Our theory is parametrized by a partially ordered semiring as deﬁned inSection 3.1. We add additional constraints as required only while deriving speciﬁc properties inSection 6. Our goal is to design a dependent usage-aware type system. But, for simplicity, we start with asimply-typed usage-aware system similar to the system of Petricek et al. [2014]. The grammar forthis system appear in Figure 1 on page 5. It is parametrized over an arbitrary partially-orderedsemiring ( 𝑄, , · , , + , ≤) with grades 𝑞 ∈ 𝑄 .The typing judgement for this system has the form Δ ; Γ ⊢ a : A ; the rules appear inline below.This judgement includes both a standard typing context Δ and a usage context Γ , a copy of thetyping context annotated with grades. For brevity in examples, we often elide the standard typingcontext as the information is subsumed by the usage context. Indeed, in any derivation, the typingcontext and the usage context correspond: Notation 4.1. • The notation ⌊ Γ ⌋ denotes a typing context Δ same as Γ , but with no grades. • The notation Γ denotes the vector of grades in Γ . • The notation Δ ⊢ Γ denotes that Δ = ⌊ Γ ⌋ . Lemma 4.2 (Typing context correspondence). If Δ ; Γ ⊢ a : A, then Δ ⊢ Γ . This style of including both a plain typing context Δ and its usage counterpart Γ in the judge-ment is merely for convenience; it allows us to easily tell when two usage contexts diﬀer only n their quantities. There are many alternative ways to express the same information in the typesystem: we could have only one usage context Γ and add constraints, or have a typing context Δ and a separate vector of quantities. We are now ready to start our tour of the typing rules of this system.

Variables.

ST-Var x ∉ dom Δ Δ ⊢ Γ ( Δ , x : A ) ; ( · Γ , x : A ) ⊢ x : A ST-Weak x ∉ dom ΔΔ ; Γ ⊢ a : B Δ , x : A ; Γ , x : A ⊢ a : B We see here that a variable x has type A if it has type A in the context—that part is unsurprising.However, as is typical in this style of systems, the context is extended to include · Γ : this notationmeans that all variables in Γ must have a quantity of 0. Notation 4.3 (Context scaling).

The notation 𝑞 · Γ denotes a context Γ ′ such that, for eachx : 𝑟 A ∈ Γ , we have x : 𝑞 · 𝑟 A ∈ Γ ′ . The rule

ST-Var states that all variables other than x are not used in the expression x , that iswhy their quantity is zero. Note also that x : A occurs last in the context. If we wish to use avariable that is not the last item in the context, the rule ST-Weak allows us to remove (readingfrom bottom to top) zero-usage variables at the end of a context.

Sub-usage.

ST-Sub Δ ; Γ ⊢ a : A Γ ≤ Γ Δ ; Γ ⊢ a : A We may allow our contexts to provide more resources than is necessary. Sub-usaging, as it iscommonly referred to, allows us to assume more resources in our context than are necessary.

Notation 4.4 (Context sub-usage).

The notation Γ ≤ Γ means ⌊ Γ ⌋ = ⌊ Γ ⌋ where, for everycorresponding pair of assumptions x : 𝑞 A ∈ Γ and x : 𝑞 A ∈ Γ , the condition 𝑞 ≤ 𝑞 holds.Functions. ST-Lam Δ , x : A ; Γ , x : 𝑞 A ⊢ a : B Δ ; Γ ⊢ 𝜆 x : 𝑞 A . a : ( 𝑞 A → B ) ST-App Δ ; Γ ⊢ a : ( 𝑞 A → B ) Δ ; Γ ⊢ b : A Δ ; Γ + 𝑞 · Γ ⊢ a b : B Any quantitative type system must be careful around expressions that contain multiple sub-expressions. Function application is a prime example, so we examine rule

ST-App next. In this rule,we see that the function a has type 𝑞 A → B , meaning that it uses its argument, of type A , 𝑞 timesto produce a result of type B . Accordingly, we must make sure that the argument expression b canbe used 𝑞 times. Put another way, we must multiply the usage required for b , as recorded in thetyping context Γ , by 𝑞 . We see this in the context used in the rule’s conclusion: Γ + 𝑞 · Γ .This introduces another piece of important notation: Notation 4.5 (Context addition).

Adding contexts Γ + Γ is deﬁned only when ⌊ Γ ⌋ = ⌊ Γ ⌋ .The result context Γ is obtained by point-wise addition of quantities; i.e. for every x : 𝑞 A ∈ Γ andx : 𝑞 A ∈ Γ , we have x : 𝑞 + 𝑞 A ∈ Γ . Our approach using two contexts Δ and Γ works nicely here. Because both premises to rule ST-App use the same Δ , we know that the required precondition of context addition is satisﬁed. The igh-level idea here is common in sub-structural type systems: whenever we use multiple sub-expressions within one expression, we must split the context. One part of the context checks onesub-expression, and the remainder checks other sub-expression(s). Example . Before considering the rest of the system, it is instructive tostep through an example involving a function that does not use its argument in the context of thelinearity semiring. We say that such arguments are irrelevant . Suppose that we have a function f ,of type B → A → A . (Just from this type, we can see that f must be constant in 𝐵 .) Suppose alsothat we want to apply this function to some variable x . In this case, deﬁne the usage contexts Γ = f : ( B → A → A ) Γ = Γ , x : B Γ = f : ( B → A → A ) , x : B and construct a typing derivation for the application: ST-App ST-Weak ST-Var ⌊ Γ ⌋ ; Γ ⊢ f : B → A → A ⌊ Γ ⌋ ; Γ ⊢ f : B → A → A ST-Var ⌊ Γ ⌋ ; Γ ⊢ x : B ⌊ Γ ⌋ ; Γ + · Γ ⊢ f x : A → A Working through the context expression Γ + · Γ , we see that the computed ﬁnal context,derived in the conclusion of the application rule is just Γ again. Although the variable x appearsfree in the expression f x , because it is the argument to a constant function here, this use does notcontribute to the overall result. Unit.

ST-Unit ∅ ; ∅ ⊢ unit : Unit

ST-UnitE Δ ; Γ ⊢ a : Unit Δ ; Γ ⊢ b : B Δ ; Γ + Γ ⊢ let unit = a in b : B The

Unit type has a single element, unit . To eliminate a term of this type, we just match it with unit . Since the elimination form requires the resources used for both the terms, we add the twocontexts in the conclusion of rule

ST-UnitE . The graded modal type.

ST-Box Δ ; Γ ⊢ a : A Δ ; 𝑞 · Γ ⊢ box 𝑞 a : ✷ 𝑞 A ST-LetBox Δ ; Γ ⊢ a : ✷ 𝑞 A Δ , x : A ; Γ , x : 𝑞 A ⊢ b : B Δ ; Γ + Γ ⊢ let box x = a in b : B The type ✷ 𝑞 A is called a graded modal type or usage modal type . It is introduced by the construct box 𝑞 a , which uses the expression 𝑞 times to build the box. This box can then be passed around asan entity. When unboxed (rule ST-LetBox ), the continuation has access to 𝑞 copies of the contents. Products.

ST-Pair Δ ; Γ ⊢ a : A Δ ; Γ ⊢ b : B Δ ; Γ + Γ ⊢ ( a , b ) : A ⊗ B ST-Spread Δ ; Γ ⊢ a : A ⊗ A Δ ; Γ , x : A , y : A ⊢ b : B Δ ; Γ + Γ ⊢ let ( x , y ) = a in b : B he type system includes (multiplicative) products, also known as tensor products. The twocomponents of these pairs do not share variable usages. Therefore the introduction rule adds thetwo contexts together. These products must be eliminated via pattern matching because both com-ponents must be used in the continuation. An elimination form that projects only one componentof the tuple would lose the usage constraints from the other component. Note that even thoughboth components of the tuple must be used exactly once, by nesting a modal type within the tuple,programmers can construct data structures with components of varying usage. Sums.

ST-Inj1 Δ ; Γ ⊢ a : A Δ ; Γ ⊢ inj a : A ⊕ A ST-Inj2 Δ ; Γ ⊢ a : A Δ ; Γ ⊢ inj a : A ⊕ A ST-Case ≤ 𝑞 Δ ; Γ ⊢ a : A ⊕ A Δ ; Γ ⊢ b : 𝑞 A → B Δ ; Γ ⊢ b : 𝑞 A → B Δ ; 𝑞 · Γ + Γ ⊢ case 𝑞 a of b ; b : B Last, the system includes (additive) sums and case analysis. The introduction rules for the ﬁrstand second injections are no diﬀerent from a standard type system. However, in the eliminationform, rule

ST-Case , the quantities used for the scrutinee can be diﬀerent than the quantities used(and shared by) the two branches. Furthermore, the case expression may be annotated with aquantity 𝑞 that indicates how many copies of the scrutinee may be demanded in the branches.Both branches of the case analysis must use the scrutinee at least once, as indicated by the ≤ 𝑞 constraint. For the language presented above, we deﬁne an entirely standard call-by-name reduction relation a ❀ a ′ , included in Appendix A.2. With this operational semantics, a syntactic proof of typesoundness follows in the usual manner, via the entirely standard progress and preservation lemmas.The substitution lemma that is part of this proof is of particular interest to us, as it must accountfor the number of times the substituted variable ( x , in our statement) is used when computing thecontexts used in the conclusion of the lemma: Lemma 4.7 (Substitution). If Δ ; Γ ⊢ a : A and Δ , x : A , Δ ; Γ , x : 𝑞 A , Γ ⊢ b : B, then Δ , Δ ; Γ + 𝑞 · Γ , Γ ⊢ b { a / x } : B. At this point, the language that we have developed recalls systems found in prior work, such asBrunel et al. [2014], Orchard et al. [2019], Wood and Atkey [2020] and Abel and Bernardy [2020].Most diﬀerences are cosmetic—especially in the treatment of usage contexts. Of these, the mostsimilar is the concurrently developed Abel and Bernardy [2020], which we compare below. • First, Abel and Bernardy [2020] include a slightly more expressive form of pattern matching.Their elimination forms for the box modality and products multiply each scrutinee by aquantity 𝑞 , providing that many copies of its subcomponents to the continuation, as in ourrule ST-Case . For simplicity, we have omitted this feature; it is not diﬃcult to add. • Second, Abel and Bernardy [2020] require that the semiring include least-upper bounds forthe partial order of the semiring. This allows them to compose case branches with diﬀeringusages. • Third, in the rule for case , like Abel and Bernardy [2020], we need the requirement that ≤ 𝑞 . In our system as well as theirs, it turns out that this requirement is not motivated y the standard type soundness theorem: the theorem holds without it. Their condition wasinstead motivated by their parametricity theorems. Our condition is motivated by the heapsoundness theorem that we present in the next section.The standard type soundness theorem is not very informative because it does not show that thequantities are correctly used. Therefore, to address this issue, we turn to a heap-based semantics,based on Launchbury [1993] and Turner and Wadler [1999], to account for resource usage duringcomputation. A heap semantics shows how a term evaluates when the free variables of the term are assignedother terms. The assignments are stored in a heap, represented here as an ordered list. We associatean allowed usage , basically an abstract quantity of resources, to each assignment. We change thesequantities as the evaluation progresses. For example, a typical call-by-name reduction goes likethis: [ 𝑥 ↦→ , 𝑦 ↦→ 𝑥 + 𝑥 ] ( 𝑥 + 𝑦 ) look up value of 𝑥 , decrement its usage ⇒ [ 𝑥 ↦→ , 𝑦 ↦→ 𝑥 + 𝑥 ] + 𝑦 look up value of 𝑦 , decrement its usage ⇒ [ 𝑥 ↦→ , 𝑦 ↦→ 𝑥 + 𝑥 ] + ( 𝑥 + 𝑥 ) look up value of 𝑥 , decrement its usage ⇒ [ 𝑥 ↦→ , 𝑦 ↦→ 𝑥 + 𝑥 ] + ( + 𝑥 ) look up value of 𝑥 , decrement its usage ⇒ [ 𝑥 ↦→ , 𝑦 ↦→ 𝑥 + 𝑥 ] + ( + ) addition step ⇒ [ 𝑥 ↦→ , 𝑦 ↦→ 𝑥 + 𝑥 ] The reduction above is expressed informally as a sequence of pairs of heap H and expression a .We formalize this relation using the following judgement, which appears in Fig 2. [ H ] a ⇒ 𝑟 S [ H ′ ; u ′ ; Γ ′ ] a ′ The meaning of this relation is that 𝑟 copies of the term 𝑎 use the resources of the heap 𝐻 andstep to 𝑟 copies of the term 𝑎 ′ , with 𝐻 ′ being the new heap. The relation also maintains additionalinformation, which we explain below.Heap assignments are of the form x 𝑞 ↦→ Γ ⊢ a : A , associating an assignee variable with its allowedusage 𝑞 and assignment 𝑎 . The embedded context Γ and type A help in the proof of our soundnesstheorem (5.11). For a heap 𝐻 , we use ⌊ 𝐻 ⌋ to represent 𝐻 excluding the allowed usages and T 𝐻 U torepresent just the list of underlying assignments. We call ⌊ 𝐻 ⌋ and T 𝐻 U the erased and bare viewsof 𝐻 respectively. For example, for 𝐻 = [ x 𝑞 ↦→ Γ ⊢ a : A ] , the erased view ⌊ 𝐻 ⌋ = [ 𝑥 ↦→ Γ ⊢ a : A ] and the bare view T 𝐻 U = [ 𝑥 ↦→ 𝑎 ] and Γ is the embedded context. The vector of allowed usagesof the variables in 𝐻 is denoted by H .Because we use a call-by-name reduction, we don’t evaluate the terms in the heap; we just mod-ify the quantities associated with the assignments as they are retrieved. Therefore, after any step, H ′ will contain all the previous assignments of H , possibly with diﬀerent usages. Furthermore, abeta-reduction step may also add new assignments to 𝐻 . To allocate new variable names appropri-ately, we need a support set S in this relation; fresh names are chosen avoiding the variables in thisset. We keep track of these new variables that are added to the heap along with the allowed usages We don’t have

Int type and + function in our language, but we use them for the sake of explanation. H ] a ⇒ 𝑟 S [ H ′ ; u ′ ; Γ ′ ] a ′ (Small-step reduction relation (excerpt)) Small-Var ≤ 𝑟 [ H , x ( 𝑞 + 𝑟 ) ↦→ Γ ⊢ a : A , H ] x ⇒ 𝑟 S [ H , x 𝑞 ↦→ Γ ⊢ a : A , H ; | H | ⋄ 𝑟 ⋄ | H | ; ∅] a Small-AppL [ H ] a ⇒ 𝑟 S ∪ fv b [ H ′ ; u ′ ; Γ ] a ′ [ H ] a b ⇒ 𝑟 S [ H ′ ; u ′ ; Γ ] a ′ b Small-AppBeta x ∉ Var H ∪ fv b ∪ fv a − { y } ∪ Sa ′ = a { x / y }[ H ] ( 𝜆 y : 𝑞 A . a ) b ⇒ 𝑟 S [ H , x 𝑟 · 𝑞 ↦→ Γ ⊢ b : A ; | H | ⋄ x : 𝑟 · 𝑞 A ] a ′ Small-CaseL [ H ] a ⇒ 𝑟 · 𝑞 S ∪ fv b ∪ fv b [ H ′ ; u ′ ; Γ ] a ′ [ H ] case 𝑞 a of b ; b ⇒ 𝑟 S [ H ′ ; u ′ ; Γ ] case 𝑞 a ′ of b ; b Small-Case1 [ H ] case 𝑞 ( inj a ) of b ; b ⇒ 𝑟 S [ H ; | H | ; ∅] b a Small-Case2 [ H ] case 𝑞 ( inj a ) of b ; b ⇒ 𝑟 S [ H ; | H | ; ∅] b a Small-Sub [ H ] a ⇒ 𝑟 S [ H ′ ; u ′ ; Γ ] a ′ H ≤ H [ H ] a ⇒ 𝑟 S [ H ′ ; u ′ ; Γ ] a ′ Fig. 2. Heap semantics (excerpt)

Notation 5.1.

The notation n denotes a vector of ’s of length 𝑛 . The notation u ⋄ u denotes con-catenation. Here fv a stands for the free variables of 𝑎 while Var

H stands for the domain of 𝐻 andthe free variables of the terms appearing in the assignments of 𝐻 . of their assignments using the added context Γ ′ . Therefore, after a step [ H ] a ⇒ 𝑟 S [ H ′ ; u ′ ; Γ ′ ] a ′ ,the length of H ′ is the sum of the lengths of H and Γ ′ .Now, because we work with an arbitrary semiring (possibly without subtraction), this heapsemantics is non-deterministic. For example, consider a step [ 𝑥 𝑞 ↦→ 𝑎 ] 𝑥 ⇒ [ 𝑥 𝑞 ′ ↦→ 𝑎 ] 𝑎 , where 𝑞 = 𝑞 ′ + . Here, we are using 𝑥 once, so we need to reduce its usage by . But in an arbitrarysemiring, there may exist multiple new quantities, 𝑞 ′′ ≠ 𝑞 ′ , such that 𝑞 = 𝑞 ′ + = 𝑞 ′′ + . Forexample, in the linearity semiring, we have 𝜔 = + = 𝜔 + . In this case, [ 𝑥 𝜔 ↦→ 𝑎 ] 𝑥 ⇒ [ 𝑥 ↦→ 𝑎 ] 𝑎 and [ 𝑥 𝜔 ↦→ 𝑎 ] 𝑥 ⇒ [ 𝑥 𝜔 ↦→ 𝑎 ] 𝑎 .The absence of subtraction also means that given an initial heap and a ﬁnal heap, we reallydon’t know how much resources have been used by the computation. The only way to know thisis to keep track of resources while they are being used. The amount of resources used up can beexpressed as a quantity vector u ′ called consumption vector , with its components showing usageat the corresponding variables in H ′ . (The length of u ′ will always be the same as H ′ .) Instead of full contexts Γ ′ , we could have just used a list of variable/usage pairs here; but we pass dummy types alongwith them for ease of presentation later. inally, owing to the presence of case expressions that can use the scrutinee more than once,we need to be able to evaluate several copies of the scrutinee in parallel before passing them on tothe appropriate branch. So we step 𝑟 copies of a term 𝑎 in parallel to get 𝑟 copies of a ′ . We call 𝑟 the copy quantity of the step. For the most part, we shall be interested in copy quantity of . Notation 5.2.

We use [ H ] a ⇒ S [ H ′ ; u ′ ; Γ ′ ] a ′ to denote [ H ] a ⇒ S [ H ′ ; u ′ ; Γ ′ ] a ′ . Figure 2 contains an excerpt of the reduction relation. They mirror the ordinary small-step rules,but there are some crucial diﬀerences. For example, this relation includes rule

Small-Var thatallows a variable look-up, provided its usage permits. The look-up consumes the copy quantityfrom the allowed usage. But the copy quantity cannot be arbitrary here – we are consuming theresource at least once. So we restrict the copy quantity to be or more. This is the only rule thatmodiﬁes the usage of an existing variable in the heap.In this relation, rule AppBeta loads new assignment into the heap instead of immediate substi-tution. The substitution happens in steps through variable look-ups. To avoid conﬂict, we choosenew variables excluding the ones already in use. Since we are evaluating 𝑟 copies, we set the al-lowed usage of the variable to 𝑟 · 𝑞 where 𝑞 is the usage annotation on the term.Let us look at rule Small-CaseL . This is interesting since the copy quantity in the premise andthe conclusion are diﬀerent. In fact, we introduced copy quantity to properly handle usages whileevaluating case expressions. For evaluating 𝑟 copies of the case expression, we need to evaluate 𝑟 · 𝑞 copies of the scrutinee since the scrutinee gets used 𝑞 times in either branch.The rule Small-Sub reduces the allowed usages in the heap and then lets the term take a step.Here, we use H ≤ H to mean ⌊ H ⌋ = ⌊ H ⌋ and for corresponding pair of assignments x 𝑞 ↦→ Γ ⊢ a : A and x 𝑞 ↦→ Γ ⊢ a : A in H and H respectively, the condition 𝑞 ≤ 𝑞 holds.The multi-step reduction relation is the transitive closure of the single-step relation. In rule Multi-Many ,the consumption vectors from the steps are added up and the added contexts of new variables areconcatenated. The copy quantity is the same in both the premises and the conclusion since therule represents parallel multi-step evaluation of 𝑟 copies. [ H ] a ⇒⇒ 𝑟 S [ H ′ ; u ′ ; Γ ] b (Multi-Step relation) Multi-One [ H ] a ⇒ 𝑟 S [ H ′ ; u ′ ; Γ ] b [ H ] a ⇒⇒ 𝑟 S [ H ′ ; u ′ ; Γ ] b Multi-Many [ H ] a ⇒ 𝑟 S [ H ′ ; u ′ ; Γ ] b [ H ′ ] b ⇒⇒ 𝑟 S [ H ′′ ; u ′′ ; Γ ] b [ H ] a ⇒⇒ 𝑟 S [ H ′′ ; u ′ ⋄ | Γ | + u ′′ ; Γ , Γ ] b The reduction relation enforces fair usage of resources, leading to the following theorem.

Theorem 5.3 (

Conservation). If [ H ] a ⇒⇒ 𝑟 S [ H ′ ; u ′ ; Γ ′ ] a ′ , then H ′ + u ′ ≤ H ⋄ Γ ′ . Here, H represents the initial resources and Γ ′ represents the newly added resources; whereas H ′ represents the resources left and u ′ the resources that were consumed. So the theorem saysthat the initial resources concatenated with those that are added during evaluation, are equal toor more than the remaining resources plus those that were used up. Note that if the partial orderis the trivial reﬂexive order, ≤ becomes an equality. In such a scenario, the reduction relation en-forces strict conservation of resources. More generally, this theorem states that we don’t use more esources than what we are entitled to.Unlike the substitution-based semantics, in this heap semantics, terms can “get stuck” due tolack of resources. Let us look at the following evaluation: [ 𝑥 ↦→ , 𝑦 ↦→ 𝑥 + 𝑥 ] ( 𝑥 + 𝑦 ) look up value of 𝑥 , decrement its usage ⇒ [ 𝑥 ↦→ , 𝑦 ↦→ 𝑥 + 𝑥 ] + 𝑦 look up value of 𝑦 , decrement its usage ⇒ [ 𝑥 ↦→ , 𝑦 ↦→ 𝑥 + 𝑥 ] + ( 𝑥 + 𝑥 ) look up value of 𝑥 , decrement its usage ⇒ [ 𝑥 ↦→ , 𝑦 ↦→ 𝑥 + 𝑥 ] + ( + 𝑥 ) look up value of 𝑥 , stuck! The evaluation gets stuck because the starting heap does not contain enough resources for theevaluation of the term. The term needs to use 𝑥 thrice; whereas the heap contains only two copiesof 𝑥 .But this is not the only way in which an evaluation can run out of resources. Such a situationmay also happen through “unwise usage”, even when the starting heap contains enough resources.For example, over the linearity semiring, the evaluation: [ 𝑥 𝜔 ↦→ ] 𝑥 +( 𝑥 + 𝑥 ) ⇒ [ 𝑥 ↦→ ] +( 𝑥 + 𝑥 ) ⇒[ 𝑥 ↦→ ] + ( + 𝑥 ) gets stuck because in the ﬁrst step, 𝜔 was “unwisely” split as + instead ofbeing split as 𝜔 + .Our aim, then, is to show that given a heap that contains enough resources, a well-typed termthat is not a value, can always take a step such that the resulting heap contains enough resourcesfor the evaluation of the resulting term. We shall formalize what it means for a heap to containenough resources. But before that, let us explore the relationship between the various possiblesteps a term can take when provided with a heap. Earlier, we pointed out that the step relation is non-deterministic. But on a closer look, we ﬁndthat the non-determinism is limited more or less to the usages. If a term steps in two diﬀerentways when provided with a heap, the resulting terms are the same; the resulting heaps, though,may have diﬀerent allowed usage vectors. Here, we formulate a precise version of this statement.A term, when provided with a heap, can step either by looking up a variable or by adding anew assignment. Now, if the heap does not contain duplicate assignments for the same variable,look-up will always produce the same result. We call such heaps proper . Note that the reductionrelation maintains this property of heaps. So hereafter, we restrict our attention to proper heaps.Next, if a term steps by adding a new assignment, we may choose diﬀerent fresh variables leadingto diﬀerent resulting terms. But such a diﬀerence is reconcilable. Viewed as closures, such heapterm pairs are 𝛼 -equivalent.Given a heap 𝐻 and a term 𝑎 , let us call ( T 𝐻 U , 𝑎 ) a machine conﬁguration . Two heap term pairsare 𝛼 -equivalent if the corresponding machine conﬁgurations are identical up to systematic re-naming of assignee variables. We denote 𝛼 -equivalence by ∼ 𝛼 .The step relation, then, is deterministic in the following sense: Lemma 5.4 (Determinism). If [ H ] a ⇒ 𝑟 S [ H ′ ; u ′ ; Γ ′ ] a ′ and [ H ] a ⇒ 𝑟 S [ H ′ ; u ′ ; Γ ′ ] a ′ and ( H , a ) ∼ 𝛼 ( H , a ) , then ( H ′ , a ′ ) ∼ 𝛼 ( H ′ , a ′ ) . The lemma above says that, if a term, when provided with a heap, takes a step in two diﬀerentways, then the resulting heap term pairs are basically the same. We know that the ordinary small-step semantics is deterministic. The inclusion of allowed usages in the heap semantics is not to roduce multiple reducts but just to block evaluation at the point where consumption reaches itspermitted limit. This is an important point and needs elaboration.Call a reduction consisting of 𝑛 steps an 𝑛 -chain reduction. Also, for a reduction [ H ] a ⇒ 𝑟 S [ H ′ ; u ′ ; Γ ′ ] a ′ , call [ T 𝐻 U ] 𝑎 ⇒ [ T 𝐻 ′ U ] 𝑎 ′ the machine view of reduction. Then, the machine viewof every 𝑛 -chain reduction of a term in a heap is the same, modulo 𝛼 -equivalence. So, if there exists an 𝑛 -chain reduction of 𝑎 to 𝑎 ′ , starting with heap 𝐻 , we know that there is a way by which 𝑎 canreduce to 𝑎 ′ without running out of resources, implying the validity of the reduction. In such ascenario, we may as well forget all the usage annotations and evaluate 𝑎 for 𝑛 steps starting with T 𝐻 U . By the above lemma, such an evaluation in this machine environment is deterministic andhence unique. The reduced heap term pair that we get is the same (modulo 𝛼 -equivalence). Alongwith the soundness theorem, this shall give us a deterministic reduction strategy that is correct.Now that we see the equivalence of all the possible reducts, we explore its relation with theordinary small-step reduct. The ordinary and the heap-based reduction relations are bisimilar in a way we make precise below.To compare, we need to deﬁne some terms. We call a heap acyclic iﬀ the term assigned to a variabledoes not refer to itself or to any other variable appearing subsequently in the heap. Note that thereduction relation preserves acyclicity. Hereafter, we restrict our attention to proper, acyclic heaps.Now, for a heap 𝐻 , deﬁne 𝑎 { 𝐻 } as the term obtained by substituting in 𝑎 , in reverse order, thecorresponding terms for the variables in the heap. Then we have the following lemmas: Lemma 5.5. If [ H ] a ⇒ 𝑟 S [ H ′ ; u ′ ; Γ ′ ] a ′ , then a { H } = a ′ { H ′ } or a { H } ❀ a ′ { H ′ } . Further, if [∅] a ⇒ 𝑟 S [ H ′ ; u ′ ; Γ ′ ] a ′ , then a ❀ a ′ { H ′ } . Lemma 5.6.

If a ❀ a , then for a heap 𝐻 , we have [ H ] a ⇒ 𝑟 S [ H , H ′ ; u ; Γ ] a where a { H ′ } = a . The heap reduction relation splits the ordinary 𝛽 -reduction rules into an assignment additionrule and a variable look-up rule. This enables the heap-based rules to substitute one occurrence ofa variable at a time while the ordinary 𝛽 -rules substitute all occurrences of a variable at once. If weperform substitution immediately after loading a new assignment to the heap, then the heap-basedrules and the ordinary step rules are essentially the same. The above lemmas formalize this idea.The heap-based rules substitute one occurrence of a variable at a time and keep track of usageand obstruct unfair usage. With this constraint in place, we ought to know how much resourcesshall be necessary before evaluating a term. This will tell us how much resources the starting heapshould contain. The type system helps us know this as we see next. The key idea behind this language design is that, if the resources contained in a heap are judgedto be “right” for a term by the type system, the evaluation of the term in such a heap does not getstuck. With the heap-based reduction rules enforcing fairness of usage, this would mean that thetype system does a proper accounting of the resource usage of terms.The compatibility relation H ⊢ Δ ; Γ presented below expresses the judgement that the heap 𝐻 contains enough resources to evaluate any term that type-checks in the usage context Γ . A heapthat is compatible with some context is called a well-formed heap. ⊢ Δ ; Γ (Heap Compatibility) Compat-Empty ∅ ⊢ ∅ ; ∅ Compat-Cons H ⊢ Δ ; Γ + ( 𝑞 · Γ ) Δ ; Γ ⊢ a : Ax ∉ dom HH , x 𝑞 ↦→ Γ ⊢ a : A ⊢ Δ , x : A ; Γ , x : 𝑞 A The rule

Compat-Cons rule reminds us of the substitution lemma 4.7. In a way, this rule isconverse of the substitution lemma. It loads 𝑞 potential single-substitutions into the heap and letsthe context use the variable 𝑞 times. Example . Consider the following derivation: ∅ ⊢ ∅ ∅ ⊢ Int x ↦→ ⊢ x : Int x : Int ⊢ x + x : Int x ↦→ , x ↦→ x + x ⊢ x : Int , x : Int x : Int , x : Int ⊢ x + ( x + x ) : Int x ↦→ , x ↦→ x + x , x ↦→ x + ( x + x ) ⊢ x : Int , x : Int , x : Int

The context x : Int splits its resources amongst derivations of x = x + x (thrice) and x = x + ( x + x ) (once). The heap keeps a record, in the form of allowed usages, of how the contextgets split. A heap compatible with a context, therefore, satisﬁes the resource demands of a termderived in this context.We pointed out earlier that the rule Compat-Cons is like a converse substitution lemma. Thefollowing lemma formalizes this idea:

Lemma 5.8 (Multi-substitution).

If H ⊢ Δ ; Γ and Δ ; Γ ⊢ a : A, then ∅ ; ∅ ⊢ a { H } : A. Because rule

Compat-Cons leads to expansion (or reverse substitution), we can re-substitutewhile maintaining well-typedness. The compatibility relation is crucial to our development. So weexplore it in more detail below.

A heap can be viewed as a memory graph where the assignee variables correspond to memorylocations and the assigned terms to data stored in those locations. The allowed usage then, is thenumber of ways the location can be referenced. This gives us a graphical view of the heap.A heap 𝐻 where H ⊢ Δ ; Γ can be viewed as a weighted directed acyclic graph 𝐺 𝐻, Γ . Let 𝐻 contain 𝑛 assignments with the 𝑗 th one being x j 𝑞 j ↦→ Γ j ⊢ a j : A j . Then, 𝐺 𝐻, Γ is a DAG with ( 𝑛 + ) nodes, 𝑛 nodes corresponding to the 𝑛 variables in 𝐻 and one extra node for Γ , referred to as the sourcenode. Let 𝑣 𝑗 be the node corresponding to 𝑥 𝑗 and 𝑣 𝑔 be the source node. For 𝑥 𝑖 : 𝑞 𝑗𝑖 𝐴 𝑖 ∈ Γ 𝑗 (where 𝑖 < 𝑗 ) add an edge with weight 𝑤 ( 𝑣 𝑗 , 𝑣 𝑖 ) : = 𝑞 𝑗𝑖 from 𝑣 𝑗 to 𝑣 𝑖 . (Note that Γ 𝑗 only contains variables x through 𝑥 𝑗 − .) We do this for all nodes, including 𝑣 𝑔 . This gives us a DAG with the topologicalordering 𝑣 𝑔 , 𝑣 𝑛 , 𝑣 𝑛 − , . . . , 𝑣 , 𝑣 .For example 5.7, we have the following memory graph : For simplicity, we omit the Δ s from the compatibility and the typing judgements. We omit the weight edge from 𝑣 𝑔 to 𝑣 . 𝑔 𝑣 𝑣 𝑣 For a heap compatible with some context, we can express the allowed usages of the assigneevariables in terms of the edge weights of the memory graph. Let us deﬁne the length of a path tobe the product of the weights along the path. Then, the allowed usage of a variable is the sum ofthe lengths of all paths from the source node to the node corresponding to that variable. Note thatthis is so for the example graph.A path 𝑝 from 𝑣 𝑔 to 𝑣 𝑗 represents a chain of references, with the last one being pointed at 𝑣 𝑗 .The length of 𝑝 shows how many times this path is used to reference 𝑣 𝑗 . The sum of the lengthsof all the paths from 𝑣 𝑔 to 𝑣 𝑗 then gives a (static) count of the total number of times location 𝑣 𝑗 isreferenced. And this is equal to 𝑞 𝑗 , the allowed usage of the assignment for 𝑣 𝑗 in the heap. Thismeans that the allowed usage of an assignment is equal to the (static) count of the number of timesthe concerned location is referenced. So, we also call 𝑞 𝑗 the count of 𝑣 𝑗 and call this property countbalance. Below, we present an algebraic formalization of this property of well-formed heaps. Notation 5.9.

We use to denote a row vector of s of length 𝑛 (when 𝑛 is clear from the context)and use ⊺ to denote a column vector of s. For a well-formed heap 𝐻 containing 𝑛 assignments of the form x i 𝑞 i ↦→ Γ i ⊢ a i : A i , we write h 𝐻 i to denote the 𝑛 × 𝑛 matrix whose 𝑖 th row is Γ i ⋄ . We call h 𝐻 i the transformation matrix corre-sponding to 𝐻 . The transformation matrix for example 5.7 is: © « ª®¬ For a well-formed heap 𝐻 , the matrix h 𝐻 i is strictly lower triangular. Note that this is also theadjacency matrix of the memory graph, excluding node 𝑣 𝑔 . The strict lower triangular property ofthe matrix corresponds to the acyclicity of the graph. With the matrix operations over a semiringdeﬁned in the usual way, the count balance property is: Lemma 5.10 (Count Balance).

If H ⊢ Δ ; Γ , then H = H × h 𝐻 i + Γ . Proof.

We show this by induction on H ⊢ Δ ; Γ . The base case is trivial.For the Cons-case, let H ′ , x 𝑞 ↦→ Γ ⊢ a : A ⊢ Δ ′ , x : A ; Γ , x : 𝑞 A where H ′ ⊢ Δ ′ ; Γ + ( 𝑞 · Γ ) . Byinductive hypothesis, H ′ = H ′ ×h 𝐻 ′ i+ Γ + ( 𝑞 · Γ ) . Therefore, H ′ ⋄ 𝑞 = H ′ ⋄ 𝑞 × (cid:0) h 𝐻 ′ i ⊺ Γ (cid:1) + Γ ⋄ 𝑞 . (cid:3) For example 5.7, we can check that H = (cid:0) (cid:1) satisﬁes the above equation. Let us under-stand this equation. For a node 𝑣 𝑖 in 𝐺 𝐻, Γ , we can express the count 𝑞 𝑖 in terms of the counts of theincoming neighbours and the weights of the corresponding edges. We have, 𝑞 𝑖 = Σ 𝑗 𝑞 𝑗 𝑤 ( 𝑣 𝑗 , 𝑣 𝑖 ) + 𝑤 ( 𝑣 𝑔 , 𝑣 𝑖 ) . The right-hand side of this equation represents static estimate of demand, the amount ofresources we shall need while the left-hand side represents static estimate of supply, the amount ofresources we shall have. So H ⊢ Δ ; Γ is a static guarantee that the heap 𝐻 shall supply the resourcedemands of the context Γ .Therefore, if H ⊢ Δ ; Γ and Δ ; Γ ⊢ a : A , we should be able to evaluate 𝑎 in 𝐻 without runningout of resources. This is the gist of the soundness theorem. .8 Soundness Theorem 5.11 (Soundness).

If H ⊢ Δ ; Γ and Δ ; Γ ⊢ a : A and 𝑆 ⊇ dom Δ , then either 𝑎 is avalue or there exists Γ ′ , H ′ , u ′ , Γ such that: • [ H ] a ⇒ S [ H ′ ; u ′ ; Γ ] a ′ • H ′ ⊢ Δ , ⌊ Γ ⌋ ; Γ ′ • Δ , ⌊ Γ ⌋ ; Γ ′ ⊢ a ′ : A • Γ ′ + u ′ + ⋄ Γ × h 𝐻 ′ i ≤ Γ ⋄ + u ′ × h 𝐻 ′ i + ⋄ Γ The soundness theorem states that our computations can go forward with the available re-sources without ever getting stuck. Note that as the term 𝑎 steps to 𝑎 ′ , the typing context changesfrom Γ to Γ ′ . This is to be expected because during the step, resources from the heap may have beenconsumed or new resources may have been added. For example, [ 𝑥 ↦→ unit ] 𝑥 ⇒ [ 𝑥 ↦→ unit ] unit and x : Unit ⊢ x : Unit while x : Unit ⊢ unit : Unit . Though the typing context may change, thenew context, which type-checks the reduct, must be compatible with the new heap. This meansthat we can apply the soundness theorem again and again until we reach a value. At every stepof the evaluation, the dynamics of our language aligns perfectly with the statics of the language.Graphically, as the evaluation progresses, the weights in the memory graph change but the countbalance property is maintained.Furthermore, the old context and the new context are related according to the fourth clause ofthe theorem. For the moment being, let the partial order be the trivial reﬂexive order. Then, theequation stands as: Γ ′ + u ′ + ⋄ Γ × h 𝐻 ′ i = Γ ⋄ + u ′ × h 𝐻 ′ i + ⋄ Γ We can understand this equation through the following analogy. The contexts can be seen engagedin a transaction with the heap. The heap pays the context ⋄ Γ and gets ⋄ Γ × h 𝐻 ′ i resources inreturn. The context pays the heap u ′ and gets u ′ × h 𝐻 ′ i resources in return. The equation is the“balance sheet" of this transaction.For an arbitrary partial order, the transaction gets skewed in favour of the heap; meaning, thecontext gets less from the heap for what it pays. This is so because the heap contains more re-sources than is necessary; so it may “throw away” the extra resources.This soundness theorem subsumes ordinary type soundness. In fact, we can derive the ordinarypreservation and progress lemmas from this soundness theorem using bisimilarity of the two re-duction relations and the multi-substitution property. Corollary 5.12. If ∅ ; ∅ ⊢ a : A and a ❀ b, then ∅ ; ∅ ⊢ b : A. Proof.

Since a ❀ b , for any 𝑆 , we have, [∅] a ⇒ S [ H ′ ; u ′ ; Γ ′ ] b ′ such that b ′ { H ′ } = b , bylemma (5.6). Since ∅ ; ∅ ⊢ a : A and ∅ ⊢ ∅ ; ∅ and 𝑎 is not a value, we have 𝐻, Γ , Γ , 𝑄, 𝑎 ′ such that H ⊢ ⌊ Γ ⌋ ; Γ and [∅] a ⇒ S [ H ; u ; Γ ] a ′ and ⌊ Γ ⌋ ; Γ ⊢ a ′ : A , by the soundness theorem.Now, since [∅] a ⇒ S [ H ′ ; u ′ ; Γ ′ ] b ′ and [∅] a ⇒ S [ H ; u ; Γ ] a ′ , determinism gives us b ′ { H ′ } = a ′ { H } . Since H ⊢ ⌊ Γ ⌋ ; Γ and ⌊ Γ ⌋ ; Γ ⊢ a ′ : A , by multi-substitution, we have, ∅ ; ∅ ⊢ a ′ { H } : A .But a ′ { H } = b ′ { H ′ } and b ′ { H ′ } = b . Therefore, ∅ ; ∅ ⊢ b : A . (cid:3) Corollary 5.13. If ∅ ; ∅ ⊢ a : A, then 𝑎 is a value or there exists b, such that a ❀ b. We present the proof for its dependent counterpart later. roof. Since ∅ ; ∅ ⊢ a : A and ∅ ⊢ ∅ ; ∅ , either 𝑎 is a value or there exists 𝐻, Γ , 𝑄, 𝑎 ′ such that [∅] a ⇒ S [ H ; u ; Γ ] a ′ , in which case, by lemma (5.5), a ❀ a ′ { H } . (cid:3) Next we apply the soundness theorem to derive some useful properties about usage.

Till now, we have developed our theory over an arbitrary partially-ordered semiring. But an arbi-trary semiring is too general a structure for deriving theorems we are interested in. For example,the set { , } with + = and all other operations deﬁned in the usual way is also a semiring.But such a semiring does not capture our notion of usage since is supposed to mean no usageand (whenever ≠ ) is supposed to mean some usage. For to mean no usage in a semiring 𝑄 , the equation 𝑞 + = must have no solution. We call an element 𝑞 ′ ∈ 𝑄 positive (respectively positive-or-more ) iﬀ 𝑞 ′ = 𝑞 + (respectively 𝑞 + ≤ 𝑞 ′ ) for some 𝑞 ∈ 𝑄 . The above conditionthen means that is not positive. If we also have a partial order, the constraint 𝑞 + ≤ must beunsatisﬁable; meaning should not be positive-or-more. We call this the zero-unusable criterion.Henceforth, we restrict our attention to semirings that meet this criterion. The following lemmasformalize the idea discussed here. Lemma 6.1.

In a zero-unusable semiring, if [ H ] a ⇒ S [ H ′ ; u ′ ; Γ ] a ′ and x i ↦→ Γ i ⊢ a i : A i ∈ 𝐻 ,then the component u ′ ( x i ) = and x i ↦→ Γ i ⊢ a i : A i ∈ 𝐻 ′ . We see above that locations with count cannot be referenced during computation. Also, thecount for such locations always remains . Now, if they cannot be referenced, what they containshould not matter. In other words, -graded variables do not aﬀect the result of computation. Twoinitial conﬁgurations that diﬀer only in the assignments of some -graded variables produce iden-tical results. This means that such assignments do not interfere with evaluation and are irrelevant. Lemma 6.2 (Zero non-interference).

Let H i = x i ↦→ Γ ⊢ a : A and H i = x i ↦→ Γ ⊢ a : A .Then, in a zero-unusable semiring, if [ H , H i , H ] b ⇒ S ∪ fv a [ H ′ , H i , H ′ ; u ′ ; Γ ] b ′ ,then [ H , H i , H ] b ⇒ S ∪ fv a [ H ′ , H i , H ′ ; u ′ ; Γ ] b ′ . Note that not just -graded resources may be unusable, any 𝑠 -graded resource for which theconstraint 𝑞 + ≤ 𝑠 is unsatisﬁable is unusable. With respect to the security semirings describedin Section 3.2, this means that data from any security level 𝑠 for which (cid:2) 𝑠 is unusable. Thismakes sense since the default view of the type system is or Public so that data judged to be moresecure (or incomparable) cannot be used at this level. This gives us the following lemma for theclass of security lattices described in Section 3.2:

Lemma 6.3 ( 𝑠 non-interference). Let (cid:2) 𝑠 in a security lattice. Let H i = x i 𝑠 ↦→ Γ ⊢ a : A and H i = x i 𝑠 ↦→ Γ ⊢ a : A . If [ H , H i , H ] b ⇒ S ∪ fv a [ H ′ , H i , H ′ ; u ′ ; Γ ] b ′ ,then [ H , H i , H ] b ⇒ S ∪ fv a [ H ′ , H i , H ′ ; u ′ ; Γ ] b ′ . Now let us look at locations with count in memory graphs. The sum of the lengths of all pathsfrom the source node to such a node must be . The zero-unusable criterion, along with the countbalance property, implies that none of these paths has positive-or-more length. This means thatall the edge-weights in any such path cannot be positive-or-more. he condition that is not positive-or-more is a weaker version of a well-known constraint puton semirings. If is a minimal element, a stronger constraint is zerosumfree . A semiring 𝑄 is saidto be zerosumfree if for any 𝑞 , 𝑞 ∈ 𝑄 , the equation 𝑞 + 𝑞 = implies 𝑞 = 𝑞 = . If we workwith a zerosumfree semiring with as a minimal element, we know that the length of any pathfrom the source node to a node with count is . But all the edge-weights along such a path maybe non-zero. This is so because the product of two non-zero elements may be . If we disallow this,then there is no path from the source node to such a node ( weight edges are omitted). Semiringswhich satisfy 𝑞 · 𝑞 = = ⇒ 𝑞 = or 𝑞 = are called entire . With these constraints on thesemiring, we have the following lemma: Lemma 6.4.

In a zerosumfree, entire semiring with as a minimal element, if H ⊢ Δ ; Γ and x i ↦→ Γ i ⊢ a i : A i ∈ 𝐻 , then 𝑣 𝑖 (the node corresponding to 𝑥 𝑖 ) belongs to an isolated subgraph (of 𝐺 𝐻, Γ ) thatdoes not contain the source node. The lemma above says that all the -count assignments lie in isolated islands disconnected fromthe line of computation. So at any point, it is safe to garbage collect all such assignments. Let us now look at linearity. Just having a in the semiring is not enough to capture a notionof linearity. For example, does not really represent linear usage in the boolean semiring since + = . If must mean linear usage, then it cannot be equal to or greater than the successor ofany quantity other than , where successor of 𝑞 is deﬁned as 𝑞 + . Formally, the pair of constraints: 𝑞 + ≤ and 𝑞 ≠ must have no solution. We call this the one-linear criterion. In semirings thatmeet the zero-unusable and one-linear criteria, represents single usage.Mirroring our discussion on -usage, we strengthen the one-linear criterion to derive a usefulproperty about nodes with a count of in memory graphs. Let us call semirings obeying thefollowing constraints linear: • 𝑞 + 𝑞 = = ⇒ 𝑞 = and 𝑞 = or 𝑞 = and 𝑞 = • 𝑞 · 𝑞 = = ⇒ 𝑞 = 𝑞 = For entire, zerosumfree, linear semirings with and as minimal elements, we have the follow-ing property: Lemma 6.5 (Quantitative single-pointer property).

If H ⊢ Δ ; Γ and x i ↦→ Γ i ⊢ a i : A i ∈ 𝐻 ,then in 𝐺 𝐻, Γ , there is a single path 𝑝 from the source node to 𝑣 𝑖 and all the weights on 𝑝 are . Further,for any node 𝑣 𝑗 on 𝑝 , the subpath is the only path from the source node to 𝑣 𝑗 . Along with the soundness theorem, this gives us a quantitative version of the single pointerproperty. In words, it means that there is one and only one way to reference a linear resource; anyresource along the way has a single pointer to it. This property would enable one to carry out safein-place update for linear resources.Now that we have explored a graded simple type system, we move on to dependent types. Our terminology follows Golan [1999]. The zerosumfree property is sometimes called “positive” and the entire property is sometimes called “zero-product”property. GRADED DEPENDENT TYPES

In this section we deﬁne

GraD , a language with graded dependent types. The syntax is presentedbelow.

Type and a single axiom

Type : Type . We annotate Barendregt’s system with quantities, as wellas add the unit type, sums and sigma types. Note that the rule

T-Convert uses the deﬁnitionalequivalence relation, which is essentially 𝛽 -equivalence. This relation is axiomatically speciﬁed inSection 9.1.The key idea of this design is that quantities only count the runtime usage of variables. In ajudgement Δ ; Γ ⊢ a : A , the quantities recorded in Γ should be derived only from the parts of a that are needed during computation. All other uses of these variables, whether in the type A , inirrelevant parts of a , or in types that appear later in the context, should not contribute to this count.This distinction is signiﬁcant because in a dependently-typed system terms may appear in types.As a result, the typing rules must ensure that both terms and types are well-formed during typechecking. Therefore, the type system must include premises of the form Δ ; Γ ⊢ A : Type , thathold when A is a well-formed type. But we don’t want to add this usage to the usage of the term.What this means for the type system is that any usage of a context to check an irrelevant com-ponent, like the type, should be multiplied by , just like the irrelevant argument in example 4.6.For example, in the rule for variables rule T-Var , any uses of the context Γ to check the type A are discarded (multiplied by 0) in the resulting derivation. Similarly, in the rule for weakeningrule T-Weak , we check that the type of the weakened variable is well-formed using some context Γ that is compatible with Γ (same Δ ). Since Γ + · Γ = Γ , usage Γ doesn’t appear in the conclusion ofthe rule. Many rules follow this pattern of checking types with some usage-unconstrained context,including Γ in rule T-convert and rule

T-lam , and Γ in rule T-UnitElim . This rule

T-UnitElim implements a form of dependent pattern matching. Here, the type of the branch can observe thatthe eliminated term a is equal to the pattern unit . To support this reﬁnement, the result type B must type check with a free variable y of Unit type. The other elimination rules, rule

T-SigmaElim and rule

T-SumElim , also follow this style of dependent pattern matching.

Irrelevant Quantiﬁcation.

Now consider the rule

T-pi . T-pi Δ ; Γ ⊢ A : Type Δ , x : A ; Γ , x : 𝑟 A ⊢ B : Type Δ ; Γ + Γ ⊢ Π x : 𝑞 A . B : Type This deﬁnition corresponds to 𝜆 ∗ , which is ‘inconsistent’ in the sense that all types are inhabited. However, this incon-sistency does not interfere with the syntactic properties of the system that we are interested in as a core for DependentHaskell. ; Γ ⊢ a : A (Typing rules for

GraD , a graded dependent type system)

T-sub Δ ; Γ ⊢ a : A Γ ≤ Γ Δ ; Γ ⊢ a : A T-weak x ∉ dom Δ Δ ; Γ ⊢ a : B Δ ; Γ ⊢ A : Type Δ , x : A ; Γ , x : A ⊢ a : B T- convert Δ ; Γ ⊢ a : A Δ ; Γ ⊢ B : Type A ≡ B Δ ; Γ ⊢ a : B T-type ∅ ; ∅ ⊢ Type : Type

T-var x ∉ dom ΔΔ ; Γ ⊢ A : Type Δ , x : A ; 0 · Γ , x : A ⊢ x : A T-Unit ∅ ; ∅ ⊢ Unit : Type

T-unit ∅ ; ∅ ⊢ unit : Unit

T-UnitElim Δ ; Γ ⊢ a : Unit Δ ; Γ ⊢ b : B { unit / y } Δ , y : Unit ; Γ , y : 𝑟 Unit ⊢ B : Type Δ ; Γ + Γ ⊢ let unit = a in b : B { a / y } T-pi Δ ; Γ ⊢ A : Type Δ , x : A ; Γ , x : 𝑟 A ⊢ B : Type Δ ; Γ + Γ ⊢ Π x : 𝑞 A . B : Type

T-lam Δ , x : A ; Γ , x : 𝑞 A ⊢ a : B Δ ; Γ ⊢ A : Type Δ ; Γ ⊢ 𝜆 x : 𝑞 A . a : Π x : 𝑞 A . B T-app Δ ; Γ ⊢ a : Π x : 𝑞 A . B Δ ; Γ ⊢ b : A Δ ; Γ + 𝑞 · Γ ⊢ a b : B { b / x } T-Sigma Δ ; Γ ⊢ A : Type Δ , x : A ; Γ , x : 𝑟 A ⊢ B : Type Δ ; Γ + Γ ⊢ Σ x : 𝑞 A . B : Type

T-Tensor Δ ; Γ ⊢ a : A Δ ; Γ ⊢ b : B { a / x } Δ , x : A ; Γ , x : 𝑟 A ⊢ B : Type Δ ; 𝑞 · Γ + Γ ⊢ ( a , b ) : Σ x : 𝑞 A . B T-SigmaElim Δ ; Γ ⊢ a : Σ x : 𝑞 A . A Δ , x : A , y : A ; Γ , x : 𝑞 A , y : A ⊢ b : B {( x , y )/ z } Δ , z : ( Σ x : 𝑞 A . A ) ; Γ , z : 𝑟 ( Σ x : 𝑞 A . A ) ⊢ B : Type Δ ; Γ + Γ ⊢ let ( x , y ) = a in b : B { a / z } T-sum Δ ; Γ ⊢ A : Type Δ ; Γ ⊢ A : Type Δ ; Γ + Γ ⊢ A ⊕ A : Type

T-inj1 Δ ; Γ ⊢ a : A Δ ; Γ ⊢ A : Type Δ ; Γ ⊢ inj a : A ⊕ A T-inj2 Δ ; Γ ⊢ a : A Δ ; Γ ⊢ A : Type Δ ; Γ ⊢ inj a : A ⊕ A T-SumElim ≤ 𝑞 Δ ; Γ ⊢ a : A ⊕ A Δ ; Γ ⊢ b : Π x : 𝑞 A . B { inj x / y } Δ ; Γ ⊢ b : Π x : 𝑞 A . B { inj x / y } Δ , y : A ⊕ A ; Γ , y : 𝑟 A ⊕ A ⊢ B : Type Δ ; 𝑞 · Γ + Γ ⊢ case 𝑞 a of b ; b : B { a / y } Fig. 3. Typing rules for dependent, quantitative type system n particular, note that the usage annotation on the type itself ( 𝑞 ) is diﬀerent from 𝑟 , which recordshow many times 𝑥 is used in 𝐵 . The annotation 𝑞 tracks the usage of the argument in the body ofa function with this type and this usage may have no relation to the usage of 𝑥 in the body of thetype itself. This diﬀerence between 𝑞 and 𝑟 allows GraD to represent parametric polymorphism bymarking type arguments with usage . For example, the analogue of the System F type ∀ 𝛼.𝛼 → 𝛼 ,can be expressed in this system as Π x : Type . x → x . This type is well-formed because, eventhough the annotation on the variable 𝑥 is 0, that rule allows 𝑥 to be used any number of times inthe body of the type.Some versions of irrelevant quantiﬁers in type theories constrain 𝑟 to be equal to 𝑞 [Abel and Scherer2012]. By coupling the usage of variables in the body of the lambda with the result type of the Π ,these systems rule out the representation of polymorphic types, such as the one shown above.Here, we can model this more restricted form of quantiﬁer with the assistance of the box modality.If, instead of using the type Π x : A . B , we use the type Π x : ✷ A . B , we can force the result type toalso make no (relevant) use of the argument within B . The box 𝑥 can be unboxed as many timesas desired, but each unboxing must be used exactly times.It is this distinction between the types Π x : A . B and Π x : ( ✷ A ) . B (and a similar distinctionbetween Σ x : A . B and Σ x : ( ✷ A ) . B ) that motivates our inclusion of the usage annotation on the Π and Σ types directly. In the simple type system, we can derive usage-annotated functions fromlinear functions and the box modality: there is no need to annotate the arrow with any quantityother than 1. But here, due to dependency, we cannot have parametrically polymorphic typeswithout this additional form. On the other hand, with the presence of usage-annotated Σ -types,we do not need to include the box modality. Instead, we can encode the type ✷ 𝑞 A using the non-dependent tensor Σ x : 𝑞 A . Unit . Thus, we can eliminate this special form from the language.

We have proven, in Coq, the following properties about the dependently-typed system.First, well-formed terms have well-formed types. However, the resources used by such types areunrelated to those of the terms.

Lemma 7.1 (Regularity). If Δ ; Γ ⊢ a : A then there exists some Γ ′ such that Δ ; Γ ′ ⊢ A : Type . Next, we generalize the substitution lemma for the simple version to this system, by propagatingit through the context and type.

Lemma 7.2 (Substitution). If Δ ; Γ ⊢ a : A and Δ , x : A , Δ ; Γ , x : 𝑞 A , Γ ⊢ b : B then Δ , Δ { a / x } ; ( Γ + 𝑞 · Γ ) , Γ { a / x } ⊢ b { a / x } : B { a / x } . Furthermore, even though we have an explicit weakening rule in this system, we also show thatwe can weaken with a zero-annotated fresh variable anywhere in the judgement.

Lemma 7.3 (Weakening). If Δ , Δ ; Γ , Γ ⊢ a : A and Δ ; Γ ⊢ B : Type then Δ , x : B , Δ ; Γ , x : B , Γ ⊢ a : A. With a small-step relation that is identical to that of the simply typed version, we have thefollowing type soundness theorem.

Theorem 7.4 (Preservation). If Δ ; Γ ⊢ a : A and a ❀ a ′ then Δ ; Γ ⊢ a ′ : A. Theorem 7.5 (Progress). If ∅ ; ∅ ⊢ a : A then either a is a value or there exists some a ′ such thata ❀ a ′ . Now, akin to the simple version, we develop a heap semantics for the dependent version. HEAP SEMANTICS FOR GRAD

The presence of dependent types causes one issue with the heap semantics: because substitutionsare delayed through the heap, the terms and their types can “get out of sync”.For example, if we have the application of a polymorphic identity function 𝜆 x : Type .𝜆 y : x . y to some type argument Unit , then the result ( 𝜆 x : Type .𝜆 y : x . y ) Unit should have type Π y : Unit . Unit . By the rule

Small-AppBeta , [∅] ( 𝜆 x : Type .𝜆 y : x . y ) Unit ⇒ [ x ↦→ Unit ] 𝜆 y : x . y .The term 𝜆 y : x . y has type Π y : x . x . But since x = Unit , we see that Π y : x . x = Π y : Unit . Unit ;as such, 𝜆 y : x . y can also be assigned the type Π y : Unit . Unit . So to align the types of the redexand the reduct, we need to know about the new assignments loaded into the heap. This issue didnot exist in the simple setting since the types did not depend on term variables. Note that this isnot a usage-related issue, any heap-based reduction that delays substitution will need to addressit while proving soundness. The good news is that it can be resolved with a simple extension tothe type system.

We extend our contexts with deﬁnitions that mimic delayed substitutions. These deﬁnitions areused only in deriving type equalities. From the type system perspective, they are essentially abookkeeping device added to enable reasoning with respect to the heap semantics. usage contexts Γ :: = ∅ | Γ , x : 𝑞 A | Γ , x = a : 𝑞 Acontexts Δ :: = ∅ | Δ , x : A | Δ , x = a : A Along with this extension to the context, we modify the conversion rule and add two new typ-ing rules to the system, as shown below. (In rule

T-conv , A { Δ } denotes the type obtained bysubstituting in 𝐴 , in reverse order, the deﬁniens in place of the variables for the deﬁnitions in Δ .) Δ ; Γ ⊢ a : A (Typing rules for dependent system with deﬁnitions)

T-conv Δ ; Γ ⊢ a : A Δ ; Γ ⊢ B : Type A { Δ } ≡ B { Δ } Δ ; Γ ⊢ a : B T-def x ∉ dom Δ Δ ; Γ ⊢ a : A Δ , x = a : A ; 0 · Γ , x = a : A ⊢ x : A T-weak-def x ∉ dom Δ Δ ; Γ ⊢ b : B Δ ; Γ ⊢ a : A Δ , x = a : A ; Γ , x = a : A ⊢ b : B The deﬁnitions act like usual variable assumptions: rule

T-def and rule

T-weak-def mirrorrule

T-var and rule

T-weak respectively. They are applied only during the conversion rule

T-conv that substitutes out these deﬁnitions before comparing for 𝛽 -equivalence. This modiﬁed rule meansthat the term 𝜆 y : x . y can be given the type Π y : Unit . Unit in a context that deﬁnes x to be Unit .The extended type system too has the syntactic soundness properties mentioned in Section 7.2.Furthermore, because deﬁnitions act only on types, deﬁnitions do not add extra resource demandsto the typing derivation. As a result, we can always convert a normal variable assumption toinclude some deﬁnition as long as the deﬁniens type checks. Furthermore, the resources used bythe deﬁniens ( Γ below) are unimportant. Lemma 8.1 (InsertEq). If Δ , x : A , Δ ; Γ , x : 𝑞 A , Γ ⊢ b : B and Δ ; Γ ⊢ a : A, then Δ , x = a : A , Δ ; Γ , x = a : 𝑞 A , Γ ⊢ b : B. Contexts can also be weakened with new (unused) deﬁnitions, analogous to lemma 7.3.

Lemma 8.2 (Weakening with Definitions). If Δ , Δ ; Γ , Γ ⊢ b : B and Δ ; Γ ⊢ a : A then Δ , x = a : A , Δ ; Γ , x = a : A , Γ ⊢ b : B. ecause we have modiﬁed the contexts to include deﬁnitions, we need to modify the heap re-duction and compatibility relations. Only for the reduction rules that load new assignments intothe heap, we now need the added context of new variables ( Γ ) to remember their assignments. Forexample, the rule Small-AppBeta is modiﬁed as below. [ H ] a ⇒ 𝑞 S [ H ′ ; u ′ ; Γ ] a ′ (SmallStep with deﬁnitions) Small-DAppBeta x ∉ Var H ∪ fv b ∪ fv a − { y } ∪ Sa ′ = a { x / y }[ H ] ( 𝜆 y : 𝑞 A . a ) b ⇒ 𝑟 S [ H , x 𝑟 · 𝑞 ↦→ Γ ⊢ b : A ; | H | ⋄ x = b : 𝑟 · 𝑞 A ] a ′ Similarly, rule

Compat-Cons needs to track more information in the context. H ⊢ Δ ; Γ (Compatibility with deﬁnitions) Compat-ConsDef H ⊢ Δ ; Γ + ( 𝑞 · Γ ) Δ ; Γ ⊢ a : Ax ∉ dom HH , x 𝑞 ↦→ Γ ⊢ a : A ⊢ Δ , x = a : A ; Γ , x = a : 𝑞 A Note that with this modiﬁcation, if H ⊢ Δ ; Γ then b { H } = b { Δ } for any term 𝑏 .These are all the changes we need. Since the added context of new variables does not play amajor role in the step relation, all the previously stated lemmas regarding this relation hold. Butwith dependency, the multi-substitution lemma 5.8 needs to be modiﬁed to also substitute into thetype (in addition to the term). Lemma 8.3 (Multi-substitution).

If H ⊢ Δ ; Γ and Δ ; Γ ⊢ a : A, then ∅ ; ∅ ⊢ a { H } : A { H } . Another point worth noting here is that, if [ H ] a ⇒ 𝑞 S [ H ′ ; u ′ ; Γ ′ ] a ′ then a { H } ≡ a ′ { H ′ } bylemma 5.5 and deﬁnition 9.1. Before moving further, let us reﬂect how the original typing and heapcompatibility judgements relate to their extended counterparts. For the sake of distinction, let usdenote the original relations by ⊢ 𝑜 . Now, for H ⊢ 𝑜 Δ ; Γ , let Δ H and Γ H be Δ and Γ respectively withtheir variables deﬁned according to (assignments in) 𝐻 . Also, let H H denote 𝐻 with the variablesin the embedded contexts in 𝐻 deﬁned according to 𝐻 . Then, we have: Lemma 8.4 (Elaboration).

If H ⊢ 𝑜 Δ ; Γ and Δ ; Γ ⊢ 𝑜 a : A, then H H ⊢ Δ H ; Γ H and Δ H ; Γ H ⊢ a : A. By virtue of this elaboration, soundness for the extended system implies soundness for the orig-inal one.

Now we prove the heap soundness theorem for

GraD . However, we ﬁrst state some subordinatelemmas that are required in the proof. The following lemma allows us to throw away resourcesfrom heaps.

Lemma 8.5 (Sub-heaping).

If H ⊢ Δ ; Γ and Γ ′ ≤ Γ , then there exists 𝐻 ′ such that H ′ ⊢ Δ ; Γ ′ and 𝐻 ′ ≤ 𝐻 . We can insert new deﬁnitions into the heap. emma 8.6 (SmallStep weakening). If [ H , H ] a ⇒ 𝑟 S ∪{ x } [ H ′ , H ′ ; u ⋄ u ; Γ ] a ′ and | H ′ | = | u | = | H | and x ∉ dom H , H and fv a ∩ dom H = ∅ , then [ H , x 𝑞 ↦→ Γ ⊢ a : A , H ] a ⇒ 𝑟 S [ H ′ , x 𝑞 ↦→ Γ ⊢ a : A , H ′ ; u ′ ⋄ ⋄ u ; Γ ] a ′ Lemma 8.7 (Compatibility weakening).

Let H , H ⊢ Δ , Δ ′ ; (( 𝑟 · Γ ) + Γ ) , Γ and Δ ; Γ ⊢ a : Aand | H | = | Γ | = | Δ | and x ∉ dom H , H . Let H ′ be H with the embedded contexts weakened byinserting x = a : A at the | Δ | position. Then,H , x 𝑟 ↦→ Γ ⊢ a : A , H ′ ⊢ Δ , x = a : A , Δ ′ ; Γ , x = a : 𝑟 A , Γ The soundness theorem follows as an instance of the invariance lemma below. The lemma pro-vides a strong enough hypothesis for the induction to go through.

Lemma 8.8 (Invariance).

If H ⊢ Δ ; Γ + 𝑞 · Γ and Δ ; Γ ⊢ a : A and ≤ 𝑞 and dom Δ ⊆ S, theneither 𝑎 is a value or there exists Γ ′ , H ′ , u ′ , Γ and a ′ such that: • [ H ] a ⇒ 𝑞 S [ H ′ ; u ′ ; Γ ] a ′ • H ′ ⊢ Δ , ⌊ Γ ⌋ ; ( Γ , · Γ ) + 𝑞 · Γ ′ • Δ , ⌊ Γ ⌋ ; Γ ′ ⊢ a ′ : A • 𝑞 · Γ ′ + u ′ + ⋄ Γ × h 𝐻 ′ i ≤ 𝑞 · ( Γ ⋄ ) + u ′ × h 𝐻 ′ i + ⋄ Γ • dom Γ is disjoint from 𝑆 Proof.

Let H ⊢ Δ ; Γ + 𝑞 · Γ and Δ ; Γ ⊢ a : A . We prove this lemma by induction on the typingjudgement Δ ; Γ ⊢ a : A . • rule T-sub

Let Δ ; Γ ⊢ a : A where Δ ; Γ ⊢ a : A and Γ ≤ Γ . Further, H ⊢ Δ ; Γ + 𝑞 · Γ .Since H ⊢ Δ ; Γ + 𝑞 · Γ and Γ ≤ Γ , by lemma 8.5, there exists 𝐻 ′ such that H ′ ⊢ Δ ; Γ + 𝑞 · Γ and 𝐻 ′ ≤ 𝐻 . By inductive hypothesis, [ H ′ ] a ⇒ 𝑞 S [ H ′′ ; u ; Γ ] a ′′ . Since 𝐻 ′ ≤ 𝐻 , we have, [ H ] a ⇒ 𝑞 S [ H ′′ ; u ; Γ ] a ′′ . The remaining clauses follow from the inductive hypothesis andthe fact that Γ ≤ Γ . • We don’t need to consider rule

T-var and rule

T-weak since for H ⊢ Δ ; Γ , any variable 𝑥 ∈ dom Δ should be a deﬁnition of the form x = a : A . • rule T-def

Let Δ , x = a : A ; 0 · Γ , x = a : A ⊢ x : A where Δ ; Γ ⊢ a : A and x ∉ dom Δ . Further, H ⊢ Δ , x = a : A ; Γ + 𝑞 · ( · Γ , x = a : A ) . Let Γ = Γ ′ , x = a : 𝑟 A . So H ⊢ Δ , x = a : A ; Γ ′ , x = a : ( 𝑟 + 𝑞 ) A .Therefore, 𝐻 = H , x ( 𝑟 + 𝑞 ) ↦→ Γ ⊢ a : A where Δ ; Γ ⊢ a : A .Since ≤ 𝑞 , we have, [ H , x ( 𝑟 + 𝑞 ) ↦→ Γ ⊢ a : A ] x ⇒ 𝑞 S [ H , x 𝑟 ↦→ Γ ⊢ a : A ; | H | ⋄ 𝑞 ; ∅] a .Also, H , x 𝑟 ↦→ Γ ⊢ a : A ⊢ Δ , x = a : A ; ( Γ ′ , x = a : 𝑟 A ) + 𝑞 · ( Γ , x = a : A ) . By weakening,we have, Δ , x = a : A ; Γ , x = a : A ⊢ a : A . The fourth clause: 𝑞 · ( Γ ⋄ ) + ( ⋄ 𝑞 ) ≤ ⋄ 𝑞 + ( ⋄ 𝑞 ) × (cid:0) h 𝐻 i ⊺ Γ (cid:1) follows by reﬂexivity. • rule T-weak-def

Let Δ , x = a : A ; Γ , x = a : A ⊢ b : B where Δ ; Γ ⊢ b : B and Δ ; Γ ⊢ a : A and x ∉ dom Δ . Further, H ⊢ Δ , x = a : A ; Γ + 𝑞 · ( Γ , x = a : A ) . Let Γ = Γ ′ , x = a : 𝑟 A . So H ⊢ Δ , x = a : A ; Γ ′ + 𝑞 · Γ , x = a : 𝑟 A . Therefore, 𝐻 = H , x 𝑟 ↦→ Γ ⊢ a : A where Δ ; Γ ⊢ a : A .Also, H ⊢ Δ ; Γ ′ + 𝑞 · Γ + 𝑟 · Γ .Applying the inductive hypothesis, we get, [ H ] b ⇒ 𝑞 S ∪{ x } [ H ′ , H ; u ⋄ u ; Γ ] b ′ and H ′ , H ⊢ Δ , ⌊ Γ ⌋ ; ( Γ ′ + 𝑟 · Γ , · Γ ) + 𝑞 · ( Γ ′ , Γ ′′ ) and Δ , ⌊ Γ ⌋ ; Γ ′ , Γ ′′ ⊢ b ′ : B . Here, | H ′ | = | u | = | H | = | Γ ′ | = | Δ | . Now, note that 𝑥 does not appear in H . Let H ′ be H with the embedded contexts eakened by inserting x = a : A at the | Δ | position. Then, by 8.7, H ′ , x 𝑟 ↦→ Γ ⊢ a : A , H ′ ⊢ Δ , x = a : A , ⌊ Γ ⌋ ; ( Γ ′ , x = a : 𝑟 A , · Γ ) + 𝑞 · ( Γ ′ , x = a : A , Γ ′′ ) . Because extra assign-ments do not impact the evaluation, we have, by 8.6, [ H , x 𝑟 ↦→ Γ ⊢ a : A ] b ⇒ 𝑞 S [ H ′ , x 𝑟 ↦→ Γ ⊢ a : A , H ′ ; u ⋄ ⋄ u ; Γ ] b ′ . Also, by weakening lemma 8.2, Δ , x = a : A , ⌊ Γ ⌋ ; Γ ′ , x = a : A , Γ ′′ ⊢ b ′ : B . The fourth clause follows from the inductive hypothesis after inserting atthe | Δ | -position on both sides. • rule T-app

Let Δ ; Γ + 𝑟 · Γ ⊢ b a : B { a / x } where Δ ; Γ ⊢ b : Π x : 𝑟 A . B and Δ ; Γ ⊢ a : A . Further, H ⊢ Δ ; Γ + 𝑞 · ( Γ + 𝑟 · Γ ) . Now, there are two cases to consider depending on whether 𝑏 isa value or not. – 𝑏 is not a value.In this case, we get from the inductive hypothesis, [ H ] b ⇒ 𝑞 S ∪ fv a [ H ′ ; u ′ ; Γ ] b ′ and Δ , ⌊ Γ ⌋ ; Γ ′ ⊢ b ′ : Π x : 𝑟 A . B and H ′ ⊢ Δ , ⌊ Γ ⌋ ; ( Γ + ( 𝑞 · 𝑟 ) · Γ , · Γ ) + 𝑞 · Γ ′ . So [ H ] b a ⇒ 𝑞 S [ H ′ ; u ′ ; Γ ] b ′ a By weakening, we get, Δ , ⌊ Γ ⌋ ; Γ , · Γ ⊢ a : A . Therefore, by App,we have, Δ , ⌊ Γ ⌋ ; Γ ′ + 𝑟 · ( Γ , · Γ ) ⊢ b ′ a : B { a / x } . Also, by rearranging, we get H ′ ⊢ Δ , ⌊ Γ ⌋ ; ( Γ , · Γ ) + 𝑞 · ( Γ ′ + 𝑟 · ( Γ , · Γ )) . The fourth clause follows from the correspondingclause of inductive hypothesis. – 𝑏 is a value.Since 𝑏 has a Π -type, it must be headed by a 𝜆 . Let b = 𝜆 y : 𝑠 A . b for some suﬃcientlyfresh variable 𝑦 . Then, we have, Δ ; Γ ⊢ 𝜆 y : 𝑠 A . b : Π x : 𝑟 A . B . By inversion, thereexists B such that Δ , y : A ; Γ , y : 𝑠 A ⊢ b : B and Δ ; Γ ⊢ Π y : 𝑠 A . B : Type and ( Π y : 𝑠 A . B ){ Δ } ≡ ( Π x : 𝑟 A . B ){ Δ } . By deﬁnition 9.1, 𝑠 = 𝑟 and A { Δ } ≡ A { Δ } and B { Δ } ≡ B { y / x }{ Δ } . Now, by rule T-Conv , Δ ; Γ ⊢ a : A . Therefore, by lemma 8.1, Δ , y = a : A ; Γ , y = a : 𝑟 A ⊢ b : B .We have, [ H ] ( 𝜆 y : 𝑟 A . b ) a ⇒ 𝑞 S [ H , y ( 𝑞 · 𝑟 ) ↦→ Γ ⊢ a : A ; ; y = a : ( 𝑞 · 𝑟 ) A ] b . Again, since H ⊢ Δ ; Γ + 𝑞 · Γ + ( 𝑞 · 𝑟 ) · Γ , we get, H , y ( 𝑞 · 𝑟 ) ↦→ Γ ⊢ a : A ⊢ Δ , y = a : A ; ( Γ , y = a : A ) + 𝑞 · ( Γ , y = a : 𝑟 A ) .By regularity, we know that Δ ; Γ ′ ⊢ B { a / x } : Type . By weakening, Δ , y = a : A ; Γ ′ , y = a : A ⊢ B { a / x } : Type . But B { a / x }{ Δ , y = a : A } = B { a / x }{ Δ } = B { y / x }{ a / y }{ Δ } = B { y / x }{ Δ }{ a { Δ }/ y } ≡ B { Δ }{ a { Δ }/ y } = B { a / y }{ Δ } = B { Δ , y = a : A } . Hence, byrule T-Conv , Δ , y = a : A ; Γ , y = a : 𝑟 A ⊢ b : B { a / x } . The fourth clause: 𝑞 · ( Γ ⋄ 𝑟 ) + +( ⋄( 𝑞 · 𝑟 )) × (cid:0) h 𝐻 i ⊺ Γ (cid:1) ≤ 𝑞 · (( Γ + 𝑟 · Γ ) ⋄ ) + + ( ⋄( 𝑞 · 𝑟 )) follows by reﬂexivity. • rule T-conv

Let Δ ; Γ ⊢ a : B where Δ ; Γ ⊢ a : A and Δ ; Γ ⊢ B : Type and A { Δ } ≡ B { Δ } . Further, H ⊢ Δ ; Γ + 𝑞 · Γ . Therefore, by inductive hypothesis, [ H ] a ⇒ 𝑞 S [ H ′ ; u ′ ; Γ ] a ′ and Δ , ⌊ Γ ⌋ ; Γ ′ ⊢ a ′ : A and H ′ ⊢ Δ , ⌊ Γ ⌋ ; ( Γ , · Γ ) + 𝑞 · Γ ′ .Now, since fv A ⊆ dom Δ and fv B ⊆ dom Δ ; we have, A { Δ , ⌊ Γ ⌋} = A { Δ } and B { Δ , ⌊ Γ ⌋} = B { Δ } . Therefore, Δ , ⌊ Γ ⌋ ; Γ ′ ⊢ a ′ : B . The other clauses follow from the inductive hypothesis. • rule T-UnitElim

Let Δ ; Γ + Γ ⊢ let unit = a in b : B { a / y } where Δ ; Γ ⊢ a : Unit and Δ ; Γ ⊢ b : B { unit / y } and Δ , y : Unit ; Γ , y : 𝑟 Unit ⊢ B : Type . Further, H ⊢ Δ ; Γ + 𝑞 · ( Γ + Γ ) . Now, there are twocases to consider depending on whether 𝑎 is a value or not. – 𝑎 is not a value.In this case, we get from the inductive hypothesis, [ H ] a ⇒ 𝑞 S ∪ fv b ∪{ y } [ H ′ ; u ; Γ ] a ′ and Δ , ⌊ Γ ⌋ ; Γ ′ ⊢ a ′ : Unit and H ′ ⊢ Δ , ⌊ Γ ⌋ ; (( Γ + 𝑞 · Γ ) , · Γ ) + 𝑞 · Γ ′ . There-fore, [ H ] let unit = a in b ⇒ 𝑞 S [ H ′ ; u ; Γ ] let unit = a ′ in b . And, by weakening and ule UnitElim , we have, Δ , ⌊ Γ ⌋ ; Γ ′ + ( Γ , · Γ ) ⊢ let unit = a ′ in b : B { a ′ / y } .Now, by regularity, Δ ; Γ ′ ⊢ B { a / y } : Type . By weakening, Δ , ⌊ Γ ⌋ ; Γ ′ , · Γ ⊢ B { a / y } : Type . But then, B { a / y }{ Δ , ⌊ Γ ⌋} = B { a / y }{ Δ } = B { Δ }{ a { Δ }/ y } . Also, B { a ′ / y }{ Δ , ⌊ Γ ⌋} = B { Δ , ⌊ Γ ⌋}{ a ′ { Δ , ⌊ Γ ⌋}/ y } = B { Δ }{ a ′ { Δ , ⌊ Γ ⌋}/ y } . Again, a { Δ } = a { H } ≡ a ′ { H ′ } = a ′ { Δ , ⌊ Γ ⌋} . Hence, by rule T-Conv , Δ , ⌊ Γ ⌋ ; Γ ′ + ( Γ , · Γ ) ⊢ let unit = a ′ in b : B { a / y } .The fourth clause: 𝑞 · ( Γ ′ + ( Γ ⋄ )) + u + ( ⋄ Γ ) × h 𝐻 ′ i ≤ 𝑞 · (( Γ + Γ ) ⋄ ) + u × h 𝐻 ′ i + ⋄ Γ follows from the inductive hypothesis. – 𝑎 is a value.Since 𝑎 has type Unit , we have, a = unit and · Γ ≤ Γ , for some Γ . Now, [ H ] let unit = unit in b ⇒ 𝑞 S [ H ; ; ∅] b . By sub-usaging, Δ ; Γ + Γ ⊢ b : B { unit / y } . The fourth clause: 𝑞 · ( Γ + Γ ) ≤ 𝑞 · ( Γ + Γ ) follows by reﬂexivity. • rule T-SumElim

Let Δ ; 𝑟 · Γ + Γ ⊢ case 𝑟 a of b ; b : B { a / y } where Δ ; Γ ⊢ a : A ⊕ A and Δ ; Γ ⊢ b : Π x : 𝑟 A . B { inj x / y } and Δ ; Γ ⊢ b : Π x : 𝑟 A . B { inj x / y } and Δ , y : A ⊕ A ; Γ , y : 𝑠 A ⊕ A ⊢ B : Type and ≤ 𝑟 . Further, H ⊢ Δ ; Γ + 𝑞 · ( 𝑟 · Γ + Γ ) . Now, there are two cases to considerdepending on whether 𝑎 is a value or not. – 𝑎 is not a value.By inductive hypothesis, we get [ H ] a ⇒ 𝑞 · 𝑟 S ∪ fv b ∪ fv b ∪{ y } [ H ′ ; u ; Γ ] a ′ and Δ , ⌊ Γ ⌋ ; Γ ′ ⊢ a ′ : A ⊕ A and H ′ ⊢ Δ , ⌊ Γ ⌋ ; (( Γ + 𝑞 · Γ ) , · Γ )+( 𝑞 · 𝑟 )· Γ ′ . This gives us, [ H ] case 𝑟 a of b ; b ⇒ 𝑞 S [ H ′ ; u ; Γ ] case 𝑟 a ′ of b ; b . By weakening and using rule T-SumElim thereafter, we get, Δ , ⌊ Γ ⌋ ; 𝑟 · Γ ′ +( Γ , · Γ ) ⊢ case 𝑟 a ′ of b ; b : B { a ′ / y } . By following the argument presentedearlier, B { a / y }{ Δ } ≡ B { a ′ / y }{ Δ , ⌊ Γ ⌋} ; as such, by rule T-Conv , Δ , ⌊ Γ ⌋ ; 𝑟 · Γ ′ + ( Γ , · Γ ) ⊢ case 𝑟 a ′ of b ; b : B { a / y } .By inductive hypothesis, (( 𝑞 · 𝑟 ) · Γ ′ + u ) + Γ × h 𝐻 ′ i ≤ ( 𝑞 · 𝑟 ) · ( Γ ⋄ ) + u × h 𝐻 ′ i + ⋄ Γ . Fromthis, we have, ( 𝑞 · ( 𝑟 · Γ ′ + ( Γ , · Γ )) + u ) + Γ × h 𝐻 ′ i ≤ 𝑞 · (( 𝑟 · Γ + Γ ) ⋄ ) + u × h 𝐻 ′ i + ⋄ Γ . – 𝑎 is a value.Since 𝑎 has type A ⊕ A , so a = inj a or a = inj a .Let a = inj a . Now, [ H ] case 𝑟 ( inj a ) of b ; b ⇒ 𝑞 S [ H ; ; ∅] b a . By inverting thetyping judgement, we have Δ ; Γ ⊢ a : A . By rule T-app , Δ ; Γ + 𝑟 · Γ ⊢ b a : B { inj a / y } .The fourth clause follows by reﬂexivity. The case where a = inj a follows similarly. • rule T-SigmaElim

Let A = Σ x : 𝑟 A . A and Δ ; Γ + Γ ⊢ let ( x , y ) = a in b : B { a / z } where Δ ; Γ ⊢ a : A and Δ , x : A , y : A ; Γ , x : 𝑟 A , y : A ⊢ b : B {( x , y )/ z } and Δ , z : A ; Γ , z : 𝑠 A ⊢ B : Type . Further, H ⊢ Δ ; Γ + 𝑞 · ( Γ + Γ ) . Now, there are two cases to consider depending on whether 𝑎 is avalue or not. – 𝑎 is not a value.By inductive hypothesis, [ H ] a ⇒ 𝑞 S ∪ fv b ∪{ x } ∪{ y } ∪{ z } [ H ′ ; u ; Γ ] a ′ and Δ , ⌊ Γ ⌋ ; Γ ′ ⊢ a ′ : A . Therefore, [ H ] let ( x , y ) = a in b ⇒ 𝑞 S [ H ′ ; u ; Γ ] let ( x , y ) = a ′ in b . By weakeningand rule SigmaElim , we have, Δ , ⌊ Γ ⌋ ; Γ ′ + ( Γ + · Γ ) ⊢ let ( x , y ) = a ′ in b : B { a ′ / z } .Using an argument presented before, we get B { a / z }{ Δ } ≡ B { a ′ / z }{ Δ , ⌊ Γ ⌋} ; as such, byrule T-conv , Δ , ⌊ Γ ⌋ ; Γ ′ + ( Γ + · Γ ) ⊢ let ( x , y ) = a ′ in b : B { a / z } . The other clausesfollow from the inductive hypothesis. – 𝑎 is a value. ince 𝑎 has Σ -type, a = ( a , a ) where Δ ; Γ ⊢ a : A and Δ ; Γ ⊢ a : A { a / x } and Γ = 𝑟 · Γ + Γ . Assuming 𝑥 ′ and 𝑦 ′ are fresh enough, [ H ] let ( x , y ) = ( a , a ) in b ⇒ 𝑞 S [ H , x ′ ( 𝑞 · 𝑟 ) ↦→ a : A , y ′ 𝑞 ↦→ a : A { x ′ / x } ; ; x ′ = a : ( 𝑞 · 𝑟 ) A , y ′ = a : 𝑞 A { x ′ / x }] b { x ′ / x }{ y ′ / y } . Since H ⊢ Δ ; Γ + 𝑞 · ( 𝑟 · Γ + Γ + Γ ) , so H , x ′ ( 𝑞 · 𝑟 ) ↦→ Γ ⊢ a : A , y ′ 𝑞 ↦→ ( Γ , x ′ = a : A ) ⊢ a : A { x ′ / x } ⊢( Γ , x ′ = a : A , y ′ = a : A { x ′ / x }) + 𝑞 · ( Γ , x ′ = a : 𝑟 A , y ′ = a : A { x ′ / x }) .By 8.1, Γ , x ′ = a : 𝑟 A , y ′ = a : A { x ′ / x } ⊢ b { x ′ / x }{ y ′ / y } : B {( x ′ , y ′ )/ z } . But then, byconversion, Γ , x ′ = a : 𝑟 A , y ′ = a : A { x ′ / x } ⊢ b { x ′ / x }{ y ′ / y } : B {( a , a )/ z } .The fourth clause: 𝑞 · ( Γ ⋄( 𝑟 ⋄ )) + ⋄( 𝑞 · 𝑟 ) ⋄ 𝑞 × (cid:0) h 𝐻 i ⊺ ⊺ Γ Γ (cid:1) ≤ 𝑞 · ( 𝑟 · Γ + Γ + Γ ⋄ ⋄ ) +( ⋄(( 𝑞 · 𝑟 ) ⋄ 𝑞 )) follows by reﬂexivity.In all the other cases, 𝑎 is a value. (cid:3) Theorem 8.9 (Soundness).

If H ⊢ Δ ; Γ and Δ ; Γ ⊢ a : A and 𝑆 ⊇ dom Δ , then either 𝑎 is a valueor there exists Γ ′ , H ′ , u ′ , Γ , A ′ such that: • [ H ] a : A ⇒ S [ H ′ ; u ′ ; Γ ] a ′ : A ′ • H ′ ⊢ Δ , ⌊ Γ ⌋ ; Γ ′ • Δ , ⌊ Γ ⌋ ; Γ ′ ⊢ a ′ : A ′ • Γ ′ + u ′ + ⋄ Γ × h 𝐻 ′ i ≤ Γ ⋄ + u ′ × h 𝐻 ′ i + ⋄ Γ Proof.

Follows from 8.8 with 𝑞 = and Γ = · Γ . (cid:3) The soundness theorem is similar in spirit to theorems showing correctness of usage in gradedtype systems via operational methods. But this theorem can be proved by simple induction on thetyping derivation; it does not require much extra machinery over and above the reduction relation,unlike the proof of soundness in Brunel et al. [2014] which requires a realizability model on topof the reduction relation. In this regard, our soundness theorem is more in line with the modalitypreservation theorem in Abel and Bernardy [2020].We can use the soundness theorem to prove the usual preservation and progress lemmas. Theproofs are similar to the corresponding ones for the simple version (5.12 and 5.13). This means thatthe ordinary semantics is sound with respect to a resource-aware semantics.Below, we show an example application of this theorem. Other similar examples can be workedout using the lemmas from Section 6.

Example . Consider any security lattice 𝑄 , as described in Section 3.2. Let 𝑠 be any elementsuch that (cid:2) 𝑠 . Also, let ∅ ⊢ A : Type → Type such that A Unit = Int . Now, let x : Type , y : 𝑞 A x ⊢ B : Type for some 𝑞 ∈ 𝑄 .In empty context, consider a term 𝑓 of type Π x : 𝑟 Type . Π y : 𝑠 A x . B for some 𝑟 ∈ 𝑄 . (Note thatneither 𝑟 and nor 𝑠 and 𝑞 need to be equal, as explained in irrelevant quantiﬁcation.) Then, we canshow that both f Unit and f Unit either diverge or produce equal values. If f Unit diverges, thenboth diverge. Otherwise, [∅] f Unit ⇒⇒ [ H ] 𝜆 y : 𝑠 C . b where C { H } ≡ Int . Now, [ H ] ( 𝜆 y : 𝑠 C . b ) ⇒[ H , y 𝑠 ↦→ ] b and [ H ] ( 𝜆 y : 𝑠 C . b ) ⇒ [ H , y 𝑠 ↦→ ] b . By 6.3, we know that both (( H , y 𝑠 ↦→ ) , b ) and (( H , y 𝑠 ↦→ ) , b ) either diverge or reduce to the same value. We omit some extraneous components to keep things clean. DISCUSSION9.1 Definitional-Equivalence and Irrelevance

The terms “irrelevance” and “irrelevant quantiﬁcation” have multiple meanings in the literature.Our primary focus is on erasability, the ability to quantify over arguments that need not be presentat runtime. However, this terminology often includes compile-time irrelevance, or the blindness oftype equality to such erasable parts of terms. These terms are also related to, but not the same as,“parametricity” or “parametric quantiﬁcation”, which characterizes functions that map equivalentarguments to equivalent results.One diﬀerence between our formulation and a more traditional dependently-typed calculus isthat the conversion rule (rule

T-conv ) is speciﬁed in terms of an abstract equivalence relationon terms, written A ≡ B . Our proofs about this system work for any relation that satisﬁes thefollowing properties. Deﬁnition 9.1.

We say that the relation A ≡ B is sound if it:(1) is equivalence relation ,(2) contains the small step relation , in other words, if a ❀ a ′ then a ≡ a ′ ,(3) is closed under substitution , in other words, if a ≡ a then b { a / x } ≡ b { a / x } and a { b / x } ≡ a { b / x } ,(4) is injective for type constructors , for example, if Π x : 𝑞 A . B ≡ Π x : 𝑞 A . B then 𝑞 = 𝑞 and A ≡ A and B ≡ B (and similar for Σ x : 𝑟 A . B and A ⊕ B ),(5) and is consistent , in other words, if A ≡ B and both are values, then they have the same headform.The standard 𝛽 -conversion relation, deﬁned as the reﬂexive, symmetric, transitive and congru-ent closure of the step relation, is a sound relation.However, 𝛽 -conversion is not the only relation that would work. Dependent type systems withirrelevance sometimes erase irrelevant parts of terms before comparing them up to 𝛽 -equivalence[Barras and Bernardo 2008]. Alternatively, a typed deﬁnition of equivalence, might use the totalrelation when equating irrelevant components [Pfenning 2001]. In future work, we hope to showthat any sound deﬁnition of equivalence can be coarsened by ignoring irrelevant components interms during comparison. We conjecture that such a relation would also satisfy the propertiesabove. In particular, our results from Section 6 tell us that such coarsening of the equivalencerelation is consistent with evaluation, and therefore contains the step relation. The current design of linear types in GHC/Haskell is essentially an instance of the type systemdescribed in this paper, one that uses the linearity semiring. Haskell users can mark argumentswith grades or 𝜔 , but a grade of is sometimes needed internally. Haskell’s kind system supportsirrelevance, but not linearity, so the two features do not yet interact. It is with dependent types thatwe need a uniform treatment which we can achieve through this graded type system. The currentHaskell structure will be able to migrate to a graded type system with little, if any, backwardcompatibility trouble for users.One feature of Haskell’s linear types does cause a small wrinkle, though: Haskell supports multi-plicity polymorphism . An easy example is the type of map , which is forall m a b. (a %m-> b) ->[a] %m-> [b] . We see that the function argument to map can be either linear or unrestricted, andthat this choice aﬀects whether the input list is restricted. We cannot support quantity polymor-phism in our type theory, as quantifying over whether or not an argument is relevant would meanthat we could no longer compile a quantity-polymorphic function: would the compiled function ake the argument in a register or not? The solution is to tweak the meaning of quantity poly-morphism slightly: instead of quantifying over all possible quantities, we would be polymorphiconly over quantities 𝑞 such that ≤ 𝑞 . That is, we would quantify over only relevant quantities.This reinterpretation of multiplicity polymorphism avoids the mentioned trouble with static com-pilation. Furthermore, we see no diﬃculty in extending our graded type system with this kind ofquantity polymorphism; in the linear Haskell work, multiplicity polymorphism is nicely straight-forward, and we expect the same to be true here, too.Commentary on the practicalities of type checking Haskell based on GraD appears in Appen-dix B.

Quantitative Type Theory QTT [Atkey 2018; McBride 2016] uses elements of a resource semiringto track the usage of variables in a dependent type system. This system has a typing judgement ofthe form: 𝑥 : 𝜌 𝐴 , 𝑥 : 𝜌 𝐴 , . . . , 𝑥 𝑛 : 𝜌 𝑛 𝐴 𝑛 ⊢ 𝑎 : 𝜎 𝐴 , where 𝜌 𝑖 s and 𝜎 are elements of a semiring.Roughly speaking, this judgement means that using 𝜌 𝑖 copies of 𝑥 𝑖 of type 𝐴 𝑖 , with 𝑖 varying from to 𝑛 , we get 𝜎 copies of 𝑎 of type 𝐴 .In QTT, 𝜎 can be either 0 or 1. When 𝜎 is 1, the system is similar to GraD . GraD does not havethe -fragment of QTT but this is not a limitation per se: to express the requirement of copy of 𝑎 , one need only multiply the context by . This approach implies that our system treats types thesame as any other irrelevant component of terms.In contrast, QTT disables resource checking for the -fragment. In other words, in the -fragment,the resource annotations are not meaningful. This diﬀerence has both positive and negative eﬀectson the design of the language.On the positive side, because linear tensor types are turned into normal (non-linear) products,QTT can support strong - Σ types, allowing projections that violate the usage requirements of theirconstruction. In contrast, GraD supports weak - Σ types only, with resource-checked pattern match-ing as the only elimination form.On the negative side, however, QTT is restricted to semirings that are zerosumfree ( 𝑞 + 𝑞 = ⇒ 𝑞 = 𝑞 = ) and entire ( 𝑞 · 𝑞 = ⇒ 𝑞 = ∨ 𝑞 = ). (These properties are necessary toprove substitution.) This limits QTT’s applicability. For example, QTT can not be applied to theclass of semirings described in Section 3.2 that are not entire. On the other hand, our soundnesstheorem places no constraint on the semiring allowing us to work with such semirings, as lemma6.3 and example 8.10 show.Furthermore, because QTT ignores usages in the -fragment, its internal logic is limited in itsreasoning about the resource usage of programs. For example, the following proposition is notprovable in QTT: ∀ 𝑓 : Bool → Bool .𝑓 True = 𝑓 False

The above proposition says that for any constant boolean function, the result of applying it to

True is the same as the result of applying it to

False . This proposition is not provable in QTT be-cause 𝑓 ranges over many functions, including those that examine the argument. In the 0-fragment,the type system cannot prevent a function that uses its argument to be given a type that says thatit does not. ⊢ 𝜆𝑥 : 𝐴.𝑥 : Π 𝑥 : 𝐴.𝐴

Abel [2018] also lists additional ramiﬁcations of eliminating resource checking in types. In par-ticular, he notes that in QTT, it is not possible to use resource usage to optimize the computation oftypes during type checking. In particular, erasing irrelevant terms not only optimizes the output f a compiler for a dependently-typed language, it is also an optimization that is useful duringcompilation, when types are normalized for comparison.Finally, we note that GraD includes case expressions and sub-usaging, while QTT does not.

Our type system with graded contexts has operations for addition ( Γ + Γ ) and scalar multiplication( 𝑞 · Γ ) deﬁned over an arbitrary partially-ordered semiring. Further, the partial ordering from thesemiring was lifted to contexts. However, we can provide reasonable alternative deﬁnitions forthese operations and relations and all our proofs would still work the same. Here, we lay out whatconstitutes a reasonable deﬁnition.Our contexts are an example of a general algebraic structure, called a partially-ordered left semi-module. Additionally, vectors and matrices of quantities also can also be seen through this abstractmathematical lens. This may help in future extensions and applications of the work presented inthis paper.We follow Golan [1999] in our terminology and deﬁnitions here. Deﬁnition 9.2 (Left 𝑄 -semimodule). Given a semiring ( 𝑄, + , · , , ) , a left 𝑄 -semimodule is a com-mutative monoid ( 𝑀, ⊕ , ) along with a left multiplication function _ ⊙ _ : 𝑄 × 𝑀 → 𝑀 such thatthe following properties hold. • for 𝑞 , 𝑞 ∈ 𝑄 and 𝑚 ∈ 𝑀 , we have, ( 𝑞 + 𝑞 ) ⊙ 𝑚 = 𝑞 ⊙ 𝑚 ⊕ 𝑞 ⊙ 𝑚 • for 𝑞 ∈ 𝑄 and 𝑚 , 𝑚 ∈ 𝑀 , we have, 𝑞 ⊙ ( 𝑚 ⊕ 𝑚 ) = 𝑞 ⊙ 𝑚 ⊕ 𝑞 ⊙ 𝑚 • for 𝑞 , 𝑞 ∈ 𝑄 and 𝑚 ∈ 𝑀 , we have, ( 𝑞 · 𝑞 ) ⊙ 𝑚 = 𝑞 ⊙ ( 𝑞 ⊙ 𝑚 )• for 𝑚 ∈ 𝑀 , we have, ⊙ 𝑚 = 𝑚 • for 𝑞 ∈ 𝑄 and 𝑚 ∈ 𝑀 , we have, ⊙ 𝑚 = 𝑞 ⊙ = .Graded contexts Γ (with the same ⌊ Γ ⌋ ) satisfy this deﬁnition, with the operations as deﬁnedbefore. Another example of a semimodule is 𝑄 itself, with ⊕ : = + and ⊙ : = · .Next, let us consider the partial ordering of our contexts. The ordering is basically a lifting ofthe partial ordering in the semiring. But in general, a partial order on a left semimodule needs tosatisfy only the following properties. Deﬁnition 9.3 (Partially-ordered left 𝑄 -semimodule). Given a partially-ordered semiring ( 𝑄, ≤) ,a left 𝑄 -semimodule 𝑀 is said to be partially-ordered iﬀ there exists a partial order ≤ 𝑀 on 𝑀 suchthat the following properties hold. • for 𝑚 , 𝑚 , 𝑚 ∈ 𝑀 , if 𝑚 ≤ 𝑀 𝑚 , then 𝑚 ⊕ 𝑚 ≤ 𝑀 𝑚 ⊕ 𝑚 • for 𝑞 ∈ 𝑄 and 𝑚 , 𝑚 ∈ 𝑀 , if 𝑚 ≤ 𝑀 𝑚 , then 𝑞 ⊙ 𝑚 ≤ 𝑀 𝑞 ⊙ 𝑚 • for 𝑞 , 𝑞 ∈ 𝑀 and 𝑚 ∈ 𝑀 , if 𝑞 ≤ 𝑞 , then 𝑞 ⊙ 𝑚 ≤ 𝑀 𝑞 ⊙ 𝑚 .Note that our ordering of contexts Γ satisfy these properties.We use matrices on several occasions. Matrices can be seen as homomorphisms between semi-modules. Given a semiring 𝑄 , an 𝑚 × 𝑛 matrix with elements drawn from 𝑄 is basically a 𝑄 -homomorphism from 𝑄 𝑚 to 𝑄 𝑛 .For 𝑄 -semimodules 𝑀, 𝑁 , a function _ 𝛼 : 𝑀 → 𝑁 is said to be a 𝑄 -homomorphism iﬀ: • for 𝑚 , 𝑚 ∈ 𝑀 , we have, ( 𝑚 ⊕ 𝑚 ) 𝛼 = 𝑚 𝛼 ⊕ 𝑚 𝛼 • for 𝑞 ∈ 𝑄 and 𝑚 ∈ 𝑀 , we have, ( 𝑞 ⊙ 𝑚 ) 𝛼 = 𝑞 ⊙ ( 𝑚𝛼 ) .So the matrix h 𝐻 i for a heap 𝐻 is an endomorphism from 𝑄 𝑛 to 𝑄 𝑛 where 𝑛 = | 𝐻 | . Also, anidentity matrix is an identity homomorphism. ext, for natural numbers 𝑖, 𝑗, 𝑘 and 𝑄 -homomorphisms _ 𝛼 : 𝑄 𝑖 → 𝑄 𝑗 and _ 𝛽 : 𝑄 𝑗 → 𝑄 𝑘 , thecomposition _ ( 𝛼 ◦ 𝛽 ) : 𝑄 𝑖 → 𝑄 𝑘 can be given by matrix multiplication, 𝛼 × 𝛽 . The composition isassociative. And it obeys the identity laws.This makes the set 𝑉 𝑄 = { 𝑄 𝑛 | 𝑛 ∈ N } with Hom ( 𝑄 𝑚 , 𝑄 𝑛 ) = M 𝑚,𝑛 ( 𝑄 ) , the set of 𝑚 × 𝑛 matri-ces over 𝑄 , a category. We worked in this category. There may be other such categories worthexploring.

10 OTHER RELATED WORK10.1 Heap Semantics for Linear Logic

Computational and operational interpretations of linear logic have been explored in several works,especially in Chirimar et al. [1996], Turner and Wadler [1999]. In Turner and Wadler [1999], theauthors provide a heap-based operational interpretation of linear logic. They show that a call-by-name calculus enjoys the single pointer property, meaning a linear resource has exactly onereference while a call-by-need calculus satisﬁes a weaker version of this property, guaranteeingonly the maintenance of a single pointer. This system considers only linear and unrestricted re-sources. We generalize this operational interpretation of linear logic to a graded type system byallowing resources to be drawn from an arbitrary semiring. We derive a quantitative version ofthe single pointer property in Section 6. We can develop a quantitative version of the weak singlepointer property for call-by-need reduction but for this, we need to modify the typing rules toallow sharing of resources.

Perhaps the earliest work studying the combination of linear and dependent types was proposedin the form of a categorical model by Bonfante et al. [2001] who were interested in characteriz-ing how a linear dependent type system should be designed. A year later, Cervesato and Pfenning[2002] proposed the Linear Logical Framework (LLF) that combined non-dependent linear typeswith dependent types. This paper spurred a number of publications, but most relevant are in theline of work which extend dependent types with Girard et al. [1992]’s and Dal Lago and Hofmann[2009]’s bounded linear types. For example, Dal Lago and Gaboardi [2011]’s d 𝑙 PCF , a sound andcomplete system for reasoning about evaluation bounds of PCF programs. Dal lago and Petit [2012]also show that d 𝑙 PCF can also be used to reason about call-by-value execution. Gaboardi et al.[2013] develop a similar system called DFuzz for analyzing diﬀerential privacy of queries involv-ing sensitive information. In the same vein, Krishnaswami et al. [2015] show how to combine non-dependent linear types with dependent types by generalizing Benton [1995]’s linear/non-linearlogic. But all of these work had some separation between the linear and non-linear parts of theirlanguages. Quantitative type theory [Atkey 2018; McBride 2016] provided a fresh way to look atthis problem by combining the linear and non-linear parts using a resource semiring.

Orchard et al. [2019] introduced a system with a notion of graded necessity modalities over anarbitrary semiring—here called usage modalities—in a practical programming language with usagepolymorphism and indexed types. However, their system does not have full dependent types. Theyshow that usage modalities can be used to encode a large number of graded coeﬀects in the styleof Gaboardi et al. [2016] and Brunel et al. [2014].Abel and Bernardy [Abel and Bernardy 2020] use a graded type system to provide an abstractview of modalities. Their type system is similar in structure to ours, but its features and require-ments diﬀer. It includes usage polymorphism and parametric polymorphism, but lacks dependent ypes. Their system is also strongly normalizing. Furthermore, Abel and Bernardy deﬁne a re-lational interpretation for their system and use it to derive parametricity theorems. Due to ourinclusion of the Type : Type axiom, this parametricity proof technique is unavailable to us, so wemust use more syntactic methods to reason about our programs. But this axiom does not play amajor role in our proofs. We conjecture that our approach to graded dependent types would workequally well in normalizing type theories.

There are several approaches to adding irrelevant quantiﬁcation to dependently-typed languages.Miquel [2001] ﬁrst added “implicit” quantiﬁcation to a Curry-style version of the extended Cal-culus of Constructions. Implicit arguments are those that do not appear free in the body of theirabstractions. In Miquel’s system, only the relevant parts of the computation may be explicit interms, everything else must be implicit. Barras and Bernardo [2008] showed how to support decid-able type checking by allowing type annotations and other irrelevant subcomponents to appearin terms. In this setting, irrelevant arguments must not be free in the erasure of the body of theirabstractions. Mishra-Linger and Sheard [2008] extended this approach to pure type systems. Morerecently, Weirich et al. [2017] used these ideas as part of a proposal for a core language for De-pendent Haskell. We have followed their design in making the usage of irrelevant variables inthe co-domain of Π -types unrestricted. Speciﬁcally, the irrelevance ( − ) tag in their language cor-responds to the grade in our language.

11 FUTURE WORK AND CONCLUSIONS

Graded type systems are a generic framework for expressing the ﬂow and usage of resources inprograms. This work provides a new way of incorporating this framework into dependently-typedlanguages, with the goal of supporting both erasure and linearity in the same system.We designed a graded dependent type system

GraD and presented a standard substitution-basedsemantics and a usage-aware heap-based semantics. The standard semantics does not have theability to model use of resources. But the heap-based semantics can track usage during evaluationof terms. Further, the heap-based reduction relation enforces fair usage of resources. We show thatthe type system is sound with respect to this heap semantics. This implies that the type systemdoes a proper static accounting of resource usage.As always, there is more to explore: What additional reasoning principles can we get from ourheap semantics? What happens when we add imperative features—like arrays—to our language?What would a general form of equality up to erasure look like? What happens when we add mul-tiple modalities, all of them graded, to our language?The answers to these questions may have theoretical as well as practical implications. Currently,languages such as Haskell, Rust, Idris, and Agda are experimenting with dependent and lineartypes, as well as the more general applications of graded type theories. We hope that this workwill provide guidance in these language designs and extensions.

ACKNOWLEDGMENTS

This material is based upon work supported by the National Science Foundation under GrantNo. 1521539, and Grant No. 1704041. Any opinions, ﬁndings, and conclusions or recommendationsexpressed in this material are those of the author and do not necessarily reﬂect the views of theNational Science Foundation. EFERENCES

Martín Abadi, Anindya Banerjee, Nevin Heintze, and Jon G. Riecke. 1999. A Core Calculus of Dependency. In

Proceedings ofthe 26th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (San Antonio, Texas, USA) (POPL’99) . Association for Computing Machinery, New York, NY, USA, 147–160. https://doi.org/10.1145/292540.292555Andreas Abel. 2018. Resourceful Dependent Types. Presentation at 4th International Conference on Types for Proofs andPrograms (TYPES 2018), Braga, Portugal..Andreas Abel and Jean-Philippe Bernardy. 2020. A Uniﬁed View of Modalities in Type Systems.

Proceedings of the ACMon Programming Languages

4, ICFP (2020). To appear.Andreas Abel and Gabriel Scherer. 2012. On Irrelevance and Algorithmic Equality in Predicative Type Theory.

LogicalMethods in Computer Science

8, 1 (2012). https://doi.org/10.2168/LMCS-8(1:29)2012The Agda-Team. 2020.

Run-time Irrelevance . https://agda.readthedocs.io/en/v2.6.1.1/language/runtime-irrelevance.htmlRobert Atkey. 2018. The Syntax and Semantics of Quantitative Type Theory. In

LICS ’18: 33rd Annual ACM/IEEE Symposiumon Logic in Computer Science, July 9–12, 2018, Oxford, United Kingdom . https://doi.org/10.1145/3209108.3209189H. P. Barendregt. 1993.

Lambda Calculi with Types . Oxford University Press, Inc., USA, 117–309.Bruno Barras and Bruno Bernardo. 2008. The Implicit Calculus of Constructions as a Programming Language with De-pendent Types. In

Foundations of Software Science and Computational Structures (FOSSACS 2008) , Roberto Amadio (Ed.).Springer Berlin Heidelberg, Budapest, Hungary, 365–379.P. N. Benton. 1995. A Mixed Linear and Non-Linear Logic: Proofs, Terms and Models (Extended Abstract). In

Selected Papersfrom the 8th International Workshop on Computer Science Logic (CSL ’94) . Springer-Verlag, London, UK, UK, 121–135.Jean-Philippe Bernardy, Mathieu Boespﬂug, Ryan R. Newton, Simon Peyton Jones, and Arnaud Spiwack. 2018. LinearHaskell: practical linearity in a higher-order polymorphic language.

Proc. ACM Program. Lang.

2, POPL (2018), 5:1–5:29.https://doi.org/10.1145/3158093Guillaume Bonfante, François Lamarche, and Thomas Streicher. 2001.

A model of a dependent linear calculus

Programming Languages and Systems , Zhong Shao (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 351–370.Iliano Cervesato and Frank Pfenning. 2002. A Linear Logical Framework.

Information and Computation

Journal of Functional Programming

6, 2 (March 1996), 195–244. https://doi.org/10.1017/S0956796800001660U. Dal Lago and M. Gaboardi. 2011. Linear Dependent Types and Relative Completeness. In . 133–142.Ugo Dal Lago and Martin Hofmann. 2009. Bounded Linear Logic, Revisited. In

Typed Lambda Calculi and Applications ,Pierre-Louis Curien (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 80–94.Ugo Dal lago and Barbara Petit. 2012. Linear Dependent Types in a Call-by-Value Scenario. In

Proceedings of the 14thSymposium on Principles and Practice of Declarative Programming (PPDP ’12) . Association for Computing Machinery,New York, NY, USA, 115–126.Richard A. Eisenberg. 2016.

Dependent Types in Haskell: Theory and Practice . Ph.D. Dissertation. University of Pennsylvania.Richard A. Eisenberg. 2018. Quantiﬁers for Dependent Haskell. GHC Proposal

Proceedings of the 40th Annual ACM SIGPLAN-SIGACT Symposium on Principles of ProgrammingLanguages (POPL ’13) . Association for Computing Machinery, New York, NY, USA, 357–370.Marco Gaboardi, Shin-ya Katsumata, Dominic A Orchard, Flavien Breuvart, and Tarmo Uustalu. 2016. Combining eﬀectsand coeﬀects via grading. In

ICFP . 476–489.Dan R Ghica and Alex I Smith. 2014. Bounded linear types in a resource semiring. In

European Symposium on ProgrammingLanguages and Systems . Springer, 331–350.Jean-Yves Girard, Andre Scedrov, and Philip J. Scott. 1992. Bounded linear logic: a modular approach to polynomial-timecomputability.

Theoretical Computer Science

97, 1 (1992), 1–66.Jonathan S. Golan. 1999.

Semirings and their Applications . Springer Netherlands. https://doi.org/10.1007/978-94-015-9333-5Adam Gundry. 2013.

Type Inference, Haskell and Dependent Types . Ph.D. Dissertation. University of Strathclyde.Neelakantan R. Krishnaswami, Pierre Pradic, and Nick Benton. 2015. Integrating Linear and Dependent Types. In

Proceed-ings of the 42Nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’15) . ACM,New York, NY, USA, 17–30. ohn Launchbury. 1993. A natural semantics for lazy evaluation. POPL ’93: Proceedings of the 20th ACM SIGPLAN-SIGACTsymposium on Principles of programming languagesl (3 1993), 144–154. https://doi.org/10.1145/158511.158618Conor McBride. 2016.

I Got Plenty o’ Nuttin’ . Springer International Publishing, Cham, 207–233.Alexandre Miquel. 2001.

The Implicit Calculus of Constructions Extending Pure Type Systems with an Intersection Type Binderand Subtyping . Springer Berlin Heidelberg, Berlin, Heidelberg, 344–359. https://doi.org/10.1007/3-540-45413-6_27Nathan Mishra-Linger and Tim Sheard. 2008. Erasure and Polymorphism in Pure Type Systems. In

Foundations of SoftwareScience and Computational Structures (FoSSaCS) . Springer.Benjamin Moon, Harley Eades III, and Dominic Orchard. 2020. Graded Modal Dependent Type Theory (Extended Abstract).

TyDe (May 2020).Dominic Orchard, Vilem-Benjamin Liepelt, and Harley Eades III. 2019. Quantitative Program Reasoning with Graded ModalTypes.

Proc. ACM Program. Lang.

3, ICFP, Article 110 (July 2019), 30 pages. https://doi.org/10.1145/3341714Tomas Petricek, Dominic Orchard, and Alan Mycroft. 2014. Coeﬀects: A Calculus of Context-Dependent Computation. In

Proceedings of the 19th ACM SIGPLAN International Conference on Functional Programming (Gothenburg, Sweden) (ICFP’14) . Association for Computing Machinery, New York, NY, USA, 123–135. https://doi.org/10.1145/2628136.2628160Frank Pfenning. 2001. Intensionality, Extensionality, and Proof Irrelevance in Modal Type Theory. In

Proceedings of the16th Annual IEEE Symposium on Logic in Computer Science (LICS ’01) . IEEE Computer Society, Washington, DC, USA,221–. http://dl.acm.org/citation.cfm?id=871816.871845Jason Reed and Benjamin C. Pierce. 2010. Distance Makes the Types Grow Stronger: A Calculus for Diﬀer-ential Privacy. In

Proceedings of the 15th ACM SIGPLAN International Conference on Functional Programming (Baltimore, Maryland, USA) (ICFP ’10) . Association for Computing Machinery, New York, NY, USA, 157–168.https://doi.org/10.1145/1863543.1863568David N. Turner and Philip Wadler. 1999. Operational interpretations of linear logic.

Theoretical Computer Science

J. Comput.Secur.

4, 2–3 (Jan. 1996), 167–187.Philip Wadler. 1990. Linear types can change the world. In

IFIP TC , Vol. 2. 347–359.Stephanie Weirich, Pritam Choudhury, Antoine Voizard, and Richard A. Eisenberg. 2019. A Role for Dependent Types inHaskell.

Proc. ACM Program. Lang.

3, ICFP, Article 101 (July 2019), 29 pages. https://doi.org/10.1145/3341705Stephanie Weirich, Antoine Voizard, Pedro Henrique Avezedo de Amorim, and Richard A. Eisenberg. 2017. A Spec-iﬁcation for Dependent Types in Haskell.

Proc. ACM Program. Lang.

1, ICFP, Article 31 (Aug. 2017), 29 pages.https://doi.org/10.1145/3110275James Wood and Robert Atkey. 2020. A Linear Algebra Approach to Linear Metatheory. arXiv:2005.02247 [cs.PL] FULL JUDGEMENTSA.1 Simple Graded Type System Δ ; Γ ⊢ a : A (Simple graded type system)

ST-Sub Δ ; Γ ⊢ a : A Γ ≤ Γ Δ ; Γ ⊢ a : A ST-Var x ∉ dom Δ Δ ⊢ Γ ( Δ , x : A ) ; ( · Γ , x : A ) ⊢ x : A ST-Weak x ∉ dom ΔΔ ; Γ ⊢ a : B Δ , x : A ; Γ , x : A ⊢ a : B ST-Unit ∅ ; ∅ ⊢ unit : Unit

ST-UnitE Δ ; Γ ⊢ a : Unit Δ ; Γ ⊢ b : B Δ ; Γ + Γ ⊢ let unit = a in b : B ST-Lam Δ , x : A ; Γ , x : 𝑞 A ⊢ a : B Δ ; Γ ⊢ 𝜆 x : 𝑞 A . a : ( 𝑞 A → B ) ST-App Δ ; Γ ⊢ a : ( 𝑞 A → B ) Δ ; Γ ⊢ b : A Δ ; Γ + 𝑞 · Γ ⊢ a b : B ST-Box Δ ; Γ ⊢ a : A Δ ; 𝑞 · Γ ⊢ box 𝑞 a : ✷ 𝑞 A ST-LetBox Δ ; Γ ⊢ a : ✷ 𝑞 A Δ , x : A ; Γ , x : 𝑞 A ⊢ b : B Δ ; Γ + Γ ⊢ let box x = a in b : B ST-Pair Δ ; Γ ⊢ a : A Δ ; Γ ⊢ b : B Δ ; Γ + Γ ⊢ ( a , b ) : A ⊗ B ST-Spread Δ ; Γ ⊢ a : A ⊗ A Δ ; Γ , x : A , y : A ⊢ b : B Δ ; Γ + Γ ⊢ let ( x , y ) = a in b : B ST-Inj1 Δ ; Γ ⊢ a : A Δ ; Γ ⊢ inj a : A ⊕ A ST-Inj2 Δ ; Γ ⊢ a : A Δ ; Γ ⊢ inj a : A ⊕ A ST-Case ≤ 𝑞 Δ ; Γ ⊢ a : A ⊕ A Δ ; Γ ⊢ b : 𝑞 A → B Δ ; Γ ⊢ b : 𝑞 A → B Δ ; 𝑞 · Γ + Γ ⊢ case 𝑞 a of b ; b : B A.2 Operational Semantics for the Simple Graded Type System

The operational semantics depend on a notion of values: values v :: = unit | 𝜆 x : 𝑞 A . a | box 𝑞 a | ( a , b ) | inj a | inj aa ❀ a ′ (Small-step operational semantics) S-AppCong a ❀ a ′ a b ❀ a ′ b S-Beta ( 𝜆 x : 𝑞 A . a ) b ❀ a { b / x } S-UnitCong a ❀ a ′ let unit = a in b ❀ let unit = a ′ in b S-UnitBeta let unit = unit in b ❀ b S-BoxCong a ❀ a ′ let box x = a in b ❀ let box x = a ′ in b S-BoxBeta let box x = box 𝑞 a in b ❀ b { a / x } S-SpreadCong a ❀ a ′ let ( x , y ) = a in b ❀ let ( x , y ) = a ′ in b -SpreadBeta let ( x , y ) = ( a , b ) in b ❀ b { a / x }{ b / y } S-CaseCong a ❀ a ′ case 𝑞 a of b ; b ❀ case 𝑞 a ′ of b ; b S-Case1Beta case 𝑞 ( inj a ) of b ; b ❀ b a S-Case2Beta case 𝑞 ( inj a ) of b ; b ❀ b a Theorem A.1 (Preservation). If Δ ; Γ ⊢ a : A and a ❀ a ′ then Δ ; Γ ⊢ a ′ : A. Theorem A.2 (Progress). If ∅ ; ∅ ⊢ a : A then either 𝑎 is a value or there exists some 𝑎 ′ suchthat a ❀ a ′ . A.3 Heap Semantics [ H ] a ⇒ 𝑟 S [ H ′ ; u ′ ; Γ ′ ] a ′ (Small-step reduction relation (part 1)) Small-Var ≤ 𝑟 [ H , x ( 𝑞 + 𝑟 ) ↦→ Γ ⊢ a : A , H ] x ⇒ 𝑟 S [ H , x 𝑞 ↦→ Γ ⊢ a : A , H ; | H | ⋄ 𝑟 ⋄ | H | ; ∅] a Small-AppL [ H ] a ⇒ 𝑟 S ∪ fv b [ H ′ ; u ′ ; Γ ] a ′ [ H ] a b ⇒ 𝑟 S [ H ′ ; u ′ ; Γ ] a ′ b Small-AppBeta x ∉ Var H ∪ fv b ∪ fv a − { y } ∪ Sa ′ = a { x / y }[ H ] ( 𝜆 y : 𝑞 A . a ) b ⇒ 𝑟 S [ H , x 𝑟 · 𝑞 ↦→ Γ ⊢ b : A ; | H | ⋄ x : 𝑟 · 𝑞 A ] a ′ Small-UnitL [ H ] a ⇒ 𝑟 S ∪ fv b [ H ′ ; u ′ ; Γ ] a ′ [ H ] let unit = a in b ⇒ 𝑟 S [ H ′ ; u ′ ; Γ ] let unit = a ′ in b Small-UnitBeta [ H ] let unit = unit in b ⇒ 𝑟 S [ H ; | H | ; ∅] b Small-CaseL [ H ] a ⇒ 𝑟 · 𝑞 S ∪ fv b ∪ fv b [ H ′ ; u ′ ; Γ ] a ′ [ H ] case 𝑞 a of b ; b ⇒ 𝑟 S [ H ′ ; u ′ ; Γ ] case 𝑞 a ′ of b ; b Small-Case1 [ H ] case 𝑞 ( inj a ) of b ; b ⇒ 𝑟 S [ H ; | H | ; ∅] b a Small-Case2 [ H ] case 𝑞 ( inj a ) of b ; b ⇒ 𝑟 S [ H ; | H | ; ∅] b a Small-Sub [ H ] a ⇒ 𝑟 S [ H ′ ; u ′ ; Γ ] a ′ H ≤ H [ H ] a ⇒ 𝑟 S [ H ′ ; u ′ ; Γ ] a ′ H ] a ⇒ 𝑟 S [ H ′ ; u ′ ; Γ ′ ] a ′ (Small-step reduction relation (part 2)) Small-LetBoxL [ H ] a ⇒ 𝑟 S ∪ fv b [ H ′ ; u ′ ; Γ ] a ′ [ H ] let box x = a in b ⇒ 𝑟 S [ H ′ ; u ′ ; Γ ] let box x = a ′ in b Small-LetBoxBeta x ∉ Var H ∪ fv a ∪ fv b − { y } ∪ Sb ′ = b { x / y }[ H ] let box y = box 𝑞 a in b ⇒ 𝑟 S [ H , x 𝑟 · 𝑞 ↦→ Γ ⊢ a : A ; | H | ⋄ x : 𝑟 · 𝑞 A ] b ′ Small-ProjL [ H ] a ⇒ 𝑟 S ∪ fv b [ H ′ ; u ′ ; Γ ] a ′ [ H ] let ( x , y ) = a in b ⇒ 𝑟 S [ H ′ ; u ′ ; Γ ] let ( x , y ) = a ′ in b Small-ProjBeta x ′ ∉ Var H ∪ fv a ∪ fv a ∪ fv b − { x } − { y } ∪ Sy ′ ∉ Var H ∪ fv a ∪ fv a ∪ fv b − { x } − { y } ∪ S ∪ { x ′ } b ′ = b { x ′ / x }{ y ′ / y }[ H ] let ( x , y ) = ( a , a ) in b ⇒ 𝑟 S [ H , x ′ 𝑟 ↦→ Γ ⊢ a : A , y ′ 𝑟 ↦→ Γ ⊢ a : A ; | H | ⋄ ⋄ x ′ : 𝑟 A , y ′ : 𝑟 A ] b ′ A.4 Dependent Graded Type System

The full typing rules for

GraD are below. This system uses the same deﬁnition of values andoperational semantics as the simple system. Δ ; Γ ⊢ a : A (Typing rules for dependent system)

T-sub Δ ; Γ ⊢ a : A Γ ≤ Γ Δ ; Γ ⊢ a : A T-weak x ∉ dom Δ Δ ; Γ ⊢ a : B Δ ; Γ ⊢ A : Type Δ , x : A ; Γ , x : A ⊢ a : B (T-conv1?) T-type ∅ ; ∅ ⊢ Type : Type

T-var x ∉ dom ΔΔ ; Γ ⊢ A : Type Δ , x : A ; 0 · Γ , x : A ⊢ x : A T-Unit ∅ ; ∅ ⊢ Unit : Type

T-unit ∅ ; ∅ ⊢ unit : Unit

T-UnitElim Δ ; Γ ⊢ a : Unit Δ ; Γ ⊢ b : B { unit / y } Δ , y : Unit ; Γ , y : 𝑟 Unit ⊢ B : Type Δ ; Γ + Γ ⊢ let unit = a in b : B { a / y } T-pi Δ ; Γ ⊢ A : Type Δ , x : A ; Γ , x : 𝑟 A ⊢ B : Type Δ ; Γ + Γ ⊢ Π x : 𝑞 A . B : Type -lam Δ , x : A ; Γ , x : 𝑞 A ⊢ a : B Δ ; Γ ⊢ A : Type Δ ; Γ ⊢ 𝜆 x : 𝑞 A . a : Π x : 𝑞 A . B T-app Δ ; Γ ⊢ a : Π x : 𝑞 A . B Δ ; Γ ⊢ b : A Δ ; Γ + 𝑞 · Γ ⊢ a b : B { b / x } T-Sigma Δ ; Γ ⊢ A : Type Δ , x : A ; Γ , x : 𝑟 A ⊢ B : Type Δ ; Γ + Γ ⊢ Σ x : 𝑞 A . B : Type

T-inj1 Δ ; Γ ⊢ a : A Δ ; Γ ⊢ A : Type Δ ; Γ ⊢ inj a : A ⊕ A T-inj2 Δ ; Γ ⊢ a : A Δ ; Γ ⊢ A : Type Δ ; Γ ⊢ inj a : A ⊕ A T-case ≤ 𝑞 Δ ; Γ ⊢ a : A ⊕ A B = B { inj x / y } B = B { inj x / y } Δ ; Γ ⊢ b : Π x : 𝑞 A . B Δ ; Γ ⊢ b : Π x : 𝑞 A . B Δ , y : A ⊕ A ; Γ , y : 𝑟 A ⊕ A ⊢ B : Type Δ ; 𝑞 · Γ + Γ ⊢ case 𝑞 a of b ; b : B { a / y } B TYPE-CHECKING A GRADED, DEPENDENT HASKELL

This paper concerns itself with an implicit, internal language. Yet, if we are to integrate with GHC,we must make these ideas practical. There are two type-checking challenges that will arise:

Producing

GraD via elaboration

A real-world compiler must support taking a surface lan-guage, performing type inference, and then producing well-typed

GraD programs via anelaboration step. The key question here: is

GraD a suitable target for elaboration? We claimthat it is. One author of the current paper, Eisenberg, has been involved in the day-to-day im-plementation concerns of both linear and dependent types in GHC. While challenges surelyremain in any task this substantial, Eisenberg believes the type inference concerns of lineartypes and of dependent types to be largely orthogonal. The former have been worked outduring the implementation of today’s linear types [Bernardy et al. 2018], and the latter havebeen carefully studied in the context of Haskell previously [Eisenberg 2016].

Checking

GraD itself

GHC uses a typed intermediate language. Type-checking this languageserves only as a check on the compiler itself—but a vital check it is. With the right compilerﬂags, GHC will repeat the check after every optimization pass, frequently discovering bugsthat might have otherwise gone unnoticed. If we are to use

GraD as GHC’s intermediatelanguage, it, too, must support reasonably eﬃcient type-checking. Yet,

GraD as presentedhere does not. The solution is not to encode

GraD into GHC directly, but instead use anencoding of

GraD ’s typing judgements as the internal language within GHC. The relation-ship between the implicit nature and the explicit, implementable nature of a more detailedencoding is one focus of our previous work [Weirich et al. 2017]. A particular challenge is ow to encode the context splitting in, say, the application rule. The solution is not to encodethis at all, but to have grades be an output of the checking algorithm, not an input. The algo-rithm then checks that the grades line up with expectations at the binding sites of restrictedvariables—just as is done in the implementation today.of the checking algorithm, not an input. The algo-rithm then checks that the grades line up with expectations at the binding sites of restrictedvariables—just as is done in the implementation today.