[PDF] Intensional Datatype Refinement

Abstract

The pattern-match safety problem is to verify that a given functional program will never crash due to non-exhaustive patterns in its function definitions. We present a refinement type system that can be used to solve this problem. The system extends ML-style type systems with algebraic datatypes by a limited form of structural subtyping and environment-level intersection. We describe a fully automatic, sound and complete type inference procedure for this system which, under reasonable assumptions, is worst-case linear-time in the program size. Compositionality is essential to obtaining this complexity guarantee. A prototype implementation for Haskell is able to analyse a selection of packages from the Hackage database in a few hundred milliseconds.

Full PDF

aa r X i v : . [ c s . P L ] A ug For Scalable, Flow- and Context-Sensitive Verification of Pattern-Match Safety

EDDIE JONES,

University of Bristol

STEVEN RAMSAY,

University of BristolThe pattern-match safety problem is to verify that a given functional program will never crash due to non-exhaustive patterns in its function deﬁnitions. We present a reﬁnement type system that can be used tosolve this problem. The system extends ML-style type systems with algebraic datatypes by a limited form ofstructural subtyping and environment-level intersection. We describe a fully automatic, sound and completetype inference procedure for this system which, under reasonable assumptions, is linear in the program size.A prototype implementation for Haskell is able to analyse a selection of packages from the Hackage databasein a few hundred milliseconds.Additional Key Words and Phrases: higher-order program veriﬁcation, reﬁnement types

The pattern match safety problem asks, given a program with non-exhaustive (algebraic datatype)patterns in its function deﬁnitions, is it possible that the program crashes with a pattern-matchexception? Consider the example Haskell code in Figure 1. This code deﬁnes the two main ingre-dients in a typical deﬁnition (see e.g. [21]) of conversion from arbitrary propositional formulas topropositional formulas in disjunctive normal form (represented as lists of lists of literals). Usingthese deﬁnitions, the conversion can be described as the composition dnf ≔ nnf2dnf ◦ nnf .Notice that the deﬁnition of nnf2dnf is partial: it is expected only to be used on inputs that arein negation normal form (NNF). Consequently, unless is can be veriﬁed that nnf always producesa formula without any occurrence of Imp or Not , then any application of dnf to an expression oftype Fm a may result in a pattern match failure exception. In this paper we present a new reﬁne-ment type system that can be used to perform this veriﬁcation statically and automatically. Typeinference is compositional and incremental so that it can be integrated with modern developmentenvironments: open program expressions can be analysed and only the parts of the code that aremodiﬁed need to be reanalysed as changes are made.Whilst there are other analyses in the literature that can also verify instances of the foregoingexample ours is, as far as we are aware, the only to oﬀer strong guarantees on predictability, whichwe believe to be key to the usability of such systems in practice. • The analysis is characterised by the type system, which is a natural, yet expressive extensionof ML-style type systems with algebraic datatypes. Hence, the programmer can reason aboutwhen it will succeed by reasoning about typing. • The analysis runs in time that is, in the worst-case, linear in the size of the program (underreasonable assumptions on the size of types and the nesting of matching).We elaborate on these in the following.

Sound and terminating program analyses are conservative: there are always programs withoutbugs that, nevertheless, cannot be veriﬁed. Identifying a large fragment for which the analysis iscomplete, i.e. a class of safe programs for which veriﬁcation is guaranteed, allows the programmerto reason about the behaviour of the analysis on their code. In particular, when an analysis fails

Authors’ addresses: Eddie Jones, Department of Computer Science, University of Bristol; Steven Ramsay, Department ofComputer Science, University of Bristol. :2 Eddie Jones and Steven Ramsaydata L a =Atom a| NegAtom adata Fm a =Lit (L a)| Not (Fm a)| And (Fm a) (Fm a)| Or (Fm a) (Fm a)| Imp (Fm a) (Fm a) nnf (Lit (Atom x)) = Lit (Atom x)nnf (Lit (NegAtom x)) = Lit (NegAtom x)nnf (And p q) = And (nnf p) (nnf q)nnf (Or p q) = Or (nnf p) (nnf q)nnf (Imp p q) = Or (nnf (Not p)) (nnf q)nnf (Not (Not p)) = nnf pnnf (Not (And p q)) = Or (nnf (Not p)) (nnf (Not q))nnf (Not (Or p q)) = And (nnf (Not p)) (nnf (Not q))nnf (Not (Imp p q)) = And (nnf p) (nnf (Not q))nnf (Not (Lit (Atom x))) = Lit (NegAtom x)nnf (Not (Lit (NegAtom x))) = Lit (Atom x)nnf2dnf (Lit a) = [[a]]nnf2dnf (Or p q) = List.union (nnf2dnf p) (nnf2dnf q)nnf2dnf (And p q) = distrib (nnf2dnf p) (nnf2dnf q)where distrib xss yss =List.nub [ List.union xs ys | xs < − xss, ys < − yss ] Fig. 1. Conversion to disjunctive normal form. to verify a program that the user believes to be safe, it gives them an opportunity to take action,such as by programming more defensively, in order to put their program into the fragment andthus be certain of veriﬁcation success.However, for this to be most eﬀective, the fragment must be easily understood by the averagefunctional programmer. Our analysis is complete with respect to programs typable in a naturalextension of ML-style type systems with algebraic datatypes. Indeed it is characterised by thissystem: the force of Theorems 25 and 28 is to say that it forms a sound and complete inferenceprocedure. The system is presented in full in Section 5, but the highlights are as follows:(i) The datatype environment introduced by the programmer, e.g. L a and Fm a , is completed :every datatype whose deﬁnition can be obtained by erasing constructors from one of thosegiven is added to the environment for the purpose of type assignment. These new datatypesare called intensional reﬁnements . These additional types allow for the scrutinee of a matchto be typed with a datatype that is more precise than the underlying type provided by theprogrammer. For example, the datatypes in Figure 2 are among the intensional reﬁnementsof Fm a , where data A a = Atom a is an intensional reﬁnement of L a . Of course, the namesof the datatypes are irrelevant.(ii) There is a natural notion of subtyping between intensional reﬁnement datatypes which isincorporated into the type system through an unrestricted subsumption rule. For example, Clause a and Cube a are both subtypes of the intensional reﬁnement: data NFm = Lit (L a) | Or (NFm a) (NFm a) | And (NFm a) (NFm a) which is itself a subtype of Fm a . However, Clause a , Cube a and STLC a are all incomparable.(iii) The typing rule for the case analysis construct, by which pattern matching is represented,enforces that matching is exhaustive with respect to the type of the scrutinee. This ensures ntensional Datatype Refinement 1:3data Clause a =Lit (L a)| Or (Clause a) (Clause a) data STLC a =Lit (A a)| And (STLC a) (STLC a)| Imp (STLC a) (STLC a) data Cube a =Lit (L a)| And (Cube a) (Cube a) Fig. 2. Some intensional refinements of Fm a . that the analysis of matching is sound: programs for which the match is not exhaustive willnot be typable. Moreover, the rule is ﬂow-sensitive , with the type of the match only dependingon the types of the branches corresponding to the type of the scrutinee. For example, thefollowing function can be assigned the type ( a → b ) → Cube a → Cube b and it can beassigned the type ( a → b ) → Clause a → Clause b , but not the type ( a → b ) → STLC a → STLC b . map f (Lit (Atom x)) = Lit (Atom (f x))map f (Lit (NegAtom x)) = Lit (NegAtom (f x))map f (And p q) = And (map p) (map q)map f (Or p q) = Or (map p) (map q) Flow sensitivity is essential for handling typical use cases. Often a single large datatype isdeﬁned but, locally, certain parts of the program work within a fragment (e.g. only on clauses).Flow sensitivity helps to ensure that transformations on values inside the fragment remaininside the correct datatype reﬁnement — otherwise map could only advertise that it returnsformulas in type

NFm a . For example, Elm-style web applications typically deﬁne a single,global datatype of actions although the constituent pages may only be prepared to handlecertain (overlapping) subsets locally.(iv) Finally, reﬁnement polymorphism, and hence context-sensitivity , is provided by allowing forenvironments that have more than a single reﬁnement type binding for each free programvariable, i.e. an environment-level intersection. For example, suppose trivial : Clause a → Bool checks a clause for complimentary literals, isFunTy : STLC a → Bool checks if a formulafrom the simply typed fragment corresponds to a function type, and rn : String → String performs a renaming of propositional atoms. Then the following expression is well typed: λxy . trivial ( map rn x ) || isFunTy ( map rn y ) This is because the typing environment contains both of the aforementioned types for map .Note: this is polymorphism in the class of formulas, not only in the type a of their atoms.To distinguish between the typing assigned to the program by the programming language (whichwe consider part of the input to the analysis) from the types that can be assigned in our extendedsystem, we call the former the underlying typing of the program.Characterising the power of an analysis with a type system allows for the programmer to reasonabout its behaviour by using typings as a kind of certiﬁcate . Returning to the above example, theprogrammer can be certain that uses of dnf will be veriﬁably safe because they can synthesize theintensional datatype reﬁnement NFm , and quickly check the typings nnf : Fm a → NFm a and nnf2dnf : NFm a → [[ L a ]] in their mind. The example is rather contrived, but we may rather imagine such combinations occurring in diﬀerent parts of the program. :4 Eddie Jones and Steven Ramsay

Our analysis takes the form of a type inference procedure for the system described above. Inferenceproceeds by generating and solving typing constraints. The constraints are inclusions, represent-ing the ﬂow of data, guarded by requirements on the presence of certain datatype constructors,representing sensitivity to the context. For example, three of the constraints arising in the analysisof the ﬁrst case of nnf above are:

Atom ∈ Y ( L a ) Lit ∈ Z ( Fm a ) Lit ∈ X ( Fm a ) , Atom ∈ X ( L a ) ? Y ( L a ) ⊆ Z ( L a ) and concern the reﬁnement type variables X , Y and Z . The former two say that the constructor Atom must be provided by the reﬁnement datatype Y at the level of literals and that the constructor Lit must be provided by the reﬁnement datatype Z , which will characterise the output of thiscase, at the level of formulas. The third, which relates to the application Lit ( Atom x ) , says thatwhenever the reﬁnement datatype X characterising the input to nnf is required to provide the Lit constructor at the level of formulas and the

Atom constructor at the level of literals, then it mustbe that constructors provided by the intensional reﬁnement Y are a subset of those provided by Z at the level of literals. By using a version of constrained types (e.g. as described by Odersky,Sulzmann, and Wehr [31]) to describe sets of solutions, inference is compositional.Since our constraints can be viewed as deﬁnite inequalities over a ﬁnite semilattice in the senseof Rehof and Mogensen [37], sets of constraints can be solved in time which is bounded by a linearfunction of their size but, as is typical for constrained type inference, this may already exponentialin the size of the program. However, we show how to exploit compositionality and the restrictedform of intensional reﬁnements to localise this exponential complexity. Consequently, under somereasonable assumptions, our inference procedure is exponential in the size of the underlying typesof the program and linear in the number of function deﬁnitions.Inspired by the long and proliﬁc line of work on set constraint based program analysis [2–4, 23],rather than building a model explicitly (as in [37]) we have designed a set of rules for puttingconstraints into a solved form. We show that our constraint sets in solved form have the followingremarkable property, stated formally as Theorem 32.Suppose C is a set of constraints in solved form over variables V and let I ⊆ V be arbitrary. Let C ↾ I , called the restriction of C to I be those constraints in C inwhich occur only variables from I . Then every solution to C ↾ I can be extendedto a solution of all C .In the restriction C ↾ I , entire constraints are culled, including those that involve a mixture of vari-ables from I and V \ I . Such mixed constraints, intuitively, impose compatibility requirements onthe diﬀerent components of a solution to C . What is signiﬁcant about the above property is thatit guarantees not only that the part of the constraint set only concerned with V \ I is internallyconsistent but, moreover, that the mixed constraints will be satisﬁable no matter which solutionto C ↾ I is chosen. The reason is that, in our solved form, the force of any requirement placed onvariables in I due to interaction with variables in V \ I in a mixed constraint is explicitly rephrasedin a number of new constraints written purely over variables in I .We exploit compositionality in order to choose a minimal set of variables I with which to re-strict constraint sets. Constraint generation proceeds as a recursive traversal of the each functiondeﬁnition, associating a set of constraints C with each term in context Γ ⊢ e : T . In general, C will contain reﬁnement type variables that occur free in Γ and T — we call these variables the interface I — but it will also typically contain reﬁnement type variables that do not; for example,because they are associated with program points internal to M . For the purpose of typing, the Such variables are often existentially quantiﬁed in other presentations. ntensional Datatype Refinement 1:5 particular assignment to variables not occurring in the interface is irrelevant: Γ θ ⊢ e : Tθ and Γ θ ′ ⊢ e : Tθ ′ are identical whenever θ and θ ′ agree on I . Therefore, as a direct consequence of theaforementioned property, it is always sound to put constraints into solved form and restrict to theinterface.Putting the set of constraints associated with e into solved form and restricting to the interfaceof e ensures that the number of constraints only depends on the size of the types in the interfaceand the size of their deﬁnitions. In particular, assuming the depth of nested pattern matching isﬁxed, the size of constraint sets that are ever considered by the analysis are independent of the sizeof the program : we perform a small (but exponential in the size of types occurring in the interface)ﬁxed point computation at every program point, rather than an enormous (exponential in thesize of the program) ﬁxed point computation when processing the program’s entry point. If weconsider the size of the underlying type assignment and the size of the largest function deﬁnitionﬁxed, then the analysis scales linearly in the size of the program.We have implemented our System in Haskell as a GHC Plugin and ran it on a selection ofpackages from the Hackage database. The plugin takes a Haskell package to be compiled and runsour type inference algorithm over the whole code to yield a constrained type assignment and a setof type errors. The average time taken to process each module is in the order of milliseconds andthe results show very stark contrast between the number of reﬁnement variables associated withthe program points in the module (often > 10000) and the number of reﬁnement variables in theinterfaces (typically < 20). The rest of the paper is structured as follows. In Section 2 we describe a Haskell-like functionalprogramming language which forms the setting for our work. This is followed in Section 3 byour deﬁnitions of reﬁnement. Then in Sections 4 and 5 by the deﬁnition of the type system thatcharacterises the analysis. In Sections 6, 7 and 8 we present our analysis as a type inference algo-rithm, generating and solving constraints. We discuss the restriction operation and its complexityin Section 9 and we report on our implementation in Section 10. Finally, we conclude and discussrelated work in Section 11.

Preliminaries.

Given sets X and Y , let us write X → Y for the set of all functions from X to Y and Map

X Y for the set of all ﬁnite maps between X and Y . As usual function arrows areassumed to associate to the right. Additionally, we deﬁne the indexing of function arguments, thatis ( X → · · · → X m → Y )[ i ] = X i for all i ∈ [ .. m ] . Given a family of sets Y x indexed by x ∈ X ,let us write Π x ∈ X . Y x for the subset of X → Ð x ∈ X Y x that contains only functions that areguaranteed to map each x ∈ X to some element of Y x and let us write Σ x ∈ X . Y x for the subsetof X × Ð x ∈ X Y x in which the second component of each pair ( x , y ) is guaranteed to belong to Y x .Given a family of sets Y x indexed by x ∈ X , let us write Ý x ∈ X Y x for their disjoint sum and inj x for each of the canonical injections. Types.

We assume a countable collection A of type variables, ranged over by α ; a ﬁnite collection B of base types, ranged over by b , and a countable collection D of algebraic datatype identiﬁers ranged over by d . These can be thought as the names of ﬁrst-order type constructors. Each datatypeidentiﬁer has a ﬁxed arity, and only forms a proper type when supplied with the appropriatenumber of type arguments. We refer to datatype identiﬁers with it’s argument as a datatype, and :6 Eddie Jones and Steven Ramsay when it is clear from the context we will also write these as d . (Monotypes) T , U , V F α | b | d T | T → T (Type schemes) S F T | ∀ α . S We write Dt D to stand for the set of all datatypes with datatype identiﬁers drawn from the set D . Ty D and Sch D are deﬁned similarly for monotypes and schemes. We consider monotypes tobe a trivial instance of type schemes where convenient. The purpose of distinguishing base typesfrom datatypes is that the latter may have non-trivial reﬁnements whereas the former may not.For example, we will consider Int to be a base type, Fm a datatype identiﬁer and Fm Int a datatype.Type schemes are identiﬁed up to renaming of bound variables.

Lifting over types.

Given a relation on datatypes R ⊆ Dt D × Dt D , we write Ty ( R ) for therelation on Ty D × Ty D deﬁned inductively by the following: Ty ( R )( α , α ) Ty ( R )( b , b ) R ( d , d ) Ty ( R )( d , d ) Ty ( R )( T , T ) Ty ( R )( T , T ) Ty ( R )( T → T , T → T ) Expressions and modules.

We assume a countable collection of term variables, ranged over by x , y , z , and variations. As well as a countable collection K of datatype constructors, ranged over by k . The arity of a constructor is denoted Arity ( k ) . The expressions of the language are: m F ϵ | m · h x = e i e F c | k | e e | e T | λx : T . e | Λ α . e | case e of { k ® x e p · · · p k m ® x m e m } Expressions are identiﬁed up to renaming of bound variables and we will adopt the Barendregtvariable convention in order to retain a simple notation. Since we are deﬁning a reﬁnement typesystem, we will assume that the input program already has a typing assigned by the underlyingtype system of the programming language. We assume that this is manifest, in part, by the in-sertion of appropriate type abstraction Λ α . e and application e T terms, and by the annotation ofterm abstractions with their argument type λx : T . e . We also assume, as is the case for GHC Core,that pattern matching has been preprocessed into case expressions in which patterns are 1 leveldeep, i.e have the form k x · · · x n for some constructor k . Modules are simply a sequence of vari-able deﬁnitions h x = e i that may be empty ϵ . For simplicity of presentation, we allow recursivedeﬁnitions but not mutually recursive deﬁnition sets. Datatype environments.

The meaning of datatypes is deﬁned by an environment of datatypedeﬁnitions. Each datatype deﬁnition introduces a new datatype identiﬁer along with a collectionof datatype constructors that can be used to build instances of the type.

Deﬁnition 1 (Datatype Environment). A datatype environment is a pair consisting of a set D ⊆ D of datatypes identiﬁers and a function ∆ : D → Map K ( Sch D ) mapping each datatype identiﬁer d to a ﬁnite map which records the associated constructors and their type. We assume these typesonly concern the type variables that appear in datatype’s deﬁnition, and so is of shape: ∀ α . U →· · · → U m → d α , where m is the constructor’s arity. For convenience, and as the return typeof a constructor is predetermined, we will often identify ∆ ( d )( k ) with just the sequence of typescorresponding to a constructor’s arguments, i.e. [ U , .., U m ] where U i = ∆ ( d )( k )[ i ] .Since datatype environments are partial functions on D , they inherit the natural partial orderin which ∆ ⊆ ∆ just if the graph of the former is included in the graph of the latter. If ∆ ⊆ ∆ we say that ∆ is a subenvironment of ∆ . ntensional Datatype Refinement 1:7 Note that the notion of subenvironment only concerns the datatypes that are deﬁned in anenvironment and not the deﬁnitions of those datatypes (the constructors and their types), whichwill be treated by the notion of reﬁnement in the sequel.

Henceforth we will ﬁx a particular datatype environment ∆ : D → Map K ( Sch D ) which we callthe underlying datatype environment . We think of ∆ as the datatype environment provided by theprogrammer. Example 2.

We will use the following as running example of underlying datatype environment.Consider the datatype

Lam of λ -terms with arithmetic using a locally nameless representation: data Arith = Lit Int | Add | Muldata Lam = Cst Arith | BVr Int | FVr String | Abs Lam | App Lam Lam These datatypes are slightly artiﬁcial, but they allow us to illustrate several features of the deﬁni-tions in one example. For simplicity, we will consider

Int and

String to be base types (which will,therefore, not be reﬁned).The underlying datatype environment contains deﬁnitions for all the datatypes declared by theprogrammer. Some datatype deﬁnitions require the deﬁnitions of other datatypes to be understoodproperly. For example, to understand

Lam , one must also understand the deﬁnition of

Arith sinceone is deﬁned in terms of the other. There is a notion of a subenvironment that contains all andonly those deﬁnitions that are needed to understand one particular datatype.

Deﬁnition 3 (Slice).

Suppose ∆ : D → Map K ( Sch D ) . One can always construct a subenvi-ronment of ∆ by starting from a given datatype d ∈ D and closing under transitive dependencies.Deﬁne the slice of d through ∆ , written h d i ∆ , as the least subenvironment of ∆ containing d .For example, in the environment ∆ described in Example 2, we have h Lam i ∆ being the wholeenvironment, and h Arith i ∆ being just the deﬁnition of Arith itself.

Deﬁnition 4 (Reﬁnement).

We say that a datatype environment ∆ : D → Map K ( Sch D ) isa reﬁnement of a datatype environment ∆ : D → Map K ( Sch D ) just if the deﬁnitions of thedatatypes in ∆ are bounded above by the deﬁnitions in ∆ , that is: for all d ∈ D , ∆ ( d ) ⊆ ∆ ( d ) (i.e. the graph of the ﬁrst map is included in the graph of the second).Suppose ∆ ( d )( k ) is some type scheme S and ∆ is a reﬁnement of ∆ , then ∆ ( d )( k ) is the same S . So, the reﬁnements of ∆ : D → Map K ( Sch D ) are in one-to-one correspondence with choicesof constructors for each of the constituent datatypes. More precisely, every function f : Π d ∈ D . P( dom ∆ ( d )) determines a reﬁnement ∆ f satisfying: ∆ f ( d )( k ) = ( ∆ ( d )( k ) if k ∈ f ( d )⊥ otherwiseand each reﬁnement arises in this way. Example 5.

The following reﬁnement of the underlying environment from Example 2 describesa type of closed, applicative terms over linear arithmetic. data Arith = Lit Int | Add Arithdata Lam = Cst Arith | App Lam Lam:8 Eddie Jones and Steven Ramsay

This reﬁnement is determined by the choice f LL satisfying: f LL ( Arith ) = { Lit , Add } f LL ( Lam ) = { Cst , App } For the purpose of assigning types to the program, we construct a new datatype environmentconsisting of all possible reﬁnements of the underlying environment supplied by the programmer . Deﬁnition 6 (The Intensional Reﬁnement Environment).

Given a family of datatype environments ∆ i ∈ I : D i → Map K ( Sch D i ) we deﬁne their coproduct as the environment: Þ i ∈ I ∆ i : ( Þ i ∈ I D i ) → Map K ( Sch ( Þ i ∈ I D i )) whose domain is simply a disjoint sum of sets (as deﬁned in the preliminaries). The coproductcomes equipped with canonical injections inj i satisfying ( Ý i ∈ I ∆ i )( inj i d )( d )( k ) = inj i ( ∆ i ( d )( k )) wherever the latter is properly deﬁned.The intensional reﬁnement environment , written ∆ ∗ , is the coproduct of all the reﬁnements of ∆ : ∆ ∗ ≔ Þ f ∈ I ∆ f : D ∗ → Map K ( Sch D ∗ ) where I is the set Π d ∈ D . P( dom ∆ ( d )) of functions from underlying datatypes d to appropriatesubsets of constructors. That is, the set of all possible reﬁnements.Note that, formally, the datatype identiﬁers whose deﬁnitions are given in the intensional re-ﬁnement environment are of shape inj ∆ d with d ∈ D . This is convenient because one can readsuch a name as “the reﬁnement of d whose deﬁnition is ∆ ”. However, we will continue to use morefriendly names (and Haskell notation) in our examples. Example 7.

The type of closed, applicative terms over linear arithmetic from Example 5 can befound in ∆ ∗ alongside the type LArith of linear arithmetic constants:

LATm ≔ inj f LL Lam LArith ≔ inj f LA Arith the latter being deﬁned by f LA ( Arith ) = { Lit , Add } and f LA ( Lam ) = ∅ . This datatype could equiv-alently be deﬁned as inj f LL Arith (or in many other ways). Other reﬁnement datatypes deﬁned in-clude the type

LLam of closed λ -terms over linear arithmetic, the type ATm of applicative termsand the type

A type (scheme) S ∈ Sch D ∗ is said to be a reﬁnement type .Reﬁnement types come equipped with an underlying type , written U( S ) , which is a type (scheme)in Sch D deﬁned recursively as follows: U( a ) = a U( b ) = b U( inj f d T ) = d U( T )U( T → T ) = U( T ) → U( T )U( ∀ a . S ) = ∀ a . U( S ) It would suﬃce to take all the reﬁnements of all the slices (which itself still includes some redundancy, but this wouldcomplicate the deﬁnitions for no practical gain) ntensional Datatype Refinement 1:9(SShape) (cid:12)(cid:12) U( T ) , U( T ) ⊢ T T (SMis) (cid:12)(cid:12) dom ( ∆ ∗ ( d )) * dom ( ∆ ∗ ( d )) ⊢ d T d T ⊢ U i [ T / α ] 6⊑ U i [ T / β ] (SSim) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) m = Arity ( k ) , i ∈ [ .. m ] ∆ ∗ ( d )( k ) = ∀ α . U → · · · U m → d α ∆ ∗ ( d )( k ) = ∀ β . U → · · · U m → d β ⊢ d T d T ⊢ T ′ T (SArrL) ⊢ T → T T ′ → T ′ ⊢ T T ′ (SArrR) ⊢ T → T T ′ → T ′ Fig. 3. The complement of the subtyping relation on monotypes.

In the following, we will assume that we are given a program equipped with a complete under-lying typing, that is: every subterm M has an associated type S ∈ Sch D . Our task will be to ﬁnda new reﬁnement typing: an assignment to each subterm M of a reﬁnement type S ′ ∈ Sch D ∗ thathas the same “shape” as the underlying type S of M in the sense that U( S ′ ) = S . Reﬁnement induces a natural ordering on reﬁnement datatypes according to which constructorsare available in their deﬁnition. This ordering can then be lifted to all types built over thosedatatypes in the obvious way.

Deﬁnition 9 (Subtyping).

The judgement ⊢ T ⊑ T is deﬁned coinductively, using the systemof rules in Figure 3. We extend subtyping to two schemes that have the same quantiﬁer preﬁx,writing: ⊢ ∀ α . T ⊑ ∀ α . T whenever ⊢ T ⊑ T . We say that two types T and T are subtypeequivalent and write ⊢ T ≡ T just if ⊢ T ⊑ T and ⊢ T ⊑ T .It is straightforward to show the following by coinduction: Lemma 1.

Subtyping is a preorder.Intuitively, reﬁnement speciﬁes the possible shapes of types that are then interrelated by sub-typing. Reﬁnement is a covariant treatment of arrow types, since we have U( T → T ) = T ′ → T ′ iﬀ U( T ) = T ′ and U( T ) = T ′ . On the other hand, as can be seen from the deﬁnition, subtypinginterprets the argument type contravariantly. Consequently, there are some U( T ) = T for which T ⊑ T and other U( T ) = T for which T ⊑ T (viewing an underlying type as its own trivialreﬁnement).We give the deﬁnition coinductively because, as usual, there is a notion of simulation that arisesnaturally from our coalgebraic view of datatype environments. Consequently, it is most straight-forward to think of the deﬁning rules as providing a system in which to construct ﬁnite refutationsof subtype inequalities T T , which will, ultimately, fail to hold either because the types T and T have a diﬀerent shape, or because T provides some constructor that T does not. :10 Eddie Jones and Steven Ramsay Example 10.

Following the running example, the judgement ⊢ LATm → String ATm → String follows by a simple refutation: (SMis) ⊢ Arith LArith(SSim) ⊢ ATm LATm (SArrL) ⊢ LATm → String ATm → String

Conversely, we can use the coinduction principle to show that ⊢ T ⊑ T , in which case werequire a model of ⊑ that contains ( T , T ) . Example 11.

For ⊢ ATm → String ⊑ LATm → String we provide the following witness: (cid:26) ( ATm → String , LATm → String ) , ( String , String ) , ( LATm , ATm ) , ( LArith , Arith ) , ( Int , Int ) (cid:27) It can be easily veriﬁed that this set is a model of the deﬁning rules for ⊑ and hence, by coinduction,is contained within it.However, such models can be a bit unwieldy in general as the types involved get more complex.We can do better by observing that the deﬁnition can be approximated by a coinductive part,concerning datatypes, and an inductive part, by which a subtyping relationship between datatypesis lifted to all types. Consequently, we need only ﬁnd a model of the coinductive part, which ismuch neater since it only concerns Dt D × Dt D . The following can be shown by a straightforwardcoinduction. Lemma 2 (Simulation).

Let R ⊆ Dt D × Dt D and suppose that, for all ( d T , d T ) ∈ R , and forall k such that ∆ ( d )( k ) is deﬁned: • ∆ ( d )( k ) is deﬁned. • And, moreover, Ty ( R )( U i , U i ) for each i ∈ [ .. Arity ( k )] , where U and U are the argumenttypes of ∆ ( d )( k ) and ∆ ( d )( k ) instantiated at T and T respectively.Then it follows that Ty ( R ) is included in the subtype relation.Using this result, it suﬃces to exhibit R ≔ {( LATm , ATm ) , ( LArith , Arith )} in order to conclude ⊢ ATm → String ⊑ LATm → String from Example 10. Intuitively, this witness determines themodel Ty ( R ) , which contains the model of Example 11. In this section, we present a reﬁnement type system whose purpose is to exclude the possibilityof pattern-match failure. To achieve this, the typing rule for pattern-matching requires that casesare exhaustive according to the type of the scrutinised expression. However, the system allowsfor all reﬁnement datatypes and incorporates the above notion of subtyping, which allows for thescrutinised expression to be typed much more precisely than is possible in the underlying typesystem.For the purpose of deﬁning the reﬁnement type system, we make some standard Hindley-Damas-Milner assumptions about the underlying type system, namely that type application happens im-mediately after introducing a variable of polymorphic type and type abstraction happens only atthe point of deﬁnition. As a minor simpliﬁcation, we assume that constants are monomorphic andwrite C ( c ) for the monotype assigned to c axiomatically. Since this is a reﬁnement type system, By which we mean a set of pairs of types satisfying all -deﬁning rules. ntensional Datatype Refinement 1:11(TModE) Γ ⊢ ϵ : Γ Γ ⊢ m : Γ ′ Γ ′ ∪ { x : T } ⊢ e : T ′ ⊢ T ′ ⊑ T (TModD) Γ ⊢ m · h x = Λ α . e i : Γ ′ ∪ { x : ∀ α . T } Fig. 4. Typing for modules. we assume that all expressions have already been assigned an underlying type, which we will typ-ically write with an underline to aide readability. We will only need to consult these underlyingtypes when they appear in the syntax, e.g. abstraction and type application.Additionally, we relax the normal deﬁnition of a type environment from a function to a relation.Program variables may, therefore, have many types as long as they reﬁne the same underlyingtype. This assumption is equivalent to allowing environment-level intersection types.

Deﬁnition 12 (Type assignment). A type environment , typically Γ or ∆ , is a ﬁnite relation betweenprogram variables x and type schemes S , whose elements are typically written x : S . We requirethat x : S ∈ Γ and x : S ∈ Γ implies U( S ) = U( S ) . This ensures that U( Γ ) can be deﬁnedin the obvious way. The type assignment system is divided into two sets of rules, for expressions(Figure 5) and for modules (Figure 4), deﬁning judgements, respectively: Γ ⊢ e : T Γ ⊢ m : ∆ in which U( Γ ) ⊢ e : U( T ) and U( Γ ) ⊢ m : U( ∆ ) are the underlying typings, provided by theprogramming language, for the expression e and the module m respectively.The system is conceptually similar to an underyling ML-style system, but the following shouldbe noted. • Any suitable reﬁnement datatype d can be used in order to type a datatype constructor orthe scrutinee of a case statement. • The notion of subtyping from the previous section is incorporated through a subsumptionrule (recall that ⊢ T ⊑ T implies that T and T have the same shape according to U ). • The pattern-matching rule is restricted by a side condition requiring that the cases are ex-haustive. • The branches of the case expression only need to be typed if the branch is reachable, in-corporating ﬂow-sensitivity. This relaxation only makes sense for a reﬁnement type system,and from an operational point of view it makes no diﬀerence to the set of computationsexpressible. • Finally, everywhere a particular underlying type is required by the syntax, an arbitrarychoice of reﬁnement type of the appropriate shape can be made in its place.As discussed in the introduction, allowing several types for each term ensures they can be usedin diﬀerent contexts. This approach is more lightweight than an intersection type system, andarguably easier for programmers to reason about if types are to be considered as certiﬁcates. Whenit comes to algorithmic inference, however, the non-deterministic aspect would be problematic.Instead, in Section 7, we rely on reﬁnement polymorphism to summarise every typing of a variablein some environment compactly by a single constrained type scheme. The polymorphism of thiskind is no diﬀerent from that of the Hindley-Milner system, which could equally be viewed as aninﬁnite intersection type system, or indeed allowing several typings of the same variable in anenvironment. Likewise, it is simpler to deﬁne polymorphic constructors and datatypes, than toconsider each instantiation separately. :12 Eddie Jones and Steven Ramsay(TVar) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) U( T ) = Tx : ∀ α . T ∈ Γ Γ ⊢ x T : T [ S / α ] (TCst) Γ ⊢ c : C ( c ) (TCon) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) U( T ) = Tk ∈ dom ( ∆ ∗ ( d )) ∆ ∗ ( d )( k ) = ∀ α . T Γ ⊢ k T : T [ S / α ] Γ ⊢ e : T (TSub) (cid:12)(cid:12) ⊢ T ⊑ T Γ ⊢ e : T Γ ∪ { x : T } ⊢ e : T (TAbs) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) U( T ) = T x < dom Γ Γ ⊢ λx : T . e : T → T Γ ⊢ e : T → T Γ ⊢ e : T (TApp) Γ ⊢ e e : T Γ ⊢ e : d T ( ∀ i ≤ m ) Γ ∪ { x i : ∆ ∗ ( d )( k i )[ T / α ]} ⊢ e i : T (TCase) (cid:12)(cid:12) dom ( ∆ ∗ ( d )) = { k , . . . , k m } Γ ⊢ case e of {| mi = k i x i e i } : T Fig. 5. Type assignment for expressions.

Example 13.

Recall the reﬁnements of Example 7 and consider the function cloSub , with under-lying type

List ( String × Lam ) →

Lam → Lam , whose purpose is to closing an applicative termby substituting closed terms everywhere: cloSub m t =case t ofFVr s → lkup m sCst c → Cst cApp u v → App (cloSub m u) (cloSub m v)

To keep the example simple, we assume that the lookup function lkup has the following type: ∀ α . List ( String × α ) → String → α in the environment. Then the function cloSub can be assignedthe reﬁnement type : List ( Strinд × ATm ) → ATm → ATm Thus expressing that fact that the application of a closing substitution to a arbitrary applicative term yields a closed applicative term. This is possible due to a combination of the features of thesystem. First, observe that it is possible, in the abstraction rule, to assume that the bound variable m of underlying type List ( String × Lam ) has type List ( String × ATm ) in (TAbs) since it can easily beseen that the former is a reﬁnement of the latter. Then it follows that lkup Lam m i can be assignedthe type Atm by choosing Atm for T in (TVar) . Second, under the assumption that the boundvariable t has reﬁnement type Atm , it follows from (TCase) that the variable c that is bound by thecase Cst c can be assigned the type Arith . Note that the rule (TCase) is applicable only becausewe have chosen the reﬁnement

ATm of Lam which guarantees that the input will not containany abstractions. Then, in the body of the case, we can choose instead the more speciﬁc typing Here,

List and × can be understood as the trivial reﬁnements of their namesakes, i.e. with all constructors available. ntensional Datatype Refinement 1:13Cst : Arith → ATm . Similarly, u and v are assigned the type ATm so that the subexpressions cloSub m u and cloSub m v in the body of the ﬁnal case can be assigned the type

ATm . Then thetype of the body as a whole, and therefore the entire case analysis, is also Atm .The central problem is typability, for closed expressions: given an underlying datatype environ-ment ∆ and a closed module m which is typed in ∆ , does there exist a reﬁnement type assignmentto the functions of m ? Typically m will contain library functions whose source is not available tothe system, but for which an underlying type is known. To incorporate such functions we interpretan underlying type environment Γ as containing trivial reﬁnement types for each such function, i.e.each d occurring in such a type Γ denotes the reﬁnement of d that makes available all constructors. Deﬁnition 14 (Typability).

A triple ∆ , Γ and m constitutes a positive instance of the reﬁnementtypability problem just if there is a reﬁnement type environment Γ ′ such that Γ ⊢ m : Γ ′ . In such acase, we say that ∆ ; Γ ⊢ m is reﬁnement typable .The rest of the paper concerns the algorithmic solution of the typability problem. We assume a countable set of reﬁnement variables , ranged over by X , Y , Z and so on. The purposeof a reﬁnement variable X is to represent a function in Π d ∈ D . P( dom ∆ ( d )) . As described inSection 3, such functions are in 1-1 correspondence with reﬁnements of ∆ . We will abuse notationand write X for both uses (thus the following rather strange-looking equation X ( d ) = dom ( X ( d )) holds by interpreting each of the two occurrences of X according to its context.) Deﬁnition 15 (Constraints). A constructor set expression , typically S , is either a ﬁnite set of con-structors { k , . . . , k m } or a pair X ( d ) consisting of a reﬁnement variable X and an underlyingdatatype d . The underlying type of the constructor set expression is (partially) deﬁned as follows: U( X ( d )) = d U({ k , . . . , k m }) = d if ∀ i ∈ [ .. m ] . k i ∈ dom ∆ ( d ) We consider only those constructor set expressions for which the underlying type is deﬁned. Wewrite

FRV ( S ) for the set of reﬁnement variables occurring anywhere in S (which will either beempty or a singleton).An inclusion constraint is an ordered pair of constructor set expressions, written (suggestively)as S ⊆ S . When S is a singleton { k } , we will rather write the pair as k ∈ S . We shall only considerinclusion constraints in which both set expressions have the same underlying type. The reﬁnementvariables of an inclusion constraint FRV ( S ⊆ S ) are deﬁned by extension from FRV ( S ) and FRV ( S ) in the obvious way.A conditional constraint , hereafter just constraint , is a pair ϕ ? S ⊆ S consisting of a set ofinclusion constraints ϕ and an inclusion constraint S ⊆ S . The set ϕ is called the guard and theinclusion S ⊆ S the body . We will only consider conditional constraints in which each element ofthe guard has shape k ∈ X ( d ) . When the guard of a constraint ∅ ? S ⊆ S is trivial, we shall usuallyomit it and write only the body S ⊆ S . The set of reﬁnement variables FRV ( ϕ ? FRV ( S ⊆ S )) ofa constraint is deﬁned as usual.Sometimes we shall guard a constraint set C , and write ϕ ? C for the set { ψ ∪ ϕ ? S ⊆ S | ψ ? S ⊆ S an element of C } . We write FRV ( C ) for the set of reﬁnement variables occurring in C .Intuitively, an inclusion S ⊆ S is satisﬁed by any assignment to the reﬁnement variables thatmakes S included in S . A constraint ϕ ? S ⊆ S is satisﬁed if either some inclusion in the guardis not satisﬁed or the body is satisﬁed. :14 Eddie Jones and Steven Ramsay(ISBase) ⊢ b ⊑ b = ⇒ ∅ (ISTyVar) ⊢ α ⊑ α = ⇒ ∅⊢ T ⊑ T = ⇒ C ⊢ T ⊑ T = ⇒ C (ISArr) ⊢ T → T ⊑ T → T = ⇒ C ∪ C ( ∀ ki . ) ⊢ inj X ( U i )[ T X / α ] ⊑ inj Y ( U i )[ T Y / α ] = ⇒ C k i (ISData) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ∆ ( d )( k ) = ∀ α . U → · · · U n → d αC = { X ( d ) ⊆ Y ( d )} ∪ Ð k Ð ni = ( k ∈ X ( d )) ? C k i ⊢ inj X d T X ⊑ inj Y d T Y = ⇒ C Fig. 6. Inference for subtype inequalities.

Deﬁnition 16 (Satisfaction). A constructor set assignment , hereafter just assignment , is a total map θ taking each reﬁnement variable X to a constructor choice function from Π d ∈ D . P( dom ∆ ( d )) .The meaning of a constructor set expression S under an assignment θ is a set of constructors θ J S K deﬁned as follows: θ J X ( d ) K = θ ( X )( d ) θ J { k , . . . , k m } K = { k , . . . , k m } An inclusion constraint S ⊆ S is satisﬁed by an assignment θ , written θ | = S ⊆ S just if θ J S K isincluded in θ J S K . A constraint ϕ ? S ⊆ S is satisﬁed by an assignment θ , written θ | = ϕ ? S ⊆ S just if, whenever θ | = k ∈ X ( d ) for every inclusion constraint k ∈ X ( d ) in ϕ , then θ | = S ⊆ S . Deﬁnition 17 (Solutions). A solution to a constraint set C is an assignment θ satisfying everyconstraint in C , we write θ | = C . We say that C is solvable , or satisﬁable , just if it has a solution. Remark 1.

The full set constraint language is exactly the monadic class of ﬁrst-order proposi-tions [8]. By applying the translation of that paper, it can be shown that guarded constraints of theform laid out above are (monadic) Horn clauses with constructors simply interpreted as constants.

Since our system is eﬀectively syntax directed (the subsumption rule can be factored into theother syntax-directed rules), type inference follows a standard pattern of constraint generationand satisﬁability checking (see e.g. [31]). The constraints are subtype inequalities over reﬁnementvariables, but it is easily seen that, in our restricted setting, such inequalities are equivalent toconditional inclusion constraints between reﬁnement variables and sets of datatype constructors.To enable this approach, we extend the language of types so to allow datatypes parametrised byreﬁnement variables.

Deﬁnition 18 (Extended Types).

The extended types are monotypes extended with datatypes builtover reﬁnement variables: T , U , V F · · · | inj X d T Note that the type arguments to an injected datatype indentiﬁer are also extended. Expressionsof the form ( inj X List )( inj Z Int ) are, therefore, well-formed. Recall from Section 3 that reﬁnementdatatype identﬁers are of the form inj ∆ d , with d an underlying datatype identiﬁer, and should be ntensional Datatype Refinement 1:15 thought of as specifying the reﬁnement of d whose datatype deﬁnition is given by ∆ . The task ofinference is to determine constraints on these ∆ that enable a typing to be assigned and check thatthe constraints have a solution.For convience, we shall implicitly lift injections to any type, or sequence of types, written inj X T ,so that the injection is distributed over datatypes in T . In the context of extended types, we willassociate a substitution action θT with each constructor set assignment θ by lifting the deﬁnition θ ( inj X d ) ≔ inj θ ( X ) d homomorphically over all extended types. Finally, we write FRV ( T ) for the set of reﬁnement vari-ables occurring in injections in T .We also adopt an extension of type schemes that are constrained: Deﬁnition 19 (Constrained Type Scheme).

We subsume the type scheme S by the constrained typescheme , which has shape: ∀ α . ∀ X . C ⊃ T , where C is a constraint set and T is an extended type .We deﬁne FRV ( ∀ α . ∀ X . C ⊃ T ) = ( FRV ( C ) ∪ FRV ( T )) \ X . A constrained type environment is aﬁnite mapping from program variables to constrained type schemes, whose elements are written x : S . We deﬁne FRV ( Γ ) in the obvious way.As is typical, there is generally no “best” monotype solution to a set of inclusion constraints,so constrained type schemes give us an internal representation for the set of all types assignableto a module-level function. For example, assuming constant combinator K deﬁned as usual, it canbe seen that the module-level recursive function f = λx : Lam . K [ Lam , Lam ] x ( f ( f x )) can beassigned the constrained type scheme : ∀ XY . C ⊃ inj X Lam → inj Y Lam , with C : X ( Lam ) ⊆ Y ( Lam ) Y ( Lam ) ⊆ X ( Lam ) Cst ∈ X ( Lam ) ? X ( Arith ) ⊆ Y ( Arith ) Cst ∈ Y ( Lam ) ? Y ( Arith ) ⊆ X ( Arith ) Intuitively, its input ﬂows to its output and conversely, so we require inj X Lam ⊑ inj Y Lam and inj Y Lam ⊑ inj X Lam (which is encoded by the above set constraints when we view the reﬁnements X and Y as functions specifying the choice of constructors). However, there is obviously no “best”instantiation of reﬁnement variables X and Y .Constrained type environments can be understood as compact descriptions of “ordinary” typeenvironments (in the sense of Deﬁnition 12), which is made precise as follows. Deﬁnition 20.

Deﬁne L Γ M for for the type environment that can be obtained from the closed con-strained type schemes in Γ , by instantiation of reﬁnement quantiﬁers with every possible solution,that is, supposing Γ is closed: L Γ M ≔ { x : ∀ α . θT | θ | = C ∧ ( x : ∀ α . ∀ X . C ⊃ T ) ∈ Γ } Typical presentations of type inference by constraint generation involve choosing fresh typevariables, which are then constrained. Since we work with reﬁnement types, it is more convenientto choose fresh reﬁnement type templates, which are just reﬁnement types that are everywhereparametrised by fresh reﬁnement variables — in the setting of reﬁnement types, at the point atwhich inference would choose a fresh type, the underlying shape of the type is already known.We write

Fresh ( X ) to assert that X must be a fresh reﬁnement variable (i.e. not already used in thecurrent scope). We extend the notion to fresh types T of underlying shape T . Deﬁnition 21 (Fresh Types).

We write

Fresh T ( T ) for the following inductive predicate. • For all α ∈ A , Fresh α ( α ) . :16 Eddie Jones and Steven Ramsay(ICst) Γ ⊢ c : T = ⇒ T , ∅ (ICon) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) k ∈ dom ( ∆ ( d )) Fresh ( X ) and Fresh T ( T ) Γ ⊢ k T : V = ⇒ inj X V T , { k ∈ X ( d )} (IVar) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) x : ∀ α . ∀ X . C ⊃ U ∈ Γ Fresh ( Y ) and Fresh T ( T ) Γ ⊢ x T : V = ⇒ U [ Y / X ][ T / α ] , C [ Y / X ] Γ ∪ { x : T } ⊢ e : T = ⇒ T , C (IAbs) (cid:12)(cid:12)(cid:12) Fresh T ( T ) Γ ⊢ λx : T . e : T → T = ⇒ T → T , C Γ ⊢ e : T → T = ⇒ T → T , C Γ ⊢ e : T = ⇒ T , C ⊢ T ⊑ T = ⇒ C (IApp) Γ ⊢ e e : T = ⇒ T , C ∪ C ∪ C ( ∀ i ≤ m ) ⊢ T i ⊑ T = ⇒ C ′ i Γ ⊢ e : d T = ⇒ inj X d T , C ( ∀ i ≤ m ) Γ ∪ x i : ( inj X A )[ T / α ] ⊢ e i = ⇒ T i , C i (ICase) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) Fresh T ··· T ( T · T i ) C = C ∪ { X ( d ) ⊆ { k , . . . , k m }}∪ Ð mi = ( k i ∈ X ( d ) ? ( C i ∪ C ′ i )) ∆ ( d )( k i ) = ∀ α . A → · · · A n → d α Γ ⊢ case e of {| mi = k i x i e i } : T = ⇒ T , C Fig. 7. Inference for expressions. • For all b ∈ B , Fresh b ( b ) . • For d ∈ D , if Fresh T ( T ) for every T in T , and Fresh ( X ) then Fresh d ( inj X ( d ) T )• For all T , T ∈ Ty D ∗ , T , T ∈ Ty D , if Fresh T ( T ) and Fresh T ( T ) then Fresh T → T ( T → T ) .The deﬁnition guarantees that U( T ) = T . We extend the notion to sequences of types, writing Fresh T ( T ) to denote that the two sequences T and T have the same length and are related point-wise by freshness.We now have all the necessary structure to describe the constraint generation algorithm. Deﬁnition 22 (Inference).

Inference is split into three parts: for subtyping (Figure 6), for expres-sions (Figure 7) and for modules (Figure 8) using three judgement forms, respectively: ⊢ T ⊑ T = ⇒ C Γ ⊢ e : T = ⇒ T , C Γ ⊢ m = ⇒ Γ ′ , C Given two (extended) types T and T we infer a set of constraints C under which the former will bea subtype of the latter using the system of judgements ⊢ T ⊑ T = ⇒ C . For expressions in context Γ ⊢ e : T , we infer (extended) monotypes T and the constraints C under which they are permissible ntensional Datatype Refinement 1:17(IModE) Γ ⊢ ϵ = ⇒ ΓΓ ⊢ m = ⇒ Γ ′ Γ ′ ∪ { x : T } ⊢ e = ⇒ T ′ , C ⊢ T ′ ⊑ T = ⇒ C (IModD) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) Fresh T ( T ) X = FRV ( T ) Γ ⊢ m · h x : ∀ α . T = Λ α . e i = ⇒ Γ ′ ∪ { x : ∀ α . ∀ X . C ∪ C ⊃ T } Fig. 8. Inference for modules. using a system of judgements of the form Γ ⊢ e : T = ⇒ T , C . In such judgements, Γ a constrainedtype environment, i.e. a ﬁnite map from term variables to constrained type schemes. We will omitthe underlying type when not important. The rules are given in Figure 7. Constrained reﬁnementtype schemes are inferred for module-level deﬁnitions using a system of judgements of shape Γ ⊢ m = ⇒ Γ ′ , C . The deﬁnitions are given in Figure 8. The systems can be read algorithmically byregarding the quantities before the = ⇒ as inputs the quantities afterwards as outputs (however, itshould be noted that, assuming regular datatypes only, the subtyping relation must be computedby achieving a ﬁxed point explicitly).Constrained type generation via these systems of rules follows a well established pattern forexpressions and modules (see e.g. [31] for a general treatment of the non-reﬁnement case), sowe concentrate on the inference rules for subtyping. Like the more standard inference rules forexpressions and modules, the inference rules for subtyping generate a derivation tree and a systemof constraints whose solution guarantees the correctness of the corresponding instance of thederivation tree. However, in the case of subtyping, the derivation tree is not a proof in the systemof Figure 3, which is for the complement of the subtyping relation, but rather a proof that thesolution constitutes a simulation in the sense of Lemma 2. For example, the conclusion of (ISData) yields the constraints { X ( d ) ⊆ Y ( d )} ∪ Ð k ∈ K Ð ni = ( k ∈ X ( d )) ? C k i . The ﬁrst part of this constraintencodes the ﬁrst bullet of Lemma 2: the environment ∆ ∗ at inj Y d must include all constructorsincluded by the same environment at inj X d . Since, for any reﬁnement X , dom ( ∆ ∗ ( inj X d )) = X ( d ) (recall the notational abuse adopted at the start of Section 6), we arrive at X ( d ) ⊆ Y ( d ) . Thesecond part of this constraint encodes the second bullet of the lemma: if k ∈ dom ( ∆ ∗ ( inj X d )) (and, therefore, k ∈ dom ( ∆ ∗ ( inj Y d )) ), then the corresponding argument types are again related —in inference we recursively infer constraints on the relationship between the types and guard theconstraints by k ∈ X ( d ) . Theorem 23 (Soundness and completeness of ⊑ -inference). Let T and T be extended types andsuppose ⊢ T ⊑ T = ⇒ C . Then, for all assignments θ : ⊢ θT ⊑ θT iﬀ θ | = C The following states the correctness of type inference for expressions in a closed environment(e.g. for module-level deﬁnitions). The appendix contains a proof for the general case.

Theorem 24 (Soundness and completeness of expression inference).

Let Γ be a constrained typeenvironment, e an expression, V an extended reﬁnement, C a set of constraints and let Γ ⊢ e = ⇒ V , C . Then, for all reﬁnement types T : L Γ M ⊢ e : T iﬀ ∃ θ . θ | = C ∧ ⊢ θV ⊑ T Finally, we can state the overall correctness of inference for modules. :18 Eddie Jones and Steven Ramsay ϕ ? S ⊆ S ψ ? S ⊆ S (Transitivity) ϕ ∪ ψ ? S ⊆ S ϕ ? k ∈ X ( d ) ψ , k ∈ X ( d ) ? S ⊆ S (Satisfaction) ϕ ∪ ψ ? S ⊆ S ϕ ? X ( d ) ⊆ Y ( d ) ψ , k ∈ Y ( d ) ? S ⊆ S (Weakening) ϕ ∪ ψ , k ∈ X ( d ) ? S ⊆ S Fig. 9. Saturation rules.

Theorem 25 (Soundness and completeness of module inference).

Suppose Γ and Γ ′ are closed con-strained type environments, m a module and Γ ⊢ m = ⇒ Γ ′ . Then, for all type environments ∆ : L Γ M ⊢ m : ∆ iﬀ L Γ ′ M ⊑ ∆ The solvability of constraints can be determined by a process of saturation under all possible con-sequences. This is a generalisation of the transitive closure of simple inclusion constraint graphs,and a particular instance of Horn clause resolution more generally. For our constraint language,saturated constraint sets have a remarkable property: they can be restricted to any subset of theirvariables whilst preserving solutions.

Deﬁnition 26 (Atomic constraints).

A constraint is said to be atomic just if it’s body is one of thefollowing four shapes: X ( d ) ⊆ Y ( d ) X ( d ) ⊆ { k , . . . , k m } k ∈ X ( d ) k ∈ ∅ An atomic constraint is said to be trivially unsatisﬁable if it is of shape ∅ ? k ∈ ∅ . A constraint setis said to be trivially unsatisﬁable just if it contains a trivially unsatisﬁable constraint.By applying standard identities of basic set theory, every constraint is equivalent to a set ofatomic constraints. In particular, a constraint of the form k ∈ { k , . . . , k m } is equivalent to theempty set of atomic constraints (i.e. can be eliminated) whenever k is one of the k i . Deﬁnition 27 (Saturated constraint sets).

An atomic constraint set, i.e. one that only containsatomic constraints, is said to be saturated just if it is closed under the saturation rules in Figure 9.We write

Sat ( C ) for the saturated atomic constraint set obtained by iteratively applying the satu-ration rules to the set C . Remark 2.

From the perspective of Remark 1, all three rules of Figure 9 correspond to specialcases of resolution.The (Transitivity) rule closes subset inequalities under transitivity, but must keep track of theassociated guards by taking the union. The (Satisfaction) rule allows for a guard atom k ∈ X ( d ) tobe dropped whenever the same atom constitutes the body of another constraint in the set (but theother guards from both must be preserved). Finally, the (Weakening) rule allows for replacing Y ( d ) in a guard by X ( d ) when it is known to be no larger, thus weakening the constraint. Saturationunder these rules preserves and reﬂects solutions: ntensional Datatype Refinement 1:19 Theorem 28 (Saturation equivalence).

For any assignment θ , θ | = C iﬀ θ | = Sat ( C ) .If there are no trivially unsatisﬁable constraints in Sat ( C ) , then a solution can be constructed asfollows. For each variable X occurring in C , deﬁne a function θX as follows: ( θX )( d ) ≔ { k | k ∈ X ( d ) is in Sat ( C )} Then θ solves Sat ( C ) . Conversely, if there is a trivially unsatisﬁable constraint in Sat ( C ) , then Sat ( C ) is unsolvable and, by the equivalence theorem, it follows that C has no solution either. Theorem 29. C is unsatisﬁable iﬀ Sat ( C ) is trivially unsatisﬁable.Hence, solvability can be determined by saturation. In practice, having established that a constraint set is solvable, we are only interested in the so-lutions for a certain subset of the reﬁnement variables. For example if, as we have seen, the con-straints C describe a set of types { Tθ | θ solves C } of the module level functions, then we mayconsider two solutions θ and θ to be the same whenever they agree on FRV ( T ) . We call the freereﬁnement variables of T , in this case, the interface variables . Deﬁnition 30.

Let C be a saturated constraint set and let I be some set of reﬁnement variables,called the interface variables . Then deﬁne the restriction of C to I , written C ↾ I , as the set { ϕ ? S ⊆ S ∈ C | FRV ( ϕ ) ∪ FRV ( S ) ∪ FRV ( S ) ⊆ I } .The restriction of C to I is quite severe, since it simply discards any constraint not solely com-prised of interface variables. However, a remarkably strong property of the rules in Figure 9 is that,whenever C is solvable, every solution of Sat ( C ) ↾ I may be extended to a solution of Sat ( C ) (andtherefore of C ) — independent of the choice of I ! Since every solution of Sat ( C ) trivially restrictsto a solution of Sat ( C ) ↾ I (the latter has fewer constraints over fewer variables), it follows that thesolutions of Sat ( C ) ↾ I are exactly the restrictions of the solutions of C . Example 31.

Consider the following constraint set C by way of an illustration: ⋆ Cst ∈ X ( Lam ) X ( Lam ) ⊆ X ( Lam ) FVr ∈ X ( Lam ) ? X ( Lam ) ⊆ X ( Lam ) ⋆ X ( Lam ) ⊆ {

FVr , Cst } This set is not saturated and, consequently, there is no guarantee that the restriction of this set toan interface results in a constraint system whose solutions can generally be extended to solutionsof the original set C . For example, if we restrict this set to the interface I = { X , X } , the eﬀect willbe to retain the two starred constraints. But then θ ( X )( d ) ≔ ( { Cst , FVr , App } if X = X and d = Lam ∅ otherwise In the sense of Theorem 32. :20 Eddie Jones and Steven Ramsay is a solution of C ↾ I that does not extend to any solution of C . However, after saturation, Sat ( C ) consists of the following: ⋆ Cst ∈ X ( Lam ) X ( Lam ) ⊆ X ( Lam ) Cst ∈ X ( Lam ) FVr ∈ X ( Lam ) ? X ( Lam ) ⊆ X ( Lam ) FVr ∈ X ( Lam ) ? X ( Lam ) ⊆ X ( Lam ) FVr ∈ X ( Lam ) ? Cst ∈ X ( Lam ) FVr ∈ X ( Lam ) ? X ( Lam ) ⊆ X ( Lam ) ⋆ FVr ∈ X ( Lam ) ? X ( Lam ) ⊆ X ( Lam ) ⋆ FVr ∈ X ( Lam ) ? Cst ∈ X ( Lam ) ⋆ X ( Lam ) ⊆ {

FVr , Cst } FVr ∈ X ( Lam ) ? X ( Lam ) ⊆ {

FVr , Cst } ⋆ FVr ∈ X ( Lam ) ? X ( Lam ) ⊆ {

FVr , Cst } In particular, the constraint

FVr ∈ X ( Lam ) ? X ( Lam ) ⊆ X ( Lam ) which will be retained in therestriction Sat ( C ) ↾ I , which consists of the starred constraints. Consequently, the above assignment θ is ruled out. Indeed, one can easily verify that every solution of Sat ( C ) ↾ I extends to a solutionof Sat ( C ) and hence of C . Theorem 32 (Restriction/Extension).

Suppose C is a saturated constraint set and I is a subset ofvariables. Let θ be a solution for C ↾ I . Then there is a solution θ ′ for C satisfying, for all X ∈ I : θ ′ ( X )( d ) = θ ( X )( d ) Although our inference procedure is compositional, i.e. it breaks modules down into top-leveldeﬁnitions, and terms down into sub-terms that can be analysed in isolation, this is no guaranteeof it’s eﬃciency. As we have described it in Section 7, the number of constraints associated witha function deﬁnition depends on the size of the deﬁnition — constraints are generated at mostsyntax nodes and propagated to the root. In fact, as is well known for constrained type inference,the situation is worse than simply this, because a whole set of constraints is imported from theenvironment when inferring for a program variable x . The number of constraints associated with x will again depend on the size of the deﬁnition of x and the number of constraints associatedwith any functions that x depends on, and so the number of constraints can become exponentialin the number of function deﬁnitions .Let us ﬁx N to be the number of module-level function deﬁnitions, K the maximum number ofconstructors associated with any datatype and D the maximum number of datatypes associatedwith any slice (for Lam , this is 2). A simple analysis of the shape of constraints yields the followingbound:

Lemma 3.

There are O( Kv D · vKD + K ) atomic constraints over v reﬁnement variables.Suppose Γ ⊢ e = ⇒ T , C . As it stands, the number of variables v occurring in C will dependupon the size of e and the size of every deﬁnition that e depends on. We use restriction to breakthis dependency between the size of constraint sets and the size of the program. We compute C ↾ I for each constraint set C generated by an inference (i.e. at every step) with the interface I taken Although it is known that this can be avoided by a clever representation in the case of constraints that are only simplevariable/variable inclusions [20]. ntensional Datatype Refinement 1:21 to be the free variables of the context

FRV ( Γ ) ∪ FRV ( T ) . Consequently, the number of variables v occurring in C ↾ I only depends on the number of reﬁnement variables that are free in Γ and T . Since all the reﬁnement variables of module level function types are generalised by (IModD) (assuming, as is usual, that inference for modules occurs in a closed environment), it follows thatthe only free reﬁnement variables in Γ are those introduced during the inference of e , as a resultof inferring under abstractions and case expressions. Thus v becomes independent of the numberof function deﬁnitions.Moreover, if we assume that function deﬁnitions are in β -normal form and that the maximumcase expression nesting depth within any given deﬁnition is ﬁxed (i.e. does not grow with the sizeof the program) then it follows that the number of reﬁnement variables free in the environment isbounded by the size of the underlying type of e . Clearly, the number of free reﬁnement variablesin T is also bounded by its underlying type.Consequently, for a constraint set C arising by an inference Γ ⊢ e = ⇒ T , C and then restricted toits context, the number of constraints given by Lemma 3 only depends on the size of the underlyingtypes assigned to the e and, in the case of datatypes, the size of their deﬁnitions (slices). If weconsider scaling our analysis to larger and larger programs to mean programs consisting of moreand more functions, with bounded growth in the size of types and the size of individual functiondeﬁnitions then we may reasonably consider all these parameters ﬁxed. Consequently: Theorem 33.

Under the assumption that the size of types and the size of individual functiondeﬁnitions is bounded, the complexity of type inference is O( N ) .A more ﬁne grained analysis is given in an appendix.

10 IMPLEMENTATION

We implemented a prototype of our inference algorithm for Haskell as a GHC plugin. The usercan run our type checker as another stage of compilation with an additional command line ﬂag. Itis available from:https://github.com/bristolpl/intensional-datatysIn addition to running the type checker on individual modules, an interface binary ﬁle is generated,enabling other modules to use the constraint information in separate compilations.Our plugin processes GHC’s core language [40], which is signiﬁcantly more powerful than thesmall language presented here. Speciﬁcally, it must account for higher-rank types (including exis-tentials), casts and coercions, type classes. We have not implemented a treatment of these featuresin our prototype and so any occurrences are not analysed. Furthermore, we disallow empty re-ﬁnements of single-constructor datatypes (e.g. records). This relatively small departure from thetheory is a substantial improvement to the eﬃciency of the tool due to the number of records andnewtypes that are found in typical Haskell programs.Since we do not analyse the dependencies of packages, datatypes that are deﬁned outside thecurrent package are treated as base types and not reﬁned The resulting analysis provides a certiﬁ-cate of safety for some package modulo the safe use of its dependencies.In addition to missing cases, the tool uses the results of internal analyses in GHC to identifypattern matching cases that will throw an exception. For example, the following code will be con-sidered as potentially unsafe: nnf2dnf (Lit a) = [[a]]nnf2dnf (Or p q) = List.union (nnf2dnf p) (nnf2dnf q)nnf2dnf (And p q) = distrib (nnf2dnf p) (nnf2dnf q)nnf2dnf _ = error "Impossible case!" :22 Eddie Jones and Steven Ramsay

We recorded benchmarks on a 2.20GHz Intel ® Core ™ i5-5200U with 4 cores and 8.00GB RAM. Weused the following selection of projects from the Hackage database: • aeson is a performant JSON serialisation library. • The containers package provides a selection of classic functional data structures such assets and ﬁnite maps. The Data.Sequence module from this package contains machine gener-ated code that lacks the typical modularity and structure of hand written code. For example,it contains an automatically generated set of 6 mutually recursive functions, each with acomplex type and deeply nested matching. The corresponding interface is in excess of 80reﬁnement variables. This module could not be processed to completion in a small amountof time and so we have omitted it from the results. We will look explore how best to processexamples that violate our complexity assumptions in follow-up work. • extra is a collection of common combinators for datatypes and control ﬂow. • fgl (Functional Graph Library) provides an inductive representation of graphs and associ-ated operations. • haskeline is a command-line interface library • parallel is Haskell’s default library for parallel programming • sbv is an SMT based veriﬁcation tool for automatically proving properties of Haskell pro-grams. • The time library contains several representations of time, clocks and calendars. • unordered-containers provides hashing-based containers, for either performant code ordatatypes without a natural ordering.For each module we recorded the average time elapsed in milliseconds across 10 runs and thenumber of top-level deﬁnitions (N). We note both the total number of reﬁnement variables gener-ated during inference (V) and the largest interface (I). The contrast between these two ﬁgures givessome indication of how intractable the analysis may become be without the restriction operator.Naturally, constant factors will vary considerably between modules (not in correspondence withtheir size) and so our results also include the number of constructors (K) that appear in the largestdatatype, and the number of datatypes (D) in the largest slice.The benchmarks in Figure 1 provide a summary of the results for each project, i.e. the total timetaken , the total number of top-level deﬁnitions, the total number of reﬁnement variables, themaximum interface size, the largest number of constructors associated with a datatype, and thelargest slice. The full dataset can be found in the appendices.

11 CONCLUSION AND RELATED WORK

The goal of our system is to automatically, statically verify that a given program is free of patternmatch exceptions, and we have phrased it as a type inference procedure for a certain reﬁnementtype system with recursive datatype constraints. We have shown that it works well in practice,although a more extensive investigation is needed. Our primary motivation has been to ensurepredictability by giving concrete guarantees on its expressive power and algorithmic complexity.

Our work sits within a large body of lit-erature on recursive types and subtyping. As a type system, ours is not directly comparable toothers in the literature: on the one hand, the intensional reﬁnement restriction is quite severe, The total time taken is the sum of the time taken to analyse each module independently it therefore doesn’t include startup costs etc. As our tool is just another phase in the GHC compiler, it would be unfair to record the full compilation timewhich is predominantly due to GHC. ntensional Datatype Refinement 1:23

Name N K V D I Time (ms)aeson 728 13 20466 6 14 79.37containers 1792 5 25237 2 23 118.26extra 332 3 5438 3 7 61.53fgl 700 2 18403 2 12 94.32haskeline 1384 15 29389 19 27 111.67parallel 110 1 959 2 18 10.18pretty 222 8 3675 4 16 23.86sbv 5076 44 171869 49 46 518.91time 484 7 9753 6 10 134.16unordered-containers 474 5 7761 3 24 30.56

Table 1. Benchmark Summaries but on the other we allow for ﬂow sensitivity. One of the ﬁrst works to consider subtyping inthe setting of recursive types was that of Amadio and Cardelli [7]. They proposed an exponentialtime procedure for subtype checking, but this was later improved to quadratic by Kozen, Pals-berg, and Schwartzbach [29]. Neither of these works gave a treatment of the combination withpolymorphism, which is the subject of e.g. Castagna and Xu [11], Dolan and Mycroft [12], Hoangand Mitchell [25], Pottier [34]. However, to the best of our knowledge, all the associated typeinference algorithms are exponential time in the size of the program. In particular, Hoang andMitchell [25] shows that a general formulation of typing with recursive subtyping constraints hasa PSPACE-hard typability problem. However, we mention as a counterpoint that when constraintsare restricted to simple variable-variable inequalities, Gustavsson and Svenningsson [20] show thatthere is a cubic-time algorithm. Being based on uniﬁcation, inference for polymorphic variants iseﬃcient [19], but Castagna, Petrucciani, and Nguyen [10] point out instances where programmersﬁnd the results to be unpredictable. None of the above works allow for ﬂow sensitive treatment ofmatching.Our main inspiration has been the seminal body of literature of work on set constraints inprogram analysis, see particularly Aiken, Wimmers, and Lakshman [5], Aiken [1] and Heintze[24], and in particular, the line of work on making the cubic-time fragments scale in practice[15, 16, 22, 39]. Through an impressive array of sophisticated optimisations, the fragment can bemade to run eﬃciently on many programs. However, the fundamental worst-case complexity isnot changed and implementing and tuning heuristics requires a large engineering eﬀort. Moreover,this fragment does not accommodate ﬂow sensitivity.Many of the analyses and or type inference procedures discussed so far are compositional, i.e.parts of the program are analysed independently to yield summaries of their behaviour and thenthe summaries are later combined. However, it has been frequently observed that compositionalitydoes not lead to scalability if the summaries are themselves large and complicated. In particular,it is not uncommon for “summaries” that grow with the square of the size of the program in theworst case. This has led to many works that attempt to simplify summaries, typically accordingto ingenious heuristics [6, 12, 15, 17, 35, 36, 42]. Since our primary motivation was predictability,we have designed our system so that heuristics are avoided : in particular the size of summaries(i.e. constrained type schemes) only depends only on the size of the underlying types and not Heuristic-based optimisations can be the enemy of predictability since small changes in the program can lead to greatchanges in performance if the change causes the program to fall outside of the domain on which the heuristic is tuned. :24 Eddie Jones and Steven Ramsay the size of the program. It is plausible that many of these heuristic optimisations are neverthelessapplicable in order to help improve the overall eﬃciency.

Reﬁnement types originate with the works of Freeman and Pfenning[18] and Xi and Pfenning [47]. Their distinguishing feature is that they attempt to assign typesto program expressions for which an underlying type is already available. Typically, as here, thereﬁnement type is also required to respect the shape of the underlying type. One can use thisrestriction, as in loc cit to ensure some independence of the the size of the type from the size ofthe program. However, as remarked in the ﬁnal section, the constant factors are enormous sincethere is unrestricted intersection and union of reﬁnements of the same underlying type which isrepresented explicitly.The work of Freeman and Pfenning [18] requires that the programmer declare the universe ofreﬁnement types up-front (where our universe is determined automatically as a completion of theunderlying datatype environment). A disadvantage of this requirement is that it burdens the pro-grammer with a kind of annotation that they would rather not have to clutter their program with,in many simple cases. A great advantage is that, by deﬁning a reﬁnement datatype explicitly, theprogrammer can indicate formally in the code her intention that a certain invariant is (somehow)important within a certain part of the program. It seems like a very fruitful idea to allow the pro-grammer this freedom also in our system and we are actively working on an extension to allowfor this as part of our future work. In particular, we would like to take advantage of several newadvances in this line that relieve a lot of programmer burden Dunﬁeld [13, 14].An incredibly fruitful recent evolution of reﬁnement types are the Liquid Types of Rondon,Kawaguci, and Jhala [38] (see especially Vazou, Bakst, and Jhala [44] for a version with constrainedtype schemes) and similar systems (e.g. those of Terauchi [41], Unno and Kobayashi [43]). Suchtechnology is already accessible to the beneﬁt of the average programmer through the LiquidHaskell system of Vazou, Seidel, Jhala, Vytiniotis, and Peyton-Jones [45]. Due to the rich expressivepower of these systems, which typically include dependent product, eﬃcient and fully-automatictype inference is not typically a primary concern and predictability can be ensured by liberal useof annotations.

The pattern match safety problem was also ad-dressed by Mitchell and Runciman [30], which was used to verify a number of small Haskell pro-grams and libraries. The expressive power and algorithmic complexity are, however, unclear.Safety problems are within the scope of higher-order model checking Kobayashi [26], Kobayashiand Ong [27], Ong [33] and a system for verifying pattern match safety, built on higher-ordermodel checking was presented in [32]. Higher-order model checking approaches reduce veriﬁca-tion problems to model checking problems on a certain inﬁnite tree generated by a higher-ordergrammar. Although the higher-order model checking problem is linear-time in the size of thegrammar, the constant factors are enormous because, formally, it is n -EXPTIME complete (towerof exponentials of height n ) with n the type-theoretic order of the functions in the grammar. More-over, many of the transformations from program to grammar incur a large blow-up in size. Twopromising evolutions of higher-order model checking are the approach of Kobayashi, Tsukada, andWatanabe [28] based on the higher-order ﬁxpoint logic of Viswanathan and Viswanathan [46] andthe approach of Burn, Ong, and Ramsay [9] based on higher-order constrained Horn clauses. ntensional Datatype Refinement 1:25 ACKNOWLEDGMENTS

We gratefully acknowledge the support of the Engineering and Physical Sciences Research Council(EP/T006579/1) and the National Centre for Cyber Security via the UK Research Institute in Veri-ﬁed Trustworthy Software Systems. We thank our colleague Matthew Pickering for a lot of goodHaskell advice and for helping us safely navigate the interior of the Glasgow Haskell Compiler.

REFERENCES [1] Alexander Aiken. 1999. Introduction to set constraint-based program analysis.

Science of Computer Programming

Proceed-ings of the Seventh Annual Symposium on Logic in Computer Science (LICS ’92), Santa Cruz, California, USA, June 22-25,1992 . 329–340.[3] Alexander Aiken and Edward L. Wimmers. 1993. Type Inclusion Constraints and Type Inference. In

FPCA . 31–41.[4] Alexander Aiken, Edward L. Wimmers, and T. K. Lakshman. 1994. Soft Typing with Conditional Types. In

ConferenceRecord of POPL’94: 21st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, Portland, Oregon,USA, January 17-21, 1994 . 163–173.[5] Alexander Aiken, Edward L. Wimmers, and T. K. Lakshman. 1994. Soft typing with conditional types. In

Proceedingsof the 21st ACM SIGPLAN-SIGACT symposium on Principles of programming languages . Association for ComputingMachinery, 163–173. https://doi.org/10.1145/174675.177847[6] Alexander Aiken, Edward L. Wimmers, and Jens Palsberg. 1999. Optimal Representations of Polymorphic Types withSubtyping.

Higher-Order and Symbolic Computation

12, 3 (1999), 237–282. https://doi.org/10.1023/A:1010056315933[7] Roberto M. Amadio and Luca Cardelli. 1993. Subtyping recursive types.

ACM Trans. Program. Lang. Syst.

15, 4 (1993),575–631. https://doi.org/10.1145/155183.155231[8] Leo Bachmair, Harald Ganzinger, and Uwe Waldmann. 1993. Set constraints are the monadic class. In [1993] Proceed-ings Eighth Annual IEEE Symposium on Logic in Computer Science . IEEE, 75–83.[9] Toby Cathcart Burn, C.-H. Luke Ong, and Steven J. Ramsay. 2017. Higher-order constrained horn clauses for veriﬁ-cation.

Proc. ACM Program. Lang.

2, POPL (2017), Article 11. https://doi.org/10.1145/3158099[10] Giuseppe Castagna, Tommaso Petrucciani, and Kim Nguyen. 2016. Set-theoretic types for polymorphic variants. In

Proceedings of the 21st ACM SIGPLAN International Conference on Functional Programming . Association for ComputingMachinery, 378–391. https://doi.org/10.1145/2951913.2951928[11] Giuseppe Castagna and Zhiwu Xu. 2011. Set-theoretic foundation of parametric polymorphism and subtyping. In

Proceedings of the 16th ACM SIGPLAN international conference on Functional programming . Association for ComputingMachinery, 94–106. https://doi.org/10.1145/2034773.2034788[12] Stephen Dolan and Alan Mycroft. 2017. Polymorphism, subtyping, and type inference in MLsub. In

Proceedings ofthe 44th ACM SIGPLAN Symposium on Principles of Programming Languages . Association for Computing Machinery,60–72. https://doi.org/10.1145/3009837.3009882[13] Joshua Dunﬁeld. 2007. Reﬁned typechecking with Stardust. In

Proceedings of the 2007 workshopon Programming languages meets program veriﬁcation . Association for Computing Machinery, 21–32.https://doi.org/10.1145/1292597.1292602[14] Joshua Dunﬁeld. 2017. Extensible Datasort Reﬁnements. In

European Symposium on Programming Languages andSystems , Hongseok Yang (Ed.). Springer Berlin Heidelberg, 476–503.[15] Manuel Fähndrich and Alexander Aiken. 1996. Making Set-Constraint Based Program Analyses Scale. In

First Work-shop on Set Constraints at CP’96 .[16] Manuel Fähndrich, Jeﬀrey S. Foster, Zhendong Su, and Alexander Aiken. 1998. Partial online cycle elimination ininclusion constraint graphs. In

Proceedings of the ACM SIGPLAN 1998 conference on Programming language design andimplementation . Association for Computing Machinery, 85–96. https://doi.org/10.1145/277650.277667[17] Cormac Flanagan and Matthias Felleisen. 1999. Componential set-based analysis.

ACM Trans. Program. Lang. Syst.

21, 2 (1999), 370–416. https://doi.org/10.1145/316686.316703[18] Tim Freeman and Frank Pfenning. 1991. Reﬁnement types for ML. In

Proceedings of the ACM SIGPLAN 1991conference on Programming language design and implementation . Association for Computing Machinery, 268–277.https://doi.org/10.1145/113445.113468[19] Jacques Garrigue. 2002. Simple Type Inference for Structural Polymorphism. In

International Workshop on Foundationsof Object-Oriented Languages (FOOL) .[20] Jörgen Gustavsson and Josef Svenningsson. 2001. Constraint Abstractions. In

Symposium on Programs as Data Objects ,Olivier Danvy and Andrzej Filinski (Eds.). Springer Berlin Heidelberg, 63–83. :26 Eddie Jones and Steven Ramsay [21] John Harrison. 2009.

Handbook of Practical Logic and Automated Reasoning . Cambridge University Press.[22] Nevin Heintze. 1994. Set-based analysis of ML programs. In

Proceedings of the 1994 ACM conference on LISP andfunctional programming . Association for Computing Machinery, 306–317. https://doi.org/10.1145/182409.182495[23] Nevin Heintze, Spiro Michaylov, and Peter Stuckey. 1992. CLP( R ) and some electrical engineering problems. Journalof Automated Reasoning

9, 2 (1992), 231–260.[24] Nevin Charles Heintze. 1992.

Set based program analysis . Thesis.[25] My Hoang and John C. Mitchell. 1995. Lower bounds on type inference with subtypes. In

Proceedings of the 22ndACM SIGPLAN-SIGACT symposium on Principles of programming languages . Association for Computing Machinery,176–185. https://doi.org/10.1145/199448.199481[26] Naoki Kobayashi. 2013. Model Checking Higher-Order Programs.

J. ACM

60, 3 (2013), Article 20.https://doi.org/10.1145/2487241.2487246[27] N. Kobayashi and C. L. Ong. 2009. A Type System Equivalent to the Modal Mu-Calculus ModelChecking of Higher-Order Recursion Schemes. In

IEEE Symposium on Logic In Computer Science . 179–188.https://doi.org/10.1109/LICS.2009.29[28] Naoki Kobayashi, Takeshi Tsukada, and Keiichi Watanabe. 2018. Higher-Order Program Veriﬁcation via HFL ModelChecking. In

European Symposium on Programming Languages and Systems , Amal Ahmed (Ed.). Springer InternationalPublishing, 711–738.[29] Dexter Kozen, Jens Palsberg, and Michael I. Schwartzbach. 1995. Eﬃcient recursive subtyping.

Mathematical Structuresin Computer Science

5, 1 (1995), 113–125. https://doi.org/10.1017/S0960129500000657[30] Neil Mitchell and Colin Runciman. 2008. Not all patterns, but enough: an automatic veriﬁer for partial but suﬃ-cient pattern matching. In

Proceedings of the ﬁrst ACM SIGPLAN symposium on Haskell . Association for ComputingMachinery, 49–60. https://doi.org/10.1145/1411286.1411293[31] Martin Odersky, Martin Sulzmann, and Martin Wehr. 1999. Type Inference with Constrained Types.

TAPOS

5, 1 (1999),35–55.[32] C.-H. Luke Ong and Steven J. Ramsay. 2011. Verifying higher-order functional programs with pattern-matchingalgebraic data types. In

Proceedings of the 38th annual ACM SIGPLAN-SIGACT symposium on Principles of programminglanguages . Association for Computing Machinery, 587–598. https://doi.org/10.1145/1926385.1926453[33] C. L. Ong. 2006. On Model-Checking Trees Generated by Higher-Order Recursion Schemes. In

IEEE Symposium onLogic in Computer Science . 81–90. https://doi.org/10.1109/LICS.2006.38[34] François Pottier. 1998.

Type inference in the presence of subtyping: from theory to practice . Thesis.[35] François Pottier. 2001. Simplifying Subtyping Constraints: A Theory.

Information and Computation

Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages . Association for Computing Machinery, 278–291.https://doi.org/10.1145/263699.263738[37] Jakob Rehof and Torben ÃĘ. Mogensen. 1999. Tractable constraints in ﬁnite semilattices.

Science of Computer Pro-gramming

35, 2 (1999), 191 – 221.[38] Patrick M. Rondon, Ming Kawaguci, and Ranjit Jhala. 2008. Liquid types. In

Proceedings of the 29th ACM SIGPLANConference on Programming Language Design and Implementation . Association for Computing Machinery, 159–169.https://doi.org/10.1145/1375581.1375602[39] Zhendong Su, Manuel FÃďhndrich, and Alexander Aiken. 2000. Projection merging: reducing redundancies in inclu-sion constraint graphs. In

Proceedings of the 27th ACM SIGPLAN-SIGACT symposium on Principles of programminglanguages . Association for Computing Machinery, 81–95. https://doi.org/10.1145/325694.325706[40] Martin Sulzmann, Manuel M. T. Chakravarty, Simon Peyton Jones, and Kevin Donnelly. 2007. System F withType Equality Coercions. In

Proceedings of the 2007 ACM SIGPLAN International Workshop on Types in LanguagesDesign and Implementation (TLDI âĂŹ07) . Association for Computing Machinery, New York, NY, USA, 53âĂŞ66.https://doi.org/10.1145/1190315.1190324[41] Tachio Terauchi. 2010. Dependent types from counterexamples. In

Proceedings of the 37th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages . Association for Computing Machinery, 119–130.https://doi.org/10.1145/1706299.1706315[42] Valery Trifonov and Scott Smith. 1996. Subtyping constrained types. In

Static Analysis Symposium , Radhia Cousotand David A. Schmidt (Eds.). Springer Berlin Heidelberg, 349–365.[43] Hiroshi Unno and Naoki Kobayashi. 2009. Dependent type inference with interpolants. In

Proceedings of the 11th ACMSIGPLAN conference on Principles and practice of declarative programming . Association for Computing Machinery, 277–288. https://doi.org/10.1145/1599410.1599445[44] Niki Vazou, Alexander Bakst, and Ranjit Jhala. 2015. Bounded reﬁnement types. In

Proceedings of the 20thACM SIGPLAN International Conference on Functional Programming . Association for Computing Machinery, 48–61. ntensional Datatype Refinement 1:27 https://doi.org/10.1145/2784731.2784745[45] Niki Vazou, Eric L. Seidel, Ranjit Jhala, Dimitrios Vytiniotis, and Simon Peyton-Jones. 2014. Reﬁnement types forHaskell. In

Proceedings of the 19th ACM SIGPLAN international conference on Functional programming . Association forComputing Machinery, 269–282. https://doi.org/10.1145/2628136.2628161[46] Mahesh Viswanathan and Ramesh Viswanathan. 2004. A Higher Order Modal Fixed Point Logic. In

CONCUR 2004 -Concurrency Theory , Philippa Gardner and Nobuko Yoshida (Eds.). Springer Berlin Heidelberg, 512–528.[47] Hongwei Xi and Frank Pfenning. 1999. Dependent types in practical programming. In

Proceedings of the 26th ACMSIGPLAN-SIGACT symposium on Principles of programming languages . Association for Computing Machinery, 214–227. https://doi.org/10.1145/292540.292560 :28 Eddie Jones and Steven Ramsay

A ADDITIONAL MATERIAL FOR SECTION 2 : LANGUAGE

We write

Sch ( R )( ∀ α . T , ∀ α . T ) just if Ty ( R )( T , T ) . Given a function f : Dt D → Dt D wedeﬁne Ty ( f ) : Ty D → Ty D recursively as follows: Ty ( f )( α ) = α Ty ( f )( b ) = b Ty ( f )( d ) = f ( d ) Ty ( f )( T → T ) = Ty ( f )( T ) → Ty ( f )( T ) For brevity, for function f : Dt D → Dt D , we will usually just write f ( T ) , f ( S ) for Ty ( f )( T ) and Ty ( f )( S ) respectively. B ADDITIONAL MATERIAL FOR SECTION 4 : SUBTYPING

Lemma (1).

Subtyping is a preorder.

Proof.

The proof of reﬂexivity is an easy coinduction. We prove that, if there is T such that T ⊑ T and T ⊑ T then T ⊑ T by coinduction on ⊑ . This amounts to showing that the set {( T , T ) | ∃ T . T ⊑ T ⊑ T } is closed under the given rules, so in each case, we will show thatexistence of such an intermediate T is in contradiction with the given premises in the case. (SShape) Suppose there is T such that T ⊑ T and T ⊑ T and r ( T ) , r ( T ) . Since ⊑ satisﬁes (SShape) , it follows that r ( T ) = r ( T ) and r ( T ) = r ( T ) from which we obtain the desiredcontradiction. (SMis) Suppose there is T such that d T ⊑ T ⊑ d T . It follows from (SShape) that T is somedatatype d T . Furthermore, suppose dom ( ∆ ∗ ( d )) * dom ( ∆ ∗ ( d )) . By (SMis) , dom ( ∆ ∗ ( d )) ⊆ dom ( ∆ ∗ ( d ) , and dom ( ∆ ∗ ( d )) ⊆ dom ( ∆ ∗ ( d ) . Hence by the transitivity of set inclusion, weimmediately have a contradiction. (SSim) Again, suppose there is T such that d T ⊑ T ⊑ d T and thus T is some datatype d T . Suppose k ∈ dom ( ∆ ∗ ( d ) , by (SMis) it is also deﬁned for d and d . Suppose U i [ T / α ] , U i [ T / α ] , U i [ T / α ] are the ith arguments, instantiated with the corresponding type argu-ment, to the constructor in each datatype respectively. Suppose U i [ T / α ] 6⊑ U i [ T / α ] sothere is no intermediate type T i such that U i [ T / α ] ⊑ T i ⊑ U i [ T / α ] . However, it followsfrom our original assumption and (SSim) that U i [ T / α ] ⊑ U i [ T / α ] and similarly that U i [ T / α ] ⊑ U i [ T / α ] . Thus we reach a contradiction. (SArrL) Suppose there is T such that T → T ⊑ T ⊑ T ′ → T ′ and suppose there is nointermediate U such that T ′ ⊑ U ⊑ T . It follows from (SShape) that T must be of shape T → T . It follows from (SArrL) that, therefore, T ′ ⊑ T and T ⊑ T , in contradiction of theabsence of U . (SArrR) Follows analogously to the above case. (cid:3)

Lemma (2Simulation).

The proof is by coinduction. ntensional Datatype Refinement 1:29 (SShape)

By deﬁnition, if Ty ( R )( T , T ) then T and T have the same shape. (SMis) Suppose dom ( ∆ ∗ ( d )) * dom ( ∆ ∗ ( d )) and suppose, for the purpose of obtaining a con-tradiction, that Ty ( R )( d T , d T ) . Then there is some k such that ∆ ∗ ( d )( k ) is deﬁned, but ∆ ∗ ( d )( k ) is not, thus contradicting the ﬁrst bullet of the deﬁnition of R . (SData) Suppose ( U i [ T / α ] , U i [ T / α ]) < Ty ( R ) and the side conditions on the rule hold.Then suppose for contradiction that Ty ( R )( d T , d T ) . By deﬁnition, it must be that R ( d T , d T ) ,thus contradicting the second bullet. (SArrL) Suppose Ty ( R )( T → T , T ′ → T ′ ) . Then, by deﬁnition of Ty ( R ) it can only be because Ty ( R )( T ′ , T ) . (SArrR) Analogously. (cid:3)

C ADDITIONAL MATERIAL FOR SECTION 5 : REFINEMENT TYPE ASSIGNMENT

Lemma 4 (Weakening).

Suppose ⊢ Γ ⊑ Γ and ⊢ T ⊑ T . If Γ ⊢ e : T then Γ ⊢ e : T . Proof.

We prove that, for all Γ ′ , Γ ′ ⊑ Γ implies Γ ′ ⊢ e : T , by induction on Γ ⊢ e : T . Then, ifalso T ⊑ T ′ , Γ ′ ⊢ e : T ′ follows by (TSub) . (TVar) Suppose T :: T and x : ∀ α . U ∈ Γ and let Γ ′ be such that Γ ′ ⊑ Γ . Then x : ∀ α . V ∈ Γ ′ with V ⊑ U . By (TVar) , Γ ′ ⊢ x [ T ] : V [ T / α ] . It follows by deﬁnition of subtyping that V [ T / α ] ⊑ U [ T / α ] and therefore the desired result follows by (TSub) . (TSub) (TCst) (TCon) (TApp) In these cases, the conclusion follows from the hypothesesindependently of the environment. (TAbs)

Suppose T :: T and x < dom ( Γ ) and then suppose that Γ ′ is such that Γ ′ ⊑ Γ . Then, bydeﬁnition and reﬂexivity of subtyping, also Γ ′ ∪ { x : T } ⊑ Γ ∪ { x : T } . It follows from theinduction hypothesis, therefore, that Γ ′ ∪ { x : T } ⊢ e : T . We may assume that x < dom ( Γ ′ ) by the variable convention. Therefore, the result follows by (TAbs) . (TCase) Suppose dom ( ∆ ( d )) ⊆ { k , . . . , k m } . Suppose Γ ′ ⊑ Γ . It follows immediately fromthe induction hypothesis that Γ ′ ⊢ e : d . It follows by reﬂexivity of subtyping that, for each i , Γ ′ ∪ x : ∆ ( d )( k i ) ⊑ Γ ∪ x : ∆ ( d )( k i ) . Hence, the induction hypothesis gives, for each i , Γ ′ ∪ x : ∆ ( d )( k i ) ⊢ e i : T . The result follows immediately by (TCase) . (cid:3) D ADDITIONAL MATERIAL FOR SECTION 7 : TYPE INFERENCE

Theorem (23Soundness and completeness of ⊑ -inference). Let T and T be extended types andsuppose ⊢ T ⊑ T = ⇒ C . Then, for all assignments θ : ⊢ θT ⊑ θT iﬀ θ | = C Proof.

The proof is by induction on the inference judgement. (ISBase) (ISTyVar)

Obvious. (ISArr)

In the forward direction, suppose ⊢ θT → θT ⊑ θT → θT . By inversion, neces-sarily ⊢ θT ⊑ T and θT ⊑ T . In this case, C = C ∪ C . By the induction hypothesis, θ | = C and θ | = C , so θ | = C as required.In the backward direction, suppose θ | = C ∪ C . Then θ | = C and θ | = C and it followsfrom the induction hypothesis that ⊢ θT ⊑ T and θT ⊑ T . Hence, (SArr) is satisﬁedand so ⊢ θT → θT ⊑ θT → θT . (ISData) In the forward direction, suppose ⊢ inj θ ( X ) d T ⊑ inj θ ( Y ) d T and suppose ∆ ( d )( k ) is deﬁned. Then it follows from (SMis) that dom ( ∆ ∗ ( inj θ ( X ) d )) * dom ( ∆ ∗ ( inj θ ( Y ) d )) . Note :30 Eddie Jones and Steven Ramsay that, by deﬁnition, dom ( ∆ ∗ ( inj θ ( Z ) d )) = θ ( Z )( d ) , for any reﬁnement variable Z . Hence, θ | = X ( d ) ⊆ Y ( d ) . To see that, also, θ | = k ∈ X ( d ) ? C , assume θ | = k ∈ X ( d ) . Therefore, also ∆ ∗ ( inj θ ( X ) d ))( k ) and ∆ ∗ ( inj θ ( Y ) d ))( k ) are deﬁned. Let us ﬁx ∆ ∗ ( inj θ ( X ) d ))( k ) = ∀ α . A X →· · · A X m → d α , and ∆ ∗ ( inj θ ( Y ) d ))( k ) = ∀ α . A Y → · · · A Y m → d α . Hence, by (SSim) , ⊢ A X i [ T / α ] ⊑ A Y i [ T / α ] for each i ∈ [ .. m ] . Note that, by deﬁnition, ∆ ∗ ( inj Z d ))( k ) = inj Z ( ∆ ( k )) for any reﬁnement variable Z . It follows from the induction hypothesis, therefore,that θ | = C , as required.In the backward direction, suppose (i) θ | = ( k ∈ X ( d ) ? C ) and (ii) θ | = X ( d ) ⊆ Y ( d ) .Then, by (ii), (SMis) is satisﬁed. To see that (SSim) is also satisﬁed, suppose ∆ ∗ ( θ ( X ))( k ) and ∆ ∗ ( θ ( Y ))( k ) are deﬁned. Then, by (i), θ | = C and it follows from the induction hypothesis that ⊢ A X i [ T / α ] ⊑ A Y i [ T / α ] . Since, by deﬁnition, ∆ ∗ ( inj Z d ))( k ) = inj Z ( ∆ ( k )) for any reﬁnementvariable Z , (SSim) is satisﬁed and so it must be that ⊢ inj θ ( X ) d ⊑ inj θ ( Y ) d as required. (cid:3) Additionally, we deﬁne the equivalence of two reﬁnement variable assignments σ and θ moduloa set of reﬁnement variables S , written σ ≡ S θ , by requiring that they are identical on the variablesin S , i.e. ∀ X ∈ S . σ ( X ) = θ ( X ) . Lemma 5.

Let Γ be a closed constrained type environment, Γ ′ a type environment, e be anexpression, V ∈ ETy , and C a constraint set. Suppose Γ ⊢ e = ⇒ V , C , Γ ′ ⊑ L Γ M , and there exists areﬁnement variable substitution θ such that θ | = C , then there exists a type T ∈ Ty D ∗ such that:(i) Γ ′ ⊢ e : T (ii) ⊢ T ⊑ θV Proof.

The proof is by induction of = ⇒ . (ICst) In this case, e is of shape c and we may assume C ( c ) = V . Let T = C ( c ) . By (TCst) ,we have that Γ ′ ⊢ e : T . As V is an unreﬁned type, it is invariant under any substitution, i.e. θV = V , and hence (ii) is trivial satisﬁed. (ICon) In this case, e is of shape k , and V = inj X ( T ) → · · · → inj X ( T m ) → inj X ( d ) forfresh X where ∆ ( d )( k ) = T → · · · → T m . The constraint set C is { k ∈ X ( d )} . We know,therefore, that θ is a substitution mapping X to a constructor choice function f , such that k ∈ dom ( ∆ ∗ ( inj f ( d )) . Let T i = ∆ ∗ ( inj f ( d ))( k )( i ) and T = T → · · · → T n → inj f ( d ) . By thedeﬁnition of ∆ ∗ , θV = T . Finally, by (TCon) , we have that Γ ′ ⊢ k : T as required. (IVar) In this case, e is of shape x T . Let us assume that x : ∀ α . ∀ X . C ⊃ U ∈ Γ . Hence, forfresh Y and T , V is U [ Y / X ][ T / α ] , and θ solves C [ Y / X ] . We may assume x : ∀ α . σU ∈ Γ ′ ,for some σ . Thus by (IVar) Γ ′ ⊢ x T : ( σU )[ T / α ] . Deﬁne a new model θ ′ as follows: θ ′ ( Z ) =  θ ( Y i ) if Z = X i σ ( X i ) if Z = Y i T i if Z = T i σ ( Z ) otherwiseThe consistency of this deﬁnitions follows from the freshness of T and V . Clearly this solves C , and θ ′ V = θ ′ U [ Y / X ][ T / α ] = ( σU )[ T / α ] trivially satisfying (ii). (IAbs) In this case e = λx . e , and V = V → V where V is fresh. Suppose θ | = C . As C is also the constraint set of the hypotheses, by induction, there exists a type T , such that Γ ′ ∪ { x : T } ⊢ e : T , T ⊑ θV , and θV = T . By (TAbs) we may derive Γ ′ ⊢ λx . e : T → T .Let T = T → T , then clearly we have ⊢ T ⊑ θV as required. ntensional Datatype Refinement 1:31 (IApp) In this case e = e e and C = C ∪ C ∪ C . Hence by the rules of inference, necessarily Γ ⊢ e = ⇒ V → V , C , Γ ⊢ e = ⇒ V , C , and ⊢ V ⊑ V = ⇒ C . Suppose θ | = C . As this isalso a solution to C by Theorem 23, we have that θV ⊑ θV . By the induction hypothesis Γ ′ ⊢ e : T → T and Γ ′ ⊢ e : T with θV ⊑ T , T ⊑ θV , and T ⊑ θV . Using thetransitivity of subtyping, we have that T ⊑ T . Hence by (TSub) Γ ′ ⊢ e : T , and so by (TApp) Γ ′ ⊢ e e : T as required. (ICase) In this case e = case e of {| mi = k i x i e i } , and C = C ∪ Ð mi = k i ∈ X ( d ) ? ( C i ∪ C ′ i ) ∪{ X ( d ) ⊆ { k , · · · , k m }} . Let us suppose that θ maps X to the constructor choice function f .It follows from inference that Γ ⊢ e = ⇒ inj X ( d ) , C , and as θ | = C , thus Γ ′ ⊢ e : T such that ⊢ T ⊑ θ inj f ( d ) .For each i ≤ m , let us consider the case when k i ∈ dom ( ∆ ∗ ( inj f ( d ))) . By induction Γ ′ ∪ { x i : ∆ ∗ ( inj f ( d ))( k i )} ⊢ e i : T i for some T i ⊑ θV i . As C ′ i must be satisﬁed by θ , we also have that θV i ⊑ θV , and so Γ ′ ∪ { x i : ∆ ∗ ( inj f ( d ))( k i )} ⊢ e i : θV .As dom ( ∆ ∗ ( inj f ( d ))) ⊆ { k , · · · , k m } , by (TCase) we have that Γ ′ ⊢ e : θV as required. (cid:3) Lemma 6.

Let Γ be a constrained type environment, Γ ′ a type environment, e be an expression, T ∈ Ty D ∗ , V ∈ ETy , σ a reﬁnement variable substitution with domain FRV ( Γ ) and C a constraintset. Suppose Γ ′ ⊢ e : T and Γ ⊢ e = ⇒ V , C and L σ Γ M ⊑ Γ ′ . Then there is θ such that:(i) σ ≡ FRV ( Γ ) θ (ii) and ⊢ θV ⊑ T (iii) and θ | = C . Proof.

The proof is by induction on ⊢ ′ . (TVar) In this case e is of shape x T . Assume U( T ) = T , ( x : ∀ α . U ) ∈ Γ ′ and U [ T / α ] = T .Assume L σ Γ M ⊑ Γ ′ . By deﬁnition, in Γ there is some x : ∀ α . ∀ X . C ′ ⊃ U ′ and there is some τ such that ⊢ τ ( σU ′ ) ⊑ U and τ | = σC ′ . By deﬁnition, V is of shape U ′ [ Y / X ][ T ′ / α ] for fresh Y and fresh types T ′ . Moreover, C is of the form C ′ [ Y / X ] . Since the T ′ are fresh, there is asubstitution θ ′ such that θ ′ T ′ = T . Deﬁne θ as follows: θ ( Z ) ≔  θ ′ ( Z ) if Z ∈ FRV ( T ′ ) σ ( Z ) if Z ∈ FRV ( Γ ) τ ( X i ) if Z = Y i ∈ Yτ ( Z ) otherwiseNote that the freshness of Y and T ′ ensure the exclusivity of the four cases and requirement(i) of the theorem is satisﬁed. Observe that: θ ( U ′ [ Y / X ][ T ′ / α ]) = ( θ ( U ′ [ Y / X ]))[ θ T ′ / α ] = ( θ ( U ′ [ Y / X ]))[ T / α ] Next, since θ ( X i ) = θ ( Y i ) for any X i ∈ X and the codomain of τ is closed, it follows that: θ ( U ′ [ Y / X ]) = ( θU ′ )[ θ ( Y )/ X ] = ( θU ′ )[ θ ( X )/ X ] = θU ′ Finally, by the disjointness of the cases, the fact that the codomain of σ is necessarily closed,and FRV ( U ′ ) ⊆ X ∪ FRV ( Γ ) , θU ′ = τ ( σU ′ ) . Hence, overall θ ( U ′ [ Y / X ][ T ′ / α ]) = τ ( σU ′ )[ T / α ] and it follows by deﬁnition of subtyping that ⊢ τ ( σU ′ )[ T / α ] ⊑ U [ T / α ] so that require-ment (ii) is satisﬁed. Finally, note that θ | = C ′ [ Y / X ] iﬀ θ | = C ′ since θ ( Y i ) = θ ( X i ) . Then,by deﬁnition, and the closedness of the codomain of σ , θ | = C ′ iﬀ θ | = σC ′ . Then, since :32 Eddie Jones and Steven RamsayFRV ( C ′ ) ⊆ X ∪ FRV ( Γ ) , it follows that θ | = σC ′ iﬀ τ | = σC ′ , which was an assumed. Hence, θ | = C ′ [ Y / X ] and requirement (iii) is satisﬁed. (TCst) In this case, e is of shape c and we may assume C ( c ) = T . Since V = C ( c ) and C = ∅ ,Any assignment θ extending σ will satisfy the three requirements. (TCon) In this case, e is a constructor k T and T is of the form T → · · · T m → inj f d T with T → · · · → T m → inj d d T = ∆ ∗ ( inj д d )( k )[ T / α ] and k ∈ dom ( ∆ ∗ ( inj f d )) for some fresh T . Hence, by deﬁnition, T = inj f ( ∆ ( d )( k )) (*) and k ∈ f ( d ) (***). In this case, V is inj X T with T = ∆ ( d )( k ) → d and X fresh. Moreover, C is { k ∈ X ( d )} Therefore, we can deﬁne θ asfollows: θ ( Z ) = ( f if Z = Xσ ( Z ) otherwiseBy freshness of X , this guarantees requirement (i). By (*) above, we have inj f T ⊑ T , ensuringrequirement (ii). Finally, the requirement (iii) is satisﬁed by (***). (TAbs) In this case, e is an abstraction λx : T . e ′ and T is of shape T → T . We may assumethat U( T ) = T and x < dom Γ . Assume ⊢ L σ Γ M ⊑ Γ ′ . From inference we have V necessarilyof shape V → V with Fresh T ( V ) and Γ ∪ { x : V } ⊢ e = ⇒ T , C . By the freshness of V ,it follows that there is some σ ′ such that σ ′ V = T and σ ′ ( Z ) = σ ( Z ) on any Z < FRV ( V ) .Hence, L σ Γ M ∪{ x : T } = L σ ′ ( γ ∪{ x : V }) M and, by deﬁnition, ⊢ L σ Γ M ∪{ x : T } ⊑ Γ ′ ∪{ x : T } .Therefore, it follows from the induction hypothesis that there is an assignment θ satisfying(a) θ ≡ FRV ( Γ ∪{ x : V }) σ ′ , (b) ⊢ θV ⊑ T and (c) θ | = C . Then this θ works also as a witness to themain result since (a) implies θ ≡ FRV ( Γ ) σ , (a) and (b) together imply ⊢ θV → V ⊑ T → T and (c) is exactly requirement (iii). (TApp) In this case, e is an application e e . From inference we have, necessarily, Γ ⊢ e = ⇒ V → V , C , Γ ⊢ e : V , C and ⊢ V ⊑ V = ⇒ C . It follows immediately from theinduction hypothesis that there are assignments θ and θ such that (a) θ i ≡ FRV ( Γ ) σ , (b1) ⊢ θ V → V ⊑ T → T , (b2) ⊢ θ V ⊑ T , (c1) θ | = C and (c2) θ | = C . Deﬁne θ as follows: θ ( Z ) =  θ ( Z ) if Z ∈ FRV ( V → V ) θ ( Z ) if Z ∈ FRV ( V ) σ ( Z ) otherwiseThis is well deﬁned since one can easily verify by inspection that the inference system guar-antees that the only overlap in reﬁnement variables between sibling branches is in the freereﬁnement variables of the environment, and θ and θ agree on this by (a). This constructiontherefore satisﬁes requirement (i). Furthermore, (b1) implies ⊢ θV ⊑ T . Finally, it followsfrom (c1) and (c2) that θ | = C ∪ C . By (b1) and (b2) we have ⊢ θV ⊑ T and ⊢ T ⊑ θV so, bytransitivity, also ⊢ θV ⊑ θV . Hence, it follows from the completeness of subtype inference(Theorem 23) that, therefore, θ | = C . (TCase) In this case, e is of shape case e ′ of { p mi = k i x i e i } and we may assume that { i , . . . , i n } ⊆ { , . . . , m } and dom ( ∆ ∗ ( d )) = { k i , . . . , k i n } (*). We may also assume that d = inj f d for some choice f . From inference, we have necessarily Γ ⊢ e : = ⇒ inj X d , C and,for all i ∈ { . . . m } , ⊢ V i ⊑ V = ⇒ C ′ i (**) and Γ ∪ { x i : inj X ( ∆ ( d )( k i ))} ⊢ e i = ⇒ V i , C i ; where X , V and each V i are fresh. Since V is fresh, there is some substitution θ ′ such that θ ′ V = T (***). It follows from the induction hypothesis that there is θ such that (a0) θ ≡ FRV ( Γ ) σ , (b0) ⊢ θ inj X d ⊑ d and (c0) θ | = C . Set σ ′ ( X ) = θ ( X ) and σ ′ ( Z ) = σ ( Z ) for all other Z . Then,for all j ∈ { . . . n } : L σ Γ M ∪ { x i j : ∆ ∗ ( inj θ ( X ) d )( k i j )} = L σ ′ ( Γ ∪ { x i j : inj X ( ∆ ( d )( k i j ))} M ntensional Datatype Refinement 1:33 By (b0) and (SSim) , it follows that ⊢ L σ Γ M ∪ { x i j : ∆ ∗ ( inj θ ( X ) d )( k i j )} ⊑ L σ Γ M ∪ { x i j : ∆ ∗ ( d )( k i j )} . Hence it follows from the induction hypothesis that there are substitutions θ i j (for each j ∈ { . . . n } ) such that (ai) θ i j ≡ FRV ( Γ ∪{ x ij : inj X ( ∆ ( d )( k ij ))}) σ ′ , (bi) θ i j V i ⊑ T and (ci) θ i j | = C i j . Deﬁne θ as follows: θ ( Z ) =  θ ′ ( Z ) if Z ∈ FRV ( V ) θ ( Z ) if X = Z or Z ∈ FRV ( C ) θ i j ( Z ) if Z ∈ FRV ( V i j ) ∪ FRV ( C i j ) σ ( Z ) otherwiseThe use of freshness guarantees the well-deﬁnedness since reﬁnement variables introducedin diﬀerent branches are distinct and so the only variables shared are those in Γ , on whichall agree by (a0) and (ai). Hence, requirement (i) is satisﬁed. It follows from (***) that θ V = T and by (bi) ⊢ θ V i j ⊑ T . Hence, ⊢ θ V i j ⊑ θ V and it follows from the completeness of subtypeinference, Theorem 23, that θ | = C ′ i j . By (*) and (SMis) , dom ( ∆ ∗ ( inj θ ( X ) d )) ⊆ { k i , . . . , k i , n } and so θ ( X )( d ) ⊆ { k i , . . . , k i , n } . Consequently, by the foregoing and (ci): θ | = { X ( d ) ⊆ { k i , . . . , k m }} m Ø i = k i ∈ X ( d ) ? ( C i ∪ C ′ i ) Finally, requirement (iii) is satisﬁed by also taking into account (c0). (TSub)

In this case we may assume ⊢ T ⊑ T = T . It follows from the induction hypothesisthat there is θ such that (i) θ ≡ FRV ( Γ ) σ , (ii) ⊢ θ V ⊑ T and (iii) θ | = C . Since, by transitivity, ⊢ θ V ⊑ T , the same θ acts as witness to the result. (cid:3) Theorem (24Soundness and completeness of expression inference).

Follows immediately from Lemmas 5 and 6. (cid:3)

Lemma 7.

Let Γ , and Γ ′ be closed constrained type environments, ∆ ⊑ L Γ M , and m a module. If Γ ⊢ m = ⇒ Γ ′ , and L Γ ′ M isn’t empty, then there exists a ∆ ′ ⊑ L Γ ′ M such that ∆ ⊢ m : ∆ ′ . Proof.

The proof is by induction of = ⇒ . (IModE) In this case the module is empty, i.e. m = ϵ , and Γ ′ = Γ . Let ∆ ′ = ∆ which is asubenvironment of L Γ ′ M . By (TModE) we can derive ∆ ⊢ m : ∆ ′ as required. (IModD) If the module has shape m ·h x : ∀ α . T = Λ α . e i , and Γ ′ = Γ ′′ ∪{ x : ∀ α . ∀ X . Sat ( C ∪ C ) ↾ X ⊃ V } . Let us assume Γ ′′ ∪ { x : V } ⊢ e = ⇒ V ′ , C , and ⊢ V ′ ⊑ V = ⇒ C forsome fresh V with X = FRV ( V ) . The induction hypothesis provides a ∆ ′′ ⊑ L Γ ′′ M such that ∆ ⊢ m : ∆ ′′ . As L Γ ′ M isn’t empty, there must exist a solution to Sat ( C ∪ C ) ↾ X . By lemma 28we therefore have that both C and C are solvable by some reﬁnement variable substitution θ . Additionally, there must some type T ′ ⊑ θV ′ such that ∆ ′ ∪ { x : T } ⊢ e : T ′ where T = θV by lemma 5. As θV ′ ⊑ θV we have that T ′ ⊑ T . Thus by (TModE) we have that ∆ ⊢ m · h x : ∀ α . T = Λ α . e i : ∆ ′ ∪ { x : ∀ α . T } as required. (cid:3) :34 Eddie Jones and Steven Ramsay Lemma 8.

Let Γ and Γ ′ be constrained type environments, ∆ and ∆ ′ type environments, and m a module. Suppose L Γ M ⊑ ∆ , and ∆ ⊢ m : ∆ ′ and Γ ⊢ m = ⇒ Γ ′ , then L Γ ′ M ⊑ ∆ ′ . Proof.

The proof is by induction of ⊢ . (TModE) If the module is empty, then ∆ = ∆ ′ and Γ = Γ ′ . Therefore ∆ ′ ⊑ L Γ ′ M trivally followsfrom the hypothesis. (TModD) Suppose the module is of shape m ·h x = Λ α . e i , and therefore ∆ ′ = ∆ ′′ ∪{ x : ∀ α . T } for some ∆ ′′ and T such that ∆ ⊢ m : ∆ ′′ , ∆ ′′ ∪ { x : ∀ α . T } ⊢ e : T ′ , and ⊢ T ′ ⊑ T . As Γ ⊢ m · h x = Λ α . e i = ⇒ Γ ′ , we know that Γ ′ has shape Γ ′′ ∪ { x : ∀ α . ∀ X . C ∪ C ⊃ V } such that Γ ⊢ m = ⇒ Γ ′′ , Γ ′′ ∪ { x : V } ⊢ e = ⇒ V ′ , C , and ⊢ V ′ ⊑ V = ⇒ C . By the inductionhypothesis, therefore, we have that L Γ ′′ M ⊑ ∆ ′′ . As V is fresh we can instantiate lemma 6with a substituion that maps it to T ′ . Therefore, there is some θ that solves C with θV ′ ⊑ T ′ ,and θV = T . Hence ∆ ′ is Γ ′ with x instantiated by θ all other variables instantied as in ∆ ′′ .As θV ′ ⊑ T ′ trivially holds, we also have that θ | = C . Thus L Γ ′ M ⊑ ∆ ′ as required. (cid:3) Theorem (25Soundness and completeness of module inference).

Suppose Γ and Γ ′ are closed con-strained type environments, m a module and Γ ⊢ m = ⇒ Γ ′ . Then, for all type environments ∆ : L Γ M ⊢ m : ∆ iﬀ L Γ ′ M ⊑ ∆ Proof.

Follows immediately from Lemmas 7 and 8. (cid:3)

E ADDITIONAL MATERIAL FOR SECTION 8 : SATURATION

Theorem (28Saturation equivalence).

For any assignment θ , θ | = C iﬀ θ | = Sat ( C ) . Proof.

We shall show that the resolution rules preserve solutions (naturally they reﬂect solu-tions too, since they do not remove constraints). In each case we shall assume the derived guardholds, otherwise there is nothing to show. • Suppose ϕ ? S ⊆ S and ψ ? S ⊆ S appear in C , and both θ satisﬁes both ϕ and ψ . Thenwe must have θS ⊆ θS and θS ⊆ θS , as θ solves C . From transitivity of subset relation, itfollows that θS ⊆ θS . If the guards do not hold there is nothing to show. • Suppose ϕ ? dom ( X ( d )) ⊆ dom ( Y ( d )) and ψ ∪ ( Y , d ) 7→ k ? S ⊆ S appear in C , andthat θ satisﬁes ϕ ∪ ψ ∪ ( X , d ) 7→ k . Therefore dom (( θX )( d )) ⊆ dom (( θY )( d )) as θ solves C .Additionally, k ∈ dom (( θX )( d )) , and so k ∈ dom (( θY )( d )) . Thus ψ ∪ ( Y , d ) 7→ k holds under θ , and θS ⊆ θS . • Suppose ϕ ? k ∈ dom ( X ( d )) and ( X , d ) 7→ k ∪ ψ ? S ⊆ S appear in C , and θ satisﬁes both ϕ and ψ . Then we have that k ∈ dom (( θX )( d )) . Clearly the guard of S ⊆ S then must alsohold and θS ⊆ θS as required. (cid:3) Deﬁnition 34.

We say that τ is a partial solution of some constraint set C , if τ solves the restrictionof C to the domain of τ , i.e. τ solves C ↾ dom ( τ ) .Additionally, we construct the extended solution θ τ (or just θ when τ is implied) as follows. Foreach reﬁnement variable X not in the domain of τ , deﬁne ( θX )( d )( k ) = ∆ ( d )( k ) whenever thereexists some ϕ ? S ⊆ dom ( X ( d )) ∈ C , such that ϕ holds under τ , and k ∈ τ S . Lemma 9.

Suppose τ is a partial solution of a saturated constraint set C . For any constraint ϕ ? S ⊆ S ∈ C such that ϕ satisﬁed by θ τ ◦ τ , then there is a constraint with the same body, i.e.of the form ψ ? S ⊆ S , such that τ satisﬁes ψ . ntensional Datatype Refinement 1:35 Proof.

Our proof shall be by induction of the cardinality of {( X , k ) | X < dom ( τ )} .If the cardinality is zero, then τ must satisfy ϕ and so we are done.We now consider the case when the cardinality is n +

1, and let us assume the induction hypoth-esis holds for n . Suppose ( X , k ) ∈ ϕ = k and X < dom ( τ ) . As θ τ ◦ τ satisﬁes ϕ we have that that k ∈ dom (( θ τ X )( d )) for some d . And so by construction ψ ? S ⊆ dom ( X ( d )) ∈ C with k ∈ τ S and τ satisfying ψ . As C is saturated S must either be the singleton k , or of the form dom ( Y ( d )) : • In the ﬁrst case we may apply the rule (Satisfaction) as C is saturated to conclude ψ ∪ ϕ −( X , k ) ? S ⊆ S ∈ C . • In the second case we may apply the rule (Weakening) to conclude ψ ∪ ϕ ∪ ( Y , k ) −( X , k ) ? S ⊆ S ∈ C .In either case we are now left with a guard which θ τ ◦ τ satisﬁes and has n pairs with reﬁnementvariables that are not in the domain of τ , hence we can apply the induction hypothesis. (cid:3) This lemma expresses the fact that the extended solution θ τ does not make any arbitrary choices— whenever it satisﬁes the guard of a constraint, the constraint must already have been satisﬁedaccording to resolution. Lemma 10.

Suppose C is a saturated constraint set which doesn’t contain ∅ ? k ∈ ∅ , and suppose τ is a partial solution C . Then θ τ ◦ τ solves C . Proof.

We need only consider the constraints in C which reference some reﬁnement variablenot in the domain of τ .Let ϕ ? S ⊆ S be such a constraint. If θ τ ◦ τ doesn’t hold on ϕ then there is nothing to show. Onthe other hand, we may apply the above lemme to deduce that there is a constraint ψ ? S ⊆ S ∈ C with ψ satisﬁed by τ .We now consider the possible forms of S and S : • Suppose both S and S are in the domain of τ . Then ψ ? S ⊆ S is in the restriction of C tothe domain of τ and as τ satisﬁes ψ we have that τ S ⊆ τ S as required. • Suppose S is in the domain of τ , but S = dom ( X ( d )) for some X < dom ( τ ) . If k ∈ τ S , thenby construction k ∈ dom (( θX )( d )) , and so the subset relation holds. • Suppose S = dom ( X ( d )) for some X < dom ( τ ) . Let k ∈ dom (( θX )( d )) , we shall show that k ∈ ( θ ◦ τ ) S . By construction ψ ′ ? S ⊆ dom ( X ( d )) ∈ C such that k ∈ τ S and for some ψ ′ which τ satisﬁes. As C is saturated we can deduce that ψ ′ ∪ ψ ? S ⊆ S ∈ C . Note that ψ ′ and ψ hold under τ .If the constraint S is in the domain of τ , then τ S ⊆ τ S by the deﬁnition of a partial solution,and so k ∈ τ S as required.If S = dom ( Y ( d )) for some Y < dom ( τ ) , however, by the deﬁnition of θ τ , k ∈ dom (( θ τ Y )( d )) . (cid:3) Theorem (29). C is unsatisﬁable iﬀ Sat ( C ) is trivially unsatisﬁable. Proof.

The consistency of

Sat ( C ) follows from Lemma 10 with partial solution τ being emptyand then Theorem 28 gives the consistency of C .In the backward direction, Theorem 28 gives consistency of Sat ( C ) . Since no substitution cansolve k ∈ ∅ , it follows that therefore k ∈ ∅ < Sat ( C ) . (cid:3) F ADDITIONAL MATERIAL FOR SECTION 9 : RESTRICTION AND COMPLEXITY

Let S be the size of the largest function deﬁnition. Let K be the maximum number of constructorsassociated with any datatype, let D be the maximum number of datatypes in any slice (i.e. for Lam this is 2) and let Q be the maximum size of any underlying type. :36 Eddie Jones and Steven Ramsay Lemma (3).

There are O( Kv D · vKD + K ) atomic constraints over v reﬁnement variables. Proof.

Each non-trivially-unsatisﬁable atomic constraint ϕ ? S ⊆ S can be understood asa choice one of 2 K subsets of constructors for each of the vD pairs of reﬁnement variable andunderlying datatype (for ϕ , we assume that each variable is associated with a particular slice),followed by a choice of one of vD variable and datatype pairs for the head, which will appear ineither the S or S position depending on the next choice, and then either: another variable (theunderlying datatype is determined by previous choice) for the other position, one of K constructors(for S ) or a choice of the 2 K subsets of constructors or a variable datatype pair (for S ). Therefore,there are at most 2 KvD · vD · ( v + K + K ) possible constraints over these reﬁnement variables and:2 KvD · vD · ( v + K + K ) = v D KvD + KvD KvD + vD KvD + K ) = O( Kv D · KvD + K ) (cid:3) Suppose the maximum level of case expression nesting is M and let V = M + Q . Lemma 11.

Suppose Γ ⊢ e = ⇒ T , C with C restricted to the context. The number of constraintsin C is O( KV D · KV D + K ) . Proof.

By applying restriction after each inference, we are guaranteed that the reﬁnement vari-ables occurring in C are a subset of those occurring in Γ and T . The only free reﬁnement variablesin Γ are those that were introduced as a result of inferring under a lambda abstraction (i.e. intro-duced by the (IAbs) rule) or those introduced as a result of inferring under a case statement (i.e.introduced by the (ICase) rule). There are at most Q introduced by abstractions and a further M introduced by case matching (each case only introduces a single reﬁnement variable X , indepen-dently of the complexity of the datatype of the scrutinees). The type T introduces at most another Q many. (cid:3) Lemma 12.

The complexity of type inference is O( N SK V D · K ( V D ( K + ) + ) ) Proof.

For each subterm of each function deﬁnition, inference must generate new constraintsand apply the saturation and restriction operations to the set of constraints obtained from com-bining the outputs of its recursive calls. Using a standard ﬁxpoint computation, and the numberof possible constraints over a given set of variables calculated in Lemma 3, saturation of a con-straint set of size n involving v reﬁnement variables can be achieved in time O( nKv D · KvD + K ) .Hence the time taken to process each subterm will be dominated by the time by saturation. Thenumber of constraints that constitutes the input to saturation for a given subterm is a functionof the inference rule applied, its premises and side conditions. The worst-case is (ICase) whichhas at most K premises and a number of side conditions. The total number of constraints C beforesaturation and restriction, in this case, is the sum of the sizes of each C i , C ′ i , C and a single con-straint contributed by the side condition. Each of these subsets is already restricted to its contextso, by Lemma 11, their sizes are at most O( KV D · KV D + K ) , giving an overall upper bound for C of O( K V D · KV D + K ) . The total number of reﬁnement variables occurring in this set is given bythose free in the Γ , T and each T i . The context Γ and type T contribute at most V many and the T i together contribute at most a further KQ . Consequently, the number is bounded above by KV .Hence, saturation can be computed in time O( K V D · K ( V D ( K + ) + ) ) . (cid:3) Theorem (33).

Under the assumption that the size of types and the size of individual functiondeﬁnitions is bounded, the complexity of type inference is O( N ) . Proof.

Follows immediately from Lemma 12 by ﬁxing all other parameters. (cid:3) ntensional Datatype Refinement 1:37

G ADDITIONAL MATERIAL FOR SECTION 10 : IMPLEMENTATION

The following is a listing of benchmark results with packages split by module.

G.1 Package aeson-1.5.2.0

Name N K V D I Time (ms)Data.Aeson 15 6 198 2 3 3.01Data.Aeson.Encode 2 0 1 0 1 2.75Data.Aeson.Encoding 0 0 0 0 0 2.72Data.Aeson.Encoding.Builder 71 6 1534 1 6 3.34Data.Aeson.Encoding.Internal 67 6 723 2 7 2.65Data.Aeson.Internal 0 0 0 0 0 2.85Data.Aeson.Internal.Functions 3 0 10 0 0 3.33Data.Aeson.Internal.Time 0 0 0 0 0 3.26Data.Aeson.Parser 0 0 0 0 0 2.62Data.Aeson.Parser.Internal 63 6 2252 2 9 3.20Data.Aeson.Parser.Time 7 1 35 1 1 2.81Data.Aeson.Parser.Unescape 0 0 0 0 0 3.01Data.Aeson.Parser.UnescapePure 19 13 1109 2 5 4.09Data.Aeson.QQ.Simple 2 6 111 1 2 7.12Data.Aeson.TH 271 4 10251 3 11 2.88Data.Aeson.Text 13 6 326 1 4 3.99Data.Aeson.Types 1 1 7 1 2 2.81Data.Aeson.Types.Class 0 0 0 0 0 2.76Data.Aeson.Types.FromJSON 96 6 2054 6 14 2.93Data.Aeson.Types.Generic 1 0 1 0 1 3.77Data.Aeson.Types.Internal 46 6 564 2 8 3.64Data.Aeson.Types.ToJSON 27 6 484 3 10 2.88Data.Attoparsec.Time 16 1 745 1 2 2.94Data.Attoparsec.Time.Internal 8 1 61 1 1 3.98 :38 Eddie Jones and Steven Ramsay

G.2 Package containers-0.6.2.1

Name N K V D I Time (ms)Data.Containers.ListUtils 11 3 61 1 4 3.97Data.Graph 69 2 1411 2 8 2.87Data.IntMap 8 0 162 0 2 2.83Data.IntMap.Internal 438 3 5519 2 23 2.62Data.IntMap.Internal.Debug 0 0 0 0 0 2.43Data.IntMap.Internal.DeprecatedDebug 4 0 84 0 1 2.95Data.IntMap.Lazy 0 0 0 0 0 2.62Data.IntMap.Merge.Lazy 0 0 0 0 0 15.55Data.IntMap.Merge.Strict 15 3 152 2 4 2.71Data.IntMap.Strict 0 0 0 0 0 3.08Data.IntMap.Strict.Internal 146 3 1541 2 14 9.76Data.IntSet 0 0 0 0 0 2.55Data.IntSet.Internal 288 5 3690 2 9 2.89Data.Map 10 0 200 0 2 2.86Data.Map.Internal 379 4 5697 2 15 2.85Data.Map.Internal.Debug 20 2 538 1 3 3.33Data.Map.Internal.DeprecatedShowTree 4 0 86 0 1 2.46Data.Map.Lazy 0 0 0 0 0 2.57Data.Map.Merge.Lazy 0 0 0 0 0 2.72Data.Map.Merge.Strict 0 0 0 0 0 1.77Data.Map.Strict 0 0 0 0 0 3.07Data.Map.Strict.Internal 137 2 2013 2 12 2.81Data.Set 0 0 0 0 0 2.75Data.Set.Internal 211 2 3362 2 11 2.86Data.Tree 24 1 497 1 7 2.84Utils.Containers.Internal.BitQueue 13 1 153 2 4 4.02Utils.Containers.Internal.BitUtil 5 0 22 0 0 3.40Utils.Containers.Internal.Coercions 2 0 6 0 0 4.58Utils.Containers.Internal.PtrEquality 2 0 19 0 0 3.04Utils.Containers.Internal.State 2 1 5 1 1 3.69Utils.Containers.Internal.StrictMaybe 3 2 15 1 1 3.69Utils.Containers.Internal.StrictPair 1 0 4 0 1 3.40Utils.Containers.Internal.TypeError 0 0 0 0 0 2.74 ntensional Datatype Refinement 1:39

G.3 Package extra-1.7.3

Name N K V D I Time (ms)Control.Concurrent.Extra 20 3 552 3 7 2.41Control.Exception.Extra 19 0 348 0 0 2.85Control.Monad.Extra 57 0 342 0 0 2.93Data.Either.Extra 9 0 94 0 0 2.91Data.IORef.Extra 4 0 40 0 0 3.50Data.List.Extra 119 3 1963 1 6 3.47Data.List.NonEmpty.Extra 17 0 84 0 0 2.36Data.Tuple.Extra 16 0 58 0 0 4.71Data.Typeable.Extra 0 0 0 0 0 2.76Data.Version.Extra 4 0 122 0 0 2.63Extra 0 0 0 0 0 3.26Numeric.Extra 5 0 19 0 0 3.46Partial 0 0 0 0 0 3.23System.Directory.Extra 8 0 265 0 0 2.72System.Environment.Extra 0 0 0 0 0 3.55System.IO.Extra 35 0 923 0 0 2.47System.Info.Extra 2 0 2 0 0 2.70System.Process.Extra 5 0 264 0 0 4.35System.Time.Extra 12 1 362 1 3 2.29Text.Read.Extra 0 0 0 0 0 2.98 :40 Eddie Jones and Steven Ramsay

G.4 Package fgl-5.7.0.2

Name N K V D I Time (ms)Data.Graph.Inductive 1 0 20 0 0 2.95Data.Graph.Inductive.Basic 17 0 452 0 0 2.47Data.Graph.Inductive.Example 53 1 5236 1 1 2.81Data.Graph.Inductive.Graph 102 1 1920 1 1 4.26Data.Graph.Inductive.Internal.Heap 19 2 314 1 6 3.71Data.Graph.Inductive.Internal.Queue 5 1 44 1 4 3.40Data.Graph.Inductive.Internal.RootPath 6 1 123 1 2 3.22Data.Graph.Inductive.Internal.Thread 20 0 133 0 0 3.66Data.Graph.Inductive.Monad 23 0 349 0 0 3.31Data.Graph.Inductive.Monad.IOArray 7 1 272 1 1 2.47Data.Graph.Inductive.Monad.STArray 8 1 282 1 1 2.93Data.Graph.Inductive.NodeMap 60 1 678 1 4 3.81Data.Graph.Inductive.PatriciaTree 23 1 871 1 2 2.79Data.Graph.Inductive.Query 0 0 0 0 0 2.74Data.Graph.Inductive.Query.ArtPoint 16 1 442 1 5 3.40Data.Graph.Inductive.Query.BCC 20 0 500 0 0 3.18Data.Graph.Inductive.Query.BFS 29 1 625 2 9 3.10Data.Graph.Inductive.Query.DFS 44 0 467 0 0 3.14Data.Graph.Inductive.Query.Dominators 36 0 655 0 0 4.05Data.Graph.Inductive.Query.GVD 10 2 209 2 3 2.88Data.Graph.Inductive.Query.Indep 9 0 208 0 0 4.08Data.Graph.Inductive.Query.MST 13 2 219 2 10 3.76Data.Graph.Inductive.Query.MaxFlow 20 0 322 0 1 2.63Data.Graph.Inductive.Query.MaxFlow2 51 2 2500 2 12 2.89Data.Graph.Inductive.Query.Monad 58 1 700 1 6 2.87Data.Graph.Inductive.Query.SP 12 2 177 2 10 3.19Data.Graph.Inductive.Query.TransClos 15 0 215 0 0 3.62Data.Graph.Inductive.Tree 8 1 306 1 1 3.87Paths_fgl 15 0 164 0 0 3.13 ntensional Datatype Refinement 1:41

G.5 Package haskeline-0.8.0.1

Name N K V D I Time (ms)System.Console.Haskeline 136 15 2343 17 18 3.55System.Console.Haskeline.Backend 6 15 55 7 2 4.28System.Console.Haskeline.Backend.DumbTerm 48 15 484 7 12 3.65System.Console.Haskeline.Backend.Posix 51 15 1843 8 10 3.96System.Console.Haskeline.Backend.Posix.Encoder 8 2 98 2 2 4.28System.Console.Haskeline.Backend.Terminfo 148 15 2108 7 13 3.98System.Console.Haskeline.Backend.WCWidth 10 1 168 1 6 4.25System.Console.Haskeline.Command 28 15 327 8 7 4.09System.Console.Haskeline.Command.Completion 47 15 807 13 13 3.68System.Console.Haskeline.Command.History 52 15 1131 11 12 2.90System.Console.Haskeline.Command.KillRing 35 15 360 9 6 3.59System.Console.Haskeline.Command.Undo 17 15 138 8 3 3.91System.Console.Haskeline.Completion 40 1 948 1 6 3.37System.Console.Haskeline.Directory 0 0 0 0 0 3.95System.Console.Haskeline.Emacs 73 15 2328 9 21 3.55System.Console.Haskeline.History 16 1 406 1 3 6.40System.Console.Haskeline.IO 11 15 185 19 6 4.53System.Console.Haskeline.InputT 65 15 1999 17 18 3.35System.Console.Haskeline.Internal 16 15 417 17 16 4.86System.Console.Haskeline.Key 26 15 647 3 2 5.61System.Console.Haskeline.LineState 67 2 710 3 6 5.61System.Console.Haskeline.Monads 12 0 21 0 0 4.84System.Console.Haskeline.Prefs 26 15 769 8 5 4.35System.Console.Haskeline.Recover 2 0 98 0 0 3.98System.Console.Haskeline.RunCommand 33 15 582 8 27 3.81System.Console.Haskeline.Term 44 15 628 7 4 3.82System.Console.Haskeline.Vi 367 15 9789 12 27 3.50 :42 Eddie Jones and Steven Ramsay

G.6 Package parallel-3.2.2.0

Name N K V D I Time (ms)Control.Parallel 2 0 0 0 0 2.98Control.Parallel.Strategies 87 1 769 2 18 3.47Control.Seq 21 0 190 0 0 3.74

G.7 Package pretty-1.1.3.6

Name N K V D I Time (ms)Text.PrettyPrint 0 0 0 0 0 3.09Text.PrettyPrint.Annotated 0 0 0 0 0 3.91Text.PrettyPrint.Annotated.HughesPJ 154 8 3148 3 16 4.23Text.PrettyPrint.Annotated.HughesPJClass 5 8 32 3 3 4.29Text.PrettyPrint.HughesPJ 58 8 472 4 7 5.23Text.PrettyPrint.HughesPJClass 5 8 23 4 3 3.11 ntensional Datatype Refinement 1:43

G.8 Package sbv-8.7.5

Name N K V D I Time (ms)Data.SBV 0 0 0 0 0 3.54Data.SBV.Char 48 44 709 43 6 3.11Data.SBV.Client 20 44 1007 43 5 3.38Data.SBV.Client.BaseIO 118 0 418 0 4 3.27Data.SBV.Compilers.C 262 44 12934 48 11 3.29Data.SBV.Compilers.CodeGen 67 44 1919 49 8 3.07Data.SBV.Control 1 2 5 1 3 3.75Data.SBV.Control.BaseIO 50 0 142 0 5 2.87Data.SBV.Control.Query 224 44 7932 43 20 3.52Data.SBV.Control.Types 7 31 283 1 2 5.38Data.SBV.Control.Utils 311 44 12922 44 33 4.02Data.SBV.Core.AlgReals 49 2 1444 2 4 4.11Data.SBV.Core.Concrete 55 14 2443 7 16 3.90Data.SBV.Core.Data 47 44 261 43 6 3.46Data.SBV.Core.Floating 69 44 1553 43 15 3.52Data.SBV.Core.Kind 16 14 724 1 3 3.46Data.SBV.Core.Model 229 44 3848 43 14 3.07Data.SBV.Core.Operations 284 44 9524 43 37 5.50Data.SBV.Core.Sized 53 44 701 43 7 3.72Data.SBV.Core.Symbolic 235 44 6695 43 46 3.84Data.SBV.Dynamic 17 44 274 44 9 3.59Data.SBV.Either 48 44 864 43 13 3.10Data.SBV.Internals 3 0 1 0 0 3.10Data.SBV.List 80 44 1184 43 12 3.54Data.SBV.Maybe 30 44 531 43 13 3.33Data.SBV.Provers.ABC 1 31 57 9 3 3.74Data.SBV.Provers.Boolector 1 31 71 9 3 3.58Data.SBV.Provers.CVC4 6 31 134 9 3 3.45Data.SBV.Provers.MathSAT 3 31 101 9 3 3.31Data.SBV.Provers.Prover 93 44 1374 48 11 3.46Data.SBV.Provers.Yices 1 31 50 9 3 3.36Data.SBV.Provers.Z3 2 31 89 9 3 3.07Data.SBV.RegExp 24 44 278 43 8 5.47Data.SBV.SMT.SMT 118 44 5193 42 8 3.55Data.SBV.SMT.SMTLib 9 44 785 14 43 3.33Data.SBV.SMT.SMTLib2 285 44 16859 13 36 3.32Data.SBV.SMT.SMTLibNames 1 0 379 0 0 5.25Data.SBV.SMT.Utils 33 31 563 9 2 3.31Data.SBV.Set 76 44 1481 43 10 3.88Data.SBV.String 53 44 1322 43 12 3.22 :44 Eddie Jones and Steven Ramsay

Data.SBV.Tools.BMC 11 44 549 44 8 3.20Data.SBV.Tools.BoundedFix 3 44 30 43 5 3.11Data.SBV.Tools.BoundedList 59 44 578 43 12 19.58Data.SBV.Tools.CodeGen 0 0 0 0 0 3.36Data.SBV.Tools.GenTest 104 44 5552 43 10 3.03Data.SBV.Tools.Induction 12 44 393 44 14 3.57Data.SBV.Tools.Overﬂow 77 44 1491 43 12 3.14Data.SBV.Tools.Polynomial 68 44 1586 43 12 3.11Data.SBV.Tools.Range 39 44 951 45 13 3.29Data.SBV.Tools.STree 19 44 488 44 20 3.39Data.SBV.Tools.WeakestPreconditions 86 44 2829 45 33 3.68Data.SBV.Trans 0 0 0 0 0 3.47Data.SBV.Trans.Control 1 2 4 1 3 3.16Data.SBV.Tuple 16 44 367 43 7 3.75Data.SBV.Utils.ExtractIO 0 0 0 0 0 4.96Data.SBV.Utils.Lib 35 3 1070 1 2 14.82Data.SBV.Utils.Numeric 62 0 353 0 0 4.65Data.SBV.Utils.PrettyNum 83 14 2387 6 9 3.78Data.SBV.Utils.SExpr 115 6 7919 3 22 3.67Data.SBV.Utils.TDiﬀ 10 0 277 0 0 4.80Documentation.SBV.Examples.BitPrecise.BitTricks 18 44 379 43 5 3.64Documentation.SBV.Examples.BitPrecise.BrokenSearch 7 44 327 44 8 2.56Documentation.SBV.Examples.BitPrecise.Legato 72 44 1347 49 24 3.30Documentation.SBV.Examples.BitPrecise.MergeSort 19 44 428 49 9 3.05Documentation.SBV.Examples.BitPrecise.MultMask 3 44 142 44 5 3.34Documentation.SBV.Examples.BitPrecise.PreﬁxSum 17 44 269 44 4 3.57Documentation.SBV.Examples.CodeGeneration.AddSub 3 44 96 49 5 3.15Documentation.SBV.Examples.CodeGeneration.CRC_USB5 8 44 184 49 4 3.35Documentation.SBV.Examples.CodeGeneration.Fibonacci 6 44 216 49 11 3.35Documentation.SBV.Examples.CodeGeneration.GCD 7 44 192 49 11 3.66Documentation.SBV.Examples.CodeGeneration.PopulationCount 7 44 191 49 8 3.24Documentation.SBV.Examples.CodeGeneration.Uninterpreted 10 44 212 49 5 3.13Documentation.SBV.Examples.Crypto.AES 216 44 5018 49 32 3.62Documentation.SBV.Examples.Crypto.RC4 22 44 793 44 18 3.45Documentation.SBV.Examples.Crypto.SHA 108 44 5830 49 18 3.69Documentation.SBV.Examples.Existentials.CRCPolynomial 10 44 360 44 12 3.07Documentation.SBV.Examples.Existentials.Diophantine 26 44 682 44 8 5.39Documentation.SBV.Examples.Lists.BoundedMutex 19 44 1375 44 22 3.21Documentation.SBV.Examples.Lists.Fibonacci 4 44 225 44 5 3.36Documentation.SBV.Examples.Lists.Nested 1 44 444 44 4 3.63Documentation.SBV.Examples.Misc.Auxiliary 4 44 155 44 5 3.38Documentation.SBV.Examples.Misc.Enumerate 7 44 205 44 11 3.26Documentation.SBV.Examples.Misc.Floating 13 44 587 44 9 3.04 ntensional Datatype Refinement 1:45

Documentation.SBV.Examples.Misc.ModelExtract 3 44 141 44 4 3.77Documentation.SBV.Examples.Misc.Newtypes 3 44 95 44 7 3.07Documentation.SBV.Examples.Misc.NoDiv0 3 44 113 43 5 3.70Documentation.SBV.Examples.Misc.Polynomials 9 44 209 43 5 3.26Documentation.SBV.Examples.Misc.SetAlgebra 0 0 0 0 0 3.53Documentation.SBV.Examples.Misc.SoftConstrain 1 44 178 44 4 3.26Documentation.SBV.Examples.Misc.Tuple 16 44 337 44 5 4.12Documentation.SBV.Examples.Optimization.Enumerate 12 44 201 44 5 3.20Documentation.SBV.Examples.Optimization.ExtField 1 44 130 44 8 2.77Documentation.SBV.Examples.Optimization.LinearOpt 2 44 224 44 5 3.60Documentation.SBV.Examples.Optimization.Production 7 44 205 44 4 3.65Documentation.SBV.Examples.Optimization.VM 8 44 568 44 8 3.24Documentation.SBV.Examples.ProofTools.BMC 5 44 140 44 9 3.89Documentation.SBV.Examples.ProofTools.Fibonacci 11 44 297 44 10 3.12Documentation.SBV.Examples.ProofTools.Strengthen 11 44 456 44 17 2.97Documentation.SBV.Examples.ProofTools.Sum 9 44 207 44 10 3.17Documentation.SBV.Examples.Puzzles.Birthday 21 44 807 44 8 6.62Documentation.SBV.Examples.Puzzles.Coins 14 44 513 44 10 3.01Documentation.SBV.Examples.Puzzles.Counts 12 44 481 44 5 2.75Documentation.SBV.Examples.Puzzles.DogCatMouse 6 44 207 44 6 3.31Documentation.SBV.Examples.Puzzles.Euler185 11 44 609 44 4 3.53Documentation.SBV.Examples.Puzzles.Fish 48 44 1029 44 7 3.43Documentation.SBV.Examples.Puzzles.Garden 12 44 494 44 10 3.64Documentation.SBV.Examples.Puzzles.HexPuzzle 21 44 1187 45 15 3.97Documentation.SBV.Examples.Puzzles.LadyAndTigers 3 44 277 44 4 3.48Documentation.SBV.Examples.Puzzles.MagicSquare 19 44 571 44 6 3.44Documentation.SBV.Examples.Puzzles.NQueens 10 44 411 44 4 4.09Documentation.SBV.Examples.Puzzles.SendMoreMoney 4 44 361 44 4 2.87Documentation.SBV.Examples.Puzzles.Sudoku 42 44 3980 44 5 3.69Documentation.SBV.Examples.Puzzles.U2Bridge 56 44 1611 44 14 3.60Documentation.SBV.Examples.Queries.AllSat 7 44 455 44 6 3.27Documentation.SBV.Examples.Queries.CaseSplit 2 44 420 48 4 3.54Documentation.SBV.Examples.Queries.Concurrency 8 44 1485 44 9 3.36Documentation.SBV.Examples.Queries.Enums 8 44 234 44 12 3.55Documentation.SBV.Examples.Queries.FourFours 29 44 1156 46 21 3.48Documentation.SBV.Examples.Queries.GuessNumber 10 44 386 44 5 2.87Documentation.SBV.Examples.Queries.Interpolants 2 44 431 44 5 3.71Documentation.SBV.Examples.Queries.UnsatCore 2 44 248 44 4 3.25Documentation.SBV.Examples.Strings.RegexCrossword 16 44 784 44 10 3.56Documentation.SBV.Examples.Strings.SQLInjection 9 44 727 45 13 4.18Documentation.SBV.Examples.Transformers.SymbolicEval 29 44 852 46 8 6.51Documentation.SBV.Examples.Uninterpreted.AUF 6 44 219 45 7 2.80Documentation.SBV.Examples.Uninterpreted.Deduce 7 44 244 44 6 3.46 :46 Eddie Jones and Steven Ramsay

Documentation.SBV.Examples.Uninterpreted.Function 2 44 52 43 5 3.36Documentation.SBV.Examples.Uninterpreted.Multiply 5 44 215 44 9 3.24Documentation.SBV.Examples.Uninterpreted.Shannon 18 44 489 43 9 3.17Documentation.SBV.Examples.Uninterpreted.Sort 3 44 129 44 4 3.20Documentation.SBV.Examples.Uninterpreted.UISortAllSat 3 44 220 44 4 3.18Documentation.SBV.Examples.WeakestPreconditions.Append 10 44 538 47 6 3.19Documentation.SBV.Examples.WeakestPreconditions.Basics 8 44 204 47 13 3.73Documentation.SBV.Examples.WeakestPreconditions.Fib 13 44 573 47 7 3.22Documentation.SBV.Examples.WeakestPreconditions.GCD 17 44 604 47 7 2.71Documentation.SBV.Examples.WeakestPreconditions.IntDiv 12 44 447 47 13 3.65Documentation.SBV.Examples.WeakestPreconditions.IntSqrt 14 44 509 47 13 2.95Documentation.SBV.Examples.WeakestPreconditions.Length 11 44 305 47 8 2.75Documentation.SBV.Examples.WeakestPreconditions.Sum 9 44 369 47 13 3.59 ntensional Datatype Refinement 1:47

G.9 Package time-1.10

Name N K V D I Time (ms)Data.Format 33 3 711 1 2 3.85Data.Time 0 0 0 0 0 2.48Data.Time.Calendar 0 0 0 0 0 2.92Data.Time.Calendar.CalendarDiﬀDays 7 1 25 1 2 3.74Data.Time.Calendar.Days 3 0 12 0 2 3.85Data.Time.Calendar.Easter 7 1 102 1 3 2.62Data.Time.Calendar.Gregorian 35 1 283 1 4 3.55Data.Time.Calendar.Julian 35 1 283 1 4 2.94Data.Time.Calendar.JulianYearDay 11 1 109 1 1 3.24Data.Time.Calendar.MonthDay 9 0 220 0 0 2.76Data.Time.Calendar.OrdinalDate 26 1 371 1 2 2.62Data.Time.Calendar.Private 19 2 118 1 1 3.70Data.Time.Calendar.Week 1 7 14 1 2 2.58Data.Time.Calendar.WeekDate 16 1 233 1 2 2.85Data.Time.Clock 0 0 0 0 0 3.72Data.Time.Clock.Internal.AbsoluteTime 4 1 27 2 3 2.93Data.Time.Clock.Internal.CTimespec 8 1 283 1 4 4.02Data.Time.Clock.Internal.CTimeval 2 1 90 1 2 3.46Data.Time.Clock.Internal.DiﬀTime 3 0 10 0 1 3.74Data.Time.Clock.Internal.NominalDiﬀTime 3 0 7 0 1 2.84Data.Time.Clock.Internal.POSIXTime 1 0 1 0 1 2.29Data.Time.Clock.Internal.SystemTime 8 1 74 1 3 2.63Data.Time.Clock.Internal.UTCDiﬀ 2 1 14 3 3 2.41Data.Time.Clock.Internal.UTCTime 2 1 6 1 2 3.83Data.Time.Clock.Internal.UniversalTime 1 0 3 0 1 3.74Data.Time.Clock.POSIX 6 1 62 3 3 2.97Data.Time.Clock.System 9 1 143 3 3 2.62Data.Time.Clock.TAI 8 1 167 3 9 2.73Data.Time.Format 0 0 0 0 0 2.85Data.Time.Format.Format.Class 24 2 614 3 7 4.04Data.Time.Format.Format.Instances 3 1 29 2 2 2.85Data.Time.Format.ISO8601 66 3 2137 6 10 2.81Data.Time.Format.Internal 0 0 0 0 0 2.90Data.Time.Format.Locale 11 1 605 1 2 2.63Data.Time.Format.Parse 14 1 285 2 2 2.92Data.Time.Format.Parse.Class 25 3 1375 2 6 2.99Data.Time.Format.Parse.Instances 17 1 506 2 2 2.81Data.Time.LocalTime 0 0 0 0 0 3.44Data.Time.LocalTime.Internal.CalendarDiﬀTime 5 1 29 2 3 2.97Data.Time.LocalTime.Internal.LocalTime 14 1 122 3 4 4.71Data.Time.LocalTime.Internal.TimeOfDay 24 1 294 1 3 2.83Data.Time.LocalTime.Internal.TimeZone 16 2 343 3 3 2.79Data.Time.LocalTime.Internal.ZonedTime 6 1 46 5 4 2.96 :48 Eddie Jones and Steven Ramsay

G.10 Package unordered-containers-0.2.11.0unordered-containers-0.2.11.0