aa r X i v : . [ c s . P L ] A ug For Scalable, Flow- and Context-Sensitive Verification of Pattern-Match Safety
EDDIE JONES,
University of Bristol
STEVEN RAMSAY,
University of BristolThe pattern-match safety problem is to verify that a given functional program will never crash due to non-exhaustive patterns in its function definitions. We present a refinement type system that can be used tosolve this problem. The system extends ML-style type systems with algebraic datatypes by a limited form ofstructural subtyping and environment-level intersection. We describe a fully automatic, sound and completetype inference procedure for this system which, under reasonable assumptions, is linear in the program size.A prototype implementation for Haskell is able to analyse a selection of packages from the Hackage databasein a few hundred milliseconds.Additional Key Words and Phrases: higher-order program verification, refinement types
The pattern match safety problem asks, given a program with non-exhaustive (algebraic datatype)patterns in its function definitions, is it possible that the program crashes with a pattern-matchexception? Consider the example Haskell code in Figure 1. This code defines the two main ingre-dients in a typical definition (see e.g. [21]) of conversion from arbitrary propositional formulas topropositional formulas in disjunctive normal form (represented as lists of lists of literals). Usingthese definitions, the conversion can be described as the composition dnf ≔ nnf2dnf ◦ nnf .Notice that the definition of nnf2dnf is partial: it is expected only to be used on inputs that arein negation normal form (NNF). Consequently, unless is can be verified that nnf always producesa formula without any occurrence of Imp or Not , then any application of dnf to an expression oftype Fm a may result in a pattern match failure exception. In this paper we present a new refine-ment type system that can be used to perform this verification statically and automatically. Typeinference is compositional and incremental so that it can be integrated with modern developmentenvironments: open program expressions can be analysed and only the parts of the code that aremodified need to be reanalysed as changes are made.Whilst there are other analyses in the literature that can also verify instances of the foregoingexample ours is, as far as we are aware, the only to offer strong guarantees on predictability, whichwe believe to be key to the usability of such systems in practice. • The analysis is characterised by the type system, which is a natural, yet expressive extensionof ML-style type systems with algebraic datatypes. Hence, the programmer can reason aboutwhen it will succeed by reasoning about typing. • The analysis runs in time that is, in the worst-case, linear in the size of the program (underreasonable assumptions on the size of types and the nesting of matching).We elaborate on these in the following.
Sound and terminating program analyses are conservative: there are always programs withoutbugs that, nevertheless, cannot be verified. Identifying a large fragment for which the analysis iscomplete, i.e. a class of safe programs for which verification is guaranteed, allows the programmerto reason about the behaviour of the analysis on their code. In particular, when an analysis fails
Authors’ addresses: Eddie Jones, Department of Computer Science, University of Bristol; Steven Ramsay, Department ofComputer Science, University of Bristol. :2 Eddie Jones and Steven Ramsaydata L a =Atom a| NegAtom adata Fm a =Lit (L a)| Not (Fm a)| And (Fm a) (Fm a)| Or (Fm a) (Fm a)| Imp (Fm a) (Fm a) nnf (Lit (Atom x)) = Lit (Atom x)nnf (Lit (NegAtom x)) = Lit (NegAtom x)nnf (And p q) = And (nnf p) (nnf q)nnf (Or p q) = Or (nnf p) (nnf q)nnf (Imp p q) = Or (nnf (Not p)) (nnf q)nnf (Not (Not p)) = nnf pnnf (Not (And p q)) = Or (nnf (Not p)) (nnf (Not q))nnf (Not (Or p q)) = And (nnf (Not p)) (nnf (Not q))nnf (Not (Imp p q)) = And (nnf p) (nnf (Not q))nnf (Not (Lit (Atom x))) = Lit (NegAtom x)nnf (Not (Lit (NegAtom x))) = Lit (Atom x)nnf2dnf (Lit a) = [[a]]nnf2dnf (Or p q) = List.union (nnf2dnf p) (nnf2dnf q)nnf2dnf (And p q) = distrib (nnf2dnf p) (nnf2dnf q)where distrib xss yss =List.nub [ List.union xs ys | xs < − xss, ys < − yss ] Fig. 1. Conversion to disjunctive normal form. to verify a program that the user believes to be safe, it gives them an opportunity to take action,such as by programming more defensively, in order to put their program into the fragment andthus be certain of verification success.However, for this to be most effective, the fragment must be easily understood by the averagefunctional programmer. Our analysis is complete with respect to programs typable in a naturalextension of ML-style type systems with algebraic datatypes. Indeed it is characterised by thissystem: the force of Theorems 25 and 28 is to say that it forms a sound and complete inferenceprocedure. The system is presented in full in Section 5, but the highlights are as follows:(i) The datatype environment introduced by the programmer, e.g. L a and Fm a , is completed :every datatype whose definition can be obtained by erasing constructors from one of thosegiven is added to the environment for the purpose of type assignment. These new datatypesare called intensional refinements . These additional types allow for the scrutinee of a matchto be typed with a datatype that is more precise than the underlying type provided by theprogrammer. For example, the datatypes in Figure 2 are among the intensional refinementsof Fm a , where data A a = Atom a is an intensional refinement of L a . Of course, the namesof the datatypes are irrelevant.(ii) There is a natural notion of subtyping between intensional refinement datatypes which isincorporated into the type system through an unrestricted subsumption rule. For example, Clause a and Cube a are both subtypes of the intensional refinement: data NFm = Lit (L a) | Or (NFm a) (NFm a) | And (NFm a) (NFm a) which is itself a subtype of Fm a . However, Clause a , Cube a and STLC a are all incomparable.(iii) The typing rule for the case analysis construct, by which pattern matching is represented,enforces that matching is exhaustive with respect to the type of the scrutinee. This ensures ntensional Datatype Refinement 1:3data Clause a =Lit (L a)| Or (Clause a) (Clause a) data STLC a =Lit (A a)| And (STLC a) (STLC a)| Imp (STLC a) (STLC a) data Cube a =Lit (L a)| And (Cube a) (Cube a) Fig. 2. Some intensional refinements of Fm a . that the analysis of matching is sound: programs for which the match is not exhaustive willnot be typable. Moreover, the rule is flow-sensitive , with the type of the match only dependingon the types of the branches corresponding to the type of the scrutinee. For example, thefollowing function can be assigned the type ( a → b ) → Cube a → Cube b and it can beassigned the type ( a → b ) → Clause a → Clause b , but not the type ( a → b ) → STLC a → STLC b . map f (Lit (Atom x)) = Lit (Atom (f x))map f (Lit (NegAtom x)) = Lit (NegAtom (f x))map f (And p q) = And (map p) (map q)map f (Or p q) = Or (map p) (map q) Flow sensitivity is essential for handling typical use cases. Often a single large datatype isdefined but, locally, certain parts of the program work within a fragment (e.g. only on clauses).Flow sensitivity helps to ensure that transformations on values inside the fragment remaininside the correct datatype refinement — otherwise map could only advertise that it returnsformulas in type
NFm a . For example, Elm-style web applications typically define a single,global datatype of actions although the constituent pages may only be prepared to handlecertain (overlapping) subsets locally.(iv) Finally, refinement polymorphism, and hence context-sensitivity , is provided by allowing forenvironments that have more than a single refinement type binding for each free programvariable, i.e. an environment-level intersection. For example, suppose trivial : Clause a → Bool checks a clause for complimentary literals, isFunTy : STLC a → Bool checks if a formulafrom the simply typed fragment corresponds to a function type, and rn : String → String performs a renaming of propositional atoms. Then the following expression is well typed: λxy . trivial ( map rn x ) || isFunTy ( map rn y ) This is because the typing environment contains both of the aforementioned types for map .Note: this is polymorphism in the class of formulas, not only in the type a of their atoms.To distinguish between the typing assigned to the program by the programming language (whichwe consider part of the input to the analysis) from the types that can be assigned in our extendedsystem, we call the former the underlying typing of the program.Characterising the power of an analysis with a type system allows for the programmer to reasonabout its behaviour by using typings as a kind of certificate . Returning to the above example, theprogrammer can be certain that uses of dnf will be verifiably safe because they can synthesize theintensional datatype refinement NFm , and quickly check the typings nnf : Fm a → NFm a and nnf2dnf : NFm a → [[ L a ]] in their mind. The example is rather contrived, but we may rather imagine such combinations occurring in different parts of the program. :4 Eddie Jones and Steven Ramsay
Our analysis takes the form of a type inference procedure for the system described above. Inferenceproceeds by generating and solving typing constraints. The constraints are inclusions, represent-ing the flow of data, guarded by requirements on the presence of certain datatype constructors,representing sensitivity to the context. For example, three of the constraints arising in the analysisof the first case of nnf above are:
Atom ∈ Y ( L a ) Lit ∈ Z ( Fm a ) Lit ∈ X ( Fm a ) , Atom ∈ X ( L a ) ? Y ( L a ) ⊆ Z ( L a ) and concern the refinement type variables X , Y and Z . The former two say that the constructor Atom must be provided by the refinement datatype Y at the level of literals and that the constructor Lit must be provided by the refinement datatype Z , which will characterise the output of thiscase, at the level of formulas. The third, which relates to the application Lit ( Atom x ) , says thatwhenever the refinement datatype X characterising the input to nnf is required to provide the Lit constructor at the level of formulas and the
Atom constructor at the level of literals, then it mustbe that constructors provided by the intensional refinement Y are a subset of those provided by Z at the level of literals. By using a version of constrained types (e.g. as described by Odersky,Sulzmann, and Wehr [31]) to describe sets of solutions, inference is compositional.Since our constraints can be viewed as definite inequalities over a finite semilattice in the senseof Rehof and Mogensen [37], sets of constraints can be solved in time which is bounded by a linearfunction of their size but, as is typical for constrained type inference, this may already exponentialin the size of the program. However, we show how to exploit compositionality and the restrictedform of intensional refinements to localise this exponential complexity. Consequently, under somereasonable assumptions, our inference procedure is exponential in the size of the underlying typesof the program and linear in the number of function definitions.Inspired by the long and prolific line of work on set constraint based program analysis [2–4, 23],rather than building a model explicitly (as in [37]) we have designed a set of rules for puttingconstraints into a solved form. We show that our constraint sets in solved form have the followingremarkable property, stated formally as Theorem 32.Suppose C is a set of constraints in solved form over variables V and let I ⊆ V be arbitrary. Let C ↾ I , called the restriction of C to I be those constraints in C inwhich occur only variables from I . Then every solution to C ↾ I can be extendedto a solution of all C .In the restriction C ↾ I , entire constraints are culled, including those that involve a mixture of vari-ables from I and V \ I . Such mixed constraints, intuitively, impose compatibility requirements onthe different components of a solution to C . What is significant about the above property is thatit guarantees not only that the part of the constraint set only concerned with V \ I is internallyconsistent but, moreover, that the mixed constraints will be satisfiable no matter which solutionto C ↾ I is chosen. The reason is that, in our solved form, the force of any requirement placed onvariables in I due to interaction with variables in V \ I in a mixed constraint is explicitly rephrasedin a number of new constraints written purely over variables in I .We exploit compositionality in order to choose a minimal set of variables I with which to re-strict constraint sets. Constraint generation proceeds as a recursive traversal of the each functiondefinition, associating a set of constraints C with each term in context Γ ⊢ e : T . In general, C will contain refinement type variables that occur free in Γ and T — we call these variables the interface I — but it will also typically contain refinement type variables that do not; for example,because they are associated with program points internal to M . For the purpose of typing, the Such variables are often existentially quantified in other presentations. ntensional Datatype Refinement 1:5 particular assignment to variables not occurring in the interface is irrelevant: Γ θ ⊢ e : Tθ and Γ θ ′ ⊢ e : Tθ ′ are identical whenever θ and θ ′ agree on I . Therefore, as a direct consequence of theaforementioned property, it is always sound to put constraints into solved form and restrict to theinterface.Putting the set of constraints associated with e into solved form and restricting to the interfaceof e ensures that the number of constraints only depends on the size of the types in the interfaceand the size of their definitions. In particular, assuming the depth of nested pattern matching isfixed, the size of constraint sets that are ever considered by the analysis are independent of the sizeof the program : we perform a small (but exponential in the size of types occurring in the interface)fixed point computation at every program point, rather than an enormous (exponential in thesize of the program) fixed point computation when processing the program’s entry point. If weconsider the size of the underlying type assignment and the size of the largest function definitionfixed, then the analysis scales linearly in the size of the program.We have implemented our System in Haskell as a GHC Plugin and ran it on a selection ofpackages from the Hackage database. The plugin takes a Haskell package to be compiled and runsour type inference algorithm over the whole code to yield a constrained type assignment and a setof type errors. The average time taken to process each module is in the order of milliseconds andthe results show very stark contrast between the number of refinement variables associated withthe program points in the module (often > 10000) and the number of refinement variables in theinterfaces (typically < 20). The rest of the paper is structured as follows. In Section 2 we describe a Haskell-like functionalprogramming language which forms the setting for our work. This is followed in Section 3 byour definitions of refinement. Then in Sections 4 and 5 by the definition of the type system thatcharacterises the analysis. In Sections 6, 7 and 8 we present our analysis as a type inference algo-rithm, generating and solving constraints. We discuss the restriction operation and its complexityin Section 9 and we report on our implementation in Section 10. Finally, we conclude and discussrelated work in Section 11.
Preliminaries.
Given sets X and Y , let us write X → Y for the set of all functions from X to Y and Map
X Y for the set of all finite maps between X and Y . As usual function arrows areassumed to associate to the right. Additionally, we define the indexing of function arguments, thatis ( X → · · · → X m → Y )[ i ] = X i for all i ∈ [ .. m ] . Given a family of sets Y x indexed by x ∈ X ,let us write Π x ∈ X . Y x for the subset of X → Ð x ∈ X Y x that contains only functions that areguaranteed to map each x ∈ X to some element of Y x and let us write Σ x ∈ X . Y x for the subsetof X × Ð x ∈ X Y x in which the second component of each pair ( x , y ) is guaranteed to belong to Y x .Given a family of sets Y x indexed by x ∈ X , let us write Ý x ∈ X Y x for their disjoint sum and inj x for each of the canonical injections. Types.
We assume a countable collection A of type variables, ranged over by α ; a finite collection B of base types, ranged over by b , and a countable collection D of algebraic datatype identifiers ranged over by d . These can be thought as the names of first-order type constructors. Each datatypeidentifier has a fixed arity, and only forms a proper type when supplied with the appropriatenumber of type arguments. We refer to datatype identifiers with it’s argument as a datatype, and :6 Eddie Jones and Steven Ramsay when it is clear from the context we will also write these as d . (Monotypes) T , U , V F α | b | d T | T → T (Type schemes) S F T | ∀ α . S We write Dt D to stand for the set of all datatypes with datatype identifiers drawn from the set D . Ty D and Sch D are defined similarly for monotypes and schemes. We consider monotypes tobe a trivial instance of type schemes where convenient. The purpose of distinguishing base typesfrom datatypes is that the latter may have non-trivial refinements whereas the former may not.For example, we will consider Int to be a base type, Fm a datatype identifier and Fm Int a datatype.Type schemes are identified up to renaming of bound variables.
Lifting over types.
Given a relation on datatypes R ⊆ Dt D × Dt D , we write Ty ( R ) for therelation on Ty D × Ty D defined inductively by the following: Ty ( R )( α , α ) Ty ( R )( b , b ) R ( d , d ) Ty ( R )( d , d ) Ty ( R )( T , T ) Ty ( R )( T , T ) Ty ( R )( T → T , T → T ) Expressions and modules.
We assume a countable collection of term variables, ranged over by x , y , z , and variations. As well as a countable collection K of datatype constructors, ranged over by k . The arity of a constructor is denoted Arity ( k ) . The expressions of the language are: m F ϵ | m · h x = e i e F c | k | e e | e T | λx : T . e | Λ α . e | case e of { k ® x e p · · · p k m ® x m e m } Expressions are identified up to renaming of bound variables and we will adopt the Barendregtvariable convention in order to retain a simple notation. Since we are defining a refinement typesystem, we will assume that the input program already has a typing assigned by the underlyingtype system of the programming language. We assume that this is manifest, in part, by the in-sertion of appropriate type abstraction Λ α . e and application e T terms, and by the annotation ofterm abstractions with their argument type λx : T . e . We also assume, as is the case for GHC Core,that pattern matching has been preprocessed into case expressions in which patterns are 1 leveldeep, i.e have the form k x · · · x n for some constructor k . Modules are simply a sequence of vari-able definitions h x = e i that may be empty ϵ . For simplicity of presentation, we allow recursivedefinitions but not mutually recursive definition sets. Datatype environments.
The meaning of datatypes is defined by an environment of datatypedefinitions. Each datatype definition introduces a new datatype identifier along with a collectionof datatype constructors that can be used to build instances of the type.
Definition 1 (Datatype Environment). A datatype environment is a pair consisting of a set D ⊆ D of datatypes identifiers and a function ∆ : D → Map K ( Sch D ) mapping each datatype identifier d to a finite map which records the associated constructors and their type. We assume these typesonly concern the type variables that appear in datatype’s definition, and so is of shape: ∀ α . U →· · · → U m → d α , where m is the constructor’s arity. For convenience, and as the return typeof a constructor is predetermined, we will often identify ∆ ( d )( k ) with just the sequence of typescorresponding to a constructor’s arguments, i.e. [ U , .., U m ] where U i = ∆ ( d )( k )[ i ] .Since datatype environments are partial functions on D , they inherit the natural partial orderin which ∆ ⊆ ∆ just if the graph of the former is included in the graph of the latter. If ∆ ⊆ ∆ we say that ∆ is a subenvironment of ∆ . ntensional Datatype Refinement 1:7 Note that the notion of subenvironment only concerns the datatypes that are defined in anenvironment and not the definitions of those datatypes (the constructors and their types), whichwill be treated by the notion of refinement in the sequel.
Henceforth we will fix a particular datatype environment ∆ : D → Map K ( Sch D ) which we callthe underlying datatype environment . We think of ∆ as the datatype environment provided by theprogrammer. Example 2.
We will use the following as running example of underlying datatype environment.Consider the datatype
Lam of λ -terms with arithmetic using a locally nameless representation: data Arith = Lit Int | Add | Muldata Lam = Cst Arith | BVr Int | FVr String | Abs Lam | App Lam Lam These datatypes are slightly artificial, but they allow us to illustrate several features of the defini-tions in one example. For simplicity, we will consider
Int and
String to be base types (which will,therefore, not be refined).The underlying datatype environment contains definitions for all the datatypes declared by theprogrammer. Some datatype definitions require the definitions of other datatypes to be understoodproperly. For example, to understand
Lam , one must also understand the definition of
Arith sinceone is defined in terms of the other. There is a notion of a subenvironment that contains all andonly those definitions that are needed to understand one particular datatype.
Definition 3 (Slice).
Suppose ∆ : D → Map K ( Sch D ) . One can always construct a subenvi-ronment of ∆ by starting from a given datatype d ∈ D and closing under transitive dependencies.Define the slice of d through ∆ , written h d i ∆ , as the least subenvironment of ∆ containing d .For example, in the environment ∆ described in Example 2, we have h Lam i ∆ being the wholeenvironment, and h Arith i ∆ being just the definition of Arith itself.
Definition 4 (Refinement).
We say that a datatype environment ∆ : D → Map K ( Sch D ) isa refinement of a datatype environment ∆ : D → Map K ( Sch D ) just if the definitions of thedatatypes in ∆ are bounded above by the definitions in ∆ , that is: for all d ∈ D , ∆ ( d ) ⊆ ∆ ( d ) (i.e. the graph of the first map is included in the graph of the second).Suppose ∆ ( d )( k ) is some type scheme S and ∆ is a refinement of ∆ , then ∆ ( d )( k ) is the same S . So, the refinements of ∆ : D → Map K ( Sch D ) are in one-to-one correspondence with choicesof constructors for each of the constituent datatypes. More precisely, every function f : Π d ∈ D . P( dom ∆ ( d )) determines a refinement ∆ f satisfying: ∆ f ( d )( k ) = ( ∆ ( d )( k ) if k ∈ f ( d )⊥ otherwiseand each refinement arises in this way. Example 5.
The following refinement of the underlying environment from Example 2 describesa type of closed, applicative terms over linear arithmetic. data Arith = Lit Int | Add Arithdata Lam = Cst Arith | App Lam Lam:8 Eddie Jones and Steven Ramsay
This refinement is determined by the choice f LL satisfying: f LL ( Arith ) = { Lit , Add } f LL ( Lam ) = { Cst , App } For the purpose of assigning types to the program, we construct a new datatype environmentconsisting of all possible refinements of the underlying environment supplied by the programmer . Definition 6 (The Intensional Refinement Environment).
Given a family of datatype environments ∆ i ∈ I : D i → Map K ( Sch D i ) we define their coproduct as the environment: Þ i ∈ I ∆ i : ( Þ i ∈ I D i ) → Map K ( Sch ( Þ i ∈ I D i )) whose domain is simply a disjoint sum of sets (as defined in the preliminaries). The coproductcomes equipped with canonical injections inj i satisfying ( Ý i ∈ I ∆ i )( inj i d )( d )( k ) = inj i ( ∆ i ( d )( k )) wherever the latter is properly defined.The intensional refinement environment , written ∆ ∗ , is the coproduct of all the refinements of ∆ : ∆ ∗ ≔ Þ f ∈ I ∆ f : D ∗ → Map K ( Sch D ∗ ) where I is the set Π d ∈ D . P( dom ∆ ( d )) of functions from underlying datatypes d to appropriatesubsets of constructors. That is, the set of all possible refinements.Note that, formally, the datatype identifiers whose definitions are given in the intensional re-finement environment are of shape inj ∆ d with d ∈ D . This is convenient because one can readsuch a name as “the refinement of d whose definition is ∆ ”. However, we will continue to use morefriendly names (and Haskell notation) in our examples. Example 7.
The type of closed, applicative terms over linear arithmetic from Example 5 can befound in ∆ ∗ alongside the type LArith of linear arithmetic constants:
LATm ≔ inj f LL Lam LArith ≔ inj f LA Arith the latter being defined by f LA ( Arith ) = { Lit , Add } and f LA ( Lam ) = ∅ . This datatype could equiv-alently be defined as inj f LL Arith (or in many other ways). Other refinement datatypes defined in-clude the type
LLam of closed λ -terms over linear arithmetic, the type ATm of applicative termsand the type
ATm of closed applicative terms, whose definitions we give in the more convenientnotation: data ATm = Cst Arith | App ATm ATm data ATm = FVr String | Cst Arith | App ATm ATmdata LLam = Cst LArith | BVr Int | Abs LLam | App LLam LLam Definition 8 (Refinement Type).
A type (scheme) S ∈ Sch D ∗ is said to be a refinement type .Refinement types come equipped with an underlying type , written U( S ) , which is a type (scheme)in Sch D defined recursively as follows: U( a ) = a U( b ) = b U( inj f d T ) = d U( T )U( T → T ) = U( T ) → U( T )U( ∀ a . S ) = ∀ a . U( S ) It would suffice to take all the refinements of all the slices (which itself still includes some redundancy, but this wouldcomplicate the definitions for no practical gain) ntensional Datatype Refinement 1:9(SShape) (cid:12)(cid:12) U( T ) , U( T ) ⊢ T T (SMis) (cid:12)(cid:12) dom ( ∆ ∗ ( d )) * dom ( ∆ ∗ ( d )) ⊢ d T d T ⊢ U i [ T / α ] 6⊑ U i [ T / β ] (SSim) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) m = Arity ( k ) , i ∈ [ .. m ] ∆ ∗ ( d )( k ) = ∀ α . U → · · · U m → d α ∆ ∗ ( d )( k ) = ∀ β . U → · · · U m → d β ⊢ d T d T ⊢ T ′ T (SArrL) ⊢ T → T T ′ → T ′ ⊢ T T ′ (SArrR) ⊢ T → T T ′ → T ′ Fig. 3. The complement of the subtyping relation on monotypes.
In the following, we will assume that we are given a program equipped with a complete under-lying typing, that is: every subterm M has an associated type S ∈ Sch D . Our task will be to finda new refinement typing: an assignment to each subterm M of a refinement type S ′ ∈ Sch D ∗ thathas the same “shape” as the underlying type S of M in the sense that U( S ′ ) = S . Refinement induces a natural ordering on refinement datatypes according to which constructorsare available in their definition. This ordering can then be lifted to all types built over thosedatatypes in the obvious way.
Definition 9 (Subtyping).
The judgement ⊢ T ⊑ T is defined coinductively, using the systemof rules in Figure 3. We extend subtyping to two schemes that have the same quantifier prefix,writing: ⊢ ∀ α . T ⊑ ∀ α . T whenever ⊢ T ⊑ T . We say that two types T and T are subtypeequivalent and write ⊢ T ≡ T just if ⊢ T ⊑ T and ⊢ T ⊑ T .It is straightforward to show the following by coinduction: Lemma 1.
Subtyping is a preorder.Intuitively, refinement specifies the possible shapes of types that are then interrelated by sub-typing. Refinement is a covariant treatment of arrow types, since we have U( T → T ) = T ′ → T ′ iff U( T ) = T ′ and U( T ) = T ′ . On the other hand, as can be seen from the definition, subtypinginterprets the argument type contravariantly. Consequently, there are some U( T ) = T for which T ⊑ T and other U( T ) = T for which T ⊑ T (viewing an underlying type as its own trivialrefinement).We give the definition coinductively because, as usual, there is a notion of simulation that arisesnaturally from our coalgebraic view of datatype environments. Consequently, it is most straight-forward to think of the defining rules as providing a system in which to construct finite refutationsof subtype inequalities T T , which will, ultimately, fail to hold either because the types T and T have a different shape, or because T provides some constructor that T does not. :10 Eddie Jones and Steven Ramsay Example 10.
Following the running example, the judgement ⊢ LATm → String ATm → String follows by a simple refutation: (SMis) ⊢ Arith LArith(SSim) ⊢ ATm LATm (SArrL) ⊢ LATm → String ATm → String
Conversely, we can use the coinduction principle to show that ⊢ T ⊑ T , in which case werequire a model of ⊑ that contains ( T , T ) . Example 11.
For ⊢ ATm → String ⊑ LATm → String we provide the following witness: (cid:26) ( ATm → String , LATm → String ) , ( String , String ) , ( LATm , ATm ) , ( LArith , Arith ) , ( Int , Int ) (cid:27) It can be easily verified that this set is a model of the defining rules for ⊑ and hence, by coinduction,is contained within it.However, such models can be a bit unwieldy in general as the types involved get more complex.We can do better by observing that the definition can be approximated by a coinductive part,concerning datatypes, and an inductive part, by which a subtyping relationship between datatypesis lifted to all types. Consequently, we need only find a model of the coinductive part, which ismuch neater since it only concerns Dt D × Dt D . The following can be shown by a straightforwardcoinduction. Lemma 2 (Simulation).
Let R ⊆ Dt D × Dt D and suppose that, for all ( d T , d T ) ∈ R , and forall k such that ∆ ( d )( k ) is defined: • ∆ ( d )( k ) is defined. • And, moreover, Ty ( R )( U i , U i ) for each i ∈ [ .. Arity ( k )] , where U and U are the argumenttypes of ∆ ( d )( k ) and ∆ ( d )( k ) instantiated at T and T respectively.Then it follows that Ty ( R ) is included in the subtype relation.Using this result, it suffices to exhibit R ≔ {( LATm , ATm ) , ( LArith , Arith )} in order to conclude ⊢ ATm → String ⊑ LATm → String from Example 10. Intuitively, this witness determines themodel Ty ( R ) , which contains the model of Example 11. In this section, we present a refinement type system whose purpose is to exclude the possibilityof pattern-match failure. To achieve this, the typing rule for pattern-matching requires that casesare exhaustive according to the type of the scrutinised expression. However, the system allowsfor all refinement datatypes and incorporates the above notion of subtyping, which allows for thescrutinised expression to be typed much more precisely than is possible in the underlying typesystem.For the purpose of defining the refinement type system, we make some standard Hindley-Damas-Milner assumptions about the underlying type system, namely that type application happens im-mediately after introducing a variable of polymorphic type and type abstraction happens only atthe point of definition. As a minor simplification, we assume that constants are monomorphic andwrite C ( c ) for the monotype assigned to c axiomatically. Since this is a refinement type system, By which we mean a set of pairs of types satisfying all -defining rules. ntensional Datatype Refinement 1:11(TModE) Γ ⊢ ϵ : Γ Γ ⊢ m : Γ ′ Γ ′ ∪ { x : T } ⊢ e : T ′ ⊢ T ′ ⊑ T (TModD) Γ ⊢ m · h x = Λ α . e i : Γ ′ ∪ { x : ∀ α . T } Fig. 4. Typing for modules. we assume that all expressions have already been assigned an underlying type, which we will typ-ically write with an underline to aide readability. We will only need to consult these underlyingtypes when they appear in the syntax, e.g. abstraction and type application.Additionally, we relax the normal definition of a type environment from a function to a relation.Program variables may, therefore, have many types as long as they refine the same underlyingtype. This assumption is equivalent to allowing environment-level intersection types.
Definition 12 (Type assignment). A type environment , typically Γ or ∆ , is a finite relation betweenprogram variables x and type schemes S , whose elements are typically written x : S . We requirethat x : S ∈ Γ and x : S ∈ Γ implies U( S ) = U( S ) . This ensures that U( Γ ) can be definedin the obvious way. The type assignment system is divided into two sets of rules, for expressions(Figure 5) and for modules (Figure 4), defining judgements, respectively: Γ ⊢ e : T Γ ⊢ m : ∆ in which U( Γ ) ⊢ e : U( T ) and U( Γ ) ⊢ m : U( ∆ ) are the underlying typings, provided by theprogramming language, for the expression e and the module m respectively.The system is conceptually similar to an underyling ML-style system, but the following shouldbe noted. • Any suitable refinement datatype d can be used in order to type a datatype constructor orthe scrutinee of a case statement. • The notion of subtyping from the previous section is incorporated through a subsumptionrule (recall that ⊢ T ⊑ T implies that T and T have the same shape according to U ). • The pattern-matching rule is restricted by a side condition requiring that the cases are ex-haustive. • The branches of the case expression only need to be typed if the branch is reachable, in-corporating flow-sensitivity. This relaxation only makes sense for a refinement type system,and from an operational point of view it makes no difference to the set of computationsexpressible. • Finally, everywhere a particular underlying type is required by the syntax, an arbitrarychoice of refinement type of the appropriate shape can be made in its place.As discussed in the introduction, allowing several types for each term ensures they can be usedin different contexts. This approach is more lightweight than an intersection type system, andarguably easier for programmers to reason about if types are to be considered as certificates. Whenit comes to algorithmic inference, however, the non-deterministic aspect would be problematic.Instead, in Section 7, we rely on refinement polymorphism to summarise every typing of a variablein some environment compactly by a single constrained type scheme. The polymorphism of thiskind is no different from that of the Hindley-Milner system, which could equally be viewed as aninfinite intersection type system, or indeed allowing several typings of the same variable in anenvironment. Likewise, it is simpler to define polymorphic constructors and datatypes, than toconsider each instantiation separately. :12 Eddie Jones and Steven Ramsay(TVar) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) U( T ) = Tx : ∀ α . T ∈ Γ Γ ⊢ x T : T [ S / α ] (TCst) Γ ⊢ c : C ( c ) (TCon) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) U( T ) = Tk ∈ dom ( ∆ ∗ ( d )) ∆ ∗ ( d )( k ) = ∀ α . T Γ ⊢ k T : T [ S / α ] Γ ⊢ e : T (TSub) (cid:12)(cid:12) ⊢ T ⊑ T Γ ⊢ e : T Γ ∪ { x : T } ⊢ e : T (TAbs) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) U( T ) = T x < dom Γ Γ ⊢ λx : T . e : T → T Γ ⊢ e : T → T Γ ⊢ e : T (TApp) Γ ⊢ e e : T Γ ⊢ e : d T ( ∀ i ≤ m ) Γ ∪ { x i : ∆ ∗ ( d )( k i )[ T / α ]} ⊢ e i : T (TCase) (cid:12)(cid:12) dom ( ∆ ∗ ( d )) = { k , . . . , k m } Γ ⊢ case e of {| mi = k i x i e i } : T Fig. 5. Type assignment for expressions.
Example 13.
Recall the refinements of Example 7 and consider the function cloSub , with under-lying type
List ( String × Lam ) →
Lam → Lam , whose purpose is to closing an applicative termby substituting closed terms everywhere: cloSub m t =case t ofFVr s → lkup m sCst c → Cst cApp u v → App (cloSub m u) (cloSub m v)
To keep the example simple, we assume that the lookup function lkup has the following type: ∀ α . List ( String × α ) → String → α in the environment. Then the function cloSub can be assignedthe refinement type : List ( Strinд × ATm ) → ATm → ATm Thus expressing that fact that the application of a closing substitution to a arbitrary applicative term yields a closed applicative term. This is possible due to a combination of the features of thesystem. First, observe that it is possible, in the abstraction rule, to assume that the bound variable m of underlying type List ( String × Lam ) has type List ( String × ATm ) in (TAbs) since it can easily beseen that the former is a refinement of the latter. Then it follows that lkup Lam m i can be assignedthe type Atm by choosing Atm for T in (TVar) . Second, under the assumption that the boundvariable t has refinement type Atm , it follows from (TCase) that the variable c that is bound by thecase Cst c can be assigned the type Arith . Note that the rule (TCase) is applicable only becausewe have chosen the refinement
ATm of Lam which guarantees that the input will not containany abstractions. Then, in the body of the case, we can choose instead the more specific typing Here,
List and × can be understood as the trivial refinements of their namesakes, i.e. with all constructors available. ntensional Datatype Refinement 1:13Cst : Arith → ATm . Similarly, u and v are assigned the type ATm so that the subexpressions cloSub m u and cloSub m v in the body of the final case can be assigned the type
ATm . Then thetype of the body as a whole, and therefore the entire case analysis, is also Atm .The central problem is typability, for closed expressions: given an underlying datatype environ-ment ∆ and a closed module m which is typed in ∆ , does there exist a refinement type assignmentto the functions of m ? Typically m will contain library functions whose source is not available tothe system, but for which an underlying type is known. To incorporate such functions we interpretan underlying type environment Γ as containing trivial refinement types for each such function, i.e.each d occurring in such a type Γ denotes the refinement of d that makes available all constructors. Definition 14 (Typability).
A triple ∆ , Γ and m constitutes a positive instance of the refinementtypability problem just if there is a refinement type environment Γ ′ such that Γ ⊢ m : Γ ′ . In such acase, we say that ∆ ; Γ ⊢ m is refinement typable .The rest of the paper concerns the algorithmic solution of the typability problem. We assume a countable set of refinement variables , ranged over by X , Y , Z and so on. The purposeof a refinement variable X is to represent a function in Π d ∈ D . P( dom ∆ ( d )) . As described inSection 3, such functions are in 1-1 correspondence with refinements of ∆ . We will abuse notationand write X for both uses (thus the following rather strange-looking equation X ( d ) = dom ( X ( d )) holds by interpreting each of the two occurrences of X according to its context.) Definition 15 (Constraints). A constructor set expression , typically S , is either a finite set of con-structors { k , . . . , k m } or a pair X ( d ) consisting of a refinement variable X and an underlyingdatatype d . The underlying type of the constructor set expression is (partially) defined as follows: U( X ( d )) = d U({ k , . . . , k m }) = d if ∀ i ∈ [ .. m ] . k i ∈ dom ∆ ( d ) We consider only those constructor set expressions for which the underlying type is defined. Wewrite
FRV ( S ) for the set of refinement variables occurring anywhere in S (which will either beempty or a singleton).An inclusion constraint is an ordered pair of constructor set expressions, written (suggestively)as S ⊆ S . When S is a singleton { k } , we will rather write the pair as k ∈ S . We shall only considerinclusion constraints in which both set expressions have the same underlying type. The refinementvariables of an inclusion constraint FRV ( S ⊆ S ) are defined by extension from FRV ( S ) and FRV ( S ) in the obvious way.A conditional constraint , hereafter just constraint , is a pair ϕ ? S ⊆ S consisting of a set ofinclusion constraints ϕ and an inclusion constraint S ⊆ S . The set ϕ is called the guard and theinclusion S ⊆ S the body . We will only consider conditional constraints in which each element ofthe guard has shape k ∈ X ( d ) . When the guard of a constraint ∅ ? S ⊆ S is trivial, we shall usuallyomit it and write only the body S ⊆ S . The set of refinement variables FRV ( ϕ ? FRV ( S ⊆ S )) ofa constraint is defined as usual.Sometimes we shall guard a constraint set C , and write ϕ ? C for the set { ψ ∪ ϕ ? S ⊆ S | ψ ? S ⊆ S an element of C } . We write FRV ( C ) for the set of refinement variables occurring in C .Intuitively, an inclusion S ⊆ S is satisfied by any assignment to the refinement variables thatmakes S included in S . A constraint ϕ ? S ⊆ S is satisfied if either some inclusion in the guardis not satisfied or the body is satisfied. :14 Eddie Jones and Steven Ramsay(ISBase) ⊢ b ⊑ b = ⇒ ∅ (ISTyVar) ⊢ α ⊑ α = ⇒ ∅⊢ T ⊑ T = ⇒ C ⊢ T ⊑ T = ⇒ C (ISArr) ⊢ T → T ⊑ T → T = ⇒ C ∪ C ( ∀ ki . ) ⊢ inj X ( U i )[ T X / α ] ⊑ inj Y ( U i )[ T Y / α ] = ⇒ C k i (ISData) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ∆ ( d )( k ) = ∀ α . U → · · · U n → d αC = { X ( d ) ⊆ Y ( d )} ∪ Ð k Ð ni = ( k ∈ X ( d )) ? C k i ⊢ inj X d T X ⊑ inj Y d T Y = ⇒ C Fig. 6. Inference for subtype inequalities.
Definition 16 (Satisfaction). A constructor set assignment , hereafter just assignment , is a total map θ taking each refinement variable X to a constructor choice function from Π d ∈ D . P( dom ∆ ( d )) .The meaning of a constructor set expression S under an assignment θ is a set of constructors θ J S K defined as follows: θ J X ( d ) K = θ ( X )( d ) θ J { k , . . . , k m } K = { k , . . . , k m } An inclusion constraint S ⊆ S is satisfied by an assignment θ , written θ | = S ⊆ S just if θ J S K isincluded in θ J S K . A constraint ϕ ? S ⊆ S is satisfied by an assignment θ , written θ | = ϕ ? S ⊆ S just if, whenever θ | = k ∈ X ( d ) for every inclusion constraint k ∈ X ( d ) in ϕ , then θ | = S ⊆ S . Definition 17 (Solutions). A solution to a constraint set C is an assignment θ satisfying everyconstraint in C , we write θ | = C . We say that C is solvable , or satisfiable , just if it has a solution. Remark 1.
The full set constraint language is exactly the monadic class of first-order proposi-tions [8]. By applying the translation of that paper, it can be shown that guarded constraints of theform laid out above are (monadic) Horn clauses with constructors simply interpreted as constants.
Since our system is effectively syntax directed (the subsumption rule can be factored into theother syntax-directed rules), type inference follows a standard pattern of constraint generationand satisfiability checking (see e.g. [31]). The constraints are subtype inequalities over refinementvariables, but it is easily seen that, in our restricted setting, such inequalities are equivalent toconditional inclusion constraints between refinement variables and sets of datatype constructors.To enable this approach, we extend the language of types so to allow datatypes parametrised byrefinement variables.
Definition 18 (Extended Types).
The extended types are monotypes extended with datatypes builtover refinement variables: T , U , V F · · · | inj X d T Note that the type arguments to an injected datatype indentifier are also extended. Expressionsof the form ( inj X List )( inj Z Int ) are, therefore, well-formed. Recall from Section 3 that refinementdatatype identfiers are of the form inj ∆ d , with d an underlying datatype identifier, and should be ntensional Datatype Refinement 1:15 thought of as specifying the refinement of d whose datatype definition is given by ∆ . The task ofinference is to determine constraints on these ∆ that enable a typing to be assigned and check thatthe constraints have a solution.For convience, we shall implicitly lift injections to any type, or sequence of types, written inj X T ,so that the injection is distributed over datatypes in T . In the context of extended types, we willassociate a substitution action θT with each constructor set assignment θ by lifting the definition θ ( inj X d ) ≔ inj θ ( X ) d homomorphically over all extended types. Finally, we write FRV ( T ) for the set of refinement vari-ables occurring in injections in T .We also adopt an extension of type schemes that are constrained: Definition 19 (Constrained Type Scheme).
We subsume the type scheme S by the constrained typescheme , which has shape: ∀ α . ∀ X . C ⊃ T , where C is a constraint set and T is an extended type .We define FRV ( ∀ α . ∀ X . C ⊃ T ) = ( FRV ( C ) ∪ FRV ( T )) \ X . A constrained type environment is afinite mapping from program variables to constrained type schemes, whose elements are written x : S . We define FRV ( Γ ) in the obvious way.As is typical, there is generally no “best” monotype solution to a set of inclusion constraints,so constrained type schemes give us an internal representation for the set of all types assignableto a module-level function. For example, assuming constant combinator K defined as usual, it canbe seen that the module-level recursive function f = λx : Lam . K [ Lam , Lam ] x ( f ( f x )) can beassigned the constrained type scheme : ∀ XY . C ⊃ inj X Lam → inj Y Lam , with C : X ( Lam ) ⊆ Y ( Lam ) Y ( Lam ) ⊆ X ( Lam ) Cst ∈ X ( Lam ) ? X ( Arith ) ⊆ Y ( Arith ) Cst ∈ Y ( Lam ) ? Y ( Arith ) ⊆ X ( Arith ) Intuitively, its input flows to its output and conversely, so we require inj X Lam ⊑ inj Y Lam and inj Y Lam ⊑ inj X Lam (which is encoded by the above set constraints when we view the refinements X and Y as functions specifying the choice of constructors). However, there is obviously no “best”instantiation of refinement variables X and Y .Constrained type environments can be understood as compact descriptions of “ordinary” typeenvironments (in the sense of Definition 12), which is made precise as follows. Definition 20.
Define L Γ M for for the type environment that can be obtained from the closed con-strained type schemes in Γ , by instantiation of refinement quantifiers with every possible solution,that is, supposing Γ is closed: L Γ M ≔ { x : ∀ α . θT | θ | = C ∧ ( x : ∀ α . ∀ X . C ⊃ T ) ∈ Γ } Typical presentations of type inference by constraint generation involve choosing fresh typevariables, which are then constrained. Since we work with refinement types, it is more convenientto choose fresh refinement type templates, which are just refinement types that are everywhereparametrised by fresh refinement variables — in the setting of refinement types, at the point atwhich inference would choose a fresh type, the underlying shape of the type is already known.We write
Fresh ( X ) to assert that X must be a fresh refinement variable (i.e. not already used in thecurrent scope). We extend the notion to fresh types T of underlying shape T . Definition 21 (Fresh Types).
We write
Fresh T ( T ) for the following inductive predicate. • For all α ∈ A , Fresh α ( α ) . :16 Eddie Jones and Steven Ramsay(ICst) Γ ⊢ c : T = ⇒ T , ∅ (ICon) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) k ∈ dom ( ∆ ( d )) Fresh ( X ) and Fresh T ( T ) Γ ⊢ k T : V = ⇒ inj X V T , { k ∈ X ( d )} (IVar) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) x : ∀ α . ∀ X . C ⊃ U ∈ Γ Fresh ( Y ) and Fresh T ( T ) Γ ⊢ x T : V = ⇒ U [ Y / X ][ T / α ] , C [ Y / X ] Γ ∪ { x : T } ⊢ e : T = ⇒ T , C (IAbs) (cid:12)(cid:12)(cid:12) Fresh T ( T ) Γ ⊢ λx : T . e : T → T = ⇒ T → T , C Γ ⊢ e : T → T = ⇒ T → T , C Γ ⊢ e : T = ⇒ T , C ⊢ T ⊑ T = ⇒ C (IApp) Γ ⊢ e e : T = ⇒ T , C ∪ C ∪ C ( ∀ i ≤ m ) ⊢ T i ⊑ T = ⇒ C ′ i Γ ⊢ e : d T = ⇒ inj X d T , C ( ∀ i ≤ m ) Γ ∪ x i : ( inj X A )[ T / α ] ⊢ e i = ⇒ T i , C i (ICase) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) Fresh T ··· T ( T · T i ) C = C ∪ { X ( d ) ⊆ { k , . . . , k m }}∪ Ð mi = ( k i ∈ X ( d ) ? ( C i ∪ C ′ i )) ∆ ( d )( k i ) = ∀ α . A → · · · A n → d α Γ ⊢ case e of {| mi = k i x i e i } : T = ⇒ T , C Fig. 7. Inference for expressions. • For all b ∈ B , Fresh b ( b ) . • For d ∈ D , if Fresh T ( T ) for every T in T , and Fresh ( X ) then Fresh d ( inj X ( d ) T )• For all T , T ∈ Ty D ∗ , T , T ∈ Ty D , if Fresh T ( T ) and Fresh T ( T ) then Fresh T → T ( T → T ) .The definition guarantees that U( T ) = T . We extend the notion to sequences of types, writing Fresh T ( T ) to denote that the two sequences T and T have the same length and are related point-wise by freshness.We now have all the necessary structure to describe the constraint generation algorithm. Definition 22 (Inference).
Inference is split into three parts: for subtyping (Figure 6), for expres-sions (Figure 7) and for modules (Figure 8) using three judgement forms, respectively: ⊢ T ⊑ T = ⇒ C Γ ⊢ e : T = ⇒ T , C Γ ⊢ m = ⇒ Γ ′ , C Given two (extended) types T and T we infer a set of constraints C under which the former will bea subtype of the latter using the system of judgements ⊢ T ⊑ T = ⇒ C . For expressions in context Γ ⊢ e : T , we infer (extended) monotypes T and the constraints C under which they are permissible ntensional Datatype Refinement 1:17(IModE) Γ ⊢ ϵ = ⇒ ΓΓ ⊢ m = ⇒ Γ ′ Γ ′ ∪ { x : T } ⊢ e = ⇒ T ′ , C ⊢ T ′ ⊑ T = ⇒ C (IModD) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) Fresh T ( T ) X = FRV ( T ) Γ ⊢ m · h x : ∀ α . T = Λ α . e i = ⇒ Γ ′ ∪ { x : ∀ α . ∀ X . C ∪ C ⊃ T } Fig. 8. Inference for modules. using a system of judgements of the form Γ ⊢ e : T = ⇒ T , C . In such judgements, Γ a constrainedtype environment, i.e. a finite map from term variables to constrained type schemes. We will omitthe underlying type when not important. The rules are given in Figure 7. Constrained refinementtype schemes are inferred for module-level definitions using a system of judgements of shape Γ ⊢ m = ⇒ Γ ′ , C . The definitions are given in Figure 8. The systems can be read algorithmically byregarding the quantities before the = ⇒ as inputs the quantities afterwards as outputs (however, itshould be noted that, assuming regular datatypes only, the subtyping relation must be computedby achieving a fixed point explicitly).Constrained type generation via these systems of rules follows a well established pattern forexpressions and modules (see e.g. [31] for a general treatment of the non-refinement case), sowe concentrate on the inference rules for subtyping. Like the more standard inference rules forexpressions and modules, the inference rules for subtyping generate a derivation tree and a systemof constraints whose solution guarantees the correctness of the corresponding instance of thederivation tree. However, in the case of subtyping, the derivation tree is not a proof in the systemof Figure 3, which is for the complement of the subtyping relation, but rather a proof that thesolution constitutes a simulation in the sense of Lemma 2. For example, the conclusion of (ISData) yields the constraints { X ( d ) ⊆ Y ( d )} ∪ Ð k ∈ K Ð ni = ( k ∈ X ( d )) ? C k i . The first part of this constraintencodes the first bullet of Lemma 2: the environment ∆ ∗ at inj Y d must include all constructorsincluded by the same environment at inj X d . Since, for any refinement X , dom ( ∆ ∗ ( inj X d )) = X ( d ) (recall the notational abuse adopted at the start of Section 6), we arrive at X ( d ) ⊆ Y ( d ) . Thesecond part of this constraint encodes the second bullet of the lemma: if k ∈ dom ( ∆ ∗ ( inj X d )) (and, therefore, k ∈ dom ( ∆ ∗ ( inj Y d )) ), then the corresponding argument types are again related —in inference we recursively infer constraints on the relationship between the types and guard theconstraints by k ∈ X ( d ) . Theorem 23 (Soundness and completeness of ⊑ -inference). Let T and T be extended types andsuppose ⊢ T ⊑ T = ⇒ C . Then, for all assignments θ : ⊢ θT ⊑ θT iff θ | = C The following states the correctness of type inference for expressions in a closed environment(e.g. for module-level definitions). The appendix contains a proof for the general case.
Theorem 24 (Soundness and completeness of expression inference).
Let Γ be a constrained typeenvironment, e an expression, V an extended refinement, C a set of constraints and let Γ ⊢ e = ⇒ V , C . Then, for all refinement types T : L Γ M ⊢ e : T iff ∃ θ . θ | = C ∧ ⊢ θV ⊑ T Finally, we can state the overall correctness of inference for modules. :18 Eddie Jones and Steven Ramsay ϕ ? S ⊆ S ψ ? S ⊆ S (Transitivity) ϕ ∪ ψ ? S ⊆ S ϕ ? k ∈ X ( d ) ψ , k ∈ X ( d ) ? S ⊆ S (Satisfaction) ϕ ∪ ψ ? S ⊆ S ϕ ? X ( d ) ⊆ Y ( d ) ψ , k ∈ Y ( d ) ? S ⊆ S (Weakening) ϕ ∪ ψ , k ∈ X ( d ) ? S ⊆ S Fig. 9. Saturation rules.
Theorem 25 (Soundness and completeness of module inference).
Suppose Γ and Γ ′ are closed con-strained type environments, m a module and Γ ⊢ m = ⇒ Γ ′ . Then, for all type environments ∆ : L Γ M ⊢ m : ∆ iff L Γ ′ M ⊑ ∆ The solvability of constraints can be determined by a process of saturation under all possible con-sequences. This is a generalisation of the transitive closure of simple inclusion constraint graphs,and a particular instance of Horn clause resolution more generally. For our constraint language,saturated constraint sets have a remarkable property: they can be restricted to any subset of theirvariables whilst preserving solutions.
Definition 26 (Atomic constraints).
A constraint is said to be atomic just if it’s body is one of thefollowing four shapes: X ( d ) ⊆ Y ( d ) X ( d ) ⊆ { k , . . . , k m } k ∈ X ( d ) k ∈ ∅ An atomic constraint is said to be trivially unsatisfiable if it is of shape ∅ ? k ∈ ∅ . A constraint setis said to be trivially unsatisfiable just if it contains a trivially unsatisfiable constraint.By applying standard identities of basic set theory, every constraint is equivalent to a set ofatomic constraints. In particular, a constraint of the form k ∈ { k , . . . , k m } is equivalent to theempty set of atomic constraints (i.e. can be eliminated) whenever k is one of the k i . Definition 27 (Saturated constraint sets).
An atomic constraint set, i.e. one that only containsatomic constraints, is said to be saturated just if it is closed under the saturation rules in Figure 9.We write
Sat ( C ) for the saturated atomic constraint set obtained by iteratively applying the satu-ration rules to the set C . Remark 2.
From the perspective of Remark 1, all three rules of Figure 9 correspond to specialcases of resolution.The (Transitivity) rule closes subset inequalities under transitivity, but must keep track of theassociated guards by taking the union. The (Satisfaction) rule allows for a guard atom k ∈ X ( d ) tobe dropped whenever the same atom constitutes the body of another constraint in the set (but theother guards from both must be preserved). Finally, the (Weakening) rule allows for replacing Y ( d ) in a guard by X ( d ) when it is known to be no larger, thus weakening the constraint. Saturationunder these rules preserves and reflects solutions: ntensional Datatype Refinement 1:19 Theorem 28 (Saturation equivalence).
For any assignment θ , θ | = C iff θ | = Sat ( C ) .If there are no trivially unsatisfiable constraints in Sat ( C ) , then a solution can be constructed asfollows. For each variable X occurring in C , define a function θX as follows: ( θX )( d ) ≔ { k | k ∈ X ( d ) is in Sat ( C )} Then θ solves Sat ( C ) . Conversely, if there is a trivially unsatisfiable constraint in Sat ( C ) , then Sat ( C ) is unsolvable and, by the equivalence theorem, it follows that C has no solution either. Theorem 29. C is unsatisfiable iff Sat ( C ) is trivially unsatisfiable.Hence, solvability can be determined by saturation. In practice, having established that a constraint set is solvable, we are only interested in the so-lutions for a certain subset of the refinement variables. For example if, as we have seen, the con-straints C describe a set of types { Tθ | θ solves C } of the module level functions, then we mayconsider two solutions θ and θ to be the same whenever they agree on FRV ( T ) . We call the freerefinement variables of T , in this case, the interface variables . Definition 30.
Let C be a saturated constraint set and let I be some set of refinement variables,called the interface variables . Then define the restriction of C to I , written C ↾ I , as the set { ϕ ? S ⊆ S ∈ C | FRV ( ϕ ) ∪ FRV ( S ) ∪ FRV ( S ) ⊆ I } .The restriction of C to I is quite severe, since it simply discards any constraint not solely com-prised of interface variables. However, a remarkably strong property of the rules in Figure 9 is that,whenever C is solvable, every solution of Sat ( C ) ↾ I may be extended to a solution of Sat ( C ) (andtherefore of C ) — independent of the choice of I ! Since every solution of Sat ( C ) trivially restrictsto a solution of Sat ( C ) ↾ I (the latter has fewer constraints over fewer variables), it follows that thesolutions of Sat ( C ) ↾ I are exactly the restrictions of the solutions of C . Example 31.
Consider the following constraint set C by way of an illustration: ⋆ Cst ∈ X ( Lam ) X ( Lam ) ⊆ X ( Lam ) FVr ∈ X ( Lam ) ? X ( Lam ) ⊆ X ( Lam ) ⋆ X ( Lam ) ⊆ {
FVr , Cst } This set is not saturated and, consequently, there is no guarantee that the restriction of this set toan interface results in a constraint system whose solutions can generally be extended to solutionsof the original set C . For example, if we restrict this set to the interface I = { X , X } , the effect willbe to retain the two starred constraints. But then θ ( X )( d ) ≔ ( { Cst , FVr , App } if X = X and d = Lam ∅ otherwise In the sense of Theorem 32. :20 Eddie Jones and Steven Ramsay is a solution of C ↾ I that does not extend to any solution of C . However, after saturation, Sat ( C ) consists of the following: ⋆ Cst ∈ X ( Lam ) X ( Lam ) ⊆ X ( Lam ) Cst ∈ X ( Lam ) FVr ∈ X ( Lam ) ? X ( Lam ) ⊆ X ( Lam ) FVr ∈ X ( Lam ) ? X ( Lam ) ⊆ X ( Lam ) FVr ∈ X ( Lam ) ? Cst ∈ X ( Lam ) FVr ∈ X ( Lam ) ? X ( Lam ) ⊆ X ( Lam ) ⋆ FVr ∈ X ( Lam ) ? X ( Lam ) ⊆ X ( Lam ) ⋆ FVr ∈ X ( Lam ) ? Cst ∈ X ( Lam ) ⋆ X ( Lam ) ⊆ {
FVr , Cst } FVr ∈ X ( Lam ) ? X ( Lam ) ⊆ {
FVr , Cst } FVr ∈ X ( Lam ) ? X ( Lam ) ⊆ {
FVr , Cst } FVr ∈ X ( Lam ) ? X ( Lam ) ⊆ {
FVr , Cst } ⋆ FVr ∈ X ( Lam ) ? X ( Lam ) ⊆ {
FVr , Cst } In particular, the constraint
FVr ∈ X ( Lam ) ? X ( Lam ) ⊆ X ( Lam ) which will be retained in therestriction Sat ( C ) ↾ I , which consists of the starred constraints. Consequently, the above assignment θ is ruled out. Indeed, one can easily verify that every solution of Sat ( C ) ↾ I extends to a solutionof Sat ( C ) and hence of C . Theorem 32 (Restriction/Extension).
Suppose C is a saturated constraint set and I is a subset ofvariables. Let θ be a solution for C ↾ I . Then there is a solution θ ′ for C satisfying, for all X ∈ I : θ ′ ( X )( d ) = θ ( X )( d ) Although our inference procedure is compositional, i.e. it breaks modules down into top-leveldefinitions, and terms down into sub-terms that can be analysed in isolation, this is no guaranteeof it’s efficiency. As we have described it in Section 7, the number of constraints associated witha function definition depends on the size of the definition — constraints are generated at mostsyntax nodes and propagated to the root. In fact, as is well known for constrained type inference,the situation is worse than simply this, because a whole set of constraints is imported from theenvironment when inferring for a program variable x . The number of constraints associated with x will again depend on the size of the definition of x and the number of constraints associatedwith any functions that x depends on, and so the number of constraints can become exponentialin the number of function definitions .Let us fix N to be the number of module-level function definitions, K the maximum number ofconstructors associated with any datatype and D the maximum number of datatypes associatedwith any slice (for Lam , this is 2). A simple analysis of the shape of constraints yields the followingbound:
Lemma 3.
There are O( Kv D · vKD + K ) atomic constraints over v refinement variables.Suppose Γ ⊢ e = ⇒ T , C . As it stands, the number of variables v occurring in C will dependupon the size of e and the size of every definition that e depends on. We use restriction to breakthis dependency between the size of constraint sets and the size of the program. We compute C ↾ I for each constraint set C generated by an inference (i.e. at every step) with the interface I taken Although it is known that this can be avoided by a clever representation in the case of constraints that are only simplevariable/variable inclusions [20]. ntensional Datatype Refinement 1:21 to be the free variables of the context
FRV ( Γ ) ∪ FRV ( T ) . Consequently, the number of variables v occurring in C ↾ I only depends on the number of refinement variables that are free in Γ and T . Since all the refinement variables of module level function types are generalised by (IModD) (assuming, as is usual, that inference for modules occurs in a closed environment), it follows thatthe only free refinement variables in Γ are those introduced during the inference of e , as a resultof inferring under abstractions and case expressions. Thus v becomes independent of the numberof function definitions.Moreover, if we assume that function definitions are in β -normal form and that the maximumcase expression nesting depth within any given definition is fixed (i.e. does not grow with the sizeof the program) then it follows that the number of refinement variables free in the environment isbounded by the size of the underlying type of e . Clearly, the number of free refinement variablesin T is also bounded by its underlying type.Consequently, for a constraint set C arising by an inference Γ ⊢ e = ⇒ T , C and then restricted toits context, the number of constraints given by Lemma 3 only depends on the size of the underlyingtypes assigned to the e and, in the case of datatypes, the size of their definitions (slices). If weconsider scaling our analysis to larger and larger programs to mean programs consisting of moreand more functions, with bounded growth in the size of types and the size of individual functiondefinitions then we may reasonably consider all these parameters fixed. Consequently: Theorem 33.
Under the assumption that the size of types and the size of individual functiondefinitions is bounded, the complexity of type inference is O( N ) .A more fine grained analysis is given in an appendix.
10 IMPLEMENTATION
We implemented a prototype of our inference algorithm for Haskell as a GHC plugin. The usercan run our type checker as another stage of compilation with an additional command line flag. Itis available from:https://github.com/bristolpl/intensional-datatysIn addition to running the type checker on individual modules, an interface binary file is generated,enabling other modules to use the constraint information in separate compilations.Our plugin processes GHC’s core language [40], which is significantly more powerful than thesmall language presented here. Specifically, it must account for higher-rank types (including exis-tentials), casts and coercions, type classes. We have not implemented a treatment of these featuresin our prototype and so any occurrences are not analysed. Furthermore, we disallow empty re-finements of single-constructor datatypes (e.g. records). This relatively small departure from thetheory is a substantial improvement to the efficiency of the tool due to the number of records andnewtypes that are found in typical Haskell programs.Since we do not analyse the dependencies of packages, datatypes that are defined outside thecurrent package are treated as base types and not refined The resulting analysis provides a certifi-cate of safety for some package modulo the safe use of its dependencies.In addition to missing cases, the tool uses the results of internal analyses in GHC to identifypattern matching cases that will throw an exception. For example, the following code will be con-sidered as potentially unsafe: nnf2dnf (Lit a) = [[a]]nnf2dnf (Or p q) = List.union (nnf2dnf p) (nnf2dnf q)nnf2dnf (And p q) = distrib (nnf2dnf p) (nnf2dnf q)nnf2dnf _ = error "Impossible case!" :22 Eddie Jones and Steven Ramsay
We recorded benchmarks on a 2.20GHz Intel ® Core ™ i5-5200U with 4 cores and 8.00GB RAM. Weused the following selection of projects from the Hackage database: • aeson is a performant JSON serialisation library. • The containers package provides a selection of classic functional data structures such assets and finite maps. The Data.Sequence module from this package contains machine gener-ated code that lacks the typical modularity and structure of hand written code. For example,it contains an automatically generated set of 6 mutually recursive functions, each with acomplex type and deeply nested matching. The corresponding interface is in excess of 80refinement variables. This module could not be processed to completion in a small amountof time and so we have omitted it from the results. We will look explore how best to processexamples that violate our complexity assumptions in follow-up work. • extra is a collection of common combinators for datatypes and control flow. • fgl (Functional Graph Library) provides an inductive representation of graphs and associ-ated operations. • haskeline is a command-line interface library • parallel is Haskell’s default library for parallel programming • sbv is an SMT based verification tool for automatically proving properties of Haskell pro-grams. • The time library contains several representations of time, clocks and calendars. • unordered-containers provides hashing-based containers, for either performant code ordatatypes without a natural ordering.For each module we recorded the average time elapsed in milliseconds across 10 runs and thenumber of top-level definitions (N). We note both the total number of refinement variables gener-ated during inference (V) and the largest interface (I). The contrast between these two figures givessome indication of how intractable the analysis may become be without the restriction operator.Naturally, constant factors will vary considerably between modules (not in correspondence withtheir size) and so our results also include the number of constructors (K) that appear in the largestdatatype, and the number of datatypes (D) in the largest slice.The benchmarks in Figure 1 provide a summary of the results for each project, i.e. the total timetaken , the total number of top-level definitions, the total number of refinement variables, themaximum interface size, the largest number of constructors associated with a datatype, and thelargest slice. The full dataset can be found in the appendices.
11 CONCLUSION AND RELATED WORK
The goal of our system is to automatically, statically verify that a given program is free of patternmatch exceptions, and we have phrased it as a type inference procedure for a certain refinementtype system with recursive datatype constraints. We have shown that it works well in practice,although a more extensive investigation is needed. Our primary motivation has been to ensurepredictability by giving concrete guarantees on its expressive power and algorithmic complexity.
Our work sits within a large body of lit-erature on recursive types and subtyping. As a type system, ours is not directly comparable toothers in the literature: on the one hand, the intensional refinement restriction is quite severe, The total time taken is the sum of the time taken to analyse each module independently it therefore doesn’t include startup costs etc. As our tool is just another phase in the GHC compiler, it would be unfair to record the full compilation timewhich is predominantly due to GHC. ntensional Datatype Refinement 1:23
Name N K V D I Time (ms)aeson 728 13 20466 6 14 79.37containers 1792 5 25237 2 23 118.26extra 332 3 5438 3 7 61.53fgl 700 2 18403 2 12 94.32haskeline 1384 15 29389 19 27 111.67parallel 110 1 959 2 18 10.18pretty 222 8 3675 4 16 23.86sbv 5076 44 171869 49 46 518.91time 484 7 9753 6 10 134.16unordered-containers 474 5 7761 3 24 30.56
Table 1. Benchmark Summaries but on the other we allow for flow sensitivity. One of the first works to consider subtyping inthe setting of recursive types was that of Amadio and Cardelli [7]. They proposed an exponentialtime procedure for subtype checking, but this was later improved to quadratic by Kozen, Pals-berg, and Schwartzbach [29]. Neither of these works gave a treatment of the combination withpolymorphism, which is the subject of e.g. Castagna and Xu [11], Dolan and Mycroft [12], Hoangand Mitchell [25], Pottier [34]. However, to the best of our knowledge, all the associated typeinference algorithms are exponential time in the size of the program. In particular, Hoang andMitchell [25] shows that a general formulation of typing with recursive subtyping constraints hasa PSPACE-hard typability problem. However, we mention as a counterpoint that when constraintsare restricted to simple variable-variable inequalities, Gustavsson and Svenningsson [20] show thatthere is a cubic-time algorithm. Being based on unification, inference for polymorphic variants isefficient [19], but Castagna, Petrucciani, and Nguyen [10] point out instances where programmersfind the results to be unpredictable. None of the above works allow for flow sensitive treatment ofmatching.Our main inspiration has been the seminal body of literature of work on set constraints inprogram analysis, see particularly Aiken, Wimmers, and Lakshman [5], Aiken [1] and Heintze[24], and in particular, the line of work on making the cubic-time fragments scale in practice[15, 16, 22, 39]. Through an impressive array of sophisticated optimisations, the fragment can bemade to run efficiently on many programs. However, the fundamental worst-case complexity isnot changed and implementing and tuning heuristics requires a large engineering effort. Moreover,this fragment does not accommodate flow sensitivity.Many of the analyses and or type inference procedures discussed so far are compositional, i.e.parts of the program are analysed independently to yield summaries of their behaviour and thenthe summaries are later combined. However, it has been frequently observed that compositionalitydoes not lead to scalability if the summaries are themselves large and complicated. In particular,it is not uncommon for “summaries” that grow with the square of the size of the program in theworst case. This has led to many works that attempt to simplify summaries, typically accordingto ingenious heuristics [6, 12, 15, 17, 35, 36, 42]. Since our primary motivation was predictability,we have designed our system so that heuristics are avoided : in particular the size of summaries(i.e. constrained type schemes) only depends only on the size of the underlying types and not Heuristic-based optimisations can be the enemy of predictability since small changes in the program can lead to greatchanges in performance if the change causes the program to fall outside of the domain on which the heuristic is tuned. :24 Eddie Jones and Steven Ramsay the size of the program. It is plausible that many of these heuristic optimisations are neverthelessapplicable in order to help improve the overall efficiency.
Refinement types originate with the works of Freeman and Pfenning[18] and Xi and Pfenning [47]. Their distinguishing feature is that they attempt to assign typesto program expressions for which an underlying type is already available. Typically, as here, therefinement type is also required to respect the shape of the underlying type. One can use thisrestriction, as in loc cit to ensure some independence of the the size of the type from the size ofthe program. However, as remarked in the final section, the constant factors are enormous sincethere is unrestricted intersection and union of refinements of the same underlying type which isrepresented explicitly.The work of Freeman and Pfenning [18] requires that the programmer declare the universe ofrefinement types up-front (where our universe is determined automatically as a completion of theunderlying datatype environment). A disadvantage of this requirement is that it burdens the pro-grammer with a kind of annotation that they would rather not have to clutter their program with,in many simple cases. A great advantage is that, by defining a refinement datatype explicitly, theprogrammer can indicate formally in the code her intention that a certain invariant is (somehow)important within a certain part of the program. It seems like a very fruitful idea to allow the pro-grammer this freedom also in our system and we are actively working on an extension to allowfor this as part of our future work. In particular, we would like to take advantage of several newadvances in this line that relieve a lot of programmer burden Dunfield [13, 14].An incredibly fruitful recent evolution of refinement types are the Liquid Types of Rondon,Kawaguci, and Jhala [38] (see especially Vazou, Bakst, and Jhala [44] for a version with constrainedtype schemes) and similar systems (e.g. those of Terauchi [41], Unno and Kobayashi [43]). Suchtechnology is already accessible to the benefit of the average programmer through the LiquidHaskell system of Vazou, Seidel, Jhala, Vytiniotis, and Peyton-Jones [45]. Due to the rich expressivepower of these systems, which typically include dependent product, efficient and fully-automatictype inference is not typically a primary concern and predictability can be ensured by liberal useof annotations.
The pattern match safety problem was also ad-dressed by Mitchell and Runciman [30], which was used to verify a number of small Haskell pro-grams and libraries. The expressive power and algorithmic complexity are, however, unclear.Safety problems are within the scope of higher-order model checking Kobayashi [26], Kobayashiand Ong [27], Ong [33] and a system for verifying pattern match safety, built on higher-ordermodel checking was presented in [32]. Higher-order model checking approaches reduce verifica-tion problems to model checking problems on a certain infinite tree generated by a higher-ordergrammar. Although the higher-order model checking problem is linear-time in the size of thegrammar, the constant factors are enormous because, formally, it is n -EXPTIME complete (towerof exponentials of height n ) with n the type-theoretic order of the functions in the grammar. More-over, many of the transformations from program to grammar incur a large blow-up in size. Twopromising evolutions of higher-order model checking are the approach of Kobayashi, Tsukada, andWatanabe [28] based on the higher-order fixpoint logic of Viswanathan and Viswanathan [46] andthe approach of Burn, Ong, and Ramsay [9] based on higher-order constrained Horn clauses. ntensional Datatype Refinement 1:25 ACKNOWLEDGMENTS
We gratefully acknowledge the support of the Engineering and Physical Sciences Research Council(EP/T006579/1) and the National Centre for Cyber Security via the UK Research Institute in Veri-fied Trustworthy Software Systems. We thank our colleague Matthew Pickering for a lot of goodHaskell advice and for helping us safely navigate the interior of the Glasgow Haskell Compiler.
REFERENCES [1] Alexander Aiken. 1999. Introduction to set constraint-based program analysis.
Science of Computer Programming
Proceed-ings of the Seventh Annual Symposium on Logic in Computer Science (LICS ’92), Santa Cruz, California, USA, June 22-25,1992 . 329–340.[3] Alexander Aiken and Edward L. Wimmers. 1993. Type Inclusion Constraints and Type Inference. In
FPCA . 31–41.[4] Alexander Aiken, Edward L. Wimmers, and T. K. Lakshman. 1994. Soft Typing with Conditional Types. In
ConferenceRecord of POPL’94: 21st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, Portland, Oregon,USA, January 17-21, 1994 . 163–173.[5] Alexander Aiken, Edward L. Wimmers, and T. K. Lakshman. 1994. Soft typing with conditional types. In
Proceedingsof the 21st ACM SIGPLAN-SIGACT symposium on Principles of programming languages . Association for ComputingMachinery, 163–173. https://doi.org/10.1145/174675.177847[6] Alexander Aiken, Edward L. Wimmers, and Jens Palsberg. 1999. Optimal Representations of Polymorphic Types withSubtyping.
Higher-Order and Symbolic Computation
12, 3 (1999), 237–282. https://doi.org/10.1023/A:1010056315933[7] Roberto M. Amadio and Luca Cardelli. 1993. Subtyping recursive types.
ACM Trans. Program. Lang. Syst.
15, 4 (1993),575–631. https://doi.org/10.1145/155183.155231[8] Leo Bachmair, Harald Ganzinger, and Uwe Waldmann. 1993. Set constraints are the monadic class. In [1993] Proceed-ings Eighth Annual IEEE Symposium on Logic in Computer Science . IEEE, 75–83.[9] Toby Cathcart Burn, C.-H. Luke Ong, and Steven J. Ramsay. 2017. Higher-order constrained horn clauses for verifi-cation.
Proc. ACM Program. Lang.
2, POPL (2017), Article 11. https://doi.org/10.1145/3158099[10] Giuseppe Castagna, Tommaso Petrucciani, and Kim Nguyen. 2016. Set-theoretic types for polymorphic variants. In
Proceedings of the 21st ACM SIGPLAN International Conference on Functional Programming . Association for ComputingMachinery, 378–391. https://doi.org/10.1145/2951913.2951928[11] Giuseppe Castagna and Zhiwu Xu. 2011. Set-theoretic foundation of parametric polymorphism and subtyping. In
Proceedings of the 16th ACM SIGPLAN international conference on Functional programming . Association for ComputingMachinery, 94–106. https://doi.org/10.1145/2034773.2034788[12] Stephen Dolan and Alan Mycroft. 2017. Polymorphism, subtyping, and type inference in MLsub. In
Proceedings ofthe 44th ACM SIGPLAN Symposium on Principles of Programming Languages . Association for Computing Machinery,60–72. https://doi.org/10.1145/3009837.3009882[13] Joshua Dunfield. 2007. Refined typechecking with Stardust. In
Proceedings of the 2007 workshopon Programming languages meets program verification . Association for Computing Machinery, 21–32.https://doi.org/10.1145/1292597.1292602[14] Joshua Dunfield. 2017. Extensible Datasort Refinements. In
European Symposium on Programming Languages andSystems , Hongseok Yang (Ed.). Springer Berlin Heidelberg, 476–503.[15] Manuel Fähndrich and Alexander Aiken. 1996. Making Set-Constraint Based Program Analyses Scale. In
First Work-shop on Set Constraints at CP’96 .[16] Manuel Fähndrich, Jeffrey S. Foster, Zhendong Su, and Alexander Aiken. 1998. Partial online cycle elimination ininclusion constraint graphs. In
Proceedings of the ACM SIGPLAN 1998 conference on Programming language design andimplementation . Association for Computing Machinery, 85–96. https://doi.org/10.1145/277650.277667[17] Cormac Flanagan and Matthias Felleisen. 1999. Componential set-based analysis.
ACM Trans. Program. Lang. Syst.
21, 2 (1999), 370–416. https://doi.org/10.1145/316686.316703[18] Tim Freeman and Frank Pfenning. 1991. Refinement types for ML. In
Proceedings of the ACM SIGPLAN 1991conference on Programming language design and implementation . Association for Computing Machinery, 268–277.https://doi.org/10.1145/113445.113468[19] Jacques Garrigue. 2002. Simple Type Inference for Structural Polymorphism. In
International Workshop on Foundationsof Object-Oriented Languages (FOOL) .[20] Jörgen Gustavsson and Josef Svenningsson. 2001. Constraint Abstractions. In
Symposium on Programs as Data Objects ,Olivier Danvy and Andrzej Filinski (Eds.). Springer Berlin Heidelberg, 63–83. :26 Eddie Jones and Steven Ramsay [21] John Harrison. 2009.
Handbook of Practical Logic and Automated Reasoning . Cambridge University Press.[22] Nevin Heintze. 1994. Set-based analysis of ML programs. In
Proceedings of the 1994 ACM conference on LISP andfunctional programming . Association for Computing Machinery, 306–317. https://doi.org/10.1145/182409.182495[23] Nevin Heintze, Spiro Michaylov, and Peter Stuckey. 1992. CLP( R ) and some electrical engineering problems. Journalof Automated Reasoning
9, 2 (1992), 231–260.[24] Nevin Charles Heintze. 1992.
Set based program analysis . Thesis.[25] My Hoang and John C. Mitchell. 1995. Lower bounds on type inference with subtypes. In
Proceedings of the 22ndACM SIGPLAN-SIGACT symposium on Principles of programming languages . Association for Computing Machinery,176–185. https://doi.org/10.1145/199448.199481[26] Naoki Kobayashi. 2013. Model Checking Higher-Order Programs.
J. ACM
60, 3 (2013), Article 20.https://doi.org/10.1145/2487241.2487246[27] N. Kobayashi and C. L. Ong. 2009. A Type System Equivalent to the Modal Mu-Calculus ModelChecking of Higher-Order Recursion Schemes. In
IEEE Symposium on Logic In Computer Science . 179–188.https://doi.org/10.1109/LICS.2009.29[28] Naoki Kobayashi, Takeshi Tsukada, and Keiichi Watanabe. 2018. Higher-Order Program Verification via HFL ModelChecking. In
European Symposium on Programming Languages and Systems , Amal Ahmed (Ed.). Springer InternationalPublishing, 711–738.[29] Dexter Kozen, Jens Palsberg, and Michael I. Schwartzbach. 1995. Efficient recursive subtyping.
Mathematical Structuresin Computer Science
5, 1 (1995), 113–125. https://doi.org/10.1017/S0960129500000657[30] Neil Mitchell and Colin Runciman. 2008. Not all patterns, but enough: an automatic verifier for partial but suffi-cient pattern matching. In
Proceedings of the first ACM SIGPLAN symposium on Haskell . Association for ComputingMachinery, 49–60. https://doi.org/10.1145/1411286.1411293[31] Martin Odersky, Martin Sulzmann, and Martin Wehr. 1999. Type Inference with Constrained Types.
TAPOS
5, 1 (1999),35–55.[32] C.-H. Luke Ong and Steven J. Ramsay. 2011. Verifying higher-order functional programs with pattern-matchingalgebraic data types. In
Proceedings of the 38th annual ACM SIGPLAN-SIGACT symposium on Principles of programminglanguages . Association for Computing Machinery, 587–598. https://doi.org/10.1145/1926385.1926453[33] C. L. Ong. 2006. On Model-Checking Trees Generated by Higher-Order Recursion Schemes. In
IEEE Symposium onLogic in Computer Science . 81–90. https://doi.org/10.1109/LICS.2006.38[34] François Pottier. 1998.
Type inference in the presence of subtyping: from theory to practice . Thesis.[35] François Pottier. 2001. Simplifying Subtyping Constraints: A Theory.
Information and Computation
Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages . Association for Computing Machinery, 278–291.https://doi.org/10.1145/263699.263738[37] Jakob Rehof and Torben ÃĘ. Mogensen. 1999. Tractable constraints in finite semilattices.
Science of Computer Pro-gramming
35, 2 (1999), 191 – 221.[38] Patrick M. Rondon, Ming Kawaguci, and Ranjit Jhala. 2008. Liquid types. In
Proceedings of the 29th ACM SIGPLANConference on Programming Language Design and Implementation . Association for Computing Machinery, 159–169.https://doi.org/10.1145/1375581.1375602[39] Zhendong Su, Manuel FÃďhndrich, and Alexander Aiken. 2000. Projection merging: reducing redundancies in inclu-sion constraint graphs. In
Proceedings of the 27th ACM SIGPLAN-SIGACT symposium on Principles of programminglanguages . Association for Computing Machinery, 81–95. https://doi.org/10.1145/325694.325706[40] Martin Sulzmann, Manuel M. T. Chakravarty, Simon Peyton Jones, and Kevin Donnelly. 2007. System F withType Equality Coercions. In
Proceedings of the 2007 ACM SIGPLAN International Workshop on Types in LanguagesDesign and Implementation (TLDI âĂŹ07) . Association for Computing Machinery, New York, NY, USA, 53âĂŞ66.https://doi.org/10.1145/1190315.1190324[41] Tachio Terauchi. 2010. Dependent types from counterexamples. In
Proceedings of the 37th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages . Association for Computing Machinery, 119–130.https://doi.org/10.1145/1706299.1706315[42] Valery Trifonov and Scott Smith. 1996. Subtyping constrained types. In
Static Analysis Symposium , Radhia Cousotand David A. Schmidt (Eds.). Springer Berlin Heidelberg, 349–365.[43] Hiroshi Unno and Naoki Kobayashi. 2009. Dependent type inference with interpolants. In
Proceedings of the 11th ACMSIGPLAN conference on Principles and practice of declarative programming . Association for Computing Machinery, 277–288. https://doi.org/10.1145/1599410.1599445[44] Niki Vazou, Alexander Bakst, and Ranjit Jhala. 2015. Bounded refinement types. In
Proceedings of the 20thACM SIGPLAN International Conference on Functional Programming . Association for Computing Machinery, 48–61. ntensional Datatype Refinement 1:27 https://doi.org/10.1145/2784731.2784745[45] Niki Vazou, Eric L. Seidel, Ranjit Jhala, Dimitrios Vytiniotis, and Simon Peyton-Jones. 2014. Refinement types forHaskell. In
Proceedings of the 19th ACM SIGPLAN international conference on Functional programming . Association forComputing Machinery, 269–282. https://doi.org/10.1145/2628136.2628161[46] Mahesh Viswanathan and Ramesh Viswanathan. 2004. A Higher Order Modal Fixed Point Logic. In
CONCUR 2004 -Concurrency Theory , Philippa Gardner and Nobuko Yoshida (Eds.). Springer Berlin Heidelberg, 512–528.[47] Hongwei Xi and Frank Pfenning. 1999. Dependent types in practical programming. In
Proceedings of the 26th ACMSIGPLAN-SIGACT symposium on Principles of programming languages . Association for Computing Machinery, 214–227. https://doi.org/10.1145/292540.292560 :28 Eddie Jones and Steven Ramsay
A ADDITIONAL MATERIAL FOR SECTION 2 : LANGUAGE
We write
Sch ( R )( ∀ α . T , ∀ α . T ) just if Ty ( R )( T , T ) . Given a function f : Dt D → Dt D wedefine Ty ( f ) : Ty D → Ty D recursively as follows: Ty ( f )( α ) = α Ty ( f )( b ) = b Ty ( f )( d ) = f ( d ) Ty ( f )( T → T ) = Ty ( f )( T ) → Ty ( f )( T ) For brevity, for function f : Dt D → Dt D , we will usually just write f ( T ) , f ( S ) for Ty ( f )( T ) and Ty ( f )( S ) respectively. B ADDITIONAL MATERIAL FOR SECTION 4 : SUBTYPING
Lemma (1).
Subtyping is a preorder.
Proof.
The proof of reflexivity is an easy coinduction. We prove that, if there is T such that T ⊑ T and T ⊑ T then T ⊑ T by coinduction on ⊑ . This amounts to showing that the set {( T , T ) | ∃ T . T ⊑ T ⊑ T } is closed under the given rules, so in each case, we will show thatexistence of such an intermediate T is in contradiction with the given premises in the case. (SShape) Suppose there is T such that T ⊑ T and T ⊑ T and r ( T ) , r ( T ) . Since ⊑ satisfies (SShape) , it follows that r ( T ) = r ( T ) and r ( T ) = r ( T ) from which we obtain the desiredcontradiction. (SMis) Suppose there is T such that d T ⊑ T ⊑ d T . It follows from (SShape) that T is somedatatype d T . Furthermore, suppose dom ( ∆ ∗ ( d )) * dom ( ∆ ∗ ( d )) . By (SMis) , dom ( ∆ ∗ ( d )) ⊆ dom ( ∆ ∗ ( d ) , and dom ( ∆ ∗ ( d )) ⊆ dom ( ∆ ∗ ( d ) . Hence by the transitivity of set inclusion, weimmediately have a contradiction. (SSim) Again, suppose there is T such that d T ⊑ T ⊑ d T and thus T is some datatype d T . Suppose k ∈ dom ( ∆ ∗ ( d ) , by (SMis) it is also defined for d and d . Suppose U i [ T / α ] , U i [ T / α ] , U i [ T / α ] are the ith arguments, instantiated with the corresponding type argu-ment, to the constructor in each datatype respectively. Suppose U i [ T / α ] 6⊑ U i [ T / α ] sothere is no intermediate type T i such that U i [ T / α ] ⊑ T i ⊑ U i [ T / α ] . However, it followsfrom our original assumption and (SSim) that U i [ T / α ] ⊑ U i [ T / α ] and similarly that U i [ T / α ] ⊑ U i [ T / α ] . Thus we reach a contradiction. (SArrL) Suppose there is T such that T → T ⊑ T ⊑ T ′ → T ′ and suppose there is nointermediate U such that T ′ ⊑ U ⊑ T . It follows from (SShape) that T must be of shape T → T . It follows from (SArrL) that, therefore, T ′ ⊑ T and T ⊑ T , in contradiction of theabsence of U . (SArrR) Follows analogously to the above case. (cid:3)
Lemma (2Simulation).
Let R ⊆ Dt D × Dt D and suppose that, for all ( d T , d T ) ∈ R , and forall k such that ∆ ( d )( k ) is defined: • ∆ ( d )( k ) is defined. • And, moreover, Ty ( R )( U i , U i ) for each i ∈ [ .. Arity ( k )] , where U and U are the argumenttypes of ∆ ( d )( k ) and ∆ ( d )( k ) instantiated at T and T respectively.Then it follows that Ty ( R ) is included in the subtype relation. Proof.
The proof is by coinduction. ntensional Datatype Refinement 1:29 (SShape)
By definition, if Ty ( R )( T , T ) then T and T have the same shape. (SMis) Suppose dom ( ∆ ∗ ( d )) * dom ( ∆ ∗ ( d )) and suppose, for the purpose of obtaining a con-tradiction, that Ty ( R )( d T , d T ) . Then there is some k such that ∆ ∗ ( d )( k ) is defined, but ∆ ∗ ( d )( k ) is not, thus contradicting the first bullet of the definition of R . (SData) Suppose ( U i [ T / α ] , U i [ T / α ]) < Ty ( R ) and the side conditions on the rule hold.Then suppose for contradiction that Ty ( R )( d T , d T ) . By definition, it must be that R ( d T , d T ) ,thus contradicting the second bullet. (SArrL) Suppose Ty ( R )( T → T , T ′ → T ′ ) . Then, by definition of Ty ( R ) it can only be because Ty ( R )( T ′ , T ) . (SArrR) Analogously. (cid:3)
C ADDITIONAL MATERIAL FOR SECTION 5 : REFINEMENT TYPE ASSIGNMENT
Lemma 4 (Weakening).
Suppose ⊢ Γ ⊑ Γ and ⊢ T ⊑ T . If Γ ⊢ e : T then Γ ⊢ e : T . Proof.
We prove that, for all Γ ′ , Γ ′ ⊑ Γ implies Γ ′ ⊢ e : T , by induction on Γ ⊢ e : T . Then, ifalso T ⊑ T ′ , Γ ′ ⊢ e : T ′ follows by (TSub) . (TVar) Suppose T :: T and x : ∀ α . U ∈ Γ and let Γ ′ be such that Γ ′ ⊑ Γ . Then x : ∀ α . V ∈ Γ ′ with V ⊑ U . By (TVar) , Γ ′ ⊢ x [ T ] : V [ T / α ] . It follows by definition of subtyping that V [ T / α ] ⊑ U [ T / α ] and therefore the desired result follows by (TSub) . (TSub) (TCst) (TCon) (TApp) In these cases, the conclusion follows from the hypothesesindependently of the environment. (TAbs)
Suppose T :: T and x < dom ( Γ ) and then suppose that Γ ′ is such that Γ ′ ⊑ Γ . Then, bydefinition and reflexivity of subtyping, also Γ ′ ∪ { x : T } ⊑ Γ ∪ { x : T } . It follows from theinduction hypothesis, therefore, that Γ ′ ∪ { x : T } ⊢ e : T . We may assume that x < dom ( Γ ′ ) by the variable convention. Therefore, the result follows by (TAbs) . (TCase) Suppose dom ( ∆ ( d )) ⊆ { k , . . . , k m } . Suppose Γ ′ ⊑ Γ . It follows immediately fromthe induction hypothesis that Γ ′ ⊢ e : d . It follows by reflexivity of subtyping that, for each i , Γ ′ ∪ x : ∆ ( d )( k i ) ⊑ Γ ∪ x : ∆ ( d )( k i ) . Hence, the induction hypothesis gives, for each i , Γ ′ ∪ x : ∆ ( d )( k i ) ⊢ e i : T . The result follows immediately by (TCase) . (cid:3) D ADDITIONAL MATERIAL FOR SECTION 7 : TYPE INFERENCE
Theorem (23Soundness and completeness of ⊑ -inference). Let T and T be extended types andsuppose ⊢ T ⊑ T = ⇒ C . Then, for all assignments θ : ⊢ θT ⊑ θT iff θ | = C Proof.
The proof is by induction on the inference judgement. (ISBase) (ISTyVar)
Obvious. (ISArr)
In the forward direction, suppose ⊢ θT → θT ⊑ θT → θT . By inversion, neces-sarily ⊢ θT ⊑ T and θT ⊑ T . In this case, C = C ∪ C . By the induction hypothesis, θ | = C and θ | = C , so θ | = C as required.In the backward direction, suppose θ | = C ∪ C . Then θ | = C and θ | = C and it followsfrom the induction hypothesis that ⊢ θT ⊑ T and θT ⊑ T . Hence, (SArr) is satisfiedand so ⊢ θT → θT ⊑ θT → θT . (ISData) In the forward direction, suppose ⊢ inj θ ( X ) d T ⊑ inj θ ( Y ) d T and suppose ∆ ( d )( k ) is defined. Then it follows from (SMis) that dom ( ∆ ∗ ( inj θ ( X ) d )) * dom ( ∆ ∗ ( inj θ ( Y ) d )) . Note :30 Eddie Jones and Steven Ramsay that, by definition, dom ( ∆ ∗ ( inj θ ( Z ) d )) = θ ( Z )( d ) , for any refinement variable Z . Hence, θ | = X ( d ) ⊆ Y ( d ) . To see that, also, θ | = k ∈ X ( d ) ? C , assume θ | = k ∈ X ( d ) . Therefore, also ∆ ∗ ( inj θ ( X ) d ))( k ) and ∆ ∗ ( inj θ ( Y ) d ))( k ) are defined. Let us fix ∆ ∗ ( inj θ ( X ) d ))( k ) = ∀ α . A X →· · · A X m → d α , and ∆ ∗ ( inj θ ( Y ) d ))( k ) = ∀ α . A Y → · · · A Y m → d α . Hence, by (SSim) , ⊢ A X i [ T / α ] ⊑ A Y i [ T / α ] for each i ∈ [ .. m ] . Note that, by definition, ∆ ∗ ( inj Z d ))( k ) = inj Z ( ∆ ( k )) for any refinement variable Z . It follows from the induction hypothesis, therefore,that θ | = C , as required.In the backward direction, suppose (i) θ | = ( k ∈ X ( d ) ? C ) and (ii) θ | = X ( d ) ⊆ Y ( d ) .Then, by (ii), (SMis) is satisfied. To see that (SSim) is also satisfied, suppose ∆ ∗ ( θ ( X ))( k ) and ∆ ∗ ( θ ( Y ))( k ) are defined. Then, by (i), θ | = C and it follows from the induction hypothesis that ⊢ A X i [ T / α ] ⊑ A Y i [ T / α ] . Since, by definition, ∆ ∗ ( inj Z d ))( k ) = inj Z ( ∆ ( k )) for any refinementvariable Z , (SSim) is satisfied and so it must be that ⊢ inj θ ( X ) d ⊑ inj θ ( Y ) d as required. (cid:3) Additionally, we define the equivalence of two refinement variable assignments σ and θ moduloa set of refinement variables S , written σ ≡ S θ , by requiring that they are identical on the variablesin S , i.e. ∀ X ∈ S . σ ( X ) = θ ( X ) . Lemma 5.
Let Γ be a closed constrained type environment, Γ ′ a type environment, e be anexpression, V ∈ ETy , and C a constraint set. Suppose Γ ⊢ e = ⇒ V , C , Γ ′ ⊑ L Γ M , and there exists arefinement variable substitution θ such that θ | = C , then there exists a type T ∈ Ty D ∗ such that:(i) Γ ′ ⊢ e : T (ii) ⊢ T ⊑ θV Proof.
The proof is by induction of = ⇒ . (ICst) In this case, e is of shape c and we may assume C ( c ) = V . Let T = C ( c ) . By (TCst) ,we have that Γ ′ ⊢ e : T . As V is an unrefined type, it is invariant under any substitution, i.e. θV = V , and hence (ii) is trivial satisfied. (ICon) In this case, e is of shape k , and V = inj X ( T ) → · · · → inj X ( T m ) → inj X ( d ) forfresh X where ∆ ( d )( k ) = T → · · · → T m . The constraint set C is { k ∈ X ( d )} . We know,therefore, that θ is a substitution mapping X to a constructor choice function f , such that k ∈ dom ( ∆ ∗ ( inj f ( d )) . Let T i = ∆ ∗ ( inj f ( d ))( k )( i ) and T = T → · · · → T n → inj f ( d ) . By thedefinition of ∆ ∗ , θV = T . Finally, by (TCon) , we have that Γ ′ ⊢ k : T as required. (IVar) In this case, e is of shape x T . Let us assume that x : ∀ α . ∀ X . C ⊃ U ∈ Γ . Hence, forfresh Y and T , V is U [ Y / X ][ T / α ] , and θ solves C [ Y / X ] . We may assume x : ∀ α . σU ∈ Γ ′ ,for some σ . Thus by (IVar) Γ ′ ⊢ x T : ( σU )[ T / α ] . Define a new model θ ′ as follows: θ ′ ( Z ) = θ ( Y i ) if Z = X i σ ( X i ) if Z = Y i T i if Z = T i σ ( Z ) otherwiseThe consistency of this definitions follows from the freshness of T and V . Clearly this solves C , and θ ′ V = θ ′ U [ Y / X ][ T / α ] = ( σU )[ T / α ] trivially satisfying (ii). (IAbs) In this case e = λx . e , and V = V → V where V is fresh. Suppose θ | = C . As C is also the constraint set of the hypotheses, by induction, there exists a type T , such that Γ ′ ∪ { x : T } ⊢ e : T , T ⊑ θV , and θV = T . By (TAbs) we may derive Γ ′ ⊢ λx . e : T → T .Let T = T → T , then clearly we have ⊢ T ⊑ θV as required. ntensional Datatype Refinement 1:31 (IApp) In this case e = e e and C = C ∪ C ∪ C . Hence by the rules of inference, necessarily Γ ⊢ e = ⇒ V → V , C , Γ ⊢ e = ⇒ V , C , and ⊢ V ⊑ V = ⇒ C . Suppose θ | = C . As this isalso a solution to C by Theorem 23, we have that θV ⊑ θV . By the induction hypothesis Γ ′ ⊢ e : T → T and Γ ′ ⊢ e : T with θV ⊑ T , T ⊑ θV , and T ⊑ θV . Using thetransitivity of subtyping, we have that T ⊑ T . Hence by (TSub) Γ ′ ⊢ e : T , and so by (TApp) Γ ′ ⊢ e e : T as required. (ICase) In this case e = case e of {| mi = k i x i e i } , and C = C ∪ Ð mi = k i ∈ X ( d ) ? ( C i ∪ C ′ i ) ∪{ X ( d ) ⊆ { k , · · · , k m }} . Let us suppose that θ maps X to the constructor choice function f .It follows from inference that Γ ⊢ e = ⇒ inj X ( d ) , C , and as θ | = C , thus Γ ′ ⊢ e : T such that ⊢ T ⊑ θ inj f ( d ) .For each i ≤ m , let us consider the case when k i ∈ dom ( ∆ ∗ ( inj f ( d ))) . By induction Γ ′ ∪ { x i : ∆ ∗ ( inj f ( d ))( k i )} ⊢ e i : T i for some T i ⊑ θV i . As C ′ i must be satisfied by θ , we also have that θV i ⊑ θV , and so Γ ′ ∪ { x i : ∆ ∗ ( inj f ( d ))( k i )} ⊢ e i : θV .As dom ( ∆ ∗ ( inj f ( d ))) ⊆ { k , · · · , k m } , by (TCase) we have that Γ ′ ⊢ e : θV as required. (cid:3) Lemma 6.
Let Γ be a constrained type environment, Γ ′ a type environment, e be an expression, T ∈ Ty D ∗ , V ∈ ETy , σ a refinement variable substitution with domain FRV ( Γ ) and C a constraintset. Suppose Γ ′ ⊢ e : T and Γ ⊢ e = ⇒ V , C and L σ Γ M ⊑ Γ ′ . Then there is θ such that:(i) σ ≡ FRV ( Γ ) θ (ii) and ⊢ θV ⊑ T (iii) and θ | = C . Proof.
The proof is by induction on ⊢ ′ . (TVar) In this case e is of shape x T . Assume U( T ) = T , ( x : ∀ α . U ) ∈ Γ ′ and U [ T / α ] = T .Assume L σ Γ M ⊑ Γ ′ . By definition, in Γ there is some x : ∀ α . ∀ X . C ′ ⊃ U ′ and there is some τ such that ⊢ τ ( σU ′ ) ⊑ U and τ | = σC ′ . By definition, V is of shape U ′ [ Y / X ][ T ′ / α ] for fresh Y and fresh types T ′ . Moreover, C is of the form C ′ [ Y / X ] . Since the T ′ are fresh, there is asubstitution θ ′ such that θ ′ T ′ = T . Define θ as follows: θ ( Z ) ≔ θ ′ ( Z ) if Z ∈ FRV ( T ′ ) σ ( Z ) if Z ∈ FRV ( Γ ) τ ( X i ) if Z = Y i ∈ Yτ ( Z ) otherwiseNote that the freshness of Y and T ′ ensure the exclusivity of the four cases and requirement(i) of the theorem is satisfied. Observe that: θ ( U ′ [ Y / X ][ T ′ / α ]) = ( θ ( U ′ [ Y / X ]))[ θ T ′ / α ] = ( θ ( U ′ [ Y / X ]))[ T / α ] Next, since θ ( X i ) = θ ( Y i ) for any X i ∈ X and the codomain of τ is closed, it follows that: θ ( U ′ [ Y / X ]) = ( θU ′ )[ θ ( Y )/ X ] = ( θU ′ )[ θ ( X )/ X ] = θU ′ Finally, by the disjointness of the cases, the fact that the codomain of σ is necessarily closed,and FRV ( U ′ ) ⊆ X ∪ FRV ( Γ ) , θU ′ = τ ( σU ′ ) . Hence, overall θ ( U ′ [ Y / X ][ T ′ / α ]) = τ ( σU ′ )[ T / α ] and it follows by definition of subtyping that ⊢ τ ( σU ′ )[ T / α ] ⊑ U [ T / α ] so that require-ment (ii) is satisfied. Finally, note that θ | = C ′ [ Y / X ] iff θ | = C ′ since θ ( Y i ) = θ ( X i ) . Then,by definition, and the closedness of the codomain of σ , θ | = C ′ iff θ | = σC ′ . Then, since :32 Eddie Jones and Steven RamsayFRV ( C ′ ) ⊆ X ∪ FRV ( Γ ) , it follows that θ | = σC ′ iff τ | = σC ′ , which was an assumed. Hence, θ | = C ′ [ Y / X ] and requirement (iii) is satisfied. (TCst) In this case, e is of shape c and we may assume C ( c ) = T . Since V = C ( c ) and C = ∅ ,Any assignment θ extending σ will satisfy the three requirements. (TCon) In this case, e is a constructor k T and T is of the form T → · · · T m → inj f d T with T → · · · → T m → inj d d T = ∆ ∗ ( inj д d )( k )[ T / α ] and k ∈ dom ( ∆ ∗ ( inj f d )) for some fresh T . Hence, by definition, T = inj f ( ∆ ( d )( k )) (*) and k ∈ f ( d ) (***). In this case, V is inj X T with T = ∆ ( d )( k ) → d and X fresh. Moreover, C is { k ∈ X ( d )} Therefore, we can define θ asfollows: θ ( Z ) = ( f if Z = Xσ ( Z ) otherwiseBy freshness of X , this guarantees requirement (i). By (*) above, we have inj f T ⊑ T , ensuringrequirement (ii). Finally, the requirement (iii) is satisfied by (***). (TAbs) In this case, e is an abstraction λx : T . e ′ and T is of shape T → T . We may assumethat U( T ) = T and x < dom Γ . Assume ⊢ L σ Γ M ⊑ Γ ′ . From inference we have V necessarilyof shape V → V with Fresh T ( V ) and Γ ∪ { x : V } ⊢ e = ⇒ T , C . By the freshness of V ,it follows that there is some σ ′ such that σ ′ V = T and σ ′ ( Z ) = σ ( Z ) on any Z < FRV ( V ) .Hence, L σ Γ M ∪{ x : T } = L σ ′ ( γ ∪{ x : V }) M and, by definition, ⊢ L σ Γ M ∪{ x : T } ⊑ Γ ′ ∪{ x : T } .Therefore, it follows from the induction hypothesis that there is an assignment θ satisfying(a) θ ≡ FRV ( Γ ∪{ x : V }) σ ′ , (b) ⊢ θV ⊑ T and (c) θ | = C . Then this θ works also as a witness to themain result since (a) implies θ ≡ FRV ( Γ ) σ , (a) and (b) together imply ⊢ θV → V ⊑ T → T and (c) is exactly requirement (iii). (TApp) In this case, e is an application e e . From inference we have, necessarily, Γ ⊢ e = ⇒ V → V , C , Γ ⊢ e : V , C and ⊢ V ⊑ V = ⇒ C . It follows immediately from theinduction hypothesis that there are assignments θ and θ such that (a) θ i ≡ FRV ( Γ ) σ , (b1) ⊢ θ V → V ⊑ T → T , (b2) ⊢ θ V ⊑ T , (c1) θ | = C and (c2) θ | = C . Define θ as follows: θ ( Z ) = θ ( Z ) if Z ∈ FRV ( V → V ) θ ( Z ) if Z ∈ FRV ( V ) σ ( Z ) otherwiseThis is well defined since one can easily verify by inspection that the inference system guar-antees that the only overlap in refinement variables between sibling branches is in the freerefinement variables of the environment, and θ and θ agree on this by (a). This constructiontherefore satisfies requirement (i). Furthermore, (b1) implies ⊢ θV ⊑ T . Finally, it followsfrom (c1) and (c2) that θ | = C ∪ C . By (b1) and (b2) we have ⊢ θV ⊑ T and ⊢ T ⊑ θV so, bytransitivity, also ⊢ θV ⊑ θV . Hence, it follows from the completeness of subtype inference(Theorem 23) that, therefore, θ | = C . (TCase) In this case, e is of shape case e ′ of { p mi = k i x i e i } and we may assume that { i , . . . , i n } ⊆ { , . . . , m } and dom ( ∆ ∗ ( d )) = { k i , . . . , k i n } (*). We may also assume that d = inj f d for some choice f . From inference, we have necessarily Γ ⊢ e : = ⇒ inj X d , C and,for all i ∈ { . . . m } , ⊢ V i ⊑ V = ⇒ C ′ i (**) and Γ ∪ { x i : inj X ( ∆ ( d )( k i ))} ⊢ e i = ⇒ V i , C i ; where X , V and each V i are fresh. Since V is fresh, there is some substitution θ ′ such that θ ′ V = T (***). It follows from the induction hypothesis that there is θ such that (a0) θ ≡ FRV ( Γ ) σ , (b0) ⊢ θ inj X d ⊑ d and (c0) θ | = C . Set σ ′ ( X ) = θ ( X ) and σ ′ ( Z ) = σ ( Z ) for all other Z . Then,for all j ∈ { . . . n } : L σ Γ M ∪ { x i j : ∆ ∗ ( inj θ ( X ) d )( k i j )} = L σ ′ ( Γ ∪ { x i j : inj X ( ∆ ( d )( k i j ))} M ntensional Datatype Refinement 1:33 By (b0) and (SSim) , it follows that ⊢ L σ Γ M ∪ { x i j : ∆ ∗ ( inj θ ( X ) d )( k i j )} ⊑ L σ Γ M ∪ { x i j : ∆ ∗ ( d )( k i j )} . Hence it follows from the induction hypothesis that there are substitutions θ i j (for each j ∈ { . . . n } ) such that (ai) θ i j ≡ FRV ( Γ ∪{ x ij : inj X ( ∆ ( d )( k ij ))}) σ ′ , (bi) θ i j V i ⊑ T and (ci) θ i j | = C i j . Define θ as follows: θ ( Z ) = θ ′ ( Z ) if Z ∈ FRV ( V ) θ ( Z ) if X = Z or Z ∈ FRV ( C ) θ i j ( Z ) if Z ∈ FRV ( V i j ) ∪ FRV ( C i j ) σ ( Z ) otherwiseThe use of freshness guarantees the well-definedness since refinement variables introducedin different branches are distinct and so the only variables shared are those in Γ , on whichall agree by (a0) and (ai). Hence, requirement (i) is satisfied. It follows from (***) that θ V = T and by (bi) ⊢ θ V i j ⊑ T . Hence, ⊢ θ V i j ⊑ θ V and it follows from the completeness of subtypeinference, Theorem 23, that θ | = C ′ i j . By (*) and (SMis) , dom ( ∆ ∗ ( inj θ ( X ) d )) ⊆ { k i , . . . , k i , n } and so θ ( X )( d ) ⊆ { k i , . . . , k i , n } . Consequently, by the foregoing and (ci): θ | = { X ( d ) ⊆ { k i , . . . , k m }} m Ø i = k i ∈ X ( d ) ? ( C i ∪ C ′ i ) Finally, requirement (iii) is satisfied by also taking into account (c0). (TSub)
In this case we may assume ⊢ T ⊑ T = T . It follows from the induction hypothesisthat there is θ such that (i) θ ≡ FRV ( Γ ) σ , (ii) ⊢ θ V ⊑ T and (iii) θ | = C . Since, by transitivity, ⊢ θ V ⊑ T , the same θ acts as witness to the result. (cid:3) Theorem (24Soundness and completeness of expression inference).
Let Γ be a constrained typeenvironment, e an expression, V an extended refinement, C a set of constraints and let Γ ⊢ e = ⇒ V , C . Then, for all refinement types T : L Γ M ⊢ e : T iff ∃ θ . θ | = C ∧ ⊢ θV ⊑ T Proof.
Follows immediately from Lemmas 5 and 6. (cid:3)
Lemma 7.
Let Γ , and Γ ′ be closed constrained type environments, ∆ ⊑ L Γ M , and m a module. If Γ ⊢ m = ⇒ Γ ′ , and L Γ ′ M isn’t empty, then there exists a ∆ ′ ⊑ L Γ ′ M such that ∆ ⊢ m : ∆ ′ . Proof.
The proof is by induction of = ⇒ . (IModE) In this case the module is empty, i.e. m = ϵ , and Γ ′ = Γ . Let ∆ ′ = ∆ which is asubenvironment of L Γ ′ M . By (TModE) we can derive ∆ ⊢ m : ∆ ′ as required. (IModD) If the module has shape m ·h x : ∀ α . T = Λ α . e i , and Γ ′ = Γ ′′ ∪{ x : ∀ α . ∀ X . Sat ( C ∪ C ) ↾ X ⊃ V } . Let us assume Γ ′′ ∪ { x : V } ⊢ e = ⇒ V ′ , C , and ⊢ V ′ ⊑ V = ⇒ C forsome fresh V with X = FRV ( V ) . The induction hypothesis provides a ∆ ′′ ⊑ L Γ ′′ M such that ∆ ⊢ m : ∆ ′′ . As L Γ ′ M isn’t empty, there must exist a solution to Sat ( C ∪ C ) ↾ X . By lemma 28we therefore have that both C and C are solvable by some refinement variable substitution θ . Additionally, there must some type T ′ ⊑ θV ′ such that ∆ ′ ∪ { x : T } ⊢ e : T ′ where T = θV by lemma 5. As θV ′ ⊑ θV we have that T ′ ⊑ T . Thus by (TModE) we have that ∆ ⊢ m · h x : ∀ α . T = Λ α . e i : ∆ ′ ∪ { x : ∀ α . T } as required. (cid:3) :34 Eddie Jones and Steven Ramsay Lemma 8.
Let Γ and Γ ′ be constrained type environments, ∆ and ∆ ′ type environments, and m a module. Suppose L Γ M ⊑ ∆ , and ∆ ⊢ m : ∆ ′ and Γ ⊢ m = ⇒ Γ ′ , then L Γ ′ M ⊑ ∆ ′ . Proof.
The proof is by induction of ⊢ . (TModE) If the module is empty, then ∆ = ∆ ′ and Γ = Γ ′ . Therefore ∆ ′ ⊑ L Γ ′ M trivally followsfrom the hypothesis. (TModD) Suppose the module is of shape m ·h x = Λ α . e i , and therefore ∆ ′ = ∆ ′′ ∪{ x : ∀ α . T } for some ∆ ′′ and T such that ∆ ⊢ m : ∆ ′′ , ∆ ′′ ∪ { x : ∀ α . T } ⊢ e : T ′ , and ⊢ T ′ ⊑ T . As Γ ⊢ m · h x = Λ α . e i = ⇒ Γ ′ , we know that Γ ′ has shape Γ ′′ ∪ { x : ∀ α . ∀ X . C ∪ C ⊃ V } such that Γ ⊢ m = ⇒ Γ ′′ , Γ ′′ ∪ { x : V } ⊢ e = ⇒ V ′ , C , and ⊢ V ′ ⊑ V = ⇒ C . By the inductionhypothesis, therefore, we have that L Γ ′′ M ⊑ ∆ ′′ . As V is fresh we can instantiate lemma 6with a substituion that maps it to T ′ . Therefore, there is some θ that solves C with θV ′ ⊑ T ′ ,and θV = T . Hence ∆ ′ is Γ ′ with x instantiated by θ all other variables instantied as in ∆ ′′ .As θV ′ ⊑ T ′ trivially holds, we also have that θ | = C . Thus L Γ ′ M ⊑ ∆ ′ as required. (cid:3) Theorem (25Soundness and completeness of module inference).
Suppose Γ and Γ ′ are closed con-strained type environments, m a module and Γ ⊢ m = ⇒ Γ ′ . Then, for all type environments ∆ : L Γ M ⊢ m : ∆ iff L Γ ′ M ⊑ ∆ Proof.
Follows immediately from Lemmas 7 and 8. (cid:3)
E ADDITIONAL MATERIAL FOR SECTION 8 : SATURATION
Theorem (28Saturation equivalence).
For any assignment θ , θ | = C iff θ | = Sat ( C ) . Proof.
We shall show that the resolution rules preserve solutions (naturally they reflect solu-tions too, since they do not remove constraints). In each case we shall assume the derived guardholds, otherwise there is nothing to show. • Suppose ϕ ? S ⊆ S and ψ ? S ⊆ S appear in C , and both θ satisfies both ϕ and ψ . Thenwe must have θS ⊆ θS and θS ⊆ θS , as θ solves C . From transitivity of subset relation, itfollows that θS ⊆ θS . If the guards do not hold there is nothing to show. • Suppose ϕ ? dom ( X ( d )) ⊆ dom ( Y ( d )) and ψ ∪ ( Y , d ) 7→ k ? S ⊆ S appear in C , andthat θ satisfies ϕ ∪ ψ ∪ ( X , d ) 7→ k . Therefore dom (( θX )( d )) ⊆ dom (( θY )( d )) as θ solves C .Additionally, k ∈ dom (( θX )( d )) , and so k ∈ dom (( θY )( d )) . Thus ψ ∪ ( Y , d ) 7→ k holds under θ , and θS ⊆ θS . • Suppose ϕ ? k ∈ dom ( X ( d )) and ( X , d ) 7→ k ∪ ψ ? S ⊆ S appear in C , and θ satisfies both ϕ and ψ . Then we have that k ∈ dom (( θX )( d )) . Clearly the guard of S ⊆ S then must alsohold and θS ⊆ θS as required. (cid:3) Definition 34.
We say that τ is a partial solution of some constraint set C , if τ solves the restrictionof C to the domain of τ , i.e. τ solves C ↾ dom ( τ ) .Additionally, we construct the extended solution θ τ (or just θ when τ is implied) as follows. Foreach refinement variable X not in the domain of τ , define ( θX )( d )( k ) = ∆ ( d )( k ) whenever thereexists some ϕ ? S ⊆ dom ( X ( d )) ∈ C , such that ϕ holds under τ , and k ∈ τ S . Lemma 9.
Suppose τ is a partial solution of a saturated constraint set C . For any constraint ϕ ? S ⊆ S ∈ C such that ϕ satisfied by θ τ ◦ τ , then there is a constraint with the same body, i.e.of the form ψ ? S ⊆ S , such that τ satisfies ψ . ntensional Datatype Refinement 1:35 Proof.
Our proof shall be by induction of the cardinality of {( X , k ) | X < dom ( τ )} .If the cardinality is zero, then τ must satisfy ϕ and so we are done.We now consider the case when the cardinality is n +
1, and let us assume the induction hypoth-esis holds for n . Suppose ( X , k ) ∈ ϕ = k and X < dom ( τ ) . As θ τ ◦ τ satisfies ϕ we have that that k ∈ dom (( θ τ X )( d )) for some d . And so by construction ψ ? S ⊆ dom ( X ( d )) ∈ C with k ∈ τ S and τ satisfying ψ . As C is saturated S must either be the singleton k , or of the form dom ( Y ( d )) : • In the first case we may apply the rule (Satisfaction) as C is saturated to conclude ψ ∪ ϕ −( X , k ) ? S ⊆ S ∈ C . • In the second case we may apply the rule (Weakening) to conclude ψ ∪ ϕ ∪ ( Y , k ) −( X , k ) ? S ⊆ S ∈ C .In either case we are now left with a guard which θ τ ◦ τ satisfies and has n pairs with refinementvariables that are not in the domain of τ , hence we can apply the induction hypothesis. (cid:3) This lemma expresses the fact that the extended solution θ τ does not make any arbitrary choices— whenever it satisfies the guard of a constraint, the constraint must already have been satisfiedaccording to resolution. Lemma 10.
Suppose C is a saturated constraint set which doesn’t contain ∅ ? k ∈ ∅ , and suppose τ is a partial solution C . Then θ τ ◦ τ solves C . Proof.
We need only consider the constraints in C which reference some refinement variablenot in the domain of τ .Let ϕ ? S ⊆ S be such a constraint. If θ τ ◦ τ doesn’t hold on ϕ then there is nothing to show. Onthe other hand, we may apply the above lemme to deduce that there is a constraint ψ ? S ⊆ S ∈ C with ψ satisfied by τ .We now consider the possible forms of S and S : • Suppose both S and S are in the domain of τ . Then ψ ? S ⊆ S is in the restriction of C tothe domain of τ and as τ satisfies ψ we have that τ S ⊆ τ S as required. • Suppose S is in the domain of τ , but S = dom ( X ( d )) for some X < dom ( τ ) . If k ∈ τ S , thenby construction k ∈ dom (( θX )( d )) , and so the subset relation holds. • Suppose S = dom ( X ( d )) for some X < dom ( τ ) . Let k ∈ dom (( θX )( d )) , we shall show that k ∈ ( θ ◦ τ ) S . By construction ψ ′ ? S ⊆ dom ( X ( d )) ∈ C such that k ∈ τ S and for some ψ ′ which τ satisfies. As C is saturated we can deduce that ψ ′ ∪ ψ ? S ⊆ S ∈ C . Note that ψ ′ and ψ hold under τ .If the constraint S is in the domain of τ , then τ S ⊆ τ S by the definition of a partial solution,and so k ∈ τ S as required.If S = dom ( Y ( d )) for some Y < dom ( τ ) , however, by the definition of θ τ , k ∈ dom (( θ τ Y )( d )) . (cid:3) Theorem (29). C is unsatisfiable iff Sat ( C ) is trivially unsatisfiable. Proof.
The consistency of
Sat ( C ) follows from Lemma 10 with partial solution τ being emptyand then Theorem 28 gives the consistency of C .In the backward direction, Theorem 28 gives consistency of Sat ( C ) . Since no substitution cansolve k ∈ ∅ , it follows that therefore k ∈ ∅ < Sat ( C ) . (cid:3) F ADDITIONAL MATERIAL FOR SECTION 9 : RESTRICTION AND COMPLEXITY
Let S be the size of the largest function definition. Let K be the maximum number of constructorsassociated with any datatype, let D be the maximum number of datatypes in any slice (i.e. for Lam this is 2) and let Q be the maximum size of any underlying type. :36 Eddie Jones and Steven Ramsay Lemma (3).
There are O( Kv D · vKD + K ) atomic constraints over v refinement variables. Proof.
Each non-trivially-unsatisfiable atomic constraint ϕ ? S ⊆ S can be understood asa choice one of 2 K subsets of constructors for each of the vD pairs of refinement variable andunderlying datatype (for ϕ , we assume that each variable is associated with a particular slice),followed by a choice of one of vD variable and datatype pairs for the head, which will appear ineither the S or S position depending on the next choice, and then either: another variable (theunderlying datatype is determined by previous choice) for the other position, one of K constructors(for S ) or a choice of the 2 K subsets of constructors or a variable datatype pair (for S ). Therefore,there are at most 2 KvD · vD · ( v + K + K ) possible constraints over these refinement variables and:2 KvD · vD · ( v + K + K ) = v D KvD + KvD KvD + vD KvD + K ) = O( Kv D · KvD + K ) (cid:3) Suppose the maximum level of case expression nesting is M and let V = M + Q . Lemma 11.
Suppose Γ ⊢ e = ⇒ T , C with C restricted to the context. The number of constraintsin C is O( KV D · KV D + K ) . Proof.
By applying restriction after each inference, we are guaranteed that the refinement vari-ables occurring in C are a subset of those occurring in Γ and T . The only free refinement variablesin Γ are those that were introduced as a result of inferring under a lambda abstraction (i.e. intro-duced by the (IAbs) rule) or those introduced as a result of inferring under a case statement (i.e.introduced by the (ICase) rule). There are at most Q introduced by abstractions and a further M introduced by case matching (each case only introduces a single refinement variable X , indepen-dently of the complexity of the datatype of the scrutinees). The type T introduces at most another Q many. (cid:3) Lemma 12.
The complexity of type inference is O( N SK V D · K ( V D ( K + ) + ) ) Proof.
For each subterm of each function definition, inference must generate new constraintsand apply the saturation and restriction operations to the set of constraints obtained from com-bining the outputs of its recursive calls. Using a standard fixpoint computation, and the numberof possible constraints over a given set of variables calculated in Lemma 3, saturation of a con-straint set of size n involving v refinement variables can be achieved in time O( nKv D · KvD + K ) .Hence the time taken to process each subterm will be dominated by the time by saturation. Thenumber of constraints that constitutes the input to saturation for a given subterm is a functionof the inference rule applied, its premises and side conditions. The worst-case is (ICase) whichhas at most K premises and a number of side conditions. The total number of constraints C beforesaturation and restriction, in this case, is the sum of the sizes of each C i , C ′ i , C and a single con-straint contributed by the side condition. Each of these subsets is already restricted to its contextso, by Lemma 11, their sizes are at most O( KV D · KV D + K ) , giving an overall upper bound for C of O( K V D · KV D + K ) . The total number of refinement variables occurring in this set is given bythose free in the Γ , T and each T i . The context Γ and type T contribute at most V many and the T i together contribute at most a further KQ . Consequently, the number is bounded above by KV .Hence, saturation can be computed in time O( K V D · K ( V D ( K + ) + ) ) . (cid:3) Theorem (33).
Under the assumption that the size of types and the size of individual functiondefinitions is bounded, the complexity of type inference is O( N ) . Proof.
Follows immediately from Lemma 12 by fixing all other parameters. (cid:3) ntensional Datatype Refinement 1:37
G ADDITIONAL MATERIAL FOR SECTION 10 : IMPLEMENTATION
The following is a listing of benchmark results with packages split by module.
G.1 Package aeson-1.5.2.0
Name N K V D I Time (ms)Data.Aeson 15 6 198 2 3 3.01Data.Aeson.Encode 2 0 1 0 1 2.75Data.Aeson.Encoding 0 0 0 0 0 2.72Data.Aeson.Encoding.Builder 71 6 1534 1 6 3.34Data.Aeson.Encoding.Internal 67 6 723 2 7 2.65Data.Aeson.Internal 0 0 0 0 0 2.85Data.Aeson.Internal.Functions 3 0 10 0 0 3.33Data.Aeson.Internal.Time 0 0 0 0 0 3.26Data.Aeson.Parser 0 0 0 0 0 2.62Data.Aeson.Parser.Internal 63 6 2252 2 9 3.20Data.Aeson.Parser.Time 7 1 35 1 1 2.81Data.Aeson.Parser.Unescape 0 0 0 0 0 3.01Data.Aeson.Parser.UnescapePure 19 13 1109 2 5 4.09Data.Aeson.QQ.Simple 2 6 111 1 2 7.12Data.Aeson.TH 271 4 10251 3 11 2.88Data.Aeson.Text 13 6 326 1 4 3.99Data.Aeson.Types 1 1 7 1 2 2.81Data.Aeson.Types.Class 0 0 0 0 0 2.76Data.Aeson.Types.FromJSON 96 6 2054 6 14 2.93Data.Aeson.Types.Generic 1 0 1 0 1 3.77Data.Aeson.Types.Internal 46 6 564 2 8 3.64Data.Aeson.Types.ToJSON 27 6 484 3 10 2.88Data.Attoparsec.Time 16 1 745 1 2 2.94Data.Attoparsec.Time.Internal 8 1 61 1 1 3.98 :38 Eddie Jones and Steven Ramsay
G.2 Package containers-0.6.2.1
Name N K V D I Time (ms)Data.Containers.ListUtils 11 3 61 1 4 3.97Data.Graph 69 2 1411 2 8 2.87Data.IntMap 8 0 162 0 2 2.83Data.IntMap.Internal 438 3 5519 2 23 2.62Data.IntMap.Internal.Debug 0 0 0 0 0 2.43Data.IntMap.Internal.DeprecatedDebug 4 0 84 0 1 2.95Data.IntMap.Lazy 0 0 0 0 0 2.62Data.IntMap.Merge.Lazy 0 0 0 0 0 15.55Data.IntMap.Merge.Strict 15 3 152 2 4 2.71Data.IntMap.Strict 0 0 0 0 0 3.08Data.IntMap.Strict.Internal 146 3 1541 2 14 9.76Data.IntSet 0 0 0 0 0 2.55Data.IntSet.Internal 288 5 3690 2 9 2.89Data.Map 10 0 200 0 2 2.86Data.Map.Internal 379 4 5697 2 15 2.85Data.Map.Internal.Debug 20 2 538 1 3 3.33Data.Map.Internal.DeprecatedShowTree 4 0 86 0 1 2.46Data.Map.Lazy 0 0 0 0 0 2.57Data.Map.Merge.Lazy 0 0 0 0 0 2.72Data.Map.Merge.Strict 0 0 0 0 0 1.77Data.Map.Strict 0 0 0 0 0 3.07Data.Map.Strict.Internal 137 2 2013 2 12 2.81Data.Set 0 0 0 0 0 2.75Data.Set.Internal 211 2 3362 2 11 2.86Data.Tree 24 1 497 1 7 2.84Utils.Containers.Internal.BitQueue 13 1 153 2 4 4.02Utils.Containers.Internal.BitUtil 5 0 22 0 0 3.40Utils.Containers.Internal.Coercions 2 0 6 0 0 4.58Utils.Containers.Internal.PtrEquality 2 0 19 0 0 3.04Utils.Containers.Internal.State 2 1 5 1 1 3.69Utils.Containers.Internal.StrictMaybe 3 2 15 1 1 3.69Utils.Containers.Internal.StrictPair 1 0 4 0 1 3.40Utils.Containers.Internal.TypeError 0 0 0 0 0 2.74 ntensional Datatype Refinement 1:39
G.3 Package extra-1.7.3
Name N K V D I Time (ms)Control.Concurrent.Extra 20 3 552 3 7 2.41Control.Exception.Extra 19 0 348 0 0 2.85Control.Monad.Extra 57 0 342 0 0 2.93Data.Either.Extra 9 0 94 0 0 2.91Data.IORef.Extra 4 0 40 0 0 3.50Data.List.Extra 119 3 1963 1 6 3.47Data.List.NonEmpty.Extra 17 0 84 0 0 2.36Data.Tuple.Extra 16 0 58 0 0 4.71Data.Typeable.Extra 0 0 0 0 0 2.76Data.Version.Extra 4 0 122 0 0 2.63Extra 0 0 0 0 0 3.26Numeric.Extra 5 0 19 0 0 3.46Partial 0 0 0 0 0 3.23System.Directory.Extra 8 0 265 0 0 2.72System.Environment.Extra 0 0 0 0 0 3.55System.IO.Extra 35 0 923 0 0 2.47System.Info.Extra 2 0 2 0 0 2.70System.Process.Extra 5 0 264 0 0 4.35System.Time.Extra 12 1 362 1 3 2.29Text.Read.Extra 0 0 0 0 0 2.98 :40 Eddie Jones and Steven Ramsay
G.4 Package fgl-5.7.0.2
Name N K V D I Time (ms)Data.Graph.Inductive 1 0 20 0 0 2.95Data.Graph.Inductive.Basic 17 0 452 0 0 2.47Data.Graph.Inductive.Example 53 1 5236 1 1 2.81Data.Graph.Inductive.Graph 102 1 1920 1 1 4.26Data.Graph.Inductive.Internal.Heap 19 2 314 1 6 3.71Data.Graph.Inductive.Internal.Queue 5 1 44 1 4 3.40Data.Graph.Inductive.Internal.RootPath 6 1 123 1 2 3.22Data.Graph.Inductive.Internal.Thread 20 0 133 0 0 3.66Data.Graph.Inductive.Monad 23 0 349 0 0 3.31Data.Graph.Inductive.Monad.IOArray 7 1 272 1 1 2.47Data.Graph.Inductive.Monad.STArray 8 1 282 1 1 2.93Data.Graph.Inductive.NodeMap 60 1 678 1 4 3.81Data.Graph.Inductive.PatriciaTree 23 1 871 1 2 2.79Data.Graph.Inductive.Query 0 0 0 0 0 2.74Data.Graph.Inductive.Query.ArtPoint 16 1 442 1 5 3.40Data.Graph.Inductive.Query.BCC 20 0 500 0 0 3.18Data.Graph.Inductive.Query.BFS 29 1 625 2 9 3.10Data.Graph.Inductive.Query.DFS 44 0 467 0 0 3.14Data.Graph.Inductive.Query.Dominators 36 0 655 0 0 4.05Data.Graph.Inductive.Query.GVD 10 2 209 2 3 2.88Data.Graph.Inductive.Query.Indep 9 0 208 0 0 4.08Data.Graph.Inductive.Query.MST 13 2 219 2 10 3.76Data.Graph.Inductive.Query.MaxFlow 20 0 322 0 1 2.63Data.Graph.Inductive.Query.MaxFlow2 51 2 2500 2 12 2.89Data.Graph.Inductive.Query.Monad 58 1 700 1 6 2.87Data.Graph.Inductive.Query.SP 12 2 177 2 10 3.19Data.Graph.Inductive.Query.TransClos 15 0 215 0 0 3.62Data.Graph.Inductive.Tree 8 1 306 1 1 3.87Paths_fgl 15 0 164 0 0 3.13 ntensional Datatype Refinement 1:41
G.5 Package haskeline-0.8.0.1
Name N K V D I Time (ms)System.Console.Haskeline 136 15 2343 17 18 3.55System.Console.Haskeline.Backend 6 15 55 7 2 4.28System.Console.Haskeline.Backend.DumbTerm 48 15 484 7 12 3.65System.Console.Haskeline.Backend.Posix 51 15 1843 8 10 3.96System.Console.Haskeline.Backend.Posix.Encoder 8 2 98 2 2 4.28System.Console.Haskeline.Backend.Terminfo 148 15 2108 7 13 3.98System.Console.Haskeline.Backend.WCWidth 10 1 168 1 6 4.25System.Console.Haskeline.Command 28 15 327 8 7 4.09System.Console.Haskeline.Command.Completion 47 15 807 13 13 3.68System.Console.Haskeline.Command.History 52 15 1131 11 12 2.90System.Console.Haskeline.Command.KillRing 35 15 360 9 6 3.59System.Console.Haskeline.Command.Undo 17 15 138 8 3 3.91System.Console.Haskeline.Completion 40 1 948 1 6 3.37System.Console.Haskeline.Directory 0 0 0 0 0 3.95System.Console.Haskeline.Emacs 73 15 2328 9 21 3.55System.Console.Haskeline.History 16 1 406 1 3 6.40System.Console.Haskeline.IO 11 15 185 19 6 4.53System.Console.Haskeline.InputT 65 15 1999 17 18 3.35System.Console.Haskeline.Internal 16 15 417 17 16 4.86System.Console.Haskeline.Key 26 15 647 3 2 5.61System.Console.Haskeline.LineState 67 2 710 3 6 5.61System.Console.Haskeline.Monads 12 0 21 0 0 4.84System.Console.Haskeline.Prefs 26 15 769 8 5 4.35System.Console.Haskeline.Recover 2 0 98 0 0 3.98System.Console.Haskeline.RunCommand 33 15 582 8 27 3.81System.Console.Haskeline.Term 44 15 628 7 4 3.82System.Console.Haskeline.Vi 367 15 9789 12 27 3.50 :42 Eddie Jones and Steven Ramsay
G.6 Package parallel-3.2.2.0
Name N K V D I Time (ms)Control.Parallel 2 0 0 0 0 2.98Control.Parallel.Strategies 87 1 769 2 18 3.47Control.Seq 21 0 190 0 0 3.74
G.7 Package pretty-1.1.3.6
Name N K V D I Time (ms)Text.PrettyPrint 0 0 0 0 0 3.09Text.PrettyPrint.Annotated 0 0 0 0 0 3.91Text.PrettyPrint.Annotated.HughesPJ 154 8 3148 3 16 4.23Text.PrettyPrint.Annotated.HughesPJClass 5 8 32 3 3 4.29Text.PrettyPrint.HughesPJ 58 8 472 4 7 5.23Text.PrettyPrint.HughesPJClass 5 8 23 4 3 3.11 ntensional Datatype Refinement 1:43
G.8 Package sbv-8.7.5
Name N K V D I Time (ms)Data.SBV 0 0 0 0 0 3.54Data.SBV.Char 48 44 709 43 6 3.11Data.SBV.Client 20 44 1007 43 5 3.38Data.SBV.Client.BaseIO 118 0 418 0 4 3.27Data.SBV.Compilers.C 262 44 12934 48 11 3.29Data.SBV.Compilers.CodeGen 67 44 1919 49 8 3.07Data.SBV.Control 1 2 5 1 3 3.75Data.SBV.Control.BaseIO 50 0 142 0 5 2.87Data.SBV.Control.Query 224 44 7932 43 20 3.52Data.SBV.Control.Types 7 31 283 1 2 5.38Data.SBV.Control.Utils 311 44 12922 44 33 4.02Data.SBV.Core.AlgReals 49 2 1444 2 4 4.11Data.SBV.Core.Concrete 55 14 2443 7 16 3.90Data.SBV.Core.Data 47 44 261 43 6 3.46Data.SBV.Core.Floating 69 44 1553 43 15 3.52Data.SBV.Core.Kind 16 14 724 1 3 3.46Data.SBV.Core.Model 229 44 3848 43 14 3.07Data.SBV.Core.Operations 284 44 9524 43 37 5.50Data.SBV.Core.Sized 53 44 701 43 7 3.72Data.SBV.Core.Symbolic 235 44 6695 43 46 3.84Data.SBV.Dynamic 17 44 274 44 9 3.59Data.SBV.Either 48 44 864 43 13 3.10Data.SBV.Internals 3 0 1 0 0 3.10Data.SBV.List 80 44 1184 43 12 3.54Data.SBV.Maybe 30 44 531 43 13 3.33Data.SBV.Provers.ABC 1 31 57 9 3 3.74Data.SBV.Provers.Boolector 1 31 71 9 3 3.58Data.SBV.Provers.CVC4 6 31 134 9 3 3.45Data.SBV.Provers.MathSAT 3 31 101 9 3 3.31Data.SBV.Provers.Prover 93 44 1374 48 11 3.46Data.SBV.Provers.Yices 1 31 50 9 3 3.36Data.SBV.Provers.Z3 2 31 89 9 3 3.07Data.SBV.RegExp 24 44 278 43 8 5.47Data.SBV.SMT.SMT 118 44 5193 42 8 3.55Data.SBV.SMT.SMTLib 9 44 785 14 43 3.33Data.SBV.SMT.SMTLib2 285 44 16859 13 36 3.32Data.SBV.SMT.SMTLibNames 1 0 379 0 0 5.25Data.SBV.SMT.Utils 33 31 563 9 2 3.31Data.SBV.Set 76 44 1481 43 10 3.88Data.SBV.String 53 44 1322 43 12 3.22 :44 Eddie Jones and Steven Ramsay
Data.SBV.Tools.BMC 11 44 549 44 8 3.20Data.SBV.Tools.BoundedFix 3 44 30 43 5 3.11Data.SBV.Tools.BoundedList 59 44 578 43 12 19.58Data.SBV.Tools.CodeGen 0 0 0 0 0 3.36Data.SBV.Tools.GenTest 104 44 5552 43 10 3.03Data.SBV.Tools.Induction 12 44 393 44 14 3.57Data.SBV.Tools.Overflow 77 44 1491 43 12 3.14Data.SBV.Tools.Polynomial 68 44 1586 43 12 3.11Data.SBV.Tools.Range 39 44 951 45 13 3.29Data.SBV.Tools.STree 19 44 488 44 20 3.39Data.SBV.Tools.WeakestPreconditions 86 44 2829 45 33 3.68Data.SBV.Trans 0 0 0 0 0 3.47Data.SBV.Trans.Control 1 2 4 1 3 3.16Data.SBV.Tuple 16 44 367 43 7 3.75Data.SBV.Utils.ExtractIO 0 0 0 0 0 4.96Data.SBV.Utils.Lib 35 3 1070 1 2 14.82Data.SBV.Utils.Numeric 62 0 353 0 0 4.65Data.SBV.Utils.PrettyNum 83 14 2387 6 9 3.78Data.SBV.Utils.SExpr 115 6 7919 3 22 3.67Data.SBV.Utils.TDiff 10 0 277 0 0 4.80Documentation.SBV.Examples.BitPrecise.BitTricks 18 44 379 43 5 3.64Documentation.SBV.Examples.BitPrecise.BrokenSearch 7 44 327 44 8 2.56Documentation.SBV.Examples.BitPrecise.Legato 72 44 1347 49 24 3.30Documentation.SBV.Examples.BitPrecise.MergeSort 19 44 428 49 9 3.05Documentation.SBV.Examples.BitPrecise.MultMask 3 44 142 44 5 3.34Documentation.SBV.Examples.BitPrecise.PrefixSum 17 44 269 44 4 3.57Documentation.SBV.Examples.CodeGeneration.AddSub 3 44 96 49 5 3.15Documentation.SBV.Examples.CodeGeneration.CRC_USB5 8 44 184 49 4 3.35Documentation.SBV.Examples.CodeGeneration.Fibonacci 6 44 216 49 11 3.35Documentation.SBV.Examples.CodeGeneration.GCD 7 44 192 49 11 3.66Documentation.SBV.Examples.CodeGeneration.PopulationCount 7 44 191 49 8 3.24Documentation.SBV.Examples.CodeGeneration.Uninterpreted 10 44 212 49 5 3.13Documentation.SBV.Examples.Crypto.AES 216 44 5018 49 32 3.62Documentation.SBV.Examples.Crypto.RC4 22 44 793 44 18 3.45Documentation.SBV.Examples.Crypto.SHA 108 44 5830 49 18 3.69Documentation.SBV.Examples.Existentials.CRCPolynomial 10 44 360 44 12 3.07Documentation.SBV.Examples.Existentials.Diophantine 26 44 682 44 8 5.39Documentation.SBV.Examples.Lists.BoundedMutex 19 44 1375 44 22 3.21Documentation.SBV.Examples.Lists.Fibonacci 4 44 225 44 5 3.36Documentation.SBV.Examples.Lists.Nested 1 44 444 44 4 3.63Documentation.SBV.Examples.Misc.Auxiliary 4 44 155 44 5 3.38Documentation.SBV.Examples.Misc.Enumerate 7 44 205 44 11 3.26Documentation.SBV.Examples.Misc.Floating 13 44 587 44 9 3.04 ntensional Datatype Refinement 1:45
Documentation.SBV.Examples.Misc.ModelExtract 3 44 141 44 4 3.77Documentation.SBV.Examples.Misc.Newtypes 3 44 95 44 7 3.07Documentation.SBV.Examples.Misc.NoDiv0 3 44 113 43 5 3.70Documentation.SBV.Examples.Misc.Polynomials 9 44 209 43 5 3.26Documentation.SBV.Examples.Misc.SetAlgebra 0 0 0 0 0 3.53Documentation.SBV.Examples.Misc.SoftConstrain 1 44 178 44 4 3.26Documentation.SBV.Examples.Misc.Tuple 16 44 337 44 5 4.12Documentation.SBV.Examples.Optimization.Enumerate 12 44 201 44 5 3.20Documentation.SBV.Examples.Optimization.ExtField 1 44 130 44 8 2.77Documentation.SBV.Examples.Optimization.LinearOpt 2 44 224 44 5 3.60Documentation.SBV.Examples.Optimization.Production 7 44 205 44 4 3.65Documentation.SBV.Examples.Optimization.VM 8 44 568 44 8 3.24Documentation.SBV.Examples.ProofTools.BMC 5 44 140 44 9 3.89Documentation.SBV.Examples.ProofTools.Fibonacci 11 44 297 44 10 3.12Documentation.SBV.Examples.ProofTools.Strengthen 11 44 456 44 17 2.97Documentation.SBV.Examples.ProofTools.Sum 9 44 207 44 10 3.17Documentation.SBV.Examples.Puzzles.Birthday 21 44 807 44 8 6.62Documentation.SBV.Examples.Puzzles.Coins 14 44 513 44 10 3.01Documentation.SBV.Examples.Puzzles.Counts 12 44 481 44 5 2.75Documentation.SBV.Examples.Puzzles.DogCatMouse 6 44 207 44 6 3.31Documentation.SBV.Examples.Puzzles.Euler185 11 44 609 44 4 3.53Documentation.SBV.Examples.Puzzles.Fish 48 44 1029 44 7 3.43Documentation.SBV.Examples.Puzzles.Garden 12 44 494 44 10 3.64Documentation.SBV.Examples.Puzzles.HexPuzzle 21 44 1187 45 15 3.97Documentation.SBV.Examples.Puzzles.LadyAndTigers 3 44 277 44 4 3.48Documentation.SBV.Examples.Puzzles.MagicSquare 19 44 571 44 6 3.44Documentation.SBV.Examples.Puzzles.NQueens 10 44 411 44 4 4.09Documentation.SBV.Examples.Puzzles.SendMoreMoney 4 44 361 44 4 2.87Documentation.SBV.Examples.Puzzles.Sudoku 42 44 3980 44 5 3.69Documentation.SBV.Examples.Puzzles.U2Bridge 56 44 1611 44 14 3.60Documentation.SBV.Examples.Queries.AllSat 7 44 455 44 6 3.27Documentation.SBV.Examples.Queries.CaseSplit 2 44 420 48 4 3.54Documentation.SBV.Examples.Queries.Concurrency 8 44 1485 44 9 3.36Documentation.SBV.Examples.Queries.Enums 8 44 234 44 12 3.55Documentation.SBV.Examples.Queries.FourFours 29 44 1156 46 21 3.48Documentation.SBV.Examples.Queries.GuessNumber 10 44 386 44 5 2.87Documentation.SBV.Examples.Queries.Interpolants 2 44 431 44 5 3.71Documentation.SBV.Examples.Queries.UnsatCore 2 44 248 44 4 3.25Documentation.SBV.Examples.Strings.RegexCrossword 16 44 784 44 10 3.56Documentation.SBV.Examples.Strings.SQLInjection 9 44 727 45 13 4.18Documentation.SBV.Examples.Transformers.SymbolicEval 29 44 852 46 8 6.51Documentation.SBV.Examples.Uninterpreted.AUF 6 44 219 45 7 2.80Documentation.SBV.Examples.Uninterpreted.Deduce 7 44 244 44 6 3.46 :46 Eddie Jones and Steven Ramsay
Documentation.SBV.Examples.Uninterpreted.Function 2 44 52 43 5 3.36Documentation.SBV.Examples.Uninterpreted.Multiply 5 44 215 44 9 3.24Documentation.SBV.Examples.Uninterpreted.Shannon 18 44 489 43 9 3.17Documentation.SBV.Examples.Uninterpreted.Sort 3 44 129 44 4 3.20Documentation.SBV.Examples.Uninterpreted.UISortAllSat 3 44 220 44 4 3.18Documentation.SBV.Examples.WeakestPreconditions.Append 10 44 538 47 6 3.19Documentation.SBV.Examples.WeakestPreconditions.Basics 8 44 204 47 13 3.73Documentation.SBV.Examples.WeakestPreconditions.Fib 13 44 573 47 7 3.22Documentation.SBV.Examples.WeakestPreconditions.GCD 17 44 604 47 7 2.71Documentation.SBV.Examples.WeakestPreconditions.IntDiv 12 44 447 47 13 3.65Documentation.SBV.Examples.WeakestPreconditions.IntSqrt 14 44 509 47 13 2.95Documentation.SBV.Examples.WeakestPreconditions.Length 11 44 305 47 8 2.75Documentation.SBV.Examples.WeakestPreconditions.Sum 9 44 369 47 13 3.59 ntensional Datatype Refinement 1:47
G.9 Package time-1.10
Name N K V D I Time (ms)Data.Format 33 3 711 1 2 3.85Data.Time 0 0 0 0 0 2.48Data.Time.Calendar 0 0 0 0 0 2.92Data.Time.Calendar.CalendarDiffDays 7 1 25 1 2 3.74Data.Time.Calendar.Days 3 0 12 0 2 3.85Data.Time.Calendar.Easter 7 1 102 1 3 2.62Data.Time.Calendar.Gregorian 35 1 283 1 4 3.55Data.Time.Calendar.Julian 35 1 283 1 4 2.94Data.Time.Calendar.JulianYearDay 11 1 109 1 1 3.24Data.Time.Calendar.MonthDay 9 0 220 0 0 2.76Data.Time.Calendar.OrdinalDate 26 1 371 1 2 2.62Data.Time.Calendar.Private 19 2 118 1 1 3.70Data.Time.Calendar.Week 1 7 14 1 2 2.58Data.Time.Calendar.WeekDate 16 1 233 1 2 2.85Data.Time.Clock 0 0 0 0 0 3.72Data.Time.Clock.Internal.AbsoluteTime 4 1 27 2 3 2.93Data.Time.Clock.Internal.CTimespec 8 1 283 1 4 4.02Data.Time.Clock.Internal.CTimeval 2 1 90 1 2 3.46Data.Time.Clock.Internal.DiffTime 3 0 10 0 1 3.74Data.Time.Clock.Internal.NominalDiffTime 3 0 7 0 1 2.84Data.Time.Clock.Internal.POSIXTime 1 0 1 0 1 2.29Data.Time.Clock.Internal.SystemTime 8 1 74 1 3 2.63Data.Time.Clock.Internal.UTCDiff 2 1 14 3 3 2.41Data.Time.Clock.Internal.UTCTime 2 1 6 1 2 3.83Data.Time.Clock.Internal.UniversalTime 1 0 3 0 1 3.74Data.Time.Clock.POSIX 6 1 62 3 3 2.97Data.Time.Clock.System 9 1 143 3 3 2.62Data.Time.Clock.TAI 8 1 167 3 9 2.73Data.Time.Format 0 0 0 0 0 2.85Data.Time.Format.Format.Class 24 2 614 3 7 4.04Data.Time.Format.Format.Instances 3 1 29 2 2 2.85Data.Time.Format.ISO8601 66 3 2137 6 10 2.81Data.Time.Format.Internal 0 0 0 0 0 2.90Data.Time.Format.Locale 11 1 605 1 2 2.63Data.Time.Format.Parse 14 1 285 2 2 2.92Data.Time.Format.Parse.Class 25 3 1375 2 6 2.99Data.Time.Format.Parse.Instances 17 1 506 2 2 2.81Data.Time.LocalTime 0 0 0 0 0 3.44Data.Time.LocalTime.Internal.CalendarDiffTime 5 1 29 2 3 2.97Data.Time.LocalTime.Internal.LocalTime 14 1 122 3 4 4.71Data.Time.LocalTime.Internal.TimeOfDay 24 1 294 1 3 2.83Data.Time.LocalTime.Internal.TimeZone 16 2 343 3 3 2.79Data.Time.LocalTime.Internal.ZonedTime 6 1 46 5 4 2.96 :48 Eddie Jones and Steven Ramsay
G.10 Package unordered-containers-0.2.11.0unordered-containers-0.2.11.0