[PDF] Data Flow Refinement Type Inference

Abstract

Refinement types enable lightweight verification of functional programs. Algorithms for statically inferring refinement types typically work by reduction to solving systems of constrained Horn clauses extracted from typing derivations. An example is Liquid type inference, which solves the extracted constraints using predicate abstraction. However, the reduction to constraint solving in itself already signifies an abstraction of the program semantics that affects the precision of the overall static analysis. To better understand this issue, we study the type inference problem in its entirety through the lens of abstract interpretation. We propose a new refinement type system that is parametric with the choice of the abstract domain of type refinements as well as the degree to which it tracks context-sensitive control flow information. We then derive an accompanying parametric inference algorithm as an abstract interpretation of a novel data flow semantics of functional programs. We further show that the type system is sound and complete with respect to the constructed abstract semantics. Our theoretical development reveals the key abstraction steps inherent in refinement type inference algorithms. The trade-off between precision and efficiency of these abstraction steps is controlled by the parameters of the type system. Existing refinement type systems and their respective inference algorithms, such as Liquid types, are captured by concrete parameter instantiations. We have implemented our framework in a prototype tool and evaluated it for a range of new parameter instantiations (e.g., using octagons and polyhedra for expressing type refinements). The tool compares favorably against other existing tools. Our evaluation indicates that our approach can be used to systematically construct new refinement type inference algorithms that are both robust and precise.

Full PDF

aa r X i v : . [ c s . P L ] N ov Data Flow Refinement Type Inference ∗ ZVONIMIR PAVLINOVIC,

New York University, USA and Google, USA

YUSEN SU,

New York University, USA and University of Waterloo, Canada

THOMAS WIES,

New York University, USAReﬁnement types enable lightweight veriﬁcation of functional programs. Algorithms for statically inferringreﬁnement types typically work by reduction to solving systems of constrained Horn clauses extracted fromtyping derivations. An example is Liquid type inference, which solves the extracted constraints using predi-cate abstraction. However, the reduction to constraint solving in itself already signiﬁes an abstraction of theprogram semantics that aﬀects the precision of the overall static analysis. To better understand this issue, westudy the type inference problem in its entirety through the lens of abstract interpretation. We propose a newreﬁnement type system that is parametric with the choice of the abstract domain of type reﬁnements as wellas the degree to which it tracks context-sensitive control ﬂow information. We then derive an accompany-ing parametric inference algorithm as an abstract interpretation of a novel data ﬂow semantics of functionalprograms. We further show that the type system is sound and complete with respect to the constructed ab-stract semantics. Our theoretical development reveals the key abstraction steps inherent in reﬁnement typeinference algorithms. The trade-oﬀ between precision and eﬃciency of these abstraction steps is controlledby the parameters of the type system. Existing reﬁnement type systems and their respective inference algo-rithms, such as Liquid types, are captured by concrete parameter instantiations. We have implemented ourframework in a prototype tool and evaluated it for a range of new parameter instantiations (e.g., using oc-tagons and polyhedra for expressing type reﬁnements). The tool compares favorably against other existingtools. Our evaluation indicates that our approach can be used to systematically construct new reﬁnementtype inference algorithms that are both robust and precise.CCS Concepts: •

Theory of computation → Program analysis ; Type theory .Additional Key Words and Phrases: reﬁnement type inference, abstract interpretation, Liquid types

Reﬁnement types are at the heart of static type systems that can check a range of safety prop-erties of functional programs [Champion et al. 2018a; Chugh et al. 2012; Dunﬁeld and Pfenning2003, 2004; Freeman and Pfenning 1991; Rondon et al. 2008; Vazou et al. 2014; Vekris et al. 2016;Xi and Pfenning 1999; Zhu and Jagannathan 2013]. Here, basic types are augmented with reﬁne-ment predicates that express relational dependencies between inputs and outputs of functions. Forexample, the contract of an array read operator can be expressed using the reﬁnement type get :: ( 𝑎 : 𝛼 array ) → ( 𝑖 : { 𝜈 : int | ≤ 𝜈 < length 𝑎 }) → 𝛼 . This type indicates that get is a function that takes an array 𝑎 over some element type 𝛼 and anindex 𝑖 as input and returns a value of type 𝛼 . The type { 𝜈 : int | ≤ 𝜈 < length 𝑎 } of the parame-ter 𝑖 reﬁnes the base type int to indicate that 𝑖 must be an index within the bounds of the array 𝑎 .This type can then be used to statically check the absence of erroneous array reads in a program.However, such a check will only succeed if the index expressions used in calls to get are alsoconstrained by appropriate reﬁnement types. Therefore, a number of type inference algorithmshave been proposed that relieve programmers of the burden to provide such type annotations ∗ This technical report is an extended version of [Pavlinovic et al. 2021].

Technical Report, echnical Report, Zvonimir Pavlinovic, Yusen Su, and Thomas Wies manually. These algorithms deploy a variety of analysis techniques ranging from predicate ab-straction [Rondon et al. 2008; Vazou et al. 2013, 2014] to interpolation [Unno and Kobayashi 2009;Zhu and Jagannathan 2013] and machine learning [Champion et al. 2018a; Zhu et al. 2015, 2016].However, a common intermediate step is that they reduce the inference problem to solving a sys-tem of constrained Horn clauses that is induced by a typing derivation for the program to be an-alyzed. However, this reduction already represents an abstraction of the program’s higher-ordercontrol ﬂow and aﬀects the precision of the overall static analysis.To better understand the interplay between the various abstraction steps underlying reﬁnementtype inference algorithms, this paper forgoes the usual reduction to constraints and instead studiesthe inference problem in its entirety through the lens of abstract interpretation [Cousot and Cousot1977, 1979]. We start by introducing a parametric data ﬂow reﬁnement type system . The typesystem generalizes from the speciﬁc choice of logical predicates by allowing for the use of ar-bitrary relational abstract domains as type reﬁnements (including e.g. octagons [Miné 2007], poly-hedra [Bagnara et al. 2008; Cousot and Halbwachs 1978; Singh et al. 2017] and automata-based do-mains [Arceri et al. 2019; Kim and Choe 2011]). Moreover, it is parametric in the degree to whichit tracks context-sensitive control ﬂow information. This is achieved through intersection func-tion types, where the granularity at which such intersections are considered is determined by howstacks are abstracted at function call sites. Next, we propose a novel concrete data ﬂow semanticsof functional programs that captures the program properties abstracted by reﬁnement type infer-ence algorithms. From this concrete semantics we then construct an abstract semantics througha series of Galois abstractions and show that the type system is sound and complete with respectto this abstract semantics. Finally, we combine the abstract semantics with an appropriate widen-ing strategy to obtain an accompanying parametric reﬁnement type inference algorithm that issound by construction. The resulting analysis framework enables the exploration of the broaderdesign space of reﬁnement type inference algorithms. Existing algorithms such as Liquid typeinference [Rondon et al. 2008] represent speciﬁc points in this design space.To demonstrate the versatility of our framework, we have implemented it in a veriﬁcation tooltargeting a subset of OCaml. We have evaluated this tool for a range of new parameter instantia-tions and compared it against existing veriﬁcation tools for functional programs. Our evaluationshows that the tool compares favorably against the state of the art. In particular, for the higher-order programs over integers and lists in our benchmark suite, the tool improves over existing toolsboth in terms of precision (more benchmarks solved) as well as robustness (no analysis timeouts).

To motivate our work, we provide an overview of common approaches to inferring reﬁnementtypes and discuss their limitations.

Consider the following deﬁnition of the Fibonacci function in OCaml: let rec fib x = if x >= 2 then fib (x - 1) + fib (x - 2) else 1

The typical reﬁnement type inference algorithm works as follows. First, the analysis performs astandard Hindley-Milner type inference to infer the basic shape of the reﬁnement type for everysubexpression of the program. For instance, the inferred type for the function fib is int → int .For every function type 𝜏 → 𝜏 , where 𝜏 is a base type such as int , the analysis next introducesa fresh dependency variable 𝑥 which stands for the function parameter, 𝑥 : 𝜏 → 𝜏 . The scope of 𝑥 is the result type 𝜏 , i.e., reﬁnement predicates inferred for 𝜏 can express dependencies on theinput value of type 𝜏 by referring to 𝑥 . Further, every base type 𝜏 is replaced by a reﬁnement type, ata Flow Refinement Type Inference Technical Report, { 𝜈 : 𝜏 | 𝜙 ( 𝜈, ® 𝑥 )} , with a placeholder reﬁnement predicate 𝜙 ( 𝜈, ® 𝑥 ) that expresses a relation betweenthe members 𝜈 of 𝜏 and the other variables ® 𝑥 in scope of the type. For example, the augmentedtype for function fib is 𝑥 : { 𝜈 : int | 𝜙 ( 𝜈 )} → { 𝜈 : int | 𝜙 ( 𝜈, 𝑥 )} . The algorithm then derives, either explicitly or implicitly, a system of Horn clauses modeling thesubtyping constraints imposed on the reﬁnement predicates by the program data ﬂow . For example,the body of fib induces the following Horn clauses over the reﬁnement predicates in fib ’s type: 𝜙 ( 𝑥 ) ∧ 𝑥 ≥ ∧ 𝜈 = 𝑥 − ⇒ 𝜙 ( 𝜈 ) ( ) 𝜙 ( 𝑥 ) ∧ 𝑥 ≥ ∧ 𝜈 = 𝑥 − ⇒ 𝜙 ( 𝜈 ) ( ) 𝜙 ( 𝑥 ) ∧ 𝑥 ≥ ∧ 𝜙 ( 𝜈 , 𝑥 − ) ∧ 𝜙 ( 𝜈 , 𝑥 − ) ∧ 𝜈 = 𝜈 + 𝜈 ⇒ 𝜙 ( 𝜈, 𝑥 ) ( ) 𝜙 ( 𝑥 ) ∧ 𝑥 < ∧ 𝜈 = ⇒ 𝜙 ( 𝜈, 𝑥 ) ( ) ≤ 𝜈 ⇒ 𝜙 ( 𝜈 ) ( ) Clauses (1) and (2) model the data ﬂow from x to the two recursive calls in the then branch of theconditional. Clauses (3) and (4) capture the constraints on the result value returned by fib in the then and else branch. Clause (5) captures an assumption that we make about the external calls to fib , namely, that these calls always pass non negative values. We note that the inference algorithmperforms a whole program analysis. Hence, when one analyzes a program fragment or individualfunction as in this case, one has to specify explicitly any assumptions made about the context.The analysis then solves the obtained Horn clauses to derive the reﬁnement predicates 𝜙 𝑖 . Forinstance, Liquid type inference uses monomial predicate abstraction for this purpose. That is, theanalysis assumes a given set of atomic predicates 𝑄 = { 𝑝 ( 𝜈, ® 𝑥 ) , . . . , 𝑝 𝑛 ( 𝜈, ® 𝑥 𝑛 )} , which are ei-ther provided by the programmer or derived from the program using heuristics, and then infersan assignment for each 𝜙 𝑖 to a conjunction over 𝑄 such that all Horn clauses are valid. Thiscan be done eﬀectively and eﬃciently using the Houdini algorithm [Flanagan and Leino 2001;Lahiri and Qadeer 2009]. For example, if we choose 𝑄 = { ≤ 𝜈, > 𝜈, 𝜈 < , 𝜈 ≥ } , thenthe ﬁnal type inferred for function fib will be: 𝑥 : { 𝜈 : int | ≤ 𝜈 } → { 𝜈 : int | ≤ 𝜈 } . The meaning of this type is tied to the current program, or in this case the assumptions made aboutthe context of the program fragment being analyzed. In particular, note that the analysis does notinfer a more general type that would leave the input parameter x of fib unconstrained. Now, suppose that the goal of the analysis is to verify that function fib is increasing, which canbe done by inferring a reﬁnement predicate 𝜙 ( 𝜈, 𝑥 ) for the return type of fib that implies 𝑥 ≤ 𝜈 .Note that 𝑥 ≤ 𝜈 itself is not inductive for the system of Horn clauses derived above because clause(3) does not hold for 𝑥 = , 𝑣 = , and 𝑣 = . However, if we strengthen 𝜙 ( 𝜈, 𝑥 ) to 𝑥 ≤ 𝜈 ∧ ≤ 𝜈 ,then it is inductive.One issue with using predicate abstraction for inferring type reﬁnements is that the analysisneeds to guess in advance which auxiliary predicates will be needed, here ≤ 𝜈 . Existing toolsbased on this approach, such as DSolve [Rondon et al. 2008] and

Liqid Haskell [Vazou et al.2018a], use heuristics for this purpose. However, these heuristics tend to be brittle. In fact, bothtools fail to verify that fib is increasing if the predicate ≤ 𝜈 is not explicitly provided by the user.Other tools such as R_Type [Champion et al. 2018a] are based on more complex analyses that usecounterexamples to inductiveness to automatically infer the necessary auxiliary predicates. How-ever, these tools no longer guarantee that the analysis terminates. Instead, our approach enablesthe use of expressive numerical abstract domains such as polyhedra to infer suﬃciently precise echnical Report, Zvonimir Pavlinovic, Yusen Su, and Thomas Wies let apply f x = f x and g y = 2 * y and h y = -2 * y let main z = let v = if 0 <= z then ( apply 𝑖 g) 𝑗 z else ( apply 𝑘 h) ℓ z in assert (0 <= v) Program 1 reﬁnement types in practice, without giving up on termination of the analysis or requiring userannotations (see § 8).If the goal is to improve precision, one may of course ask why it is necessary to develop a newreﬁnement type inference analysis from scratch. Is it not suﬃcient to improve the deployed Hornclause solvers, e.g. by using better abstract domains? Unfortunately, the answer is “no” [Unno et al.2013]. The derived Horn clause system already signiﬁes an abstraction of the program’s semanticsand, in general, entails an inherent loss of precision for the overall analysis.To motivate this issue, consider Program 1. You may ignore the program location labels 𝑖, 𝑗, 𝑘, ℓ for now. Suppose that the goal is to verify that the assert statement in the last line is safe. Thetemplates for the reﬁnement types of the top-level functions are as follows: apply :: ( 𝑦 : { 𝜈 : int | 𝜙 ( 𝜈 )} → { 𝜈 : int | 𝜙 ( 𝜈, 𝑦 )}) → 𝑥 : { 𝜈 : int | 𝜙 ( 𝜈 )} → { 𝜈 : int | 𝜙 ( 𝜈, 𝑥 )} g :: 𝑦 : { 𝜈 : int | 𝜙 ( 𝜈 )} → { 𝜈 : int | 𝜙 ( 𝜈, 𝑦 )} h :: 𝑦 : { 𝜈 : int | 𝜙 ( 𝜈 )} → { 𝜈 : int | 𝜙 ( 𝜈, 𝑦 )} Moreover, the key Horn clauses are: ≤ 𝑧 ∧ 𝜈 = 𝑧 ⇒ 𝜙 ( 𝜈 ) 𝜙 ( 𝑦 ) ∧ 𝜈 = 𝑦 ⇒ 𝜙 ( 𝜈, 𝑦 ) 𝜙 ( 𝑥 ) ⇒ 𝜙 ( 𝜈 ) ≤ 𝑧 ∧ 𝜙 ( 𝜈 ) ⇒ 𝜙 ( 𝜈 ) ≤ 𝑧 ∧ 𝜙 ( 𝜈, 𝑦 ) ⇒ 𝜙 ( 𝜈, 𝑦 ) 𝜙 ( 𝜈, 𝑥 ) ⇒ 𝜙 ( 𝜈, 𝑥 ) > 𝑧 ∧ 𝜈 = 𝑧 ⇒ 𝜙 ( 𝜈 ) 𝜙 ( 𝑦 ) ∧ 𝜈 = −( 𝑦 ) ⇒ 𝜙 ( 𝜈, 𝑦 ) > 𝑧 ∧ 𝜙 ( 𝜈 ) ⇒ 𝜙 ( 𝜈 ) > 𝑧 ∧ 𝜙 ( 𝜈, 𝑦 ) ⇒ 𝜙 ( 𝜈, 𝑦 ) Note that the least solution of these Horn clauses satisﬁes 𝜙 ( 𝜈 ) = 𝜙 ( 𝜈 ) = 𝜙 ( 𝜈 ) = 𝜙 ( 𝜈 ) = true and 𝜙 ( 𝜈, 𝑥 ) = 𝜙 ( 𝜈, 𝑥 ) = ( 𝜈 = 𝑥 ∨ 𝜈 = − 𝑥 ) . Speciﬁcally, 𝜙 ( 𝜈, 𝑥 ) is too weak to conclude that thetwo calls to apply on line 3 always return positive integers. Hence, any analysis based on derivinga solution to this Horn clause system will fail to infer reﬁnement predicates that are suﬃcientlystrong to entail the safety of the assertion in main . The problem is that the generated Horn clausesdo not distinguish between the two functions g and h that apply is called with. All existing re-ﬁnement type inference tools that follow this approach of generating a context-insensitive Hornclause abstraction of the program therefore fail to verify Program 1.To obtain a better understanding where existing reﬁnement type inference algorithms lose pre-cision, we take a fresh look at this problem through the lens of abstract interpretation. We introduce a few notations and basic deﬁnitions that we use throughout the paper.

Notation.

We often use meta-level let 𝑥 = 𝑡 in 𝑡 and conditional if 𝑡 then 𝑡 else 𝑡 constructsin mathematical deﬁnitions. We compress consecutive let bindings let 𝑥 = 𝑡 in . . . let 𝑥 𝑛 = 𝑡 𝑛 in 𝑡 as let 𝑥 = 𝑡 ; . . . ; 𝑥 𝑛 = 𝑡 𝑛 in 𝑡 . We use capital lambda notation ( Λ 𝑥 . . . . ) for deﬁningmathematical functions. For a function 𝑓 : 𝑋 → 𝑌 , 𝑥 ∈ 𝑋 , and 𝑦 ∈ 𝑌 , we write 𝑓 [ 𝑥 ↦→ 𝑦 ] todenote a function that maps 𝑥 to 𝑦 and otherwise agrees with 𝑓 on every element of 𝑋 \ { 𝑥 } . Weuse the notation 𝑓 .𝑥 : 𝑦 instead of 𝑓 [ 𝑥 ↦→ 𝑦 ] if 𝑓 is an environment mapping variables 𝑥 to theirbindings 𝑦 . For a set 𝑋 we denote its powerset by ℘ ( 𝑋 ) . For a relation 𝑅 ⊆ 𝑋 × 𝑌 over sets 𝑋 , 𝑌 and ata Flow Refinement Type Inference Technical Report, a natural number 𝑛 > , we use ¤ 𝑅 𝑛 to refer to the point-wise lifting of 𝑅 to a relation on 𝑛 -tuples 𝑋 𝑛 × 𝑌 𝑛 . That is h( 𝑥 , . . . , 𝑥 𝑛 ) , ( 𝑦 , . . . , 𝑦 𝑛 )i ∈ ¤ 𝑅 𝑛 iﬀ ( 𝑥 𝑖 , 𝑦 𝑖 ) ∈ 𝑅 for all ≤ 𝑖 ≤ 𝑛 . Similarly, for anynonempty set 𝑍 we denote by ¤ 𝑅 𝑍 the point-wise lifting of 𝑅 to a relation over ( 𝑍 → 𝑋 ) × ( 𝑍 → 𝑌 ) .More precisely, if 𝑓 : 𝑍 → 𝑋 and 𝑓 : 𝑍 → 𝑌 , then ( 𝑓 , 𝑓 ) ∈ ¤ 𝑅 𝑍 iﬀ ∀ 𝑧 ∈ 𝑍 . ( 𝑓 ( 𝑧 ) , 𝑓 ( 𝑧 )) ∈ 𝑅 .Typically, we drop the subscripts from these lifted relations when they are clear from the context.For sets 𝑋 , 𝑌 and a function 𝑑 : 𝑋 → ℘ ( 𝑌 ) , we use the notation Π 𝑥 ∈ 𝑋 .𝑑 ( 𝑥 ) to refer to the set { 𝑓 : 𝑋 → 𝑌 | ∀ 𝑥 ∈ 𝑋 . 𝑓 ( 𝑥 ) ∈ 𝑑 ( 𝑥 ) } of all dependent functions with respect to 𝑑 . Similarly, forgiven sets 𝑋 and 𝑌 we use the notation Σ 𝑥 ∈ 𝑋 .𝑑 ( 𝑥 ) to refer to the set { h 𝑥, 𝑦 i : 𝑋 × 𝑌 | 𝑦 ∈ 𝑑 ( 𝑥 ) } of all dependent pairs with respect to 𝑑 . We use the operators 𝜋 and 𝜋 to project to the ﬁrst, resp.,second pair component. Abstract interpretation.

A partially ordered set (poset) is a pair ( 𝐿, ⊑) consisting of a set 𝐿 anda binary relation ⊑ on 𝐿 that is reﬂexive, transitive, and antisymmetric. Let ( 𝐿 , ⊑ ) and ( 𝐿 , ⊑ ) be two posets. We say that two functions 𝛼 ∈ 𝐿 → 𝐿 and 𝛾 ∈ 𝐿 → 𝐿 form a Galois connection iﬀ ∀ 𝑥 ∈ 𝐿 , ∀ 𝑦 ∈ 𝐿 . 𝛼 ( 𝑥 ) ⊑ 𝑦 ⇐⇒ 𝑥 ⊑ 𝛾 ( 𝑦 ) . We call 𝐿 the concrete domain and 𝐿 the abstract domain of the Galois connection. Similarly, 𝛼 is called abstraction function (or left adjoint) and 𝛾 concretization function (or right adjoint).Intuitively, 𝛼 ( 𝑥 ) is the most precise approximation of 𝑥 ∈ 𝐿 in 𝐿 while 𝛾 ( 𝑦 ) is the least preciseelement of 𝐿 that can be approximated by 𝑦 ∈ 𝐿 .A complete lattice is a tuple h 𝐿, ⊑ , ⊥ , ⊤ , ⊔ , ⊓i where ( 𝐿, ⊑) is a poset such that for any 𝑋 ⊆ 𝐿 ,the least upper bound ⊔ 𝑋 (join) and greatest lower bound ⊓ 𝑋 (meet) with respect to ⊑ exist. Inparticular, we have ⊥ = ⊓ 𝐿 and ⊤ = ⊔ 𝐿 . We often identify a complete lattice with its carrier set 𝐿 .Let ( 𝐿 , ⊑ , ⊥ , ⊤ , ⊔ , ⊓ ) and ( 𝐿 , ⊑ , ⊥ , ⊤ , ⊔ , ⊓ ) be two complete lattices and let ( 𝛼, 𝛾 ) bea Galois connection between 𝐿 and 𝐿 . Each of these functions uniquely determines the other: 𝛼 ( 𝑥 ) = ⊓ { 𝑦 ∈ 𝐿 | 𝑥 ⊑ 𝛾 ( 𝑦 )} 𝛾 ( 𝑦 ) = ⊔ { 𝑥 ∈ 𝐿 | 𝛼 ( 𝑥 ) ⊑ 𝑦 } Also, 𝛼 is a complete join-morphism ∀ 𝑆 ⊆ 𝐿 . 𝛼 (⊔ 𝑆 ) = ⊔ { 𝛼 ( 𝑥 ) | 𝑥 ∈ 𝑆 } , 𝛼 (⊥ ) = ⊥ and 𝛾 is a complete meet-morphism ∀ 𝑆 ⊆ 𝐿 . 𝛾 (⊓ 𝑆 ) = ⊓ { 𝛾 ( 𝑥 ) | 𝑥 ∈ 𝑆 } , 𝛾 (⊤ ) = ⊤ . A similar result holdsin the other direction: if 𝛼 is a complete join-morphism and 𝛾 is deﬁned as above, then ( 𝛼, 𝛾 ) is aGalois connection between 𝐿 and 𝐿 . Likewise, if 𝛾 is a complete meet-morphism and 𝛼 is deﬁnedas above, then ( 𝛼, 𝛾 ) is a Galois connection [Cousot and Cousot 1979]. We now introduce our parametric data ﬂow reﬁnement type system. The purpose of this section isprimarily to build intuition. The remainder of the paper will then formally construct the type sys-tem as an abstract interpretation of our new data ﬂow semantics. As a reward, we will also obtaina parametric algorithm for inferring data ﬂow reﬁnement types that is sound by construction.

Language.

Our formal presentation considers a simple untyped lambda calculus: 𝑒 ∈ Exp :: = 𝑐 | 𝑥 | 𝑒 𝑒 | 𝜆𝑥 . 𝑒 The language supports constants 𝑐 ∈ Cons (e.g. integers and Booleans), variables 𝑥 ∈ Var , lambdaabstractions, and function applications. An expression 𝑒 is closed if all using occurrences of vari-ables within 𝑒 are bound in lambda abstractions 𝜆𝑥 . 𝑒 . A program is a closed expression.Let 𝑒 be an expression. Each subexpression of 𝑒 is uniquely annotated with a location drawnfrom the set 𝐿𝑜𝑐 . We denote locations by ℓ, 𝑖, 𝑗 and use subscript notation to indicate the loca-tion identifying a (sub)expression as in ( 𝑒 𝑖 𝑒 𝑗 ) ℓ and ( 𝜆𝑥 .𝑒 𝑖 ) ℓ . The location annotations are omittedwhenever possible to avoid notational clutter. Variables are also locations, i.e. Var ⊆ 𝐿𝑜𝑐 . echnical Report, Zvonimir Pavlinovic, Yusen Su, and Thomas Wies In our example programs we often use let constructs. Note that these can be expressed usinglambda abstraction and function application as usual: let 𝑥 = 𝑒 in 𝑒 def = ( 𝜆𝑥 . 𝑒 ) 𝑒 . Types.

Our data ﬂow reﬁnement type system takes two parameters: (1) a ﬁnite set of abstractstacks ˆ S , and (2) a (possibly inﬁnite) complete lattice of basic reﬁnement types hR t , ⊑ 𝑏 , ⊥ 𝑏 , ⊤ 𝑏 , ⊔ 𝑏 , ⊓ 𝑏 i .We will discuss the purpose of abstract stacks in a moment. A basic reﬁnement type b ∈ R t comesequipped with an implicit scope 𝑋 ⊆ Var . Intuitively, b represents a relation between primitiveconstant values (e.g. integers) and other values bound to the variables in 𝑋 . The partial order ⊑ 𝑏 isan abstraction of subset inclusion on such relations. We will make this intuition formally preciselater. For 𝑋 ⊆ Var , we denote by R t 𝑋 the set of all basic reﬁnement types with scope 𝑋 . Example 4.1.

Let 𝜙 ( 𝑋 ) stand for a convex linear constraint over (integer) variables in 𝑋 ∪ { 𝜈 } .Then deﬁne the set of basic reﬁnement types R lia 𝑋 :: = ⊥ 𝑏 | ⊤ 𝑏 | { 𝜈 : int | 𝜙 ( 𝑋 )} An example of a basic type in R lia with scope { 𝑥 } is { 𝜈 : int | 𝑥 ≤ 𝜈 ∧ ≤ 𝜈 } . The order ⊑ 𝑏 on R lia is obtained by lifting the entailment ordering on linear constraints to R lia in the expected way. Ifwe identify linear constraints up to equivalence, then ⊑ 𝑏 induces a complete lattice on R lia .The basic reﬁnement types are extended to data ﬂow reﬁnement types as follows: 𝑡 ∈ V t 𝑋 :: = ⊥ t | ⊤ t | b | 𝑥 : 𝒕 b ∈ R t 𝑋 𝑥 : 𝒕 ∈ T t 𝑋 def = Σ 𝑥 ∈ Var \ 𝑋 . ˆ S → V t 𝑋 × V t 𝑋 ∪{ 𝑥 } A data ﬂow reﬁnement type 𝑡 ∈ V t 𝑋 also has an implicit scope 𝑋 . We denote by V t = Ð 𝑋 ⊆ Var V t 𝑋 the set of all such types for all scopes. There are four kinds of types. First, the type ⊥ t should beinterpreted as unreachable or nontermination and the type ⊤ t stands for a type error . We introducethese types explicitly so that we can later endow V t with a partial order to form a completelattice. In addition to ⊥ t and ⊤ t , we have basic reﬁnement types b and (dependent) function types 𝑥 : 𝒕 . The latter are pairs consisting of a dependency variable 𝑥 ∈ Var \ 𝑋 and a type table 𝒕 that maps abstract stack ˆ 𝑆 ∈ ˆ S to pairs of types 𝒕 ( ˆ 𝑆 ) = h 𝑡 𝑖 , 𝑡 𝑜 i . That is, 𝑥 : 𝒕 can be understoodas capturing a separate dependent function type 𝑥 : 𝑡 𝑖 → 𝑡 𝑜 per abstract stack ˆ 𝑆 . Abstract stacksrepresent abstractions of concrete call stacks, enabling function types to case split on diﬀerentcalling contexts of the represented functions. In this sense, function types resemble intersectiontypes with ad hoc polymorphism on contexts. Note that the scope of the output type 𝑡 𝑜 includesthe dependency variable 𝑥 , thus enabling the output type to capture input/output dependencies.Let 𝑡 = 𝑥 : 𝒕 be a function type and ˆ 𝑆 ∈ ˆ S such that 𝒕 ( ˆ 𝑆 ) = h 𝑡 𝑖 , 𝑡 𝑜 i for some 𝑡 𝑖 and 𝑡 𝑜 . We denote 𝑡 𝑖 by 𝒕 ( ˆ 𝑆 ) in and 𝑡 𝑜 by 𝒕 ( ˆ 𝑆 ) out . We say that 𝑡 has been called at ˆ 𝑆 , denoted ˆ 𝑆 ∈ 𝑡 , if 𝒕 ( ˆ 𝑆 ) in is not ⊥ t .We denote by 𝒕 ⊥ the empty type table that maps every abstract stack to the pair h⊥ t , ⊥ t i and write [ ˆ 𝑆 ⊳ 𝑡 𝑖 → 𝑡 𝑜 ] as a short hand for the singleton type table 𝒕 ⊥ [ ˆ 𝑆 ↦→ h 𝑡 𝑖 , 𝑡 𝑜 i] . We extend this notationto tables obtained by explicit enumeration of table entries. Finally, we denote by 𝑡 | ˆ 𝑆 the functiontype 𝑥 : [ ˆ 𝑆 ⊳ 𝑡 𝑖 → 𝑡 𝑜 ] obtained from 𝑡 by restricting it to the singleton table for ˆ 𝑆 . Example 4.2.

Consider again the set R lia from Example 4.1. In what follows, we write int for { 𝜈 : int | true } and { 𝜈 | 𝜙 ( 𝑋 )} for { 𝜈 : int | 𝜙 ( 𝑋 )} . Let further ˆ S = { 𝜖 } be a trivial set of abstractstacks. We then instantiate V t with R lia and ˆ S . Since tables only have a single entry, we use themore familiar notation 𝑥 : 𝑡 𝑖 → 𝑡 𝑜 for function types, instead of 𝑥 : [ 𝜖 ⊳ 𝑡 𝑖 → 𝑡 𝑜 ] . We can thenrepresent the type of fib in § 2.1 by the data ﬂow reﬁnement type 𝑥 : { 𝜈 | ≤ 𝜈 } → { 𝜈 | 𝑥 ≤ 𝜈 ∧ ≤ 𝜈 } . ata Flow Refinement Type Inference Technical Report, The most precise type that we can infer for function apply in Program 1 is 𝑓 : ( 𝑦 : int → int ) → 𝑥 : int → int . Here, the variables 𝑓 , 𝑥 , and 𝑦 refer to the corresponding parameters of the function apply , respec-tively, the functions g and h that are passed to parameter 𝑓 of apply . Now, deﬁne the set of abstractstacks ˆ S = 𝐿𝑜𝑐 ∪ { 𝜖 } . Intuitively, we can use the elements of ˆ S to distinguish table entries infunction types based on the program locations where the represented functions are called. Instan-tiating our set of types V t with R lia and ˆ S , we can infer a more precise type for function apply in Program 1: 𝑓 : [ 𝑖 ⊳ 𝑦 : [ 𝑗 ⊳ { 𝜈 | ≤ 𝜈 } → { 𝜈 | 𝜈 = 𝑦 }] → 𝑥 : [ 𝑗 ⊳ { 𝜈 | ≤ 𝜈 } → { 𝜈 | 𝜈 = 𝑥 }] ,𝑘 ⊳ 𝑦 : [ ℓ ⊳ { 𝜈 | > 𝜈 } → { 𝜈 | 𝜈 = − 𝑦 }] → 𝑥 : [ ℓ ⊳ { 𝜈 | > 𝜈 } → { 𝜈 | 𝜈 = − 𝑥 }] . Note that the type provides suﬃcient information to distinguish between the calls to apply withfunctions g and h at call sites 𝑖 and 𝑘 , respectively. This information is suﬃcient to guarantee thecorrectness of the assertion on line 4. Typing environments and operations on types.

Before we can deﬁne the typing rules, weﬁrst need to introduce a few operations for constructing and manipulating types. We here onlyprovide the intuition for these operations through examples. In § 7 we will then explain how todeﬁne these operations in terms of a few simple primitive operations provided by the domain R t .First, the (basic) reﬁnement type abstracting a single constant 𝑐 ∈ Cons is denoted by [ 𝜈 = 𝑐 ] t .For instance, for our basic type domain R lia from Example 4.1 and an integer constant 𝑐 ∈ Z , wedeﬁne [ 𝜈 = 𝑐 ] t = { 𝜈 | 𝜈 = 𝑐 } . Next, given a type 𝑡 over scope 𝑋 and a type 𝑡 ′ over scope 𝑋 ′ ⊆ 𝑋 \{ 𝑥 } ,we denote by 𝑡 [ 𝑥 ← 𝑡 ′ ] the type obtained from 𝑡 by strengthening the relation to 𝑥 with theinformation provided by type 𝑡 ′ . Returning to Example 4.1, for two basic types b = { 𝜈 | 𝜙 ( 𝑋 )} and b ′ = { 𝜈 | 𝜙 ′ ( 𝑋 ′ )} , we have b [ 𝑥 ← b ′ ] = { 𝜈 | 𝜙 ( 𝑋 ) ∧ 𝜙 ′ ( 𝑋 ′ ) [ 𝑥 / 𝜈 ]} . Finally, for a variable 𝑥 ∈ 𝑋 and 𝑡 ∈ V t 𝑋 , we assume an operation 𝑡 [ 𝜈 = 𝑥 ] t that strengthens 𝑡 by enforcing equality betweenthe value bound to 𝑥 and the value represented by 𝑡 . For instance, for a base type b = { 𝜈 | 𝜙 ( 𝑋 )} from Example 4.1 we have b [ 𝜈 = 𝑥 ] t = { 𝜈 | 𝜙 ( 𝑋 ) ∧ 𝜈 = 𝑥 } .A typing environment for scope 𝑋 is a function Γ t ∈ ( Π 𝑥 ∈ 𝑋 . V t 𝑋 \{ 𝑥 } ) . We lift 𝑡 [ 𝑥 ← 𝑡 ′ ] toan operation 𝑡 [ Γ t ] that strengthens 𝑡 with the constraints on variables in 𝑋 imposed by the typesbound to these variables in Γ t .We additionally assume that abstract stacks ˆ S come equipped with an abstract concatenationoperation ˆ · : 𝐿𝑜𝑐 × ˆ S → ˆ S that prepends a call site location 𝑖 onto an abstract stack ˆ 𝑆 , denoted 𝑖 ˆ · ˆ 𝑆 .For instance, consider again the sets of abstract stacks ˆ S and ˆ S introduced in Example 4.2. Wedeﬁne ˆ · on ˆ 𝑆 ∈ ˆ S as 𝑖 ˆ · ˆ 𝑆 = 𝜖 and on ˆ 𝑆 ∈ ˆ S we deﬁne it as 𝑖 ˆ · ˆ 𝑆 = 𝑖 . The general speciﬁcation of ˆ · is given in § 6.2. Typing rules.

Typing judgements take the form Γ t , ˆ 𝑆 ⊢ 𝑒 : 𝑡 and rely on the subtype relation 𝑡 < : 𝑡 ′ deﬁned in Fig.1. The rule s-bot states that ⊥ t is a subtype of all other types except ⊤ t .Since ⊤ t denotes a type error, the rules must ensure that we do not have 𝑡 < : ⊤ for any type 𝑡 .The rule s-base deﬁnes subtyping on basic reﬁnement types, which simply defers to the partialorder ⊑ 𝑏 on R t . Finally, the rule s-fun is reminiscent of the familiar contravariant subtyping rulefor dependent function types, except that it quantiﬁes over all entries in the type tables. In § 7.2,we will establish a formal correspondence between subtyping and the propagation of values alongdata ﬂow paths.The rules deﬁning the typing relation Γ t , ˆ 𝑆 ⊢ 𝑒 : 𝑡 are shown in Fig. 2. We implicitly requirethat all free variables of 𝑒 are in the scope of Γ t . The rule t-const for typing constants requires echnical Report, Zvonimir Pavlinovic, Yusen Su, and Thomas Wies s-bot 𝑡 ≠ ⊤ t ⊥ t < : 𝑡 s-base b ⊑ 𝑏 b b < : b s-fun 𝒕 ( ˆ 𝑆 ) = h 𝑡 𝑖 , 𝑡 𝑜 i 𝒕 ( ˆ 𝑆 ) = h 𝑡 𝑖 , 𝑡 𝑜 i 𝑡 𝑖 < : 𝑡 𝑖 𝑡 𝑜 [ 𝑥 ← 𝑡 𝑖 ] < : 𝑡 𝑜 [ 𝑥 ← 𝑡 𝑖 ] 𝑥 : 𝒕 < : 𝑥 : 𝒕 ∀ ˆ 𝑆 ∈ ˆ S Fig. 1. Data flow refinement subtyping rules t-var Γ t ( 𝑥 ) [ 𝜈 = 𝑥 ] t [ Γ t ] < : 𝑡 [ 𝜈 = 𝑥 ] t [ Γ t ] Γ t , ˆ 𝑆 ⊢ 𝑥 : 𝑡 t-app Γ t , ˆ 𝑆 ⊢ 𝑒 𝑖 : 𝑡 𝑖 Γ t , ˆ 𝑆 ⊢ 𝑒 𝑗 : 𝑡 𝑗 𝑡 𝑖 < : 𝑥 : [ 𝑖 ˆ · ˆ 𝑆 ⊳ 𝑡 𝑗 → 𝑡 ] Γ t , ˆ 𝑆 ⊢ 𝑒 𝑖 𝑒 𝑗 : 𝑡 t-const [ 𝜈 = 𝑐 ] t [ Γ t ] < : 𝑡 Γ t , ˆ 𝑆 ⊢ 𝑐 : 𝑡 t-abs Γ t 𝑖 = Γ t .𝑥 : 𝑡 𝑥 Γ t 𝑖 , ˆ 𝑆 ′ ⊢ 𝑒 𝑖 : 𝑡 𝑖 𝑥 : [ ˆ 𝑆 ′ ⊳ 𝑡 𝑥 → 𝑡 𝑖 ] < : 𝑡 | ˆ 𝑆 ′ Γ t , ˆ 𝑆 ⊢ 𝜆𝑥 . 𝑒 𝑖 : 𝑡 ∀ ˆ 𝑆 ′ ∈ 𝑡 Fig. 2. Data flow refinement typing rules that [ 𝜈 = 𝑐 ] t is a subtype of the type 𝑡 , after strengthening it with all constraints on the variablesin scope obtained from Γ t . That is, we push all environmental assumptions into the types. Thisway, subtyping can be deﬁned without tracking explicit typing environments. We note that thisformalization does not preclude an implementation of basic reﬁnement types that tracks typingenvironments explicitly. The rule t-var for typing variable expressions 𝑥 is similar to the rule t-const . That is, we require that the type Γ t ( 𝑥 ) bound to 𝑥 is a subtype of 𝑡 , modulo strengtheningwith the equality constraint 𝜈 = 𝑥 and all environmental constraints. Note that we here strengthenboth sides of the subtype relation, which is necessary for completeness of the rules, due to thebidirectional nature of subtyping for function types.The rule t-app for typing function applications 𝑒 𝑖 𝑒 𝑗 requires that the type 𝑡 𝑖 of 𝑒 𝑖 must be asubtype of the function type 𝑥 : [ 𝑖 ˆ · ˆ 𝑆 : 𝑡 𝑗 → 𝑡 ] where 𝑡 𝑗 is the type of the argument expression 𝑒 𝑗 and 𝑡 is the result type of the function application. Note that the rule extends the abstractstack ˆ 𝑆 with the call site location 𝑖 identifying 𝑒 𝑖 . The subtype relation then forces 𝑡 𝑖 to have anappropriate entry for the abstract call stack 𝑖 ˆ · ˆ 𝑆 . The rule t-abs for typing lambda abstraction isas usual, except that it universally quantiﬁes over all abstract stacks ˆ 𝑆 ′ at which 𝑡 has been called.The side condition ∀ ˆ 𝑆 ′ ∈ 𝑡 implicitly constrains 𝑡 to be a function type. Our goal is to construct our type system from a concrete semantics of functional programs, fol-lowing the usual calculational approach taken in abstract interpretation. This imposes some con-straints on our development. In particular, given that the typing rules are deﬁned by structuralrecursion over the syntax of the evaluated expression, the same should be true for the concrete se-mantics that we take as the starting point of our construction. This requirement rules out standardoperational semantics for higher-order programs (e.g. based on function closures) because theyevaluate function bodies at call sites rather than deﬁnition sites, making these semantics nonstruc-tural. A more natural choice is a standard denotational semantics (e.g. one that interprets lambdaterms by mathematical functions). However, the problem with denotational semantics is that it isinherently compositional; functions are given meaning irrespective of the context in which theyappear. However, as we have discussed in § 2, the function types inferred by algorithms such as ata Flow Refinement Type Inference Technical Report, Liquid types only consider the inputs to functions that are observed in the program under con-sideration. Denotational semantics are therefore ill-suited for capturing the program propertiesabstracted by these type systems.Hence, we introduce a new data ﬂow reﬁnement type semantics . Like standard denotational se-mantics, it is fully structural in the program syntax but without being compositional. That is, itcaptures the intuition behind reﬁnement type inference algorithms that view a function value asa table that records all inputs the function will receive from this point onwards as it continues toﬂow through the program.

We start with the semantic domains used for giving meaning to expressions 𝑒 ∈ Exp : 𝑛 ∈ N def = N 𝑒 ∪ N 𝑥 N 𝑒 def = 𝐿𝑜𝑐 × E N 𝑥 def = Var × E × S 𝑆 ∈ S def = 𝐿𝑜𝑐 ∗ 𝐸 ∈ E def = Var ⇀ ﬁn N 𝑥 𝑀 ∈ M def = N → V 𝑣 ∈ V :: = ⊥ | ⊤ | 𝑐 | 𝒗 𝒗 ∈ T def = S → V × V

Nodes, stacks, and environments.

Every intermediate point of a program’s execution is uniquelyidentiﬁed by an (execution) node , 𝑛 ∈ N , a concept which we adapt from [Jagannathan and Weeks1995]. We distinguish expression nodes N 𝑒 and variable nodes N 𝑥 . An expression node h ℓ, 𝐸 i , de-noted ℓ ⋄ 𝐸 , captures the execution point where a subexpression 𝑒 ℓ is evaluated in the environment 𝐸 . An environment 𝐸 is a (ﬁnite) partial map binding variables to variable nodes. A variable node h 𝑥, 𝐸, 𝑆 i , denoted 𝑥 ⋄ 𝐸 ⋄ 𝑆 , is created at each execution point where an argument value is bound toa formal parameter 𝑥 of a function at a call site. Here, 𝐸 is the environment at the point wherethe function is deﬁned and 𝑆 is the call site stack of the variable node. Note that we deﬁne nodesand environments using mutual recursion where the base cases are deﬁned using the empty envi-ronment 𝜖 . The call site stack captures the sequence of program locations of all pending functioncalls before this variable node was created. That is, intuitively, 𝑆 can be thought of as recordingthe return addresses of these pending calls. We write ℓ · 𝑆 to denote the stack obtained from 𝑆 byprepending ℓ . Call site stacks are used to uniquely identify each variable binding.We explain the role of expression and variable nodes in more detail later. For any node 𝑛 , wedenote by 𝑙𝑜𝑐 ( 𝑛 ) the location of 𝑛 and by 𝑒𝑛𝑣 ( 𝑛 ) its environment. If 𝑛 is a variable node, we denoteits stack component by 𝑠𝑡𝑎𝑐𝑘 ( 𝑛 ) . A pair h 𝑒, 𝐸 i is called well-formed if 𝐸 ( 𝑥 ) is deﬁned for everyvariable 𝑥 that occurs free in 𝑒 . Values and execution maps.

Similar to our data ﬂow reﬁnement types, there are four kindsof (data ﬂow) values 𝑣 ∈ V . First, every constant 𝑐 is also a value. The value ⊥ stands for non-termination or unreachability of a node, and the value ⊤ models execution errors. Functions arerepresented by tables . A table 𝒗 maintains an input/output value pair 𝑇 ( 𝑆 ) = h 𝑣 𝑖 , 𝑣 𝑜 i for each callsite stack 𝑆 . We adapt similar notation for concrete function tables as we introduced for type tablesin § 4. In particular, we denote by 𝒗 ⊥ the table that maps every call site stack to the pair h⊥ , ⊥i .We say that a value 𝑣 is safe, denoted safe ( 𝑣 ) , if ⊤ does not occur anywhere in 𝑣 , i.e., 𝑣 ≠ ⊤ and if 𝑣 ∈ T then for all 𝑆 , both 𝑣 ( 𝑆 ) in and 𝑣 ( 𝑆 ) out are safe.The data ﬂow semantics computes execution maps 𝑀 ∈ M , which map nodes to values. Wewrite 𝑀 ⊥ ( 𝑀 ⊤ ) for the execution map that assigns ⊥ ( ⊤ ) to every node.As a precursor to deﬁning the data ﬂow semantics, we deﬁne a computational order ⊑ on values.In this order, ⊥ is the smallest element, ⊤ is the largest element, and tables are ordered recursively referred to as contour in [Jagannathan and Weeks 1995] 9 echnical Report, Zvonimir Pavlinovic, Yusen Su, and Thomas Wies let id x = x 𝑘 in let u = (id 𝑞 𝑓 ) 𝑔 in (id 𝑎 𝑏 ) 𝑐 (( 𝜆 id . ( 𝜆 u . ( id 𝑎 𝑏 ) 𝑐 ) 𝑑 ( id 𝑞 𝑓 ) 𝑔 ) ℎ ( 𝜆𝑥. 𝑥 𝑘 ) 𝑜 ) 𝑝 ℎ ↦→ [ ℎ ⊳ [ 𝑞 ⊳ → , 𝑎 ⊳ → ] → ] id ,𝑜 ↦→ [ 𝑞 ⊳ → , 𝑎 ⊳ → ] 𝑑 ↦→ [ 𝑑 ⊳ → ] 𝑞 ↦→ [ 𝑞 ⊳ → ] 𝑎 ↦→ [ 𝑎 ⊳ → ] u , 𝑓 ,𝑔, x 𝑞 , 𝑘 𝑞 ↦→ 𝑏,𝑐,𝑑, ℎ, x 𝑎 , 𝑘 𝑎 , 𝑝 ↦→ Program 2. The right side shows the program’s execution map ( ) _ ↦→ ⊥ ( ) ℎ ↦→ [ ℎ ⊳ 𝒗 ⊥ → ⊥] 𝑜 ↦→ 𝒗 ⊥ ( ) id ↦→ 𝒗 ⊥ 𝑑 ↦→ 𝒗 ⊥ ( ) 𝑞 ↦→ [ 𝑞 ⊳ → ⊥] 𝑓 ↦→ ( ) id ↦→ [ 𝑞 ⊳ → ⊥] ℎ ↦→ [ ℎ ⊳ [ 𝑞 ⊳ → ⊥] → ⊥] 𝑜 ↦→ [ 𝑞 ⊳ → ⊥] ( ) x 𝑞 ↦→ ( ) 𝑘 𝑞 ↦→ 𝑜 ↦→ [ 𝑞 ⊳ → ] ℎ ↦→ [ ℎ ⊳ [ 𝑞 ⊳ → ] → ⊥] ( ) id ↦→ [ 𝑞 ⊳ → ]( ) 𝑞 ↦→ [ 𝑞 ⊳ → ] 𝑔 ↦→ 𝑑 ↦→ [ 𝑑 ⊳ → ⊥] ( ) u ↦→ 𝑏 ↦→ 𝑎 ↦→ [ 𝑎 ⊳ → ⊥] ( ) id ↦→ [ 𝑞 ⊳ → , 𝑎 ⊳ → ⊥] ℎ ↦→ [ ℎ ⊳ [ 𝑞 ⊳ → , 𝑎 ⊳ → ⊥] → ⊥] 𝑜 ↦→ [ 𝑞 ⊳ → , 𝑎 ⊳ → ⊥] ( ) x 𝑎 ↦→ ( ) 𝑘 𝑎 ↦→ 𝑜 ↦→ [ 𝑞 ⊳ → , 𝑎 ⊳ → ] ℎ ↦→ [ ℎ ⊳ [ . . . , 𝑎 ⊳ → ] → ⊥] ( ) id ↦→ [ 𝑞 ⊳ → , 𝑎 ⊳ → ] ( ) 𝑐, 𝑔, 𝑝 ↦→ 𝑎 ↦→ [ 𝑎 ⊳ → ] 𝑑 ↦→ [ 𝑑 ⊳ → ] ℎ ↦→ [ ℎ ⊳ [ . . . ] → ] Program 3. Fixpoint iterates of the data flow semantics for Program 2 on the pairs of values for each call site: 𝑣 ⊑ 𝑣 def ⇐⇒ 𝑣 = ⊥ ∨ 𝑣 = ⊤ ∨ ( 𝑣 , 𝑣 ∈ Cons ∧ 𝑣 = 𝑣 ) ∨ ( 𝑣 , 𝑣 ∈ T ∧ ∀ 𝑆. 𝑣 ( 𝑆 ) ¤⊑ 𝑣 ( 𝑆 )) Deﬁning the error value as the largest element of the order is nonessential but simpliﬁes the pre-sentation. This deﬁnition ensures that the least upper bounds (lub) of arbitrary sets of values exist,denoted by the join operator ⊔ . In fact, ⊑ is a partial order that induces a complete lattice on valueswhich can be lifted point-wise to execution maps. Example 5.1.

Let us provide intuition for the execution maps through an example. To this end,consider Program 2. The middle shows the corresponding expression in our simple language witheach subexpression annotated with its unique location. E.g., the using occurrence of x on line 1 isannotated with location 𝑘 .The program’s execution map is given on the right. We abbreviate call stacks occurring in tableentries by the last location pushed onto the stack (e.g. writing just 𝑎 instead of 𝑎𝑑ℎ ). This simpliﬁ-cation preserves the uniqueness of call stacks for this speciﬁc program. We similarly denote a nodejust by its location if this already uniquely identiﬁes the node. During the execution of Program 2,the lambda abstraction at location 𝑜 is called twice whereas all other functions are called once.Due to the two calls to the function ( id ) at 𝑜 , the variable x is bound twice and the subexpressionat location 𝑘 is also evaluated two times. This is reﬂected in the execution map by entries for twodistinct variable nodes associated with x and two distinct expression nodes associated with 𝑘 . Weuse superscript notation to indicate the environment associated with these nodes. For instance, 𝑘 𝑞 is the expression node that records the value obtained from the subexpression at location 𝑘 whenit is evaluated in the environment binding x to the variable node x 𝑞 . In turn, x 𝑞 records the bindingof x for the function call at location 𝑞 . ata Flow Refinement Type Inference Technical Report, The cumulative entry u , 𝑓 , 𝑔, x 𝑞 , 𝑘 𝑞 ↦→ in the execution map indicates that the result computedat nodes u , 𝑓 , 𝑔 , and 𝑘 𝑞 is , respectively, that 𝑥 is bound to for the call at location 𝑞 . Some of thenodes are mapped to function values. The function id is represented by the table [ 𝑞 ⊳ → , 𝑎 ⊳ → ] that stores input-output values for the two calls to id at 𝑞 and 𝑎 . The nodes corresponding to thetwo usages of id are also mapped to tables. However, these tables only have a single entry each.Intuitively, id takes two separate data ﬂow paths in the program starting from its deﬁnition at 𝑜 .For each node on these two paths, the associated table captures how id will be used at the nodeson any data ﬂow path that continues from that point onward. The tables stored at nodes 𝑞 and 𝑎 thus only contain information about the input and output for the respective call site whereas thetable stored at id captures both call sites because the node for the deﬁnition of id occurs on bothpaths.Some additional examples involving higher-order functions and recursion can be found in § A . We deﬁne the data ﬂow semantics of a higher-order program 𝑒 formally as the least ﬁxpoint of aconcrete transformer, step , on execution maps. Concrete Transformer.

The idea is that we start with the map 𝑀 ⊥ and then use step to consecu-tively update the map with new values as more and more nodes are reached during the executionof 𝑒 . The signature of step is as follows: step : Exp → E × S → M → V × M

It takes an expression 𝑒 , an environment 𝐸 , and a stack 𝑆 , and returns a transformer step J 𝑒 ℓ K ( 𝐸, 𝑆 ) : M → V × M on execution maps. Given an input map 𝑀 , the transformer step J 𝑒 ℓ K ( 𝐸, 𝑆 ) returnsthe new value 𝑣 ′ computed at node ℓ ⋄ 𝐸 together with the updated execution map 𝑀 ′ . That is,we always have 𝑀 ′ ( ℓ ⋄ 𝐸 ) = 𝑣 ′ . We could have deﬁned the transformer so that it returns only theupdated map 𝑀 ′ , but returning 𝑣 ′ in addition yields a more concise deﬁnition: observe that M →V × M is the type of the computation of a state monad [Wadler 1990]. We exploit this observationand deﬁne step using monadic composition of primitive state transformers. This allows us to hidethe stateful nature of the deﬁnition and make it easier to see the connection to the type systemlater.The primitive state transformers and composition operations are deﬁned in Fig. 3. For instance,the transformer ! 𝑛 reads the value at node 𝑛 in the current execution map 𝑀 and returns that valuetogether with the unchanged map. Similarly, the transformer 𝑛 : = 𝑣 updates the entry at node 𝑛 inthe current map 𝑀 by taking the join of the current value at 𝑛 with 𝑣 , returning the obtained newvalue 𝑣 ′ and the updated map. We compress a sequence of update operations 𝑛 : = 𝑣 , . . . , 𝑛 𝑛 : = 𝑣 𝑛 ,by using the shorter notation 𝑛 , . . . , 𝑛 𝑛 : = 𝑣 , . . . , 𝑣 𝑛 to reduce clutter. We point out that the resultof this sequenced update operation is the result of the last update 𝑛 𝑛 : = 𝑣 𝑛 .The operation bind ( 𝐹, 𝐺 ) deﬁnes the usual composition of stateful computations 𝐹 and 𝐺 inthe state monad, where 𝐹 ∈ M → 𝛼 × M and 𝐺 ∈ 𝛼 → M → 𝛽 × M for some 𝛼 and 𝛽 . Notethat the composition is short-circuiting in the case where the intermediate value 𝑢 produced by 𝐹 is ⊤ (i.e. an error occurred). We use Haskell-style monad comprehension syntax for applicationsof bind at the meta-level. That is, we write do 𝑥 ← 𝐹 ; 𝐺 for bind ( 𝐹, Λ 𝑥 . 𝐺 ) . We similarly write do 𝑥 ← 𝐹 if 𝑃 ; 𝐺 for bind ( 𝐹, Λ 𝑥 . if 𝑃 then 𝐺 else return ⊥) and we shorten do 𝑥 ← 𝐹 ; 𝐺 to just do 𝐹 ; 𝐺 in cases where 𝑥 does not occur free in 𝐺 . Moreover, we write do 𝑥 ← 𝐹 ; . . . ; 𝑥 𝑛 ← 𝐹 𝑛 ; 𝐺 for the comprehension sequence do 𝑥 ← 𝐹 ; ( . . . ; ( do 𝑥 𝑛 ← 𝐹 𝑛 ; 𝐺 ) . . . ) . We also freelymix the monad comprehension syntax with standard let binding syntax and omit the semicolonswhenever this causes no confusion. echnical Report, Zvonimir Pavlinovic, Yusen Su, and Thomas Wies The deﬁnition of step J 𝑒 K ( 𝐸, 𝑆 ) is given in Fig. 4 using induction over the structure of 𝑒 . As wediscussed earlier, the structural deﬁnition of the transformer enables an easier formal connectionto the data ﬂow reﬁnement typing rules. Note that in the deﬁnition we implicitly assume that h 𝑒, 𝐸 i is well-formed. We discuss the cases of the deﬁnition one at a time using Program 2 as ourrunning example. Figure 3 shows the ﬁxpoint iterates of step J 𝑒 K ( 𝜖, 𝜖 ) starting from the executionmap 𝑀 ⊥ where 𝑒 is Program 2 and 𝜖 refers to both the empty environment and empty stack. Foreach iterate ( 𝑖 ) , we only show the entries in the execution map that change in that iteration. Wewill refer to this ﬁgure throughout our discussion below. Constant 𝑒 = 𝑐 ℓ . Here, we simply set the current node 𝑛 to the join of its current value 𝑀 ( 𝑛 ) and thevalue 𝑐 . For example, in Fig. 3, when execution reaches the subexpression at location 𝑓 in iteration ( ) , the corresponding entry for the node 𝑛 identiﬁed by 𝑓 is updated to 𝑀 ( 𝑛 ) ⊔ = ⊥ ⊔ = . Variable 𝑒 = 𝑥 ℓ . This case implements the data ﬂow propagation between the variable node 𝑛 𝑥 binding 𝑥 and the current expression node 𝑛 where 𝑥 is used. This is realized using the propagationfunction ⋉ . Let 𝑣 𝑥 = Γ ( 𝑥 ) = 𝑀 ( 𝑛 𝑥 ) and 𝑣 = 𝑀 ( 𝑛 ) be the current values stored at the two nodes in 𝑀 .The function ⋉ takes these values as input and propagates information between them, returningtwo new values 𝑣 ′ 𝑥 and 𝑣 ′ which are then stored back into 𝑀 . The propagation function is deﬁnedin Fig. 5 and works as follows. If 𝑣 𝑥 is a constant or the error value and 𝑣 is still ⊥ , then we simplypropagate 𝑣 𝑥 forward, replacing 𝑣 and leaving 𝑣 𝑥 unchanged. The interesting cases are when wepropagate information between tables. The idea is that inputs in a table 𝑣 ﬂow backward to 𝑣 𝑥 whereas outputs for these inputs ﬂow forward from 𝑣 𝑥 to 𝑣 . For example, consider the evaluationof the occurrence of variable id at location 𝑎 in step ( ) of Fig. 3. Here, the expression node 𝑛 isidentiﬁed by 𝑎 and the variable node 𝑛 𝑥 by id . Moreover, we have 𝑣 = [ 𝑎 ⊳ → ⊥] from step ( ) and 𝑣 𝑥 = [ 𝑞 ⊳ → ] from step ( ) . We then obtain 𝑣 𝑥 ⋉ t 𝑣 = [ 𝑞 ⊳ → ] ⋉ t [ 𝑎 ⊳ → ⊥] = h[ 𝑞 ⊳ → , 𝑎 ⊳ → ⊥] , [ 𝑎 ⊳ → ⊥]i That is, the propagation causes the entry in the execution map for the node identiﬁed by id to beupdated to [ 𝑞 ⊳ → , 𝑎 ⊳ → ⊥] .In general, if 𝑣 𝑥 is a table but 𝑣 is still ⊥ , we initialize 𝑣 to the empty table 𝒗 ⊥ and leave 𝑣 𝑥 unchanged (because we have not yet accumulated any inputs in 𝑣 ). If both 𝑣 𝑥 and 𝑣 are tables, 𝑣 𝑥 ⋉ t 𝑣 propagates inputs and outputs as described above by calling ⋉ recursively for every call site 𝑆 ∈ 𝑣 . Note how the recursive call for the propagation of the inputs 𝑣 𝑖 ⋉ 𝑣 𝑖 inverts the direction ofthe propagation. This has the aﬀect that information about argument values propagate from thecall site to the deﬁnition site of the function being called, as is the case for the input value at callsite 𝑎 in our example above. Conversely, output values are propagated in the other direction fromfunction deﬁnition sites to call sites. For example, in step ( ) of Fig. 3, the occurrence of id atlocation 𝑎 is evaluated again, but now we have 𝑣 𝑥 = [ 𝑞 ⊳ → , 𝑎 ⊳ → ] from step ( ) whereas 𝑣 is as before. In this case, the propagation yields 𝑣 𝑥 ⋉ t 𝑣 = [ 𝑞 ⊳ → , 𝑎 ⊳ → ] ⋉ t [ 𝑎 ⊳ → ⊥] = h[ 𝑞 ⊳ → , 𝑎 ⊳ → ] , [ 𝑎 ⊳ → ]i That is, the information about the output value for the input value has ﬁnally arrived at the callsite 𝑎 . As we shall see, the dataﬂow propagation between tables closely relates to contravariantsubtyping of function types. Function application 𝑒 = ( 𝑒 𝑖 𝑒 𝑗 ) ℓ . We ﬁrst evaluate 𝑒 𝑖 to obtain the updated map and extract thenew value 𝑣 𝑖 stored at the corresponding expression node 𝑛 𝑖 . If 𝑣 𝑖 is not a table, then we must beattempting an unsafe call, in which case the monadic operations return the error map 𝑀 ⊤ . If 𝑣 𝑖 is a table, we continue evaluation of 𝑒 𝑗 obtaining the new value 𝑣 𝑗 at the associated expressionnode 𝑛 𝑗 . We next need to propagate the information between this call site 𝑛 𝑖 and 𝑣 𝑖 . To this end, ata Flow Refinement Type Inference Technical Report, ! 𝑛 def = Λ 𝑀. h 𝑀 ( 𝑛 ) , 𝑀 i 𝑛 : = 𝑣 def = Λ 𝑀. let 𝑣 ′ = 𝑀 ( 𝑛 ) ⊔ 𝑣 in if safe ( 𝑣 ′ ) then h 𝑣 ′ , 𝑀 [ 𝑛 ↦→ 𝑣 ′ ]i else h⊤ , 𝑀 ⊤ i env ( 𝐸 ) def = Λ 𝑀. h 𝑀 ◦ 𝐸, 𝑀 i assert ( 𝑃 ) def = Λ 𝑀. if 𝑃 then h⊥ , 𝑀 i else h⊤ , 𝑀 ⊤ i for 𝒗 do 𝐹 def = Λ 𝑀. ¤ Ä 𝑆 ∈ 𝒗 𝐹 ( 𝑆 ) ( 𝑀 ) return 𝑣 def = Λ 𝑀. h 𝑣, 𝑀 i bind ( 𝐹, 𝐺 ) def = Λ 𝑀. let h 𝑢, 𝑀 ′ i = 𝐹 ( 𝑀 ) in if 𝑢 = ⊤ then h⊤ , 𝑀 ⊤ i else 𝐺 ( 𝑢 ) ( 𝑀 ′ ) Fig. 3. Primitive transformers on state monad for computations over execution maps step J 𝑐 ℓ K ( 𝐸, 𝑆 ) def = do 𝑛 = ℓ ⋄ 𝐸 ; 𝑣 ′ ← 𝑛 : = 𝑐 ; return 𝑣 ′ step J 𝑥 ℓ K ( 𝐸, 𝑆 ) def = do 𝑛 = ℓ ⋄ 𝐸 ; 𝑣 ← ! 𝑛 ; 𝑛 𝑥 = 𝐸 ( 𝑥 ) ; Γ ← env ( 𝐸 ) 𝑣 ′ ← 𝑛 𝑥 , 𝑛 : = Γ ( 𝑥 ) ⋉ 𝑣 return 𝑣 ′ step J ( 𝑒 𝑖 𝑒 𝑗 ) ℓ K ( 𝐸, 𝑆 ) def = do 𝑛 = ℓ ⋄ 𝐸 ; 𝑛 𝑖 = 𝑖 ⋄ 𝐸 ; 𝑛 𝑗 = 𝑗 ⋄ 𝐸 ; 𝑣 ← ! 𝑛𝑣 𝑖 ← step J 𝑒 𝑖 K ( 𝐸, 𝑆 ) if 𝑣 𝑖 ≠ ⊥ assert ( 𝑣 𝑖 ∈ T ) 𝑣 𝑗 ← step J 𝑒 𝑗 K ( 𝐸, 𝑆 ) 𝑣 ′ 𝑖 , [ 𝑖 · 𝑆 ⊳ 𝑣 ′ 𝑗 → 𝑣 ′ ] = 𝑣 𝑖 ⋉ [ 𝑖 · 𝑆 ⊳ 𝑣 𝑗 → 𝑣 ] 𝑣 ′′ ← 𝑛 𝑖 , 𝑛 𝑗 , 𝑛 : = 𝑣 ′ 𝑖 , 𝑣 ′ 𝑗 , 𝑣 ′ return 𝑣 ′′ step J ( 𝜆𝑥 .𝑒 𝑖 ) ℓ K ( 𝐸, 𝑆 ) def = do 𝑛 = ℓ ⋄ 𝐸 ; 𝒗 ← 𝑛 : = 𝒗 ⊥ 𝒗 ′ ← for 𝒗 do body ( 𝑥, 𝑒 𝑖 , 𝐸, 𝒗 ) 𝒗 ′′ ← 𝑛 : = 𝒗 ′ return 𝒗 ′′ body ( 𝑥, 𝑒 𝑖 , 𝐸, 𝒗 ) ( 𝑆 ′ ) def = do 𝑛 𝑥 = 𝑥 ⋄ 𝐸 ⋄ 𝑆 ′ ; 𝐸 𝑖 = 𝐸.𝑥 : 𝑛 𝑥 ; 𝑛 𝑖 = 𝑖 ⋄ 𝐸 𝑖 𝑣 𝑥 ← ! 𝑛 𝑥 𝑣 𝑖 ← step J 𝑒 𝑖 K ( 𝐸 𝑖 , 𝑆 ′ )[ 𝑆 ′ : 𝑣 ′ 𝑥 → 𝑣 ′ 𝑖 ] , 𝒗 ′ = [ 𝑆 ′ : 𝑣 𝑥 → 𝑣 𝑖 ] ⋉ 𝒗 | 𝑆 ′ 𝑛 𝑥 , 𝑛 𝑖 : = 𝑣 ′ 𝑥 , 𝑣 ′ 𝑖 return 𝒗 ′ Fig. 4. Transformer for the concrete data flow semantics 𝒗 ⋉ 𝒗 def = let 𝒗 ′ = Λ 𝑆. if 𝑆 ∉ 𝒗 then h 𝒗 ( 𝑆 ) , 𝒗 ( 𝑆 )i elselet h 𝑣 𝑖 , 𝑣 𝑜 i = 𝒗 ( 𝑆 ) ; h 𝑣 𝑖 , 𝑣 𝑜 i = 𝒗 ( 𝑆 )h 𝑣 ′ 𝑖 , 𝑣 ′ 𝑖 i = 𝑣 𝑖 ⋉ 𝑣 𝑖 ; h 𝑣 ′ 𝑜 , 𝑣 ′ 𝑜 i = 𝑣 𝑜 ⋉ 𝑣 𝑜 in (h 𝑣 ′ 𝑖 , 𝑣 ′ 𝑜 i , h 𝑣 ′ 𝑖 , 𝑣 ′ 𝑜 i) in h Λ 𝑆. 𝜋 ( 𝒗 ′ ( 𝑆 )) , Λ 𝑆. 𝜋 ( 𝒗 ′ ( 𝑆 ))i 𝒗 ⋉ ⊥ def = h 𝒗 , 𝒗 ⊥ i 𝒗 ⋉ ⊤ def = h⊤ , ⊤i 𝑣 ⋉ 𝑣 def = h 𝑣 , 𝑣 ⊔ 𝑣 i (otherwise) Fig. 5. Value propagation in the concrete data flow semantics we use the return value 𝑣 for the node 𝑛 of 𝑒 computed thus far and create a singleton table [ 𝑆 ′ : 𝑣 𝑗 → 𝑣 ] where 𝑆 ′ = 𝑖 · 𝑆 is the extended call stack that will be used for the evaluation of thisfunction call. We then propagate between 𝑣 𝑖 and this table to obtain the new value 𝑣 ′ 𝑖 and table 𝒗 ′ . Note that this propagation boils down to (1) the propagation between 𝑣 𝑗 and the input of 𝑣 𝑖 echnical Report, Zvonimir Pavlinovic, Yusen Su, and Thomas Wies at 𝑆 ′ and (2) the propagation between the output of 𝑣 𝑖 at 𝑆 ′ and the return value 𝑣 . The updatedtable 𝒗 ′ hence contains the updated input and output values 𝑣 ′ 𝑗 and 𝑣 ′ at 𝑆 ′ . All of these values arestored back into the execution map. Intuitively, 𝑣 ′ 𝑖 contains the information that the correspondingfunction received an input coming from the call site identiﬁed by 𝑆 ′ . This information is ultimatelypropagated back to the function deﬁnition where the call is actually evaluated.As an example, consider the evaluation of the function application at location 𝑔 in step ( ) ofFig. 3. That is, node 𝑛 is identiﬁed by 𝑔 and we have 𝑖 = 𝑞 , 𝑗 = 𝑓 , 𝑆 = ℎ , and 𝑣 = ⊥ . Moreover,we initially have 𝑀 ( 𝑛 𝑖 ) = ⊥ and 𝑀 ( 𝑛 𝑗 ) = ⊥ . The recursive evaluation of 𝑒 𝑖 will propagate theinformation that id is a table to location 𝑞 , i.e., the recursive call to step returns 𝑣 𝑖 = 𝒗 ⊥ . Since 𝑣 𝑖 is a table, we proceed with the recursive evaluation of 𝑓 , after which we obtain 𝑀 ( 𝑛 𝑓 ) = 𝑣 𝑗 = .Next we compute 𝑣 𝑖 ⋉ [ 𝑖 · 𝑆 ⊳ 𝑣 𝑗 → 𝑣 ] = 𝒗 ⊥ ⋉ [ 𝑞ℎ ⊳ → ⊥] = h[ 𝑞ℎ ⊳ → ⊥] , [ 𝑞ℎ ⊳ → ⊥]i That is, 𝑀 ( 𝑛 𝑖 ) is updated to 𝑣 ′ 𝑖 = [ 𝑞ℎ ⊳ → ⊥] . Note that we still have 𝑣 ′ = ⊥ at this point. Theﬁnal value at location 𝑔 is obtained later when this subexpression is reevaluated in step ( ) . Lambda abstraction 𝑒 = ( 𝜆𝑥 .𝑒 𝑖 ) ℓ . We ﬁrst extract the table 𝒗 computed for the function thus far.Then, for every call stack 𝑆 ′ for which an input has already been back-propagated to 𝒗 , we analyzethe body by evaluating body ( 𝑥, 𝑒 𝑖 , 𝐸, 𝒗 ) ( 𝑆 ′ ) , as follows. First, we create a variable node 𝑛 𝑥 thatwill store the input value that was recorded for the call site stack 𝑆 ′ . Note that if this value is atable, indicating that the function being evaluated was called with another function as input, thenany inputs to the argument function that will be seen while evaluating the body 𝑒 𝑖 will be back-propagated to the table stored at 𝑛 𝑥 and then further to the node generating the call stack 𝑆 ′ . Byincorporating 𝑆 ′ into variable nodes, we guarantee that there is a unique node for each call.We next evaluate the function body 𝑒 𝑖 with the input associated with stack 𝑆 ′ . To this end, weextend the environment 𝐸 to 𝐸 𝑖 by binding 𝑥 to the variable node 𝑛 𝑥 . We then propagate theinformation between the values stored at the node 𝑛 𝑖 , i.e., the result of evaluating the body 𝑒 𝑖 , thenodes 𝑛 𝑥 for the bound variable, and the table 𝒗 . That is, we propagate information from (1) theinput of 𝒗 at 𝑆 ′ and the value at node 𝑛 𝑥 , and (2) the value assigned to the function body underthe updated environment 𝐸 𝑖 and the output of 𝒗 at 𝑆 ′ Finally, the updated tables 𝒗 ′ for all call sitestacks 𝑆 ′ are joined together and stored back at node 𝑛 .As an example, consider the evaluation of the lambda abstraction at location 𝑜 in step ( ) ofFig. 3. Here, 𝑛 is identiﬁed by 𝑜 and we initially have 𝑀 ( 𝑛 ) = 𝒗 = [ 𝑞ℎ ⊳ → ⊥] . Thus, we evaluatea single call to body for 𝑒 𝑖 = x 𝑘 and 𝑆 ′ = 𝑞ℎ . In this call we initially have 𝑀 ( 𝑛 𝑥 ) = 𝑀 ( x 𝑞 ) = ⊥ .Hence, the recursive evaluation of 𝑒 𝑖 does not yet have any eﬀect and we still obtain 𝑣 𝑖 = ⊥ at thispoint. However, the ﬁnal propagation step in body yields: [ 𝑆 ′ : 𝑣 𝑥 → 𝑣 𝑖 ] ⋉ 𝒗 ( 𝑆 ′ ) = [ 𝑞ℎ : ⊥ → ⊥] ⋉ [ 𝑞ℎ : 1 → ⊥] = h[ 𝑞ℎ : 1 → ⊥] , [ 𝑞ℎ : 1 → ⊥]i and we then update 𝑀 ( 𝑛 𝑥 ) to 𝑣 ′ 𝑥 = . In step ( ) , when the lambda abstraction at 𝑜 is once moreevaluated, we now have initially 𝑀 ( 𝑛 𝑥 ) = 𝑀 ( x 𝑞 ) = and the recursive evaluation of 𝑒 𝑖 = x 𝑘 will update the entry for 𝑘 𝑞 in the execution map to . Thus, we also obtain 𝑣 𝑖 = . The ﬁnalpropagation step in body now yields: [ 𝑆 ′ : 𝑣 𝑥 → 𝑣 𝑖 ] ⋉ 𝒗 ( 𝑆 ′ ) = [ 𝑞ℎ : 1 → ] ⋉ [ 𝑞ℎ : 1 → ⊥] = h[ 𝑞ℎ : 1 → ] , [ 𝑞ℎ : 1 → ]i which will cause the execution map entry for 𝑛 (identiﬁed by 𝑜 ) to be updated to [ 𝑞ℎ : 1 → ] .Observe that the evaluation of a lambda abstraction for a new input value always takes at leasttwo iterations of step . This can be optimized by performing the propagation in body both before Recall that in Fig. 3 we abbreviate the call stack 𝑞ℎ by just 𝑞 .14 ata Flow Refinement Type Inference Technical Report, and after the recursive evaluation of 𝑒 𝑖 . However, we omit this optimization here for the sake ofmaintaining a closer resemblance to the typing rule for lambda abstractions. Lemma 5.2.

The function ⋉ is monotone and increasing. Lemma 5.3.

For every 𝑒 ∈ Exp , 𝐸 ∈ E , and 𝑆 ∈ S such that h 𝑒, 𝐸 i is well-formed, step J 𝑒 K ( 𝐸, 𝑆 ) ismonotone and increasing. We deﬁne the semantics S J 𝑒 K of a program 𝑒 as the least ﬁxpoint of step over the completelattice of execution maps: S J 𝑒 K def = lfp ¤⊑ 𝑀 ⊥ Λ 𝑀. let h _ , 𝑀 ′ i = step J 𝑒 K ( 𝜖, 𝜖 ) ( 𝑀 ) in 𝑀 ′ Lemma 5.3 guarantees that S J 𝑒 K is well-deﬁned. We note that the above semantics does notprecisely model certain non-terminating programs where tables grow inﬁnitely deep. The se-mantics of such programs is simply 𝑀 ⊤ . A more precise semantics can be deﬁned using step-indexing [Appel and McAllester 2001]. However, we omit this for ease of presentation and notethat our semantics is adequate for capturing reﬁnement type systems and inference algorithms àla Liquid types, which do not support inﬁnitely nested function types.

Properties and collecting semantics.

As we shall see, data ﬂow reﬁnement types abstract pro-grams by properties 𝑃 ∈ P , which are sets of execution maps: P def = ℘ (M) . Properties form theconcrete lattice of our abstract interpretation and are ordered by subset inclusion. That is, theconcrete semantics of our abstract interpretation is the collecting semantics C :

Exp → P thatmaps programs to properties: C J 𝑒 K def = { S J 𝑒 K } . An example of a property is safety : let 𝑃 safe be theproperty consisting of all execution maps that map all nodes to safe values. Then a program 𝑒 is safe if C J 𝑒 K ⊆ 𝑃 safe . We next present two abstract semantics that represent crucial abstraction steps in calculationallyobtaining our data ﬂow reﬁnement type system from the concrete data ﬂow semantics. Our formalexposition focuses mostly on the aspects of these semantics that are instrumental in understandingthe loss of precision introduced by these intermediate abstractions. Other technical details can befound in the designated appendices included in the supplementary materials.

As we shall see later, a critical abstraction step performed by the type system is to conﬂate theinformation in tables that are propagated back from diﬀerent call sites to function deﬁnitions. Thatis, a pair of input/output values that has been collected from one call site will also be consideredas a possible input/output pair at other call sites to which that function ﬂows. To circumvent acatastrophic loss of precision caused by this abstraction step, we ﬁrst introduce an intermediatesemantics that explicitly captures the relational dependencies between input and output values offunctions. We refer to this semantics as the relational data ﬂow semantics . Abstract domains.

The deﬁnitions of nodes, stacks, and environments in the relational semanticsare the same as in the concrete semantics. Similar to data ﬂow reﬁnement types, the relationalabstractions of values, constants, and tables are deﬁned with respect to scopes 𝑋 ⊆ ﬁn Var . Thevariables in scopes are used to track how a value computed at a speciﬁc node in the executionmap relates to the other values bound in the current environment. We also use scopes to capturehow function output values depend on the input values, similar to the way input-output relationsare captured in dependent function types. The scope of a node 𝑛 , denoted 𝑋 𝑛 , is the domain of 𝑛 ’s echnical Report, Zvonimir Pavlinovic, Yusen Su, and Thomas Wies environment: 𝑋 𝑛 def = dom ( 𝑒𝑛𝑣 ( 𝑛 )) . The new semantic domains are deﬁned as follows: 𝑢 ∈ V r 𝑋 :: = ⊥ r | ⊤ r | 𝑟 | 𝑥 : 𝒖 𝑑 ∈ D r 𝑋 def = 𝑋 ∪ { 𝜈 } → Cons ∪ { F } 𝑟 ∈ R r 𝑋 def = ℘ (D r 𝑋 ) 𝑥 : 𝒖 ∈ T r 𝑋 def = Σ 𝑥 ∈ ( Var \ 𝑋 ) . S → V r 𝑋 × V r 𝑋 ∪{ 𝑥 } 𝑀 r ∈ M r def = Π 𝑛 ∈ N . V r 𝑋 𝑛 Relational values 𝑢 ∈ V r 𝑋 model how concrete values, stored at some node, depend on the concretevalues of nodes in the current scope 𝑋 . The relational value ⊥ r again models nontermination or un-reachability and imposes no constraints on the values in its scope. The relational value ⊤ r modelsevery possible concrete value, including ⊤ . Concrete constant values are abstracted by relations 𝑟 , which are sets of dependency vectors 𝑑 . A dependency vector tracks the dependency between aconstant value associated with the special symbol 𝜈 , and the values bound to the variables in scope 𝑋 . Here, we assume that 𝜈 is never contained in 𝑋 .We only track dependencies between constant values precisely: if a node in the scope stores atable, we abstract it by the symbol F which stands for an arbitrary concrete table. We assume F to be diﬀerent from all other constants Cons . We also require that for all 𝑑 ∈ 𝑟 , 𝑑 ( 𝜈 ) ∈ Cons .Relational tables h 𝑥, 𝒖 i , denoted 𝑥 : 𝒖 , are deﬁned analogously to concrete tables except that 𝒖 now maps call site stacks 𝑆 to pairs of relational values h 𝑢 𝑖 , 𝑢 𝑜 i . As for dependent function types,we add a dependency variable 𝑥 to every relational table to track input-output dependencies. Notethat we consider relational tables to be equal up to 𝛼 -renaming of dependency variables. Relationalexecution maps 𝑀 r assign each node 𝑛 a relational value with scope 𝑋 𝑛 .The relational semantics of a program is the relational execution map obtained as the leastﬁxpoint of a Galois abstraction of the concrete transformer step . The formal deﬁnition is mostlystraightforward, so we delegate it to § B . Example 6.1.

The relational execution map obtained for Program 2 is as follows (we only showthe entries for the nodes id , 𝑞 , and 𝑎 ): id ↦→ 𝑥 : [ 𝑞 ⊳ {( 𝜈 : 1 )} → {( 𝑥 : 1 , 𝜈 : 1 )} , 𝑎 ⊳ {( 𝜈 : 2 )} → {( 𝑥 : 2 , 𝜈 : 2 )}] 𝑞 ↦→ 𝑥 : [ 𝑞 ⊳ {( id : F , 𝜈 : 1 )} → {( id : F , 𝑥 : 1 , 𝜈 : 1 )}] 𝑎 ↦→ 𝑥 : [ 𝑎 ⊳ {( id : F , u : 1 , 𝜈 : 2 )} → {( id : F , u : 1 , 𝑥 : 2 , 𝜈 : 2 )}] Each concrete value 𝑣 in the concrete execution map shown to the right of Program 2 is abstractedby a relational value that relates 𝑣 with the values bound to the variables that are in the scope ofthe node where 𝑣 was observed. Consider the entry for node id . As expected, this entry is a tablethat has seen inputs at call site stacks identiﬁed with 𝑞 and 𝑎 . The actual input stored for call sitestack 𝑞 is now a relation consisting of the single row ( 𝜈 : 1 ) , and similarly for call site stack 𝑎 .As the node id has no other variables in its scope, these input relations are simply representingthe original concrete input values and . We associate these original values with the symbol 𝜈 .The output relation for 𝑞 consists of the single row ( 𝑥 : 1 , 𝜈 : 1 ) , stating that for the input value (associated with 𝑥 ), the output value is also (associated with 𝜈 ). Observe how we use 𝑥 to captureexplicitly the dependencies between the input and output values.The entry in the relational execution map for the node identiﬁed by 𝑞 is similar to the one for id , except that the relation also has an additional entry id : F . This is because id is in the scopeof 𝑞 . The value of id in the execution map is a table, which the relational values abstract by thesymbolic value F . That is, the relational semantics only tracks relational dependencies betweenprimitive values precisely whereas function values are abstracted by F . The relational table storedat node 𝑎 is similar, except that we now also have the variable u which is bound to value . As inthe concrete execution map, the relational tables for 𝑞 and 𝑎 contain fewer entries than the tablestored at id . ata Flow Refinement Type Inference Technical Report, Abstraction.

We formalize the meaning of relational execution maps in terms of a Galois con-nection between M r and the complete lattice of sets of concrete execution maps ℘ (M) . The de-tails of this construction and the resulting abstract transformer of the relational semantics can befound in § B . We here focus on the key idea of the abstraction by formalizing the intuitive mean-ing of relational values given above. Our formalization uses a family of concretization functions 𝛾 r 𝑋 : V r 𝑋 → ℘ (( 𝑋 → V) × V) , parameterized by scopes 𝑋 , that map relational values to sets ofpairs h Γ , 𝑣 i where Γ maps the variables in scope 𝑋 to values in V : 𝛾 r 𝑋 (⊥ r ) def = ( 𝑋 → V) × {⊥} 𝛾 r 𝑋 (⊤ r ) def = ( 𝑋 → V) × V 𝛾 r 𝑋 ( 𝑟 ) def = {h Γ , 𝑐 i | 𝑑 ∈ 𝑟 ∧ 𝑑 ( 𝜈 ) = 𝑐 ∧ ∀ 𝑥 ∈ 𝑋 . Γ ( 𝑥 ) ∈ 𝛾 d ( 𝑑 ( 𝑥 ))} ∪ 𝛾 r 𝑋 (⊥ r ) 𝛾 r 𝑋 ( 𝑥 : 𝒖 ) def = {h Γ , 𝒗 i | ∀ 𝑆. 𝒗 ( 𝑆 ) = h 𝑣 𝑖 , 𝑣 𝑜 i ∧ 𝒖 ( 𝑆 ) = h 𝑢 𝑖 , 𝑢 𝑜 i∧h Γ , 𝑣 𝑖 i ∈ 𝛾 r 𝑋 ( 𝑢 𝑖 ) ∧ h Γ [ 𝑥 ↦→ 𝑣 𝑖 ] , 𝑣 𝑜 i ∈ 𝛾 r 𝑋 ∪{ 𝑥 } ( 𝑢 𝑜 )} ∪ 𝛾 r 𝑋 (⊥ r ) Here, the function 𝛾 d , which we use to give meaning to dependency relations, is deﬁned by 𝛾 d ( 𝑐 ) = { 𝑐 } and 𝛾 d ( F ) = T . The meaning of relational execution maps is then given by the function 𝛾 r ( 𝑀 r ) def = { 𝑀 | ∀ 𝑛 ∈ N . h Γ , 𝑣 i ∈ 𝛾 r 𝑋 𝑛 ( 𝑀 r ( 𝑛 )) ∧ Γ = 𝑀 ◦ 𝑒𝑛𝑣 ( 𝑛 ) ∧ 𝑣 = 𝑀 ( 𝑛 )} . We now describe a key abstraction step in the construction of our data ﬂow reﬁnement type seman-tics. We formalize this step in terms of a collapsed (relational data ﬂow) semantics , which collapsesfunction tables to a bounded number of entries while controlling how much stack information isbeing lost, thereby allowing for diﬀerent notions of call-site context sensitivity.

Abstract domains.

The collapsed semantics is parameterized by a ﬁnite set of abstract stacks ˆ 𝑆 ∈ ˆ S , a stack abstraction function 𝜌 : S → ˆ S , and an abstract stack concatenation operation ˆ · : 𝐿𝑜𝑐 × ˆ S → ˆ S . Stack abstraction must be homomorphic with respect to concatenation: for all ℓ ∈ 𝐿𝑜𝑐 and 𝑆 ∈ S , 𝜌 ( ℓ · 𝑆 ) = ℓ ˆ · 𝜌 ( 𝑆 ) .Abstract stacks induce sets of abstract nodes and abstract environments following the samestructure as in the concrete semantics ˆ 𝑛 ∈ ˆ N def = ˆ N 𝑒 ∪ ˆ N 𝑥 ˆ N 𝑒 def = Loc × ˆ E ˆ N 𝑥 def = Var × ˆ E × ˆ S ˆ 𝐸 ∈ ˆ E def = Var ⇀ ﬁn ˆ N 𝑥 We lift 𝜌 from stacks to nodes and environments in the expected way. In particular, for variablenodes 𝑛 𝑥 = ℓ ⋄ 𝐸 ⋄ 𝑆 , we recursively deﬁne 𝜌 ( 𝑛 𝑥 ) def = ℓ ⋄( 𝜌 ◦ 𝐸 )⋄ 𝜌 ( 𝑆 ) . Analogous to concrete nodes,we deﬁne the scope of an abstract node as 𝑋 ˆ 𝑛 = dom ( 𝑒𝑛𝑣 ( ˆ 𝑛 )) .The deﬁnition of values and execution maps remains largely unchanged. In particular, depen-dency vectors and relational values are inherited from the relational semantics. Only the deﬁnitionof tables changes, which now range over abstract stacks rather than concrete stacks: ˆ 𝑢 ∈ V ˆ r 𝑋 :: = ⊥ ˆ r | ⊤ ˆ r | 𝑟 | 𝑥 : ˆ 𝒖 𝑥 : ˆ 𝒖 ∈ T ˆ r 𝑋 def = Σ 𝑥 ∈ ( Var \ 𝑋 ) . ˆ S → V ˆ r 𝑋 × V ˆ r 𝑋 ∪{ 𝑥 } 𝑀 ˆ r ∈ M ˆ r def = Π ˆ 𝑛 ∈ ˆ N . V ˆ r 𝑋 ˆ 𝑛 Example 6.2.

We use Program 2 again to provide intuition for the new semantics. To this end, weﬁrst deﬁne a family of sets of abstract stacks which we can use to instantiate the collapsed seman-tics. Let 𝑘 ∈ N and deﬁne ˆ S 𝑘 = 𝐿𝑜𝑐 𝑘 where 𝐿𝑜𝑐 𝑘 is the set of all sequences of locations of length atmost 𝑘 . Moreover, for ˆ 𝑆 ∈ ˆ S 𝑘 deﬁne ℓ ˆ · ˆ 𝑆 = ( ℓ · ˆ 𝑆 ) [ , 𝑘 ] where ( ℓ . . . ℓ 𝑛 ) [ , 𝑘 ] is 𝜖 if 𝑘 = , ℓ . . . ℓ 𝑘 if < 𝑘 < 𝑛 , and ℓ . . . ℓ 𝑛 otherwise. Note that this deﬁnition generalizes the deﬁnitions of ˆ S and ˆ S from Example 4.2. An abstract stack in ˆ S 𝑘 only maintains the return locations of the 𝑘 mostrecent pending calls, thus yielding a 𝑘 -context-sensitive analysis. In particular, instantiating our echnical Report, Zvonimir Pavlinovic, Yusen Su, and Thomas Wies collapsed semantics with ˆ S yields a context-insensitive analysis. Applying this analysis to Pro-gram 2, we obtain the following collapsed execution map , which abstracts the relational executionmap for this program shown in Example 6.1: id ↦→ 𝑥 : {( 𝜈 : 1 ) , ( 𝜈 : 2 )} → {( 𝑥 : 1 , 𝜈 : 1 ) , ( 𝑥 : 2 , 𝜈 : 2 )} 𝑞 ↦→ 𝑥 : {( id : F , 𝜈 : 1 )} → {( id : F , 𝑥 : 1 , 𝜈 : 1 )} 𝑎 ↦→ 𝑥 : {( id : F , u : 1 , 𝜈 : 2 )} → {( id : F , u : 1 , 𝑥 : 2 , 𝜈 : 2 )} Again, we only show some of the entries and for economy of notation, we omit the abstract stack 𝜖 in the singleton tables. Since the semantics does not maintain any stack information, the collapsed tables no longer track where functions are being called in the program. For instance, the entryfor id indicates that id is a function called at some concrete call sites with inputs 1 and 2. Whilethe precise call site stack information of id is no longer maintained, the symbolic variable 𝑥 stillcaptures the relational dependency between the input and output values for all the calls to id .If we chose to maintain more information in ˆ S , the collapsed semantics is also more precise. Forinstance, a 1-context-sensitive analysis is obtained by instantiating the collapsed semantics with ˆ S . Analyzing Program 2 using this instantiation of the collapsed semantics yields a collapsedexecution map that is isomorphic to the relational execution map shown in Example 6.1 (i.e., theanalysis does not lose precision in this case). Abstraction.

Similar to the relational semantics, we formalize the meaning of the collapsed se-mantics in terms of a Galois connection between the complete lattices of relational execution maps M r and collapsed execution maps M ˆ r . Again, we only provide the deﬁnition of the right adjoint 𝛾 ˆ r : M ˆ r → M r here. Similar to the relational semantics, 𝛾 ˆ r is deﬁned in terms of a family ofconcretization functions 𝛾 ˆ r 𝑋 : V ˆ r 𝑋 → V r 𝑋 mapping collapsed values to relational values: ¤ 𝛾 ˆ r ( 𝑀 ˆ r ) def = Λ 𝑛 ∈ N . ( 𝛾 ˆ r 𝑋 𝜌 ( 𝑛 ) ◦ 𝑀 ˆ r ◦ 𝜌 ) ( 𝑛 ) 𝛾 ˆ r 𝑋 (⊥ ˆ r ) def = ⊥ r 𝛾 ˆ r 𝑋 (⊤ ˆ r ) def = ⊤ r 𝛾 ˆ r 𝑋 ( 𝑟 ) def = 𝑟𝛾 ˆ r 𝑋 ( 𝑥 : ˆ 𝒖 ) def = 𝑥 : Λ 𝑆 ∈ S . let h ˆ 𝑢 𝑖 , ˆ 𝑢 𝑜 i = ( ˆ 𝒖 ◦ 𝜌 ) ( 𝑆 ) in h 𝛾 ˆ r 𝑋 ( ˆ 𝑢 𝑖 ) , 𝛾 ˆ r 𝑋 ∪{ 𝑥 } ( ˆ 𝑢 𝑜 )i More details on the collapsed semantics including its abstract transformer are provided in § C .

At last, we obtain our parametric data ﬂow reﬁnement type semantics from the collapsed relationalsemantics by abstracting dependency relations over concrete constants by abstract relations drawnfrom some relational abstract domain. In § 7.2, we will then show that our data ﬂow reﬁnement typesystem introduced in § 4 is sound and complete with respect to this abstract semantics. Finally, in§ 7.3 we obtain a generic type inference algorithm from our abstract semantics by using wideningto enforce ﬁnite convergence of the ﬁxpoint iteration sequence.

Abstract domains.

The abstract domains of our data ﬂow reﬁnement type semantics build onthe set of types V t 𝑋 deﬁned in § 4. Recall that V t 𝑋 is parametric with a set of abstract stacks ˆ S and a complete lattice of basic reﬁnement types hR t , ⊑ 𝑏 , ⊥ 𝑏 , ⊤ 𝑏 , ⊔ 𝑏 , ⊓ 𝑏 i , which can be viewed as aunion of sets R t 𝑋 for each scope 𝑋 . We require that each R t 𝑋 forms a complete sublattice of R t andthat there exists a family of Galois connections h 𝛼 𝑏𝑋 , 𝛾 𝑏𝑋 i between R t 𝑋 and the complete lattice ofdependency relations hR r 𝑋 , ⊆ , ∅ , D r 𝑋 , ∪ , ∩i . For instance, for the domain R lia from Example 4.1, the In fact, we can relax this condition and only require a concretization function, thus supporting abstract reﬁnement domainssuch as polyhedra [Cousot and Halbwachs 1978]. 18 ata Flow Refinement Type Inference Technical Report, concretization function 𝛾 𝑏𝑋 is naturally obtained from the satisfaction relation for linear integerconstraints.We lift the partial order ⊑ 𝑏 on basic reﬁnement types to a preorder ⊑ t on types as follows: 𝑡 ⊑ t 𝑡 def ⇐⇒ 𝑡 = ⊥ t ∨ 𝑡 = ⊤ t ∨ ( 𝑡 , 𝑡 ∈ R t ∧ 𝑡 ⊑ 𝑏 𝑡 ) ∨ ( 𝑡 , 𝑡 ∈ T t ∧ ∀ ˆ 𝑆. 𝑡 ( ˆ 𝑆 ) ¤⊑ t 𝑡 ( ˆ 𝑆 )) By implicitly taking the quotient of types modulo 𝛼 -renaming of dependency variables in functiontypes we obtain a partial order that induces a complete lattice hV t 𝑋 , ⊑ t , ⊥ t , ⊤ t , ⊔ t , ⊓ t i . We lift thispartial order point-wise to reﬁnement type maps M t def = Π ˆ 𝑛 ∈ ˆ N . V t 𝑋 ˆ 𝑛 and obtain a completelattice hM t , ¤⊑ t , 𝑀 t ⊥ , 𝑀 t ⊤ , ¤⊔ t , ¤⊓ t i . Galois connection.

The meaning of reﬁnement types is given by a function 𝛾 t 𝑋 : V t 𝑋 → V ˆ r 𝑋 thatextends 𝛾 𝑏 on basic reﬁnement types. This function is then lifted to type maps as before: 𝛾 t 𝑋 (⊥ t ) def = ⊥ ˆ r 𝛾 t 𝑋 (⊤ t ) def = ⊤ ˆ r 𝛾 t 𝑋 ( 𝑥 : 𝒕 ) def = 𝑥 : Λ ˆ 𝑆. let h 𝑡 𝑖 , 𝑡 𝑜 i = 𝒕 ( ˆ 𝑆 ) in h 𝛾 t 𝑋 ( 𝑡 𝑖 ) , 𝛾 t 𝑋 ∪{ 𝑥 } ( 𝑡 𝑜 )i 𝛾 t ( 𝑀 t ) def = Λ ˆ 𝑛. ( 𝛾 t 𝑋 ˆ 𝑛 ◦ 𝑀 t ) ( ˆ 𝑛 ) Abstract domain operations.

We brieﬂy revisit the abstract domain operations on types introducedin § 4 and provide their formal speciﬁcations needed for the correctness of our data ﬂow reﬁnementtype semantics.We deﬁne these operations in terms of three simpler operations on basic reﬁnement types. First,for 𝑥, 𝑦 ∈ 𝑋 ∪ { 𝜈 } and b ∈ R t 𝑋 , let b [ 𝑥 = 𝑦 ] be an abstraction of the concrete operation thatstrengthens the dependency relations described by b with an equality constraint 𝑥 = 𝑦 . That is,we require 𝛾 𝑏𝑋 ( b [ 𝑥 = 𝑦 ]) ⊇ { 𝑑 ∈ 𝛾 𝑏𝑋 ( b ) | 𝑑 ( 𝑥 ) = 𝑑 ( 𝑦 )} . Similarly, for 𝑐 ∈ Cons ∪ { F } we assumethat b [ 𝑥 = 𝑐 ] is an abstraction of the concrete operation that strengthens b with the equality 𝑥 = 𝑐 , i.e. we require 𝛾 𝑏𝑋 ( b [ 𝑥 = 𝑐 ]) ⊇ { 𝑑 ∈ 𝛾 𝑏𝑋 ( b ) | 𝑑 ( 𝑥 ) = 𝑐 } . Lastly, we assume an abstractvariable substitution operation, b [ 𝑥 / 𝜈 ] , which must be an abstraction of variable substitution ondependency relations: 𝛾 𝑏𝑋 ( b [ 𝑥 / 𝜈 ]) ⊇ { 𝑑 [ 𝜈 ↦→ 𝑐, 𝑥 ↦→ 𝑑 ( 𝜈 )] | 𝑑 ∈ 𝛾 𝑏𝑋 ( b ) , 𝑐 ∈ Cons } . We lift theseoperations to general reﬁnement types 𝑡 in the expected way. For instance, we deﬁne 𝑡 [ 𝑥 = 𝑐 ] def =  𝑡 [ 𝑥 = 𝑐 ] if 𝑡 ∈ R 𝑧 : Λ ˆ 𝑆. h 𝒕 ( ˆ 𝑆 ) in [ 𝑥 = 𝑐 ] , 𝒕 ( ˆ 𝑆 ) out [ 𝑥 = 𝑐 ]i if 𝑡 = 𝑧 : 𝒕 ∧ 𝑥 ≠ 𝜈𝑡 otherwise Note that in the second case of the deﬁnition, the fact that 𝑥 is in the scope of 𝑡 implies 𝑥 ≠ 𝑧 .We then deﬁne the function that yields the abstraction of a constant 𝑐 ∈ Cons as [ 𝜈 = 𝑐 ] t def = ⊤ 𝑏 [ 𝜈 = 𝑐 ] . The strengthening operation 𝑡 [ 𝑥 ← 𝑡 ′ ] is deﬁned recursively over the structure oftypes as follows: 𝑡 [ 𝑥 ← 𝑡 ′ ] def =  ⊥ t if 𝑡 ′ = ⊥ t 𝑡 [ 𝑥 = F ] else if 𝑡 ′ ∈ T t 𝑡 ⊓ 𝑏 𝑡 ′ [ 𝑥 / 𝜈 ] else if 𝑡 ∈ R 𝑧 : Λ ˆ 𝑆. h 𝒕 ( ˆ 𝑆 ) in [ 𝑥 ← 𝑡 ′ ] , 𝒕 ( ˆ 𝑆 ) out [ 𝑥 ← 𝑡 ′ ]i else if 𝑡 = 𝑧 : 𝒕 𝑡 otherwise Finally, we lift 𝑡 [ 𝑥 ← 𝑡 ′ ] to the operation 𝑡 [ Γ t ] that strengthens 𝑡 with respect to a type environ-ment Γ t by deﬁning 𝑡 [ Γ t ] def = d t 𝑥 ∈ dom ( Γ t ) 𝑡 [ 𝑥 ← Γ t ( 𝑥 )] . Abstract propagation and transformer.

The propagation operation ⋉ t on reﬁnement types,shown in Fig. 6, is then obtained from ⋉ in Fig. 5 by replacing all operations on concrete values echnical Report, Zvonimir Pavlinovic, Yusen Su, and Thomas Wies 𝑥 : 𝒕 ⋉ t 𝑥 : 𝒕 def = let 𝒕 = Λ ˆ 𝑆. let h 𝑡 𝑖 , 𝑡 𝑜 i = 𝒕 ( ˆ 𝑆 ) ; h 𝑡 𝑖 , 𝑡 𝑜 i = 𝒕 ( ˆ 𝑆 ) 𝑥 : 𝒕 ⋉ t ⊥ t def = h 𝑥 : 𝒕 , 𝑥 : 𝒕 ⊥ ih 𝑡 ′ 𝑖 , 𝑡 ′ 𝑖 i = 𝑡 𝑖 ⋉ t 𝑡 𝑖 𝑥 : 𝒕 ⋉ t ⊤ t def = h⊤ t , ⊤ t ih 𝑡 ′ 𝑜 , 𝑡 ′ 𝑜 i = 𝑡 𝑜 [ 𝑥 ← 𝑡 𝑖 ] ⋉ t 𝑡 𝑜 [ 𝑥 ← 𝑡 𝑖 ] 𝑡 ⋉ t 𝑡 def = h 𝑡 , 𝑡 ⊔ t 𝑡 i in hh 𝑡 ′ 𝑖 , 𝑡 𝑜 ⊔ t 𝑡 ′ 𝑜 i , h 𝑡 ′ 𝑖 , 𝑡 𝑜 ⊔ t 𝑡 ′ 𝑜 ii (otherwise) in h 𝑥 : Λ ˆ 𝑆. 𝜋 ( 𝒕 ( ˆ 𝑆 )) , 𝑥 : Λ ˆ 𝑆. 𝜋 ( 𝒕 ( ˆ 𝑆 ))i Fig. 6. Abstract value propagation in the data flow refinement type semantics step t J 𝑐 ℓ K ( ˆ 𝐸, ˆ 𝑆 ) def = do ˆ 𝑛 = ℓ ⋄ ˆ 𝐸 ; Γ t ← env ( ˆ 𝐸 ) ; 𝑡 ′ ← ˆ 𝑛 : = [ 𝜈 = 𝑐 ] t [ Γ t ] return 𝑡 ′ step t J 𝑥 ℓ K ( ˆ 𝐸, ˆ 𝑆 ) def = do ˆ 𝑛 = ℓ ⋄ ˆ 𝐸 ; 𝑡 ← !ˆ 𝑛 ; ˆ 𝑛 𝑥 = ˆ 𝐸 ( 𝑥 ) ; Γ t ← env ( ˆ 𝐸 ) 𝑡 ′ ← ˆ 𝑛 𝑥 , ˆ 𝑛 : = Γ t ( 𝑥 ) [ 𝜈 = 𝑥 ] t [ Γ t ] ⋉ t 𝑡 [ 𝜈 = 𝑥 ] t [ Γ t ] return 𝑡 ′ step t J ( 𝑒 𝑖 𝑒 𝑗 ) ℓ K ( ˆ 𝐸, ˆ 𝑆 ) def = do ˆ 𝑛 = ℓ ⋄ ˆ 𝐸 ; ˆ 𝑛 𝑖 = 𝑖 ⋄ ˆ 𝐸 ; ˆ 𝑛 𝑗 = 𝑗 ⋄ ˆ 𝐸 ; 𝑡 ← !ˆ 𝑛𝑡 𝑖 ← step t J 𝑒 𝑖 K ( ˆ 𝐸, ˆ 𝑆 ) ; assert ( 𝑡 𝑖 ∈ T t ) 𝑡 𝑗 ← step t J 𝑒 𝑗 K ( ˆ 𝐸, ˆ 𝑆 ) 𝑡 ′ 𝑖 , 𝑥 : [ 𝑖 ˆ · ˆ 𝑆 ⊳ 𝑡 ′ 𝑗 → 𝑡 ′ ] = 𝑡 𝑖 ⋉ t 𝑥 : [ 𝑖 ˆ · ˆ 𝑆 ⊳ 𝑡 𝑗 → 𝑡 ] 𝑡 ′′ ← ˆ 𝑛 𝑖 , ˆ 𝑛 𝑗 , ˆ 𝑛 : = 𝑡 ′ 𝑖 , 𝑡 ′ 𝑗 , 𝑡 ′ return 𝑡 ′′ step t J ( 𝜆𝑥 .𝑒 𝑖 ) ℓ K ( ˆ 𝐸, ˆ 𝑆 ) def = do ˆ 𝑛 = ℓ ⋄ ˆ 𝐸 ; 𝑡 ← ˆ 𝑛 : = 𝑥 : 𝒕 ⊥ 𝑡 ′ ← for 𝑡 do body t ( 𝑥, 𝑒 𝑖 , ˆ 𝐸, 𝑡 ) 𝑡 ′′ ← ˆ 𝑛 : = 𝑡 ′ return 𝑡 ′′ body t ( 𝑥, 𝑒 𝑖 , ˆ 𝐸, 𝑡 ) ( ˆ 𝑆 ′ ) def = do ˆ 𝑛 𝑥 = 𝑥 ⋄ ˆ 𝐸 ⋄ ˆ 𝑆 ′ ; ˆ 𝐸 𝑖 = ˆ 𝐸.𝑥 : ˆ 𝑛 𝑥 ; ˆ 𝑛 𝑖 = 𝑖 ⋄ ˆ 𝐸 𝑖 𝑡 𝑥 ← !ˆ 𝑛 𝑥 ; 𝑡 𝑖 ← step t J 𝑒 𝑖 K ( ˆ 𝐸 𝑖 , ˆ 𝑆 ′ ) 𝑥 : [ ˆ 𝑆 ′ : 𝑡 ′ 𝑥 → 𝑡 ′ 𝑖 ] , 𝑡 ′ = 𝑥 : [ ˆ 𝑆 ′ : 𝑡 𝑥 → 𝑡 𝑖 ] ⋉ t 𝑡 | ˆ 𝑆 ′ ˆ 𝑛 𝑥 , ˆ 𝑛 𝑖 : = 𝑡 ′ 𝑥 , 𝑡 ′ 𝑖 return 𝑡 ′ Fig. 7. Abstract transformer for the data flow refinement type semantics with their counterparts on types. In a similar fashion, we obtain the new abstract transformer step t for the reﬁnement type semantics from the concrete transformer step . We again use a state monadto hide the manipulation of type execution maps. The corresponding operations are variants ofthose used in the concrete transformer, which are summarized in Fig. 7. The abstract transformerclosely resembles the concrete one. The only major diﬀerences are in the cases for constant valuesand variables. Here, we strengthen the computed types with the relational information about thevariables in scope obtained from the current environment Γ t = 𝑀 t ◦ ˆ 𝐸 . Abstract semantics.

We identify abstract properties P t in the data ﬂow reﬁnement type seman-tics with type maps, P t def = M t and deﬁne 𝛾 : P t → P , which maps abstract to concrete propertiesby 𝛾 def = 𝛾 r ◦ 𝛾 ˆ r ◦ 𝛾 t . Lemma 7.1.

The function 𝛾 is a complete meet-morphism between P t and P . It follows that 𝛾 induces a Galois connection between concrete and abstract properties. The dataﬂow reﬁnement semantics C t J · K : Exp → P t is then deﬁned as the least ﬁxpoint of step t : C t J 𝑒 K def = lfp ¤⊑ t 𝑀 t ⊥ Λ 𝑀 t . let h _ , 𝑀 t ′ i = step t J 𝑒 K ( 𝜖, 𝜖 ) ( 𝑀 t ) in 𝑀 t ′ Theorem 7.2.

The reﬁnement type semantics is sound, i.e. C J 𝑒 K ⊆ 𝛾 ( C t J 𝑒 K ) . ata Flow Refinement Type Inference Technical Report, The soundness proof follows from the calculational design of our abstraction and the propertiesof the involved Galois connections.We say that a type 𝑡 is safe if it does not contain ⊤ t , i.e. 𝑡 ≠ ⊤ t and if 𝑡 = 𝑥 : 𝒕 then for all ˆ 𝑆 ∈ ˆ S , 𝒕 ( ˆ 𝑆 ) in and 𝒕 ( ˆ 𝑆 ) out are safe. Similarly, a type map 𝑀 t is safe if all its entries are safe. The next lemmastates that safe type maps yield safe properties. It follows immediately from the deﬁnitions of theconcretizations. Lemma 7.3.

For all safe type maps 𝑀 t , 𝛾 ( 𝑀 t ) ⊆ 𝑃 safe . A direct corollary of this lemma and the soundness theorems for our abstract semantics is thatany safe approximation of the reﬁnement type semantics can be used to prove program safety.

Corollary 7.4.

For all programs 𝑒 and safe type maps 𝑀 t , if C t J 𝑒 K ¤⊑ t 𝑀 t , then 𝑒 is safe. It is worth to pause for a moment and appreciate the resemblance between the subtyping andtyping rules introduced in § 4 on one hand, and the abstract propagation operator ⋉ t and abstracttransformer step t on the other hand. We now make this resemblance formally precise by showingthat the type system exactly captures the safe ﬁxpoints of the abstract transformer. This impliesthe soundness and completeness of our type system with respect to the abstract semantics.We start by formally relating the subtype relation and type propagation. The following lemmastates that subtyping precisely captures the safe ﬁxpoints of type propagation. Lemma 7.5.

For all 𝑡 , 𝑡 ∈ V t 𝑋 , 𝑡 < : 𝑡 iﬀ h 𝑡 , 𝑡 i = 𝑡 ⋉ t 𝑡 and 𝑡 , 𝑡 are safe. We use this fact to show that any derivation of a typing judgment Γ t , ˆ 𝑆 ⊢ 𝑒 : 𝑡 represents a safeﬁxpoint of step t on 𝑒 , and vice versa, for any safe ﬁxpoint of step t on 𝑒 , we can obtain a typingderivation. To state the soundness theorem we need one more deﬁnition: we say that a typingenvironment is valid if it does not map any variable to ⊥ t or ⊤ t . Theorem 7.6 (Soundness).

Let 𝑒 be an expression, Γ t a valid typing environment, ˆ 𝑆 an abstractstack, and 𝑡 ∈ V t . If Γ t , ˆ 𝑆 ⊢ 𝑒 : 𝑡 , then there exist 𝑀 t , ˆ 𝐸 such that 𝑀 t is safe, Γ t = 𝑀 t ◦ ˆ 𝐸 and h 𝑡, 𝑀 t i = step t J 𝑒 K ( ˆ 𝐸, ˆ 𝑆 ) ( 𝑀 t ) . Theorem 7.7 (Completeness).

Let 𝑒 be an expression, ˆ 𝐸 an environment, ˆ 𝑆 ∈ ˆ S , 𝑀 t a type map,and 𝑡 ∈ V t . If step t J 𝑒 K ( ˆ 𝐸, ˆ 𝑆 ) ( 𝑀 t ) = h 𝑡, 𝑀 t i and 𝑀 t is safe, then Γ t , ˆ 𝑆 ⊢ 𝑒 : 𝑡 where Γ t = 𝑀 t ◦ ˆ 𝐸 . The idea for the generic type inference algorithm is to iteratively compute C t J 𝑒 K , which capturesthe most precise typing for 𝑒 as we have established above. Unfortunately, there is no guaranteethat the ﬁxpoint iterates of step t converge towards C t J 𝑒 K in ﬁnitely many steps. The reasons aretwo-fold. First, the domain R t may not satisfy the ascending chain condition (i.e. it may have in-ﬁnite height). To solve this ﬁrst issue, we simply assume that R t comes equipped with a familyof widening operators ▽ t 𝑋 : R t 𝑋 × R t 𝑋 → R t 𝑋 for its scoped sublattices. Recall that a wideningoperator for a complete lattice h 𝐿, ⊑ , ⊥ , ⊤ , ⊔ , ⊓i is a function ▽ : 𝐿 × 𝐿 → 𝐿 such that: (1) ▽ is anupper bound operator, i.e., for all 𝑥,𝑦 ∈ 𝐿 , 𝑥 ⊔ 𝑦 ⊑ 𝑥 ▽ 𝑦 , and (2) for all inﬁnite ascending chains 𝑥 ⊑ 𝑥 ⊑ . . . in 𝐿 , the chain 𝑦 ⊑ 𝑦 ⊑ . . . eventually stabilizes, where 𝑦 def = 𝑥 and 𝑦 𝑖 def = 𝑦 𝑖 − ▽ 𝑥 𝑖 for all 𝑖 > [Cousot and Cousot 1977].The second issue is that there is in general no bound on the depth of function tables recordedin the ﬁxpoint iterates. This phenomenon can be observed, e.g., in the following program : We here assume that recursive functions are encoded using the 𝑌 combinator.21 echnical Report, Zvonimir Pavlinovic, Yusen Su, and Thomas Wies let rec hungry x = hungry and loop f = loop (f ()) in loop hungry To solve this second issue, we introduce a shape widening operator that enforces a bound on thedepth of tables. The two widening operators will then be combined to yield a widening operatorfor V t 𝑋 . In order to deﬁne shape widening, we ﬁrst deﬁne the shape of a type using the function sh : V t 𝑋 → V t 𝑋 sh (⊥ t ) def = ⊥ t sh (⊤ t ) def = ⊤ t sh ( 𝑟 t ) def = ⊥ t sh ( 𝑥 : 𝒕 ) def = 𝑥 : Λ ˆ 𝑆. h sh ( 𝒕 ( ˆ 𝑆 ) in ) , sh ( 𝒕 ( ˆ 𝑆 ) out )i A shape widening operator is a function ▽ sh 𝑋 : V t 𝑋 × V t 𝑋 → V t 𝑋 such that (1) ▽ sh 𝑋 is an upperbound operator and (2) for every inﬁnite ascending chain 𝑡 ⊑ t 𝑋 𝑡 ⊑ t 𝑋 . . . , the chain sh ( 𝑡 ′ ) ⊑ t 𝑋 sh ( 𝑡 ′ ) ⊑ t 𝑋 . . . stabilizes, where 𝑡 ′ def = 𝑡 and 𝑡 ′ 𝑖 def = 𝑡 ′ 𝑖 − ▽ sh 𝑋 𝑡 𝑖 for 𝑖 > . Example 7.8.

The occurs check performed when unifying type variables in type inference forHindley-Milner-style type systems serves a similar purpose as shape widening. In fact, we candeﬁne a shape widening operator that mimics the occurs check. To this end, suppose that eachdependency variable 𝑥 is tagged with a ﬁnite set of pairs h ℓ, ˆ 𝑆 i . We denote this set by tag ( 𝑥 ) .Moreover, when computing joins over function types 𝑥 : 𝒕 and 𝑦 : 𝒕 , ﬁrst 𝛼 -rename 𝑥 , respectively, 𝑦 by some fresh 𝑧 such that tag ( 𝑧 ) = tag ( 𝑥 ) ∪ tag ( 𝑦 ) . We proceed similarly when applying ⋉ t to function types. Finally, assume that each fresh dependency variable 𝑥 generated by step t forthe function type at a call expression 𝑒 𝑖 𝑒 𝑗 has tag ( 𝑥 ) = {h 𝑖, ˆ 𝑆 i} where ˆ 𝑆 is the abstract call stackat this point. Then to obtain 𝑡 ▽ sh 𝑋 𝑡 , ﬁrst deﬁne 𝑡 = 𝑡 ⊔ t 𝑋 𝑡 . If 𝑡 contains two distinct bindingsof dependency variables 𝑥 and 𝑦 such that tag ( 𝑥 ) = tag ( 𝑦 ) , deﬁne 𝑡 ▽ sh 𝑋 𝑡 = ⊤ t and otherwise 𝑡 ▽ sh 𝑋 𝑡 = 𝑡 . Clearly this is a shape widening operator if we only consider the ﬁnitely many tagsets that can be constructed from the locations in the analyzed program.In what follows, let ▽ sh 𝑋 be a shape widening operator. First, we lift the ▽ t 𝑋 on R t to an upperbound operator ▽ ra 𝑋 on V t 𝑋 : 𝑡 ▽ ra 𝑋 𝑡 ′ def =  𝑡 ▽ t 𝑋 𝑡 ′ if 𝑡, 𝑡 ′ ∈ R t 𝑥 : Λ ˆ 𝑆. let h 𝑡 𝑖 , 𝑡 𝑜 i = 𝒕 ( ˆ 𝑆 ) ; h 𝑡 ′ 𝑖 , 𝑡 ′ 𝑜 i = 𝒕 ′ ( ˆ 𝑆 ) in h 𝑡 𝑖 ▽ ra 𝑋 𝑡 ′ 𝑖 , 𝑡 𝑜 ▽ ra 𝑋 ∪{ 𝑥 } 𝑡 ′ 𝑜 i if 𝑡 = 𝑥 : 𝒕 ∧ 𝑡 ′ = 𝑥 : 𝒕 ′ 𝑡 ⊔ t 𝑡 ′ otherwise We then deﬁne ▽ t 𝑋 : V t 𝑋 × V t 𝑋 → V t 𝑋 as the composition of ▽ sh 𝑋 and ▽ ra 𝑋 , that is, 𝑡 ▽ t 𝑋 𝑡 ′ def = 𝑡 ▽ ra 𝑋 ( 𝑡 ▽ sh 𝑋 𝑡 ′ ) . Lemma 7.9. ▽ t 𝑋 is a widening operator. We lift the widening operators ▽ t 𝑋 pointwise to an upper bound operator ¤ ▽ t : M t × M t → M t and deﬁne a widened data ﬂow reﬁnement semantics C t ▽ J · K : Exp → P t as the least ﬁxpoint ofthe widened iterates of step t : C t ▽ J 𝑒 K def = lfp ¤⊑ t 𝑀 t ⊥ Λ 𝑀 t . let h _ , 𝑀 t ′ i = step t J 𝑒 K ( 𝜖, 𝜖 ) ( 𝑀 t ) in ( 𝑀 t ¤ ▽ t 𝑀 t ′ ) (1)The following theorem then follows directly from Lemma 7.9 and [Cousot and Cousot 1977]. Theorem 7.10.

The widened reﬁnement type semantics is sound and terminating, i.e., for all pro-grams 𝑒 , C t ▽ J 𝑒 K converges in ﬁnitely many iterations. Moreover, C t J 𝑒 K ¤⊑ t C t ▽ J 𝑒 K . Our parametric type inference algorithm thus computes C t ▽ J 𝑒 K iteratively according to (1). ByTheorem 7.10 and Corollary 7.4, if the resulting type map is safe, then so is 𝑒 . ata Flow Refinement Type Inference Technical Report, We have implemented a prototype of our parametric data ﬂow reﬁnement type inference anal-ysis in a tool called

Drift . The tool is written in OCaml and builds on top of the Apron li-brary [Jeannet and Miné 2009] to support various numerical abstract domains of type reﬁnements.We have implemented two versions of the analysis: a context-insensitive version in which all en-tries in tables are collapsed to a single one (as in Liquid type inference) and a 1-context-sensitiveanalysis that distinguishes table entries based on the most recent call site locations. For the widen-ing on basic reﬁnement types, we consider two variants: plain widening and widening with thresh-olds [Blanchet et al. 2007]. Both variants use

Apron ’s widening operators for the individual reﬁne-ment domains.The tool takes programs written in a subset of OCaml as input. This subset supports higher-order recursive functions, operations on primitive types such as integers and Booleans, as wellas lists and arrays. We note that the abstract and concrete transformers of our semantics can beeasily extended to handle recursive function deﬁnitions directly. As of now, the tool does not yetsupport user-deﬁned algebraic data types.

Drift automatically checks whether all array accessesare within bounds. In addition, the tool supports the veriﬁcation of user-provided assertions. Typereﬁnements for lists can express constraints on both the list’s length and its elements.To evaluate

Drift we conduct two experiments that aim to answer the following questions:(1) What is the trade-oﬀ between eﬃciency and precision for diﬀerent instantiations of ourparametric analysis framework?(2) How does our new analysis compare with other state-of-the-art automated veriﬁcation toolsfor higher-order programs?

Benchmarks and Setup.

We collected a benchmark suite of OCaml programs by combining severalsources of programs from prior work and augmenting it with our own new programs. Specif-ically, we included the programs used in the evaluation of the

DOrder [Zhu et al. 2016] and

R_Type [Champion et al. 2018a] tools, excluding only those programs that involve algebraic datatypes or certain OCaml standard library functions that our tool currently does not yet support.We generated a few additional variations of some of these programs by adding or modifyinguser-provided assertions, or by replacing concrete test inputs for the program’s main functionby unconstrained parameters. In general, the programs are small (up to 86 lines) but intricate. Wepartitioned the programs into ﬁve categories: ﬁrst-order arithmetic programs (FO), higher-orderarithmetic programs (HO), higher-order programs that were obtained by reducing program termi-nation to safety checking (T), array programs (A), and list programs (L). All programs except twoin the T category are safe. We separated these two erroneous programs out into a sixth category(E) which we augmented by additional unsafe programs obtained by modifying safe programs sothat they contain implementation bugs or faulty speciﬁcations. The benchmark suite is availablein the tool’s Github repository. All our experiments were conducted on a desktop computer withan Intel(R) Core(TM) i7-4770 CPU and 16 GB memory running Linux.

Experiment 1: Comparing diﬀerent configurations of

Drift . We consider the two versions of ourtool (context-insensitive and 1-context-sensitive) and instantiate each with two diﬀerent relationalabstract domains implemented in

Apron : Octagons [Miné 2007] (Oct), and Convex Polyhedra andLinear Equalities [Cousot and Halbwachs 1978] (Polka). For the Polka domain we consider both its loose conﬁguration, which only captures non-strict inequalities, as well as its strict conﬁgurationwhich can also represent strict inequalities. We note that the analysis of function calls criticallyrelies on the abstract domain’s ability to handle equality constraints precisely. We therefore do https://github.com/nyu-acsys/drift/ 23 echnical Report, Zvonimir Pavlinovic, Yusen Su, and Thomas Wies Table 1. Summary of Experiment 1. For each benchmark category, we provide the number of programswithin that category in parentheses. For each benchmark category and configuration, we list: the numberof programs successfully analyzed (succ) and the total accumulated running time across all benchmarks inseconds (total) . In the E category, an analysis run is considered successful if it flags the type error/assertionviolation present in the benchmark. The numbers given in parenthesis in the full columns indicate the numberof benchmarks that failed due to timeouts (if any). These benchmarks are excluded from the calculation ofthe cumulative running times. The timeout threshold was 300s per benchmark.

Benchmarkcategory ConﬁgurationVersion context-insensitive 1-context-sensitiveDomain Oct Polka strict Polka loose Oct Polka strict Polka looseWidening w tw w tw w tw w tw w tw w twFO (73) succ 25 39 39 51 39 51 33 46 47 58 47 59loc: 11 total 10.40 46.94 17.37 46.42(1) 15.82 42.88(1) 67.38 129.80 92.37 138.00(1) 87.27 138.33HO (62) succ 33 48(1) 42 55 42 55 42 51 48 60 48 60loc: 10 total 8.53 49.97 14.97 60.67 14.03 54.71 83.56 282.57 119.10 345.03 112.05 316.18T (80) succ 72 72 72 72 72 72 79 79 78 79 78 79loc: 44 total 806.24 842.70 952.53 994.52 882.21 924.00 1297.13(1) 1398.25(1) 1467.09(1) 1566.11(1) 1397.71(1) 1497.28(1)A (13) succ 6 6 8 8 8 8 8 8 11 11 11 11loc: 17 total 4.30 23.43 7.66 25.04 7.19 23.38 17.99 41.66 28.07 47.77 26.84 45.94L (20) succ 8(2) 14 10 14 10 14 11 18 10 18 10 18loc: 16 total 1.62 12.02 4.00 12.52 3.45 10.97 5.73 27.01 11.46 29.01 10.32 26.43E (17) succ 17 17 17 17 17 17 17 17 17 17 17 17loc: 21 total 8.04 14.89 12.72 19.44 12.01 18.28 17.90 28.76 24.96 34.24 23.75 32.71 not consider the interval domain as it cannot express such relational constraints. For each abstractdomain, we further consider two diﬀerent widening conﬁgurations: standard widening (w) andwidening with thresholds (tw). For widening with thresholds [Blanchet et al. 2007], we use a simpleheuristic that chooses the conditional expressions in the analyzed program as well as pair-wiseinequalities between the variables in scope as threshold constraints.Table 1 summarizes the results of the experiment. First, note that all conﬁgurations success-fully ﬂag all erroneous benchmarks (as one should expect from a sound analysis). Moreover, thecontext-sensitive version of the analysis is in general more precise than the context-insensitiveone. The extra precision comes at the cost of an increase in the analysis time by a factor of 1.8 onaverage. The 1-context-sensitive version with Polka loose/tw performed best, solving 244 out of265 benchmarks. There are two programs for which some of the conﬁgurations produced timeouts.However, each of these programs can be successfully veriﬁed by at least one conﬁguration. As ex-pected, using Octagon is more eﬃcient than using loose polyhedra, which in turn is more eﬃcientthan strict polyhedra. We anticipate that the diﬀerences in running times for the diﬀerent domainswill be more pronounced on larger programs. In general, one can use diﬀerent domains for diﬀer-ent parts of the program as is common practice in static analyzers such as

Astrée [Cousot et al.2009].We conducted a detailed analysis of the 20 benchmarks that

Drift could not solve using anyof the conﬁgurations that we have considered. To verify the 16 failing benchmarks in the FOand HO categories, one needs to infer type reﬁnements that involve either non-linear or non-convex constraints, neither of which is currently supported by the tool. This can be addressede.g. by further increasing context-sensitivity, by using more expressive domains such as intervalpolyhedra [Chen et al. 2009], and by incorporating techniques such as trace partitioning to reduceloss of precision due to joins [Mauborgne and Rival 2005]. The two failing benchmarks in thearray category require the analysis to capture universally quantiﬁed invariants about the elementsstored in arrays. However, our implementation currently only tracks array length constraints. Itis relatively straightforward to extend the analysis in order to capture constraints on elements ashas been proposed e.g. in [Vazou et al. 2013]. ata Flow Refinement Type Inference Technical Report, Table 2. Summary of Experiment 2. In addition to the cumulative running time for each category, we providethe average (avg) and median (med) running time per benchmark (in s). Timeouts are reported as in Table 1.The timeout threshold was 300s per benchmark across all tools. In the (succ) column, we additionally providein parentheses the number of benchmarks that were only solved by that tool (if any).

Bench-mark cat.

Drift R_Type DSolve MoCHi succ full avg med succ full avg med succ full avg med succ full avg medFO (73) 59(5) 138.33 1.89 0.26 44 5.78(15) 0.08 0.06 49 16.27 0.22 0.12 We further conducted a more detailed analysis of the running times by proﬁling the executionof the tool. This analysis determined that most of the time is spent in the strengthening of outputtypes with input types when propagating recursively between function types in ⋉ t . This happensparticularly often when analyzing programs that involve applications of curried functions, whichare currently handled rather naively, causing a quadratic blowup that can be avoided with a morecareful implementation. Notably, the programs in the T category involve deeply curried functions.This appears to be the primary reason why the tool is considerably slower on these programs.Moreover, the implementation of the ﬁxpoint loop is still rather naive as it calls ⋉ t even if thearguments have not changed since the previous ﬁxpoint iteration. We believe that by avoidingredundant calls to ⋉ t the running times can be notably improved. Experiment 2: Comparing with other Tools.

Overall, the results of Experiment 1 suggest that the 1-context-sensitive version of

Drift instantiated with the loose Polka domain and threshold widen-ing provides a good balance between precision and eﬃciency. In our second experiment, we com-pare this conﬁguration with several other existing tools. We consider three other automated veri-ﬁcation tools:

DSolve , R_Type , and

MoCHi . DSolve is the original implementation of the Liquidtype inference algorithm proposed in [Rondon et al. 2008] (cf. § 2).

R_Type [Champion et al. 2018a]improves upon

DSolve by replacing the Houdini-based ﬁxpoint algorithm of [Rondon et al. 2008]with the Horn clause solver

HoIce [Champion et al. 2018b].

HoIce uses an ICE-style machinelearning algorithm [Garg et al. 2014] that, unlike Houdini, can also infer disjunctive reﬁnementpredicates. We note that

R_Type does not support arrays or lists and hence we omit it from thesecategories. Finally,

MoCHi [Kobayashi et al. 2011] is a software model checker based on higher-order recursion schemes, which also uses

HoIce as its default back-end Horn clause solver. Weused the most recent version of each tool at the time when the experiments were conducted andwe ran all tools in their default conﬁgurations. More precise information about the speciﬁc toolversions used can be found in the tools’ Github repository.We initially also considered

DOrder [Zhu et al. 2016] in our comparison. This tool builds onthe same basic algorithm as

DSolve but also learns candidate predicates from concrete programexecutions via machine learning. However, the tool primarily targets programs that manipulatealgebraic data types. Moreover,

DOrder relies on user-provided test inputs for its predicate infer-ence. As our benchmarks work with unconstrained input parameters and we explicitly excludeprograms manipulating ADTs from our benchmark set, this puts

DOrder decidedly at a disadvan-tage. To keep the comparison fair, we therefore excluded

DOrder from the experiment.Table 2 summarizes the results of our comparison.

Drift and

MoCHi perform similarly over-all and signiﬁcantly better than the other two tools. In particular, we note that only

Drift and echnical Report, Zvonimir Pavlinovic, Yusen Su, and Thomas Wies MoCHi can verify the second program discussed in § 2. The evaluation also indicates complemen-tary strengths of these tools. In terms of number of veriﬁed benchmarks,

Drift performs best inthe HO, A, and L categories with

MoCHi scoring a close second place. For the FO and T categoriesthe roles are reversed. In the FO category,

MoCHi beneﬁts from its ability to infer non-convextype reﬁnements, which are needed to verify some of the benchmarks in this category. Neverthe-less there are ﬁve programs in this category that only

Drift can verify. Unlike our current imple-mentation,

MoCHi does not appear to suﬀer from ineﬃcient handling of deeply curried functions,which leads to signiﬁcantly better cumulative running times in the T category. On the other hand,

Drift is faster than

MoCHi in the L category.We note that there are two benchmarks in the FO category and four benchmarks in the A cate-gory for which

MoCHi produces false alarms. However, this appears to be due to the use of certainlanguage features that are unsupported by the tool (such as OCaml’s mod operator).None of the tools produced unsound results in the E category. The failing benchmarks for

MoCHi are due to timeouts and the ones for

DSolve are due to crashes. The considered time-out was 300 seconds per benchmark for all tools. Across all benchmarks,

R_Type timed out on20 and

MoCHi on 19 programs.

Drift timed out on only one benchmark in the T category. Weattribute the good responsiveness of

Drift to the use of inﬁnite reﬁnement domains with widen-ing in favor of the counterexample-guided abstraction reﬁnement approach used by

R_Type and

MoCHi , which does not guarantee termination of the analysis.

Reﬁnement type inference.

Early work on reﬁnement type systems supported dependent typeswith unions and intersections based on a modular bidirectional type checking algorithm [Dunﬁeld and Pfenning2003, 2004; Freeman and Pfenning 1991]. These algorithms require some type annotations. Instead,the focus of this paper is on fully automated reﬁnement type inference algorithms that perform awhole program analysis. Many existing algorithms in this category can be obtained by instantiat-ing our parametric framework. Speciﬁcally, Liquid type inference [Rondon et al. 2008; Vazou et al.2015, 2013, 2014] performs a context-insensitive analysis over a monomial predicate abstractiondomain of type reﬁnements. Similarly, Zhu and Jagannathan [2013] propose a 1-context sensitiveanalysis with predicate abstraction, augmented with an additional counterexample-guided reﬁne-ment loop, an idea that has also inspired recent techniques for analyzing pointer programs [Toman et al.2020]. Our work generalizes these algorithms to arbitrary abstract domains of type reﬁnements (in-cluding domains of inﬁnite height) and provides parametric and constructive soundness and com-pleteness proofs for the obtained type systems. Orthogonal to our work are extensions of static in-ference algorithms with data-driven approaches for inferring reﬁnement predicates [Champion et al.2018a; Zhu et al. 2015, 2016]. Gradual Liquid type inference [Vazou et al. 2018b] addresses the is-sue of how to apply whole program analysis to infer modular speciﬁcations and improve errorreporting. Our framework is in principle compatible with this approach. We note that Galoisconnections are used in [Garcia et al. 2016; Kazerounian et al. 2018; Lehmann and Tanter 2017;Vazou et al. 2018b; Vekris et al. 2016] to relate dynamic gradual reﬁnement types and static reﬁne-ment types. However, the resulting gradual type systems are not calculationally constructed asabstract interpretations of concrete program semantics.

Semantics of higher-order programs.

The techniques introduced in [Jagannathan and Weeks1995] and [Plevyak and Chien 1995] use ﬂow graphs to assign concrete meaning to higher-orderprograms. Nodes in these graphs closely relate to the nodes in our data ﬂow semantics. Theirsemantics represents functions as expression nodes storing the location of the function expres-sion. Hence, this semantics has to make non-local changes when analyzing function applications. ata Flow Refinement Type Inference Technical Report, Our data ﬂow semantics treats functions as tables and the concrete transformer is deﬁned struc-turally on program syntax, which is more suitable for deriving type inference analyses. The idea ofmodeling functions as tables that only track inputs observed during program execution was ﬁrstexplored in the minimal function graph semantics [Jones and Mycroft 1986; Jones and Rosendahl1997]. However, that semantics does not explicitly model data ﬂow in a program, and is hence notwell suited for constructing reﬁnement type inference algorithms. There is a large body of work oncontrol and data ﬂow analysis of higher-order programs and the concrete semantics they overap-proximate [Cousot and Cousot 1994; Horn and Might 2011; Jones and Andersen 2007; Midtgaard2012; Mossin 1998; Nielson and Nielson 1997]. However, these semantics either do not capturedata ﬂow properties or rely on non-structural concrete transformers. An interesting direction forfuture work is to reconcile our collapsed semantics with control ﬂow analyses that enjoy pushdownprecision such as [Gilray et al. 2016; Reps 1998; Vardoulakis and Shivers 2011a,b].Temporal logic for higher-order programs [Okuyama et al. 2019] and higher-order recursionschemes [Kobayashi et al. 2011; Ong 2015] provide alternative bases for verifying functional pro-grams. Unlike our framework, these approaches use ﬁnite state abstractions and model checking.In particular, they rely on abstraction reﬁnement to support inﬁnite height data domains, givingup on guaranteed termination of the analysis. On the other hand, they are not restricted to provingsafety properties.

Types as abstract interpretations.

The formal connection between types and abstract inter-pretation is studied by [Cousot 1997]. His paper shows how to construct standard polymorphictype systems via a sequence of abstractions starting from a concrete denotational call-by-valuesemantics. [Monsuez 1993b] similarly shows how to use abstract interpretation to design polymor-phic type systems for call-by-name semantics and model advanced type systems such as SystemF [Monsuez 1992, 1993a, 1995a,b]. [Gori and Levi 2002, 2003] use the abstract interpretation viewof types to design new type inference for ML-like languages by incorporating widening operatorsthat infer more precise types for recursive functions. [Jensen et al. 2009] use abstract interpretationtechniques to develop a type system for JavaScript that is able to guarantee absence of commonerrors such as invoking a non-function value or accessing an undeﬁned property. [Garcia et al.2016] use Galois connections to relate gradual types with abstract static types to build a gradualtype system from a static one. [Harper 1992] introduces a framework for constructing dependenttype systems from operational semantics based on the PER model of types. Although this workdoes not build on abstract interpretation, it relies on the idea that types overapproximate programbehaviors and that they can be derived using a suitable notion of abstraction.

10 CONCLUSION

In this work, we systematically develop a parametric reﬁnement type systems as an abstract in-terpretation of a new concrete data ﬂow semantics. This development unveils the design space ofreﬁnement type analyses that infer data ﬂow invariants for functional programs. Our prototype im-plementation and experimental evaluation indicate that our framework can be used to implementnew reﬁnement type inference algorithms that are both robust and precise.

ACKNOWLEDGMENTS

This work is funded in parts by the National Science Foundation under grants CCF-1350574 andCCF-1618059. We thank the anonymous reviewers for their feedback on an earlier draft of thispaper. echnical Report, Zvonimir Pavlinovic, Yusen Su, and Thomas Wies REFERENCES

Andrew W. Appel and David A. McAllester. 2001. An indexed model of recursive types for foundational proof-carryingcode.

ACM Trans. Program. Lang. Syst.

23, 5 (2001), 657–683. https://doi.org/10.1145/504709.504712Vincenzo Arceri, Martina Olliaro, Agostino Cortesi, and Isabella Mastroeni. 2019. Completeness of Abstract Domains forString Analysis of JavaScript Programs. In

Theoretical Aspects of Computing - ICTAC 2019 - 16th International Colloquium,Hammamet, Tunisia, October 31 - November 4, 2019, Proceedings (Lecture Notes in Computer Science, Vol. 11884) , Robert M.Hierons and Mohamed Mosbah (Eds.). Springer, 255–272. https://doi.org/10.1007/978-3-030-32505-3_15Roberto Bagnara, Patricia M. Hill, and Enea Zaﬀanella. 2008. The Parma Polyhedra Library: Toward a complete set ofnumerical abstractions for the analysis and veriﬁcation of hardware and software systems.

Sci. Comput. Program.

CoRR abs/cs/0701193 (2007). arXiv:cs/0701193http://arxiv.org/abs/cs/0701193Adrien Champion, Tomoya Chiba, Naoki Kobayashi, and Ryosuke Sato. 2018a. ICE-Based Reﬁnement Type Discoveryfor Higher-Order Functional Programs. In

Tools and Algorithms for the Construction and Analysis of Systems - 24thInternational Conference, TACAS 2018, Held as Part of the European Joint Conferences on Theory and Practice of Software,ETAPS 2018, Thessaloniki, Greece, April 14-20, 2018, Proceedings, Part I (Lecture Notes in Computer Science, Vol. 10805) ,Dirk Beyer and Marieke Huisman (Eds.). Springer, 365–384. https://doi.org/10.1007/978-3-319-89960-2_20Adrien Champion, Naoki Kobayashi, and Ryosuke Sato. 2018b. HoIce: An ICE-Based Non-linear Horn Clause Solver.In

Programming Languages and Systems - 16th Asian Symposium, APLAS 2018, Wellington, New Zealand, Decem-ber 2-6, 2018, Proceedings (Lecture Notes in Computer Science, Vol. 11275) , Sukyoung Ryu (Ed.). Springer, 146–156.https://doi.org/10.1007/978-3-030-02768-1_8Liqian Chen, Antoine Miné, Ji Wang, and Patrick Cousot. 2009. Interval Polyhedra: An Abstract Domain to Infer IntervalLinear Relationships. In

Static Analysis, 16th International Symposium, SAS 2009, Los Angeles, CA, USA, August 9-11,2009. Proceedings (Lecture Notes in Computer Science, Vol. 5673) , Jens Palsberg and Zhendong Su (Eds.). Springer, 309–325. https://doi.org/10.1007/978-3-642-03237-0_21Ravi Chugh, David Herman, and Ranjit Jhala. 2012. Dependent types for JavaScript. In

Proceedings of the 27th AnnualACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2012, partof SPLASH 2012, Tucson, AZ, USA, October 21-25, 2012 , Gary T. Leavens and Matthew B. Dwyer (Eds.). ACM, 587–606.https://doi.org/10.1145/2384616.2384659Patrick Cousot. 1997. Types as abstract interpretations. In

Proceedings of the 24th ACM SIGPLAN-SIGACT symposium onPrinciples of programming languages . ACM, 316–331.Patrick Cousot and Radhia Cousot. 1977. Abstract interpretation: a uniﬁed lattice model for static analysis of programs byconstruction or approximation of ﬁxpoints. In

Proceedings of the 4th ACM SIGACT-SIGPLAN symposium on Principles ofprogramming languages . ACM, 238–252.Patrick Cousot and Radhia Cousot. 1979. Systematic design of program analysis frameworks. In

Proceedings of the 6th ACMSIGACT-SIGPLAN symposium on Principles of programming languages . ACM, 269–282.Patrick Cousot and Radhia Cousot. 1994. Invited Talk: Higher Order Abstract Interpretation (and Application to Comport-ment Analysis Generalizing Strictness, Termination, Projection, and PER Analysis. In

Proceedings of the IEEE ComputerSociety 1994 International Conference on Computer Languages, May 16-19, 1994, Toulouse, France , Henri E. Bal (Ed.). IEEEComputer Society, 95–112. https://doi.org/10.1109/ICCL.1994.288389Patrick Cousot, Radhia Cousot, Jérôme Feret, Laurent Mauborgne, Antoine Miné, and Xavier Rival. 2009. Why does Astréescale up?

Formal Methods Syst. Des.

35, 3 (2009), 229–264. https://doi.org/10.1007/s10703-009-0089-6Patrick Cousot and Nicolas Halbwachs. 1978. Automatic Discovery of Linear Restraints Among Variables of a Pro-gram. In

Conference Record of the Fifth Annual ACM Symposium on Principles of Programming Languages, Tucson,Arizona, USA, January 1978 , Alfred V. Aho, Stephen N. Zilles, and Thomas G. Szymanski (Eds.). ACM Press, 84–96.https://doi.org/10.1145/512760.512770Joshua Dunﬁeld and Frank Pfenning. 2003. Type Assignment for Intersections and Unions in Call-by-Value Languages.In

Foundations of Software Science and Computational Structures, 6th International Conference, FOSSACS 2003 Heldas Part of the Joint European Conference on Theory and Practice of Software, ETAPS 2003, Warsaw, Poland, April7-11, 2003, Proceedings (Lecture Notes in Computer Science, Vol. 2620) , Andrew D. Gordon (Ed.). Springer, 250–266.https://doi.org/10.1007/3-540-36576-1_16Joshua Dunﬁeld and Frank Pfenning. 2004. Tridirectional typechecking. In

Proceedings of the 31st ACM SIGPLAN-SIGACTSymposium on Principles of Programming Languages, POPL 2004, Venice, Italy, January 14-16, 2004 , Neil D. Jones andXavier Leroy (Eds.). ACM, 281–292. https://doi.org/10.1145/964001.964025Cormac Flanagan and K. Rustan M. Leino. 2001. Houdini, an Annotation Assistant for ESC/Java. In

FME 2001: Formal Meth-ods for Increasing Software Productivity, International Symposium of Formal Methods Europe, Berlin, Germany, March 12-16, ata Flow Refinement Type Inference Technical Report, , José Nuno Oliveira and Pamela Zave (Eds.). Springer,500–517. https://doi.org/10.1007/3-540-45251-6_29Timothy S. Freeman and Frank Pfenning. 1991. Reﬁnement Types for ML. In Proceedings of the ACM SIGPLAN’91 Conferenceon Programming Language Design and Implementation (PLDI), Toronto, Ontario, Canada, June 26-28, 1991 , David S. Wise(Ed.). ACM, 268–277. https://doi.org/10.1145/113445.113468Ronald Garcia, Alison M. Clark, and Éric Tanter. 2016. Abstracting gradual typing. In

Proceedings of the 43rd Annual ACMSIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2016, St. Petersburg, FL, USA, January 20 -22, 2016 , Rastislav Bodík and Rupak Majumdar (Eds.). ACM, 429–442. https://doi.org/10.1145/2837614.2837670Pranav Garg, Christof Löding, P. Madhusudan, and Daniel Neider. 2014. ICE: A Robust Framework for Learning Invariants.In

Computer Aided Veriﬁcation - 26th International Conference, CAV 2014, Held as Part of the Vienna Summer of Logic,VSL 2014, Vienna, Austria, July 18-22, 2014. Proceedings (Lecture Notes in Computer Science, Vol. 8559) , Armin Biere andRoderick Bloem (Eds.). Springer, 69–87. https://doi.org/10.1007/978-3-319-08867-9_5Thomas Gilray, Steven Lyde, Michael D. Adams, Matthew Might, and David Van Horn. 2016. Pushdown control-ﬂowanalysis for free. In

Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of ProgrammingLanguages, POPL 2016, St. Petersburg, FL, USA, January 20 - 22, 2016 , Rastislav Bodík and Rupak Majumdar (Eds.). ACM,691–704. https://doi.org/10.1145/2837614.2837631Roberta Gori and Giorgio Levi. 2002. An Experiment in Type Inference and Veriﬁcation by Abstract Interpretation. In

Veriﬁcation, Model Checking, and Abstract Interpretation, Third International Workshop, VMCAI 2002, Venice, Italy, January21-22, 2002, Revised Papers (Lecture Notes in Computer Science, Vol. 2294) , Agostino Cortesi (Ed.). Springer, 225–239.https://doi.org/10.1007/3-540-47813-2_16Roberta Gori and Giorgio Levi. 2003. Properties of a Type Abstract Interpreter. In

Veriﬁcation, Model Checking, and AbstractInterpretation, 4th International Conference, VMCAI 2003, New York, NY, USA, January 9-11, 2002, Proceedings (LectureNotes in Computer Science, Vol. 2575) , Lenore D. Zuck, Paul C. Attie, Agostino Cortesi, and Supratik Mukhopadhyay(Eds.). Springer, 132–145. https://doi.org/10.1007/3-540-36384-X_13Robert Harper. 1992. Constructing Type Systems over an Operational Semantics.

J. Symb. Comput.

14, 1 (1992), 71–84.https://doi.org/10.1016/0747-7171(92)90026-ZDavid Van Horn and Matthew Might. 2011. Abstracting abstract machines: a systematic approach to higher-order programanalysis.

Commun. ACM

54, 9 (2011), 101–109. https://doi.org/10.1145/1995376.1995400Suresh Jagannathan and Stephen Weeks. 1995. A Uniﬁed Treatment of Flow Analysis in Higher-order Languages. In

Proceedings of the 22Nd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (San Francisco,California, USA). ACM, 393–407. https://doi.org/10.1145/199448.199536Bertrand Jeannet and Antoine Miné. 2009. Apron: A Library of Numerical Abstract Domains for Static Analysis. In

Computer Aided Veriﬁcation, 21st International Conference, CAV 2009, Grenoble, France, June 26 - July 2, 2009. Pro-ceedings (Lecture Notes in Computer Science, Vol. 5643) , Ahmed Bouajjani and Oded Maler (Eds.). Springer, 661–667.https://doi.org/10.1007/978-3-642-02658-4_52Simon Holm Jensen, Anders Møller, and Peter Thiemann. 2009. Type Analysis for JavaScript. In

Static Analysis, 16th Inter-national Symposium, SAS 2009, Los Angeles, CA, USA, August 9-11, 2009. Proceedings (Lecture Notes in Computer Science,Vol. 5673) , Jens Palsberg and Zhendong Su (Eds.). Springer, 238–255. https://doi.org/10.1007/978-3-642-03237-0_17Neil D. Jones and Nils Andersen. 2007. Flow Analysis of Lazy Higher-order Functional Programs.

Theor. Comput. Sci.

Conference Record of the Thirteenth Annual ACM Symposium on Principles of Programming Languages, St. PetersburgBeach, Florida, USA, January 1986 . ACM Press, 296–306. https://doi.org/10.1145/512644.512672Neil D. Jones and Mads Rosendahl. 1997. Higher-Order Minimal Function Graphs.

J. Funct. Log. Program.

Veriﬁcation, Model Checking, and Abstract Interpretation - 19th International Conference, VMCAI 2018, Los Angeles, CA,USA, January 7-9, 2018, Proceedings (Lecture Notes in Computer Science, Vol. 10747) , Isil Dillig and Jens Palsberg (Eds.).Springer, 269–290. https://doi.org/10.1007/978-3-319-73721-8_13Se-Won Kim and Kwang-Moo Choe. 2011. String Analysis as an Abstract Interpretation. In

Veriﬁcation, Model Check-ing, and Abstract Interpretation - 12th International Conference, VMCAI 2011, Austin, TX, USA, January 23-25, 2011. Pro-ceedings (Lecture Notes in Computer Science, Vol. 6538) , Ranjit Jhala and David A. Schmidt (Eds.). Springer, 294–308.https://doi.org/10.1007/978-3-642-18275-4_21Naoki Kobayashi, Ryosuke Sato, and Hiroshi Unno. 2011. Predicate abstraction and CEGAR for higher-order modelchecking. In

Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implemen-tation, PLDI 2011, San Jose, CA, USA, June 4-8, 2011 , Mary W. Hall and David A. Padua (Eds.). ACM, 222–233.https://doi.org/10.1145/1993498.1993525 29 echnical Report, Zvonimir Pavlinovic, Yusen Su, and Thomas Wies

Shuvendu K. Lahiri and Shaz Qadeer. 2009. Complexity and Algorithms for Monomial and Clausal Predicate Abstrac-tion. In

Automated Deduction - CADE-22, 22nd International Conference on Automated Deduction, Montreal, Canada, Au-gust 2-7, 2009. Proceedings (Lecture Notes in Computer Science, Vol. 5663) , Renate A. Schmidt (Ed.). Springer, 214–229.https://doi.org/10.1007/978-3-642-02959-2_18Nico Lehmann and Éric Tanter. 2017. Gradual reﬁnement types. In

Proceedings of the 44th ACM SIGPLAN Symposium onPrinciples of Programming Languages, POPL 2017, Paris, France, January 18-20, 2017 , Giuseppe Castagna and Andrew D.Gordon (Eds.). ACM, 775–788. http://dl.acm.org/citation.cfm?id=3009856Laurent Mauborgne and Xavier Rival. 2005. Trace Partitioning in Abstract Interpretation Based Static Analyzers. In

Pro-gramming Languages and Systems, 14th European Symposium on Programming,ESOP 2005, Held as Part of the Joint Eu-ropean Conferences on Theory and Practice of Software, ETAPS 2005, Edinburgh, UK, April 4-8, 2005, Proceedings (LectureNotes in Computer Science, Vol. 3444) , Shmuel Sagiv (Ed.). Springer, 5–20. https://doi.org/10.1007/978-3-540-31987-0_2Jan Midtgaard. 2012. Control-ﬂow analysis of functional programs.

ACM Comput. Surv.

44, 3 (2012), 10:1–10:33.https://doi.org/10.1145/2187671.2187672Antoine Miné. 2007. The Octagon Abstract Domain.

CoRR abs/cs/0703084 (2007). arXiv:cs/0703084http://arxiv.org/abs/cs/0703084Bruno Monsuez. 1992. Polymorphic typing by abstract interpretation. In

International Conference on Foundations of SoftwareTechnology and Theoretical Computer Science . Springer, 217–228.Bruno Monsuez. 1993a. Polymorphic types and widening operators. In

Static Analysis . Springer, 267–281.Bruno Monsuez. 1993b. Polymorphic typing for call-by-name semantics. In

Formal Methods in Programming and TheirApplications . Springer, 156–169.Bruno Monsuez. 1995a. System F and abstract interpretation. In

International Static Analysis Symposium . Springer, 279–295.Bruno Monsuez. 1995b. Using abstract interpretation to deﬁne a strictness type inference system. In

Proceedings of the 1995ACM SIGPLAN symposium on Partial evaluation and semantics-based program manipulation . ACM, 122–133.Christian Mossin. 1998. Higher-Order Value Flow Graphs.

Nord. J. Comput.

5, 3 (1998), 214–234.Hanne Riis Nielson and Flemming Nielson. 1997. Inﬁnitary Control Flow Analysis: a Collecting Semantics for ClosureAnalysis. In

Conference Record of POPL’97: The 24th ACM SIGPLAN-SIGACT Symposium on Principles of ProgrammingLanguages, Papers Presented at the Symposium, Paris, France, 15-17 January 1997 , Peter Lee, Fritz Henglein, and Neil D.Jones (Eds.). ACM Press, 332–345. https://doi.org/10.1145/263699.263745Yuya Okuyama, Takeshi Tsukada, and Naoki Kobayashi. 2019. A Temporal Logic for Higher-Order Func-tional Programs. In

Static Analysis - 26th International Symposium, SAS 2019, Porto, Portugal, October 8-11,2019, Proceedings (Lecture Notes in Computer Science, Vol. 11822) , Bor-Yuh Evan Chang (Ed.). Springer, 437–458.https://doi.org/10.1007/978-3-030-32304-2_21Luke Ong. 2015. Higher-Order Model Checking: An Overview. In . IEEE Computer Society, 1–15. https://doi.org/10.1109/LICS.2015.9Zvonimir Pavlinovic, Yusen Su, and Thomas Wies. 2021. Dataﬂow Reﬁnement Type Inference.

PACMPL

5, POPL (1 2021),19:1–19:31.John Plevyak and Andrew A. Chien. 1995. Iterative Flow Analysis.Thomas W. Reps. 1998. Program analysis via graph reachability.

Inf. Softw. Technol.

40, 11-12 (1998), 701–726.https://doi.org/10.1016/S0950-5849(98)00093-7Patrick Maxim Rondon, Ming Kawaguchi, and Ranjit Jhala. 2008. Liquid types. In

Proceedings of the ACM SIGPLAN 2008Conference on Programming Language Design and Implementation, Tucson, AZ, USA, June 7-13, 2008 , Rajiv Gupta andSaman P. Amarasinghe (Eds.). ACM, 159–169. https://doi.org/10.1145/1375581.1375602Gagandeep Singh, Markus Püschel, and Martin T. Vechev. 2017. Fast polyhedra abstract domain. In

Proceedings of the44th ACM SIGPLAN Symposium on Principles of Programming Languages, POPL 2017, Paris, France, January 18-20, 2017 ,Giuseppe Castagna and Andrew D. Gordon (Eds.). ACM, 46–59. http://dl.acm.org/citation.cfm?id=3009885John Toman, Ren Siqi, Kohei Suenaga, Atsushi Igarashi, and Naoki Kobayashi. 2020. ConSORT: Context- and Flow-SensitiveOwnership Reﬁnement Types for Imperative Programs. In

Programming Languages and Systems - 29th European Sym-posium on Programming, ESOP 2020, Held as Part of the European Joint Conferences on Theory and Practice of Software,ETAPS 2020, Dublin, Ireland, April 25-30, 2020, Proceedings (Lecture Notes in Computer Science, Vol. 12075) , Peter Müller(Ed.). Springer, 684–714. https://doi.org/10.1007/978-3-030-44914-8_25Hiroshi Unno and Naoki Kobayashi. 2009. Dependent type inference with interpolants. In

Proceedings of the 11th Interna-tional ACM SIGPLAN Conference on Principles and Practice of Declarative Programming, September 7-9, 2009, Coimbra, Por-tugal , António Porto and Francisco Javier López-Fraguas (Eds.). ACM, 277–288. https://doi.org/10.1145/1599410.1599445Hiroshi Unno, Tachio Terauchi, and Naoki Kobayashi. 2013. Automating relatively complete veriﬁcation of higher-order functional programs. In

The 40th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Lan-guages, POPL ’13, Rome, Italy - January 23 - 25, 2013 , Roberto Giacobazzi and Radhia Cousot (Eds.). ACM, 75–86.https://doi.org/10.1145/2429069.2429081 30 ata Flow Refinement Type Inference Technical Report,

Dimitrios Vardoulakis and Olin Shivers. 2011a. CFA2: a Context-Free Approach to Control-Flow Analysis.

Log. MethodsComput. Sci.

7, 2 (2011). https://doi.org/10.2168/LMCS-7(2:3)2011Dimitrios Vardoulakis and Olin Shivers. 2011b. Pushdown ﬂow analysis of ﬁrst-class control. In

Proceeding of the 16th ACMSIGPLAN international conference on Functional Programming, ICFP 2011, Tokyo, Japan, September 19-21, 2011 , ManuelM. T. Chakravarty, Zhenjiang Hu, and Olivier Danvy (Eds.). ACM, 69–80. https://doi.org/10.1145/2034773.2034785Niki Vazou, Alexander Bakst, and Ranjit Jhala. 2015. Bounded reﬁnement types. In

Proceedings of the 20th ACM SIGPLANInternational Conference on Functional Programming, ICFP 2015, Vancouver, BC, Canada, September 1-3, 2015 , KathleenFisher and John H. Reppy (Eds.). ACM, 48–61. https://doi.org/10.1145/2784731.2784745Niki Vazou, Joachim Breitner, Rose Kunkel, David Van Horn, and Graham Hutton. 2018a. Theorem proving for all:equational reasoning in liquid Haskell (functional pearl). In

Proceedings of the 11th ACM SIGPLAN International Sym-posium on Haskell, Haskell@ICFP 2018, St. Louis, MO, USA, September 27-17, 2018 , Nicolas Wu (Ed.). ACM, 132–144.https://doi.org/10.1145/3242744.3242756Niki Vazou, Patrick Maxim Rondon, and Ranjit Jhala. 2013. Abstract Reﬁnement Types. In

Programming Lan-guages and Systems - 22nd European Symposium on Programming, ESOP 2013, Held as Part of the European JointConferences on Theory and Practice of Software, ETAPS 2013, Rome, Italy, March 16-24, 2013. Proceedings (Lec-ture Notes in Computer Science, Vol. 7792) , Matthias Felleisen and Philippa Gardner (Eds.). Springer, 209–228.https://doi.org/10.1007/978-3-642-37036-6_13Niki Vazou, Eric L. Seidel, Ranjit Jhala, Dimitrios Vytiniotis, and Simon L. Peyton Jones. 2014. Reﬁnementtypes for Haskell. In

Proceedings of the 19th ACM SIGPLAN international conference on Functional programming,Gothenburg, Sweden, September 1-3, 2014 , Johan Jeuring and Manuel M. T. Chakravarty (Eds.). ACM, 269–282.https://doi.org/10.1145/2628136.2628161Niki Vazou, Éric Tanter, and David Van Horn. 2018b. Gradual liquid type inference.

Proc. ACM Program. Lang.

2, OOPSLA(2018), 132:1–132:25. https://doi.org/10.1145/3276502Panagiotis Vekris, Benjamin Cosman, and Ranjit Jhala. 2016. Reﬁnement types for TypeScript. In

Proceedings of the 37thACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2016, Santa Barbara, CA, USA,June 13-17, 2016 , Chandra Krintz and Emery Berger (Eds.). ACM, 310–325. https://doi.org/10.1145/2908080.2908110Philip Wadler. 1990. Comprehending Monads. In

Proceedings of the 1990 ACM Conference on LISP and Functional Program-ming, LFP 1990, Nice, France, 27-29 June 1990 . ACM, 61–78. https://doi.org/10.1145/91556.91592Hongwei Xi and Frank Pfenning. 1999. Dependent Types in Practical Programming. In

POPL ’99, Proceedings of the 26thACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, San Antonio, TX, USA, January 20-22, 1999 ,Andrew W. Appel and Alex Aiken (Eds.). ACM, 214–227. https://doi.org/10.1145/292540.292560He Zhu and Suresh Jagannathan. 2013. Compositional and Lightweight Dependent Type Inference for ML. In

Veriﬁcation,Model Checking, and Abstract Interpretation, 14th International Conference, VMCAI 2013, Rome, Italy, January 20-22, 2013.Proceedings (Lecture Notes in Computer Science, Vol. 7737) , Roberto Giacobazzi, Josh Berdine, and Isabella Mastroeni(Eds.). Springer, 295–314. https://doi.org/10.1007/978-3-642-35873-9_19He Zhu, Aditya V. Nori, and Suresh Jagannathan. 2015. Learning reﬁnement types. In

Proceedings of the 20th ACM SIGPLANInternational Conference on Functional Programming, ICFP 2015, Vancouver, BC, Canada, September 1-3, 2015 , KathleenFisher and John H. Reppy (Eds.). ACM, 400–411. https://doi.org/10.1145/2784731.2784766He Zhu, Gustavo Petri, and Suresh Jagannathan. 2016. Automatically learning shape speciﬁcations. In

Proceedings of the37th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2016, Santa Barbara, CA,USA, June 13-17, 2016 , Chandra Krintz and Emery Berger (Eds.). ACM, 491–507. https://doi.org/10.1145/2908080.290812531 echnical Report, Zvonimir Pavlinovic, Yusen Su, and Thomas Wies

A ADDITIONAL EXAMPLES OF THE CONCRETE DATAFLOW SEMANTICSA.1 Higher-order Example

Let us consider an example involving higher-order functions. The code is again given below. let dec y = y - 1 in let f x g = if x >= 0 then g 𝑜 x 𝑝 else x 𝑞 in f 𝑎 (( f 𝑏 𝑐 ) 𝑑 dec 𝑖 ) 𝑗 dec 𝑚 The concrete execution map for this program, restricted to the execution nodes of interest is asfollows: 𝑞 𝑎 ↦→ ⊥ 𝑞 𝑏 ↦→ ⊥ 𝑐 ↦→ 𝑝 𝑎 ↦→ 𝑝 𝑏 ↦→ 𝑖 ↦→ 𝑖 ↦→ [ 𝑜𝑑 ⊳ → ] 𝑚 ↦→ [ 𝑜 𝑗 ⊳ → − ] dec ↦→ [ 𝑜𝑑 ⊳ → , 𝑜 𝑗 ⊳ → − ] 𝑎 ↦→ [ 𝑎 ⊳ → [ 𝑗 ⊳ [ 𝑜 𝑗 ⊳ → − ] → − ]] 𝑏 ↦→ [ 𝑐 ⊳ → [ 𝑑 ⊳ [ 𝑜𝑑 ⊳ → ] → ]] 𝑓 ↦→ [ 𝑎 ⊳ → [ 𝑗 ⊳ [ 𝑜 𝑗 ⊳ → − ] → − ] , 𝑏 ⊳ → [ 𝑑 ⊳ [ 𝑜𝑑 ⊳ → ] → ]] For this speciﬁc program, the execution will always reach the program expressions with locations a , b , c , d , i , j , and m with the same environment. For this reason, we again identify these expressionnodes with their locations. Expressions with labels o , p , and q will be reached with two environ-ments and stacks, each of which corresponds to executing the body of f with the value passed toit at call sites with locations a and b . We use this call site information to distinguish the two evalu-ations of the body of f . For instance, the node associated with the x 𝑞 expression when performingthe analysis for the call to f made at location 𝑎 is denoted by q 𝑎 in the above table. A.2 Recursion Example Concrete Map

We next exemplify how our concrete data ﬂow semantics models recursion through an OCamlprogram that can be found to the left in Fig. 8. The program ﬁrst deﬁnes a function f that returns 0when given 1 as input, and it otherwise just recurses with a decrement of the input. This functionis then called two times with inputs 2 and 0, respectively. The latter call never terminates. To avoidclutter, we only annotate program expressions of particular interest with their locations. let rec f n =if n=1 then 0 else f 𝑎 (n-1)in (f 𝑏

2) + (f 𝑐 𝑏 ↦→ [ 𝑏 ⊳ → ] 𝑐 ↦→ [ 𝑐 ⊳ → ⊥] 𝑓 ↦→ [ 𝑏 ⊳ → , 𝑐 ⊳ → ⊥ , 𝑎𝑏 ⊳ → ,𝑎𝑐 ⊳ − → ⊥ , 𝑎𝑎𝑐 ⊳ − → ⊥ , . . . ] Fig. 8. A program exhibiting recursion and excerpt of its execution map

Let us ﬁrst discuss the node 𝑏 , which corresponds to the call to f with integer 2. The meaningassigned to 𝑏 is a table indicating that the corresponding function is invoked at the call site stackassociated with 𝑏 with integer 2 as input, returning 0 as the output. The computation of this outputclearly requires a recursive call to f with 1 as the input. The existence and eﬀect of this call can beobserved by the entry 𝑎𝑏 ⊳ → in the table associated with f . Intuitively, this entry states that f was recursively called at the point identiﬁed by 𝑎 while executing the call at 𝑏 .Consider now the node 𝑐 , which corresponds to the call to f with integer 0. The meaning as-signed to 𝑐 is a table indicating that the corresponding function is invoked at the call site stackassociated with 𝑐 with integer 0 but with no computed output ⊥ . Clearly, there is no useful outputcomputed since f does not terminate with 0 as input. The recursive calls to f observed after calling ata Flow Refinement Type Inference Technical Report, f with 0 and their eﬀects can be again observed in the table computed for f . For instance, the tableentry 𝑎𝑐 ⊳ − → ⊥ captures that the recursive call to f at the call site point 𝑎 after entering f from 𝑐 has seen − as the input, but no output has been computed. Similarly, the entry 𝑎𝑎𝑐 ↦→ h− , ⊥i indicates that f was recursively called at 𝑎 with − as input after the previous recursive call, andagain no output was observed. Due to the nonterminating execution, the table stored for f hasinﬁnitely many entries of the shape 𝑎 𝑘 𝑑 ⊳ − 𝑘 → ⊥ . B RELATIONAL DATA FLOW SEMANTICSB.1 Notation

Before proceeding to the deﬁnitions of the ordering and abstract transformer for the relationalsemantics, we ﬁrst formally introduce auxiliary notation and deﬁnitions.We use 𝑥 : 𝒖 ⊥ for the empty relational table that maps every call site to the pair h⊥ r , ⊥ r i . Wedenote the fact that a relational table 𝒖 has been invoked at a call site stack 𝑆 cs with dependencyvariable 𝑥 by 𝑆 cs , 𝑥 ∈ 𝒖 . By 𝑀 r ⊥ ( 𝑀 r ⊤ ) we denote the relational execution maps that assign ⊥ r ( ⊤ r )to every node.We denote by V r the union of all V r 𝑋 and use similar notation for the sets of all dependencyrelations and relational tables. For the members of these sets we sometimes omit the subscriptindicating the scope when it is irrelevant or clear from the context to avoid clutter. B.2 Domain Ordering

We deﬁne a partial order ⊑ r on V r , which is analogous to the partial order ⊑ on concrete values,except that equality on constants is replaced by subset inclusion on dependency relations: 𝑢 ⊑ r 𝑢 def ⇐⇒ 𝑢 = ⊥ r ∨ 𝑢 = ⊤ r ∨ ( 𝑢 , 𝑢 ∈ R r 𝑋 ∧ 𝑢 ⊆ 𝑢 ) ∨( 𝑢 , 𝑢 ∈ T r 𝑋 ∧ ∀ 𝑆 cs . 𝑢 ( 𝑆 cs ) ¤⊑ r 𝑢 ( 𝑆 cs )) We note that aside from ⊥ r and ⊤ r , only the relational values of identical scope 𝑋 are comparable.This ordering induces for every 𝑋 , a complete lattice (V r 𝑋 , ⊑ r , ⊥ r , ⊤ r , ⊔ r , ⊓ r ) . We assume the set ofrelational values is implicitly quotiented under dependency variable renaming, which can alwaysbe achieved. The deﬁnition of join over relational values is given below. ⊥ r ⊔ r 𝑢 def = 𝑢 𝑢 ⊔ r ⊥ r def = 𝑢 𝑟 ⊔ r 𝑟 def = 𝑟 ∪ 𝑟 𝒖 ⊔ r 𝒖 def = 𝜆𝑆 cs . 𝒖 ( 𝑆 cs ) ¤⊔ r 𝒖 ( 𝑆 cs ) 𝑢 ⊔ r 𝑢 def = ⊤ r (otherwise)The meet ⊓ r is deﬁned similarly. Lemma B.1.

Let 𝑉 ∈ ℘ (V r 𝑋 ) . Then, ⊔ r 𝑉 = lub ( 𝑉 ) and ⊓ r 𝑉 = glb ( 𝑉 ) subject to ⊑ r . We lift the lattices on relational values pointwise to a complete lattice on relational maps (M r , ¤⊑ r , 𝑀 r ⊥ , 𝑀 r ⊤ , ¤⊔ r , ¤⊓ r ) . Galois connections.

We can show that there exists a Galois connection between the complete latticeof relational values and the superset of pairs of concrete maps and concrete values.

Lemma B.2.

For all scopes 𝑋 , 𝛾 r is a complete meet-morphism between V r 𝑋 and ℘ (( 𝑋 → V) × V) . It follows then that there exists a Galois connection between ℘ (( 𝑋 → V) ×V) and V r 𝑋 for everyscope 𝑋 which has 𝛾 r as its right adjoint [Cousot and Cousot 1979]. Let 𝛼 r 𝑋 : ℘ (( 𝑋 → V) × V) →V r 𝑋 be the corresponding left adjoint, which is uniquely determined by 𝛾 r .From Lemma B.2 it easilyfollows that ¤ 𝛾 r is also a complete meet-morphism. We denote by ¤ 𝛼 r : ℘ (M) → M r the left adjointof the induced Galois connection. echnical Report, Zvonimir Pavlinovic, Yusen Su, and Thomas Wies 𝑢 𝑋 [ 𝑥 ← 𝑢 ′ ] def = 𝛼 r 𝑋 ( 𝛾 r ( 𝑢 𝑋 ) ∩ {h Γ ′ [ 𝑥 ↦→ 𝑣 ′ ] , 𝑣 i | h Γ ′ , 𝑣 ′ i ∈ 𝛾 r ( 𝑢 ′ )}) 𝑢 𝑋 [ 𝑥 = 𝑥 ] def = 𝛼 r 𝑋 ( 𝛾 r 𝜄 ( 𝑢 𝑋 ) ∩ {h Γ , 𝑣 i | Γ ( 𝑥 ) = Γ ( 𝑥 )}) 𝑢 𝑋 [ 𝑥 = 𝑣 ] def = 𝛼 r 𝑋 ( 𝛾 r ( 𝑢 𝑋 ) ∩ {h Γ , 𝑣 i | Γ ( 𝑥 ) = 𝑣 }) 𝑢 𝑋 [ Γ ] def = l r { 𝑢 [ 𝑛 ← Γ ( 𝑛 )] | 𝑛 ∈ 𝑋 } 𝑢 𝑋 [ 𝑋 ′ ] def = 𝛼 r 𝑋 ′ ( 𝛾 r ( 𝑢 𝑋 ) ∩ {h Γ , 𝑣 i | ∀ 𝑛 ∈ 𝑋 ′ \ 𝑋 . Γ ( 𝑛 ) ∉ {⊥ , ⊤}}) Fig. 9. Operations on relational abstract domains

Abstract domain operations.

Before we can deﬁne the abstract transformer, we need to introduce afew primitive operations for constructing and manipulating relational values and execution maps(Fig. 9). The operation 𝑢 𝑋 [ 𝑥 = 𝑥 ] strengthens 𝑢 𝑋 by enforcing equality between 𝑥 and 𝑥 in thedependency vectors. Similarly, 𝑢 𝑋 [ 𝑥 = 𝑣 ] strengthens 𝑢 𝑋 by requiring that the value desribed by 𝑥 is 𝑣 . We use the operation 𝑢 𝑋 [ 𝑥 ← 𝑢 ′ ] to strengthen 𝑢 𝑋 with 𝑢 ′ for 𝑥 . Logically, this can be viewedas computing 𝑢 𝑋 ∧ 𝑢 ′ [ 𝑥 / 𝜈 ] . The operation 𝑀 r [ 𝑛 ← 𝑢 𝑋 ] lifts 𝑢 𝑋 [ 𝑛 ← 𝑢 ′ ] pointwise to relationalexecution maps. Note that here for every 𝑛 ′ we implicitly rescope 𝑀 ( 𝑛 ′ ) so that its scope includes 𝑛 . The operation 𝑢 𝑋 [ Γ ] , conversely, strengthens 𝑢 𝑋 with all the relations describing the valuesat nodes 𝑋 in Γ : 𝑋 → 𝑢 . Intuitively, this operation will be used to push scope/environmentassumptions to a relational value. Finally, the rescoping operation 𝑢 𝑋 [ 𝑋 ′ ] changes the scope of 𝑢 𝑋 from 𝑋 to 𝑋 ′ while strengthening it with the information that all nodes in 𝑋 ′ \ 𝑋 are neither ⊥ nor ⊤ . We use the operation ∃ 𝑛. 𝑢 𝑋 def = 𝑢 [ 𝑋 \ { 𝑛 }] to simply project out, i.e., remove a node 𝑛 from thescope of a given relational value 𝑢 𝑋 . We (implicitly) use the rescoping operations to enable precisepropagation between relational values that have incompatible scopes.The relational abstraction of a constant 𝑐 in scope 𝑋 is deﬁned as [ 𝜈 = 𝑐 ] r 𝑋 def = ⊤ r [ 𝜈 = 𝑐 ] = { 𝑑 ∈D r 𝑋 | 𝑑 ( 𝜈 ) = 𝑐 } . B.3 Abstract Transformer

The abstract transformer, step r , of the relational semantics is shown in Fig. 10. It closely resemblesthe concrete transformer, step , where relational execution maps 𝑀 r take the place of concreteexecution maps 𝑀 and relational values 𝑢 take the place of concrete values 𝑣 .The case for constant values 𝑐 ℓ eﬀectively replaces the join on values in the concrete transformerusing a join on relational values. The constant 𝑐 is abstracted by the relational value [ 𝜈 = 𝑐 ] r [ Γ ] ,which establishes the relation between 𝑐 and the other values stored at the nodesin 𝑀 that arebound in the environment 𝐸 . In general, the transformer pushes environment assumptions intorelational values rather than leaving the assumptions implicit. We decided to treat environmentassumptions this way so we can interpret relational values independently of the environment. Thisidea is implemented in the rules for variables and constants, as explained above, while it is doneinductively for other constructs. In the case for variables 𝑥 ℓ , the abstract transformer replaces ⋉ in the concrete transformer by the abstract propagation operator ⋉ r deﬁned in Fig. 11. Note thatthe deﬁnition of ⋉ r assumes that its arguments range over the same scope. Therefore, wheneverwe use ⋉ r in step r , we implicitly use the rescoping operator to make the arguments compatible.For instance, in the call to ⋉ r for variables 𝑥 ℓ , the scope of 𝑀 r ( 𝐸 ( 𝑥 )) is 𝑋 𝐸 ( 𝑥 ) which does notinclude the variable node 𝐸 ( 𝑥 ) itself. We therefore implicitly rescope 𝑀 r ( 𝐸 ( 𝑥 )) to 𝑀 r ( 𝐸 ( 𝑥 )) [ 𝑋 𝑛 ] before strengthening it with the equality 𝜈 = 𝐸 ( 𝑥 ) . The other cases are fairly straightforward.Observe how in the case for conditionals ( 𝑒 ℓ ? 𝑒 ℓ : 𝑒 ℓ ) ℓ we analyze both branches and combinethe resulting relational execution maps with a join. In each branch, the map 𝑀 r is strengthened ata Flow Refinement Type Inference Technical Report, step r J 𝑐 ℓ K ( 𝐸, 𝑆 ) def = do 𝑛 = ℓ ⋄ 𝐸 ; Γ r ← env ( 𝐸 ) ; 𝑣 ′ ← 𝑛 : = [ 𝜈 = 𝑐 ] r [ Γ r ] return 𝑢 ′ step r J 𝑥 ℓ K ( 𝐸, 𝑆 ) def = do 𝑛 = ℓ ⋄ 𝐸 ; 𝑢 ← ! 𝑛 ; 𝑛 𝑥 = 𝐸 ( 𝑥 ) ; Γ r ← env ( 𝐸 ) 𝑢 ′ ← 𝑛 𝑥 , 𝑛 : = Γ r ( 𝑥 ) [ 𝜈 = 𝑥 ] t [ Γ r ] ⋉ r 𝑢 [ 𝜈 = 𝑥 ] t [ Γ r ] return 𝑢 ′ step r J ( 𝑒 𝑖 𝑒 𝑗 ) ℓ K ( 𝐸, 𝑆 ) def = do 𝑛 = ℓ ⋄ 𝐸 ; 𝑛 𝑖 = 𝑖 ⋄ 𝐸 ; 𝑛 𝑗 = 𝑗 ⋄ 𝐸 ; 𝑢 ← ! 𝑛𝑢 𝑖 ← step r J 𝑒 𝑖 K ( 𝐸, 𝑆 ) ; assert ( 𝑢 𝑖 ∈ T t ) 𝑢 𝑗 ← step r J 𝑒 𝑗 K ( 𝐸, 𝑆 ) 𝑢 ′ 𝑖 , 𝑥 : [ 𝑖 · 𝑆 ⊳ 𝑢 ′ 𝑗 → 𝑢 ′ ] = 𝑢 𝑖 ⋉ r 𝑥 : [ 𝑖 · 𝑆 ⊳ 𝑢 𝑗 → 𝑢 ] 𝑢 ′′ ← 𝑛 𝑖 , 𝑛 𝑗 , 𝑛 : = 𝑢 ′ 𝑖 , 𝑢 ′ 𝑗 , 𝑢 ′ return 𝑢 ′′ step r J ( 𝜆𝑥 .𝑒 𝑖 ) ℓ K ( 𝐸, 𝑆 ) def = do 𝑛 = ℓ ⋄ 𝐸 ; 𝑢 ← 𝑛 : = 𝑥 : 𝒖 ⊥ 𝑢 ′ ← for 𝑢 do body r ( 𝑥, 𝑒 𝑖 , 𝐸, 𝑢 ) 𝑢 ′′ ← 𝑛 : = 𝑢 ′ return 𝑢 ′′ body r ( 𝑥, 𝑒 𝑖 , 𝐸, 𝑢 ) ( 𝑆 ′ ) def = do 𝑛 𝑥 = 𝑥 ⋄ 𝐸 ⋄ 𝑆 ′ ; 𝐸 𝑖 = 𝐸.𝑥 : 𝑛 𝑥 ; 𝑛 𝑖 = 𝑖 ⋄ 𝐸 𝑖 𝑢 𝑥 ← ! 𝑛 𝑥 ; 𝑢 𝑖 ← step r J 𝑒 𝑖 K ( 𝐸 𝑖 , 𝑆 ′ ) 𝑥 : [ 𝑆 ′ : 𝑢 ′ 𝑥 → 𝑢 ′ 𝑖 ] , 𝑢 ′ = 𝑥 : [ 𝑆 ′ : 𝑢 𝑥 → 𝑢 𝑖 ] ⋉ r 𝑢 | 𝑆 ′ 𝑛 𝑥 , 𝑛 𝑖 : = 𝑢 ′ 𝑥 , 𝑢 ′ 𝑖 return 𝑢 ′ Fig. 10. Abstract transformer for relational data flow semantics 𝑥 : 𝒖 ⋉ r 𝑥 : 𝒖 def = let 𝒖 = Λ 𝑆 cs . let h 𝑢 𝑖 , 𝑢 𝑜 i = 𝒖 ( 𝑆 cs ) ; h 𝑢 𝑖 , 𝑢 𝑜 i = 𝒖 ( 𝑆 cs )h 𝑢 ′ 𝑖 , 𝑢 ′ 𝑖 i = 𝑢 𝑖 ⋉ r 𝑢 𝑖 h 𝑢 ′ 𝑜 , 𝑢 ′ 𝑜 i = 𝑢 𝑜 [ 𝑥 ← 𝑢 𝑖 ] ⋉ r 𝑢 𝑜 [ 𝑥 ← 𝑢 𝑖 ] in (h 𝑢 ′ 𝑖 , 𝑢 𝑜 ⊔ r 𝑢 ′ 𝑜 i , h 𝑢 ′ 𝑖 , 𝑢 𝑜 ⊔ r 𝑢 ′ 𝑜 i) in h 𝑥 : Λ 𝑆 cs . 𝜋 ( 𝒖 ( 𝑆 cs )) , 𝑥 : Λ 𝑆 cs . 𝜋 ( 𝒖 ( 𝑆 cs ))i 𝒖 ⋉ r ⊥ r def = h 𝒖 , 𝒖 ⊥ i 𝒖 ⋉ r ⊤ r def = h⊤ r , ⊤ r i 𝑢 ⋉ r 𝑢 def = h 𝑢 , 𝑢 ⊔ r 𝑢 i (otherwise) Fig. 11. Value propagation in relational data flow semantics with the information about the truth value stored at 𝐸 ⋄ ℓ , reﬂecting the path-sensitive nature ofLiquid types. Lemma B.3.

The function ⋉ r is monotone and increasing. Lemma B.4.

For every 𝑒 ∈ Exp , 𝑆 ∈ S , 𝐸 ∈ E such that h 𝑒, 𝐸 i is well-formed, then step r J 𝑒 K ( 𝐸, 𝑆 ) is monotone and increasing. B.4 Abstract Semantics

The abstract domain of the relational abstraction is given by relational properties P r def = M r . Therelational abstract semantics C r : Exp → P r is then deﬁned as the least ﬁxpoint of step r as follows: C r J 𝑒 K def = lfp ¤⊑ r 𝑀 r ⊥ Λ 𝑀 r . let h _ , 𝑀 r i = step r J 𝑒 K ( 𝜖, 𝜖 ) ( 𝑀 r ) in 𝑀 r Theorem B.5.

The relational abstract semantics is sound, i.e., for all programs 𝑒 , C J 𝑒 K ⊆ ¤ 𝛾 r ( C r J 𝑒 K ) . echnical Report, Zvonimir Pavlinovic, Yusen Su, and Thomas Wies C COLLAPSED RELATIONAL DATA FLOW SEMANTICSC.1 Notation

The notations and deﬁnitions for the collapsed semantics are as for relational values except thatabstract nodes take the place of concrete nodes.

C.2 Domain Ordering

The ordering ⊑ ˆ r 𝑋 on collapsed values V ˆ r 𝑋 resembles the ordering on relational values: ˆ 𝑢 ⊑ ˆ r ˆ 𝑢 def ⇐⇒ ˆ 𝑢 = ⊥ ˆ r ∨ ˆ 𝑢 = ⊤ ˆ r ∨ ( ˆ 𝑢 , ˆ 𝑢 ∈ R ˆ r 𝑋 ∧ ˆ 𝑢 ⊆ ˆ 𝑢 ) ∨( ˆ 𝑢 , ˆ 𝑢 ∈ T ˆ r 𝑋 ∧ ∀ ˆ 𝑆 cs . ˆ 𝑢 ( ˆ 𝑆 cs ) ¤⊑ ˆ r ˆ 𝑢 ( ˆ 𝑆 cs )) We assume that when we compare two collapsed tables, we implicitly apply 𝛼 -renaming so thatthey range over the same dependency variables. Again, this ordering induces, for every 𝑋 , a com-plete lattice (V ˆ r 𝑋 , ⊑ ˆ r , ⊥ ˆ r , ⊤ ˆ r , ⊔ ˆ r , ⊓ ˆ r ) . The deﬁnition of the join ⊔ ˆ r operator is as follows. ⊥ ˆ r ⊔ ˆ r ˆ 𝑢 def = ˆ 𝑢 ˆ 𝑢 ⊔ ˆ r ⊥ ˆ r def = ˆ 𝑢 𝑟 ˆ r ⊔ ˆ r 𝑟 ˆ r def = 𝑟 ˆ r ∪ 𝑟 ˆ r 𝑥 : ˆ 𝑢 → ˆ 𝑢 ⊔ ˆ r 𝑥 : ˆ 𝑢 → ˆ 𝑢 def = 𝑥 : ˆ 𝑢 ⊔ ˆ r ˆ 𝑢 → ˆ 𝑢 ⊔ ˆ r ˆ 𝑢 ˆ 𝑢 ⊔ ˆ r ˆ 𝑢 def = ⊤ ˆ r (otherwise)The meet ⊓ ˆ r is deﬁned similarly. Lemma C.1.

Let 𝑉 ∈ ℘ (V ˆ r 𝑋 ) . Then, ⊔ ˆ r 𝑉 = lub ( 𝑉 ) and ⊓ ˆ r 𝑉 = glb ( 𝑉 ) subject to ⊑ ˆ r . We implicitly identify V ˆ r 𝑋 with its quotient subject to the equivalence relation induced by ⊑ ˆ r . Theselattices are lifted pointwise to a complete lattice on collapsed execution maps (M ˆ r , ¤⊑ ˆ r , 𝑀 ˆ r ⊥ , 𝑀 ˆ r ⊤ , ¤⊔ ˆ r , ¤⊓ ˆ r ) .We ﬁrst formally state that our complete lattice of collapsed values forms a Galois connectionto the complete lattice of relational values. Lemma C.2.

For every abstract scope 𝑋 , 𝛾 ˆ r 𝑋 is a complete meet-morphism between V ˆ r 𝑋 and V r . It follows then that there exists a unique Galois connection h 𝛼 ˆ r , 𝛾 ˆ r i between V r 𝑋 and V ˆ r 𝑋 . FromLemma C.2 it easily follows that ¤ 𝛾 ˆ r is also a complete meet-morphism. We denote by ¤ 𝛼 ˆ r : M r →M ˆ r the left adjoint of the induced Galois connection. Abstract domain operations.

In the new abstract transformer we replace the strengthening opera-tions on relational values by their most precise abstractions on collapsed values. 𝛾 ˆ r 𝛿 are simply liftedpointwise. For instance, strengthening with equality between dependency variables, ˆ 𝑢 𝑋 [ 𝑥 = 𝑥 ] ,is deﬁned as ˆ 𝑢 𝑋 [ 𝑥 = 𝑥 ] def = 𝛼 ˆ r 𝑋 ( 𝛾 ˆ r 𝑋 ( ˆ 𝑢 ) [ 𝑥 = 𝑥 ]) This deﬁnition in terms of most precise abstraction should not obscure the simple nature of thisoperation. E.g., for collapsed dependency relations 𝑟 ˆ r we simply have: 𝑟 ˆ r 𝑋 [ ˆ 𝑛 = ˆ 𝑛 ] = { 𝑑 ˆ r ∈ 𝑟 ˆ r 𝑋 | ˆ 𝑛 , ˆ 𝑛 ∈ 𝑋 ⇒ 𝑑 ˆ r ( ˆ 𝑛 ) = 𝑑 ˆ r ( ˆ 𝑛 ) } as one might expect. The other strengthening operations on relational values are abstracted in asimilar manner. C.3 Abstract Transformer

The abstract transformer for the collapsed semantics, step ˆ r : Exp → ˆ E × ˆ S → M ˆ r → V ˆ r 𝑋 × M ˆ r ,resembles the relational abstract transformer step r . That is, the operations on relational values in Technically, ⊑ ˆ r is only a pre-order due to 𝛼 -renaming of tables.36 ata Flow Refinement Type Inference Technical Report, step r are simply replaced by their abstractions on collapsed values in step ˆ r . We omit the deﬁnitionsof the transformer and data propagation as they follow straightforwardly from the relational ones. Lemma C.3.

The function prop ˆ r is monotone and increasing. Lemma C.4.

For every 𝑒 ∈ Exp , ˆ 𝑆 ∈ ˆ S , and ˆ 𝐸 ∈ ˆ E such that h 𝑒, ˆ 𝐸 i is well-formed, then step ˆ r J 𝑒 K ( ˆ 𝐸, ˆ 𝑆 ) is monotone and increasing. In the above, the well-formedness of a pair ( 𝑒, ˆ 𝐸 ) of an expression 𝑒 and abstract environment ˆ 𝐸 is deﬁned in a similar way as for the concrete environments. C.4 Abstract Semantics

The abstract domain of the collapsed abstract semantics is given by collapsed properties P ˆ r def = M ˆ r .The collapsed semantics C ˆ r J · K : Exp → P ˆ r is then deﬁned as C ˆ r J 𝑒 K def = lfp ¤⊑ ˆ r 𝑀 ˆ r ⊥ Λ 𝑀 ˆ r . let h _ , 𝑀 ˆ r i = step ˆ r J 𝑒 K ( 𝜖, 𝜖 ) ( 𝑀 ˆ r ) in 𝑀 ˆ r . Theorem C.5.

The collapsed abstract semantics is sound, i.e., for all programs 𝑒 , C r J 𝑒 K ¤⊑ r ¤ 𝛾 ˆ r ( C ˆ r J 𝑒 K ) . D PROOFSD.1 Preliminaries

We ﬁrst repeat some of the basic results regarding abstract interpretation that we use in our proofs.

Proposition D.1 ([Cousot and Cousot 1979]).

The following statements are equivalent:(1) h 𝛼, 𝛾 i is a Galois connection(2) 𝛼 and 𝛾 are monotone, 𝛼 ◦ 𝛾 is reductive: ∀ 𝑦 ∈ 𝐿 . 𝛼 ( 𝛾 ( 𝑦 )) ⊑ 𝑦 , and 𝛾 ◦ 𝛼 is extensive: ∀ 𝑥 ∈ 𝐿 . 𝑥 ⊑ 𝛾 ( 𝛼 ( 𝑥 )) (3) 𝛾 is a complete meet-morphism and 𝛼 = Λ 𝑥 ∈ 𝐿 . d { 𝑦 ∈ 𝐿 | 𝑥 ⊑ 𝛾 ( 𝑦 ) } (4) 𝛼 is a complete join-morphism and 𝛾 = Λ 𝑦 ∈ 𝐿 . Ã { 𝑥 ∈ 𝐿 | 𝛼 ( 𝑥 ) ⊑ 𝑦 } . Proposition D.2 ([Cousot and Cousot 1979]). 𝛼 is onto iﬀ 𝛾 is one-to-one iﬀ 𝛼 ◦ 𝛾 = 𝜆𝑦. 𝑦 . We additionally use

Loc ( 𝑒 ) to denote the set of all locations of 𝑒 . Next, given an expression 𝑒 with unique locations and a location ℓ ∈ Loc ( 𝑒 ) , we use 𝑒 ( ℓ ) in our proofs to designate thesubexpression of 𝑒 with the location ℓ . D.2 Concrete Semantics

Proof of Lemma 5.2.

By mutual structural induction on both input concrete values. (cid:3)

Proof of Lemma 5.3.

By structural induction on 𝑒 and Lemma 5.2. (cid:3) D.3 Relational Semantics

Proof of Lemma B.2.

Let 𝑈 ∈ ℘ (V r 𝑋 ) . We show 𝛾 r 𝑋 (⊓ r 𝑈 ) = Ñ 𝑢 ∈ 𝑈 𝛾 r 𝑋 ( 𝑢 ) .We carry the proof by case analysis on elements of 𝑈 and induction on the minimum depththereof when 𝑈 consists of tables only. The depth of a relational value is deﬁned in the expectedway: the depth of bottom, relational, and top values is 0, whereas depth of a table is deﬁned by themaximum depth of any value stored in the table.We ﬁrst consider the trivial (base) cases where 𝑈 is not a set of multiple relational tables. • 𝑈 = ∅ . Here, 𝛾 r 𝑋 (⊓ r ∅) = 𝛾 r 𝑋 (⊤ r ) = ( 𝑋 → V) × V = Ñ ∅ . • 𝑈 = { 𝑢 } . Trivial. • 𝑟 ∈ 𝑈 , 𝒖 ∈ 𝑈 . We have ⊓ r { 𝑟, 𝒖 } = ⊥ r and since ⊥ r is the bottom element, ⊓ r 𝑈 = ⊥ r . Similarly, 𝛾 r 𝑋 ( 𝑟 ) ∩ 𝛾 r 𝑋 ( 𝒖 ) = 𝛾 r 𝑋 (⊥ r ) and since ∀ 𝑢.𝛾 r 𝑋 (⊥ r ) ⊆ 𝛾 r 𝑋 ( 𝑢 ) , it follows Ñ 𝑢 ∈ 𝑈 𝛾 r 𝑋 ( 𝑢 ) = 𝛾 r 𝑋 (⊥ r ) = 𝛾 r 𝑋 (⊓ r 𝑈 ) . echnical Report, Zvonimir Pavlinovic, Yusen Su, and Thomas Wies • 𝑈 ⊆ R r 𝑋 . Here, 𝛾 r 𝑋 (⊓ r 𝑈 ) = 𝛾 r 𝑋 ( Ù 𝑟 ∈ 𝑈 𝑟 ) = { h Γ , 𝑐 i | 𝑑 ∈ ( Ù 𝑟 ∈ 𝑈 𝑟 )) ∧ 𝑑 ( 𝜈 ) = 𝑐 ∧ ∀ 𝑥 ∈ 𝑋 . Γ ( 𝑥 ) ∈ 𝛾 d ( 𝑑 ( 𝑥 )) } ∪ 𝛾 r 𝑋 (⊥ r ) [by def. of 𝛾 r 𝑋 ] = Ù 𝑟 ∈ 𝑈 ( { h Γ , 𝑐 i | 𝑑 ∈ 𝑟 ∧ 𝑑 ( 𝜈 ) = 𝑐 ∧ ∀ 𝑥 ∈ 𝑋 . Γ ( 𝑥 ) ∈ 𝛾 d ( 𝑑 ( 𝑥 )) } ∪ 𝛾 r 𝑋 (⊥ r )) [by uniqueness/non-overlapping of 𝛾 d ] = Ù 𝑟 ∈ 𝑈 𝛾 r 𝑋 ( 𝑟 ) [by def. of 𝛾 r 𝑋 ] • ⊤ r ∈ 𝑈 . Here, ⊓ r 𝑈 = ⊓ r ( 𝑈 /{⊤ r }) and Ñ 𝑢 ∈ 𝑈 𝛾 r 𝑋 ( 𝑢 ) = Ñ 𝑢 ∈ 𝑈,𝑢 ≠ ⊤ r 𝛾 r 𝑋 ( 𝑢 ) since 𝛾 r 𝑋 (⊤ r ) = ( 𝑋 → V)×V .The set 𝑈 /{⊤ r } either falls into one of the above cases or consists of multiple tables, which is the casewe show next. Let 𝑈 ⊆ T r 𝑋 and | 𝑈 | > . Let 𝑑 be the minimum depth of any table in 𝑈 . Ù 𝑥 : 𝒖 ∈ 𝑈 𝛾 r 𝑋 ( 𝑥 : 𝒖 ) = { h Γ , 𝒗 i | ∀ 𝑆 cs . 𝒗 ( 𝑆 cs ) = h 𝑣 𝑖 , 𝑣 𝑜 i ∧ h Γ 𝑖 , 𝑣 𝑖 i ∈ Ù 𝒖 ∈ 𝑈 𝛾 r ( 𝜋 ( 𝒖 ( 𝑆 cs ))) ∧h Γ 𝑜 , 𝑣 𝑜 i ∈ Ù 𝒖 ∈ 𝑈 𝛾 r ( 𝜋 ( 𝒖 ( 𝑆 cs ))) ∧ Γ 𝑜 = Γ 𝑖 [ 𝑥 ↦→ 𝑣 𝑖 ] } ∪ 𝛾 r 𝑋 (⊥ r ) [by def. of 𝛾 r 𝑋 and 𝑥 not in the scope of 𝜋 ( 𝒖 ( 𝑆 cs )) ] = { h Γ , 𝒗 i | ∀ 𝑆 cs . 𝒗 ( 𝑆 cs ) = h 𝑣 𝑖 , 𝑣 𝑜 i ∧ h Γ 𝑖 , 𝑣 𝑖 i ∈ 𝛾 r 𝑋 (⊓ r 𝒖 ∈ 𝑈 𝜋 ( 𝒖 ( 𝑆 cs ))) ∧h Γ 𝑜 , 𝑣 𝑜 i ∈ 𝛾 r 𝑋 (⊓ r 𝒖 ∈ 𝑈 𝜋 ( 𝒖 ( 𝑆 cs ))) ∧ Γ 𝑜 = Γ 𝑖 [ 𝑥 ↦→ 𝑣 𝑖 ] } ∪ 𝛾 r 𝑋 (⊥ r ) [by i.h. on the min. depth of { 𝜋 ( 𝒖 ( 𝑆 cs ))) | 𝒖 ∈ 𝑈 } and { 𝜋 ( 𝒖 ( 𝑆 cs ))) | 𝒖 ∈ 𝑈 } ] = 𝛾 r 𝑋 (⊓ r 𝑥 : 𝒖 ∈ 𝑈 𝑥 : 𝒖 ) [by def. of 𝛾 r 𝑋 ] (cid:3) Proof of Lemma B.3.

By mutual structural induction on both input relational values. (cid:3)

Proof of Lemma B.4.

By structural induction on 𝑒 and Lemma B.3. (cid:3) D.4 Collapsed Relational Semantics

Proof of Lemma C.2.

Similar to the proof of Lemma B.2. (cid:3)

Proof of Lemma C.3.

By mutual structural induction on both input types. (cid:3)

Proof of Lemma C.4.

By structural induction on 𝑒 and Lemma C.3. (cid:3) D.5 Parametric Refinement Type Semantics

Proof of Lemma 7.1.

Similar to the proofs for Lemma B.2 and Lemma C.2. The argument isbased on the deﬁnition of 𝛾 t and the fact that type reﬁnements from a Galois connections with thecomplete lattice of dependency relations. (cid:3) D.6 Soundness and Completeness of Typing Rules

Lemma D.3.

For all 𝑋 ⊆ Var , 𝑡 , 𝑡 , 𝑡 ′ 𝑡 ′ , 𝑡 ∈ V t 𝑋 and 𝑥, 𝑦 ∈ 𝑋 , the following are true:(1) strengthening is monotone: if 𝑡 ⊑ t 𝑡 then 𝑡 [ 𝑥 ← 𝑡 ] ⊑ t 𝑡 [ 𝑥 ← 𝑡 ] (2) strengthening is reductive: 𝑡 [ 𝑥 ← 𝑡 ] ⊑ t 𝑡 (3) strengthening is idempotent: 𝑡 [ 𝑥 ← 𝑡 ] [ 𝑥 ← 𝑡 ] = 𝑡 [ 𝑥 ← 𝑡 ] (4) strengthening is commutative: 𝑡 [ 𝑥 ← 𝑡 ] [ 𝑦 ← 𝑡 ] = 𝑡 [ 𝑦 ← 𝑡 ] [ 𝑥 ← 𝑡 ] ata Flow Refinement Type Inference Technical Report, Proof.

The proof goes straightforwardly by structural induction over types. In the base caseall properties follow directly from the properties of meets. (cid:3)

Lemma D.4.

For all 𝑋 ⊆ Var , 𝑡 , 𝑡 , 𝑡 ′ , 𝑡 ′ , 𝑡 ∈ V t 𝑋 and 𝑥 ∈ 𝑋 , if h 𝑡 ′ , 𝑡 ′ i = 𝑡 [ 𝑥 ← 𝑡 ] ⋉ t 𝑡 [ 𝑥 ← 𝑡 ] and 𝑡 ′ and 𝑡 ′ are safe, then 𝑡 ′ = 𝑡 ′ [ 𝑥 ← 𝑡 ] and 𝑡 ′ = 𝑡 ′ [ 𝑥 ← 𝑡 ] . Proof.

The proof goes by simultaneous induction over the depth of 𝑡 and 𝑡 . Assume h 𝑡 ′ , 𝑡 ′ i = 𝑡 [ 𝑥 ← 𝑡 ] ⋉ t 𝑡 [ 𝑥 ← 𝑡 ] and that 𝑡 ′ and 𝑡 ′ are safe. We case split on the deﬁnition of ⋉ ˆ r . Case 𝑡 [ 𝑥 ← 𝑡 ] = ⊥ t and 𝑡 [ 𝑥 ← 𝑡 ] ∉ R t . We have 𝑡 ′ = 𝒕 ⊥ = 𝒕 ⊥ [ 𝑥 ← 𝑡 ] by deﬁnition of ⋉ t and because strengthening is reductive. Moreover, we have 𝑡 ′ = 𝑡 [ 𝑥 ← 𝑡 ] and by idempotencyof strengthening we obtain 𝑡 ′ = 𝑡 [ 𝑥 ← 𝑡 ′ ] . Case 𝑡 [ 𝑥 ← 𝑡 ] = ⊤ t and 𝑡 [ 𝑥 ← 𝑡 ] ∉ R t . By deﬁnition of ⋉ ˆ r we have 𝑡 ′ = ⊤ 𝑏 , contradictingthe assumption that 𝑡 ′ is safe. Case 𝑡 [ 𝑥 ← 𝑡 ] ∉ R t and 𝑡 [ 𝑥 ← 𝑡 ] ∉ R t . We must have 𝑡 = 𝑧 : 𝒕 , 𝑡 = 𝑧 : 𝒕 , 𝑡 ′ = 𝑧 : 𝒕 ′ , and 𝑡 ′ = 𝑧 : 𝒕 ′ for some 𝒕 , 𝒕 , 𝒕 ′ , 𝒕 ′ and 𝑧 .Let ˆ 𝑆 ∈ ˆ S and deﬁne: h 𝑡 𝑖 , 𝑡 𝑜 i = 𝒕 ( ˆ 𝑆 ) h 𝑡 ′ 𝑖 , 𝑡 ′ 𝑖 i = 𝑡 𝑖 [ 𝑥 ← 𝑡 ] ⋉ t 𝑡 𝑖 [ 𝑥 ← 𝑡 ] ( 𝑎 )h 𝑡 𝑖 , 𝑡 𝑜 i = 𝒕 ( ˆ 𝑆 ) h 𝑡 ′ 𝑜 , 𝑡 ′ 𝑜 i = 𝑡 𝑜 [ 𝑥 ← 𝑡 ] [ 𝑧 ← 𝑡 𝑖 ] ⋉ t 𝑡 𝑜 [ 𝑥 ← 𝑡 ] [ 𝑧 ← 𝑡 𝑖 ] ( 𝑏 ) Note that we have 𝒕 ′ ( ˆ 𝑆 ) = h 𝑡 ′ 𝑖 , 𝑡 𝑜 [ 𝑥 ← 𝑡 ] ⊔ t 𝑡 ′ 𝑜 i and likewise 𝒕 ′ ( ˆ 𝑆 ) = h 𝑡 ′ 𝑖 , 𝑡 𝑜 [ 𝑥 ← 𝑡 ] ⊔ t 𝑡 ′ 𝑜 i .From the fact that 𝑡 ′ and 𝑡 ′ are safe, it follows that also 𝑡 ′ 𝑖 , 𝑡 ′ 𝑖 , 𝑡 ′ 𝑜 , 𝑡 ′ 𝑜 must be safe. Applying theinduction hypothesis to ( 𝑎 ) , we can directly conclude that 𝑡 ′ 𝑖 = 𝑡 ′ 𝑖 [ 𝑥 ← 𝑡 ] and 𝑡 ′ 𝑖 = 𝑡 ′ 𝑖 [ 𝑥 ← 𝑡 ] .It remains to show that 𝑡 𝑜 [ 𝑥 ← 𝑡 ] ⊔ t 𝑡 ′ 𝑜 = ( 𝑡 𝑜 [ 𝑥 ← 𝑡 ] ⊔ t 𝑡 ′ 𝑜 ) [ 𝑥 ← 𝑡 ] and 𝑡 𝑜 [ 𝑥 ← 𝑡 ] ⊔ t 𝑡 ′ 𝑜 = ( 𝑡 𝑜 [ 𝑥 ← 𝑡 ] ⊔ t 𝑡 ′ 𝑜 ) [ 𝑥 ← 𝑡 ] . The right-to-left direction of these equalities follows fromLemma D.3(2). For the other direction we ﬁrst apply commutativity of strengthening (Lemma D.3(4))to ( 𝑏 ) and then use the induction hypothesis to obtain 𝑡 ′ 𝑜 = 𝑡 ′ 𝑜 [ 𝑥 ← 𝑡 ] and ( 𝑑 ) 𝑡 ′ 𝑜 = 𝑡 ′ 𝑜 [ 𝑥 ← 𝑡 ] . The desired inclusions then follow from the derived equalities as well as Lemma D.3(1) andLemma D.3(3). Otherwise.

We must have 𝑡 [ 𝑥 ← 𝑡 ] ∈ R t and by deﬁnition of ⋉ t , we have 𝑡 ′ = 𝑡 [ 𝑥 ← 𝑡 ] and 𝑡 ′ = 𝑡 [ 𝑥 ← 𝑡 ] ⊔ t 𝑡 [ 𝑥 ← 𝑡 ] . Note that by idempotency of strengthening (Lemma D.3(3)) wedirectly have 𝑡 ′ = 𝑡 ′ [ 𝑥 ← 𝑡 ] .To also show the desired equality for 𝑡 ′ , let us ﬁrst assume that 𝑡 [ 𝑥 ← 𝑡 ] ∉ R t . Then by wemust either have 𝑡 [ 𝑥 ← 𝑡 ] = ⊥ t or, by deﬁnition of ⊑ t and ⊔ t , 𝑡 ′ = ⊤ t . In the ﬁrst case, wedirectly obtain 𝑡 ′ = 𝑡 ′ [ 𝑥 ← 𝑡 ] . The second case contradicts the assumption that 𝑡 ′ is safe. Hence,consider the remaining case that 𝑡 [ 𝑥 ← 𝑡 ] ∈ R t . Then we have: 𝑡 ′ = 𝑡 [ 𝑥 ← 𝑡 ] ⊔ t 𝑡 [ 𝑥 ← 𝑡 ]⇒ 𝑡 ′ = 𝑡 [ 𝑥 ← 𝑡 ] ⊔ 𝑏 𝑡 [ 𝑥 ← 𝑡 ]⇒ 𝑡 ′ [ 𝑥 ← 𝑡 ] = ( 𝑡 [ 𝑥 ← 𝑡 ] ⊔ 𝑏 𝑡 [ 𝑥 ← 𝑡 ]) [ 𝑥 ← 𝑡 ]⇒ 𝑡 ′ [ 𝑥 ← 𝑡 ] ⊒ 𝑏 𝑡 [ 𝑥 ← 𝑡 ] [ 𝑥 ← 𝑡 ] ⊔ 𝑏 𝑡 [ 𝑥 ← 𝑡 ] [ 𝑥 ← 𝑡 ] (Lemma D.3(1)) ⇒ 𝑡 ′ [ 𝑥 ← 𝑡 ] ⊒ 𝑏 𝑡 [ 𝑥 ← 𝑡 ] ⊔ 𝑏 𝑡 [ 𝑥 ← 𝑡 ] (Lemma D.3(3)) ⇒ 𝑡 ′ [ 𝑥 ← 𝑡 ] ⊒ 𝑏 𝑡 ′ echnical Report, Zvonimir Pavlinovic, Yusen Su, and Thomas Wies We obtain 𝑡 ′ ⊑ 𝑏 𝑡 ′ [ 𝑥 ← 𝑡 ] directly from the fact that strengthening is reductive (Lemma D.3(2)).Thus, we conclude 𝑡 ′ = 𝑡 ′ [ 𝑥 ← 𝑡 ] . (cid:3) Proof of Lemma 7.5.

Let 𝑡 , 𝑡 ∈ V t 𝑋 . The proof goes by simultaneous induction over the depthof 𝑡 and 𝑡 . Case 𝑡 = ⊥ t . For the left-to-right direction, assume 𝑡 < : 𝑡 . Only rule s-bot applies, so wemust have 𝑡 ≠ ⊤ t . Hence, both 𝑡 and 𝑡 are safe. Moreover, by the deﬁnition of ⋉ t we have ⊥ t ⋉ t 𝑡 = h⊥ t , ⊥ t ⊔ t 𝑡 i = h⊥ t , 𝑡 i .For the other direction, assume that h 𝑡 , 𝑡 i = 𝑡 ⋉ t 𝑡 and 𝑡 , 𝑡 are safe. Then we have 𝑡 ≠ ⊤ t .Hence, using rule s-bot we immediately conclude ⊥ t < : 𝑡 . Case 𝑡 ≠ ⊥ t and 𝑡 ∈ R t . For the left-to-right direction, assume 𝑡 < : 𝑡 . Since 𝑡 ∈ R t onlyrule s-base applies. Hence, we must have 𝑡 ∈ R t , 𝑡 ⊑ 𝑏 𝑡 . Since 𝑡 , 𝑡 ∈ R t , both are safe. Itfurther follows from the deﬁnition of ⋉ t that 𝑡 ⋉ t 𝑡 = h 𝑡 , 𝑡 ⊔ t 𝑡 i . Moreover, 𝑡 ⊑ 𝑏 𝑡 implies 𝑡 = 𝑡 ⊔ 𝑏 𝑡 = 𝑡 ⊔ t 𝑡 .For the other direction, assume that h 𝑡 , 𝑡 i = 𝑡 ⋉ t 𝑡 and 𝑡 , 𝑡 are safe. In particular, it followsthat 𝑡 ≠ ⊤ t . From the deﬁnition of ⋉ t we can further conclude that we must have 𝑡 = 𝑡 ⊔ t 𝑡 .The facts 𝑡 ≠ ⊤ t , 𝑡 ≠ ⊥ t , and the deﬁnitions of ⊑ t and ⊔ t imply that 𝑡 ∈ R t . Hence, we have 𝑡 = 𝑡 ⊔ 𝑏 𝑡 which implies 𝑡 ⊑ 𝑏 𝑡 . Case 𝑡 ≠ ⊥ t and 𝑡 ∉ R t . For the left-to-right direction, assume again 𝑡 < : 𝑡 . Only rule s-fun applies. Hence, we must have 𝑡 = 𝑥 : 𝒕 and 𝑡 = 𝑥 : 𝒕 . Let h 𝑥 : 𝒕 ′ , 𝑥 : 𝒕 ′ i = 𝑡 ⋉ t 𝑡 . We showfor all ˆ 𝑆 ∈ ˆ 𝑆 , 𝒕 ′ ( ˆ 𝑆 ) = 𝒕 ( ˆ 𝑆 ) and 𝒕 ′ ( ˆ 𝑆 ) = 𝒕 ( ˆ 𝑆 ) . Thus, let ˆ 𝑆 ∈ ˆ S and deﬁne h 𝑡 𝑖 , 𝑡 𝑜 i = 𝒕 ( ˆ 𝑆 ) h 𝑡 ′ 𝑖 , 𝑡 ′ 𝑖 i = 𝑡 𝑖 ⋉ t 𝑡 𝑖 h 𝑡 𝑖 , 𝑡 𝑜 i = 𝒕 ( ˆ 𝑆 ) h 𝑡 ′ 𝑜 , 𝑡 ′ 𝑜 i = 𝑡 𝑜 [ 𝑥 ← 𝑡 𝑖 ] ⋉ t 𝑡 𝑜 [ 𝑥 ← 𝑡 𝑖 ] We know from the deﬁnition of ⋉ t that 𝒕 ′ ( ˆ 𝑆 ) = h 𝑡 ′ 𝑖 , 𝑡 𝑜 ⊔ t 𝑡 ′ 𝑜 i and likewise 𝒕 ′ ( ˆ 𝑆 ) = h 𝑡 ′ 𝑖 , 𝑡 𝑜 ⊔ t 𝑡 ′ 𝑜 i .From the deﬁnition of < : we further know 𝑡 𝑖 < : 𝑡 𝑖 and 𝑡 𝑜 < : 𝑡 𝑜 From the induction hypothesis it then follows that h 𝑡 𝑖 , 𝑡 𝑖 i = 𝑡 𝑖 ⋉ t 𝑡 𝑖 and h 𝑡 𝑜 , 𝑡 𝑜 i = 𝑡 𝑜 ⋉ t 𝑡 𝑜 Thus, we can directly conclude 𝑡 ′ 𝑖 = 𝑡 𝑖 and 𝑡 ′ 𝑖 = 𝑡 𝑖 . Moreover, 𝑡 𝑖 , 𝑡 𝑖 , 𝑡 𝑜 , and 𝑡 𝑜 must all be safe.Further note that we have 𝑡 𝑜 [ 𝑥 ← 𝑡 𝑖 ] ⊑ t 𝑡 𝑜 and 𝑡 𝑜 [ 𝑥 ← 𝑡 𝑖 ] ⊑ t 𝑡 𝑜 because strengtheningis reductive (Lemma D.3(2)). By monotonicity of ⋉ t we can therefore conclude 𝑡 ′ 𝑜 ⊑ t 𝑡 𝑜 and 𝑡 ′ 𝑜 ⊑ t 𝑡 𝑜 . This implies that 𝑡 𝑜 ⊔ t 𝑡 ′ 𝑜 = 𝑡 𝑜 and 𝑡 𝑜 ⊔ t 𝑡 ′ 𝑜 = 𝑡 𝑜 hold.For the other direction, assume that h 𝑡 , 𝑡 i = 𝑡 ⋉ t 𝑡 and that 𝑡 and 𝑡 are safe. It follows from 𝑡 ≠ ⊥ t and 𝑡 ∉ R t as well as the deﬁnition of ⋉ t that we must have 𝑡 = 𝑥 : 𝒕 and 𝑡 = 𝑥 : 𝒕 .Let ˆ 𝑆 ∈ ˆ S and deﬁne h 𝑡 𝑖 , 𝑡 𝑜 i = 𝒕 ( ˆ 𝑆 ) h 𝑡 ′ 𝑖 , 𝑡 ′ 𝑖 i = 𝑡 𝑖 ⋉ t 𝑡 𝑖 ( 𝑎 )h 𝑡 𝑖 , 𝑡 𝑜 i = 𝒕 ( ˆ 𝑆 ) h 𝑡 ′ 𝑜 , 𝑡 ′ 𝑜 i = 𝑡 𝑜 [ 𝑥 ← 𝑡 𝑖 ] ⋉ t 𝑡 𝑜 [ 𝑥 ← 𝑡 𝑖 ] ( 𝑏 ) From the assumption that 𝑡 and 𝑡 are safe we can further conclude that 𝑡 𝑖 , 𝑡 𝑖 , 𝑡 𝑜 , and 𝑡 𝑜 are safe.Moreover, by the fact that strengthening is reductive, it follows that 𝑡 𝑜 [ 𝑥 ← 𝑡 𝑖 ] and 𝑡 𝑜 [ 𝑥 ← 𝑡 𝑖 ] ata Flow Refinement Type Inference Technical Report, must be safe, too. Also note that h 𝑡 , 𝑡 i = 𝑡 ⋉ t 𝑡 additionally implies the following equalities 𝑡 𝑖 = 𝑡 ′ 𝑖 ( 𝑐 ) 𝑡 𝑜 = 𝑡 𝑜 ⊔ t 𝑡 ′ 𝑜 ( 𝑒 ) 𝑡 = 𝑡 ′ ( 𝑑 ) 𝑡 𝑜 = 𝑡 𝑜 ⊔ t 𝑡 ′ 𝑜 ( 𝑓 ) Thus we obtain from ( 𝑎 ) , ( 𝑐 ) , ( 𝑑 ) that h 𝑡 𝑖 , 𝑡 𝑖 i = 𝑡 𝑖 ⋉ t 𝑡 𝑖 and, by induction hypothesis, can con-clude 𝑡 𝑖 < : 𝑡 𝑖 .From ( 𝑒 ) we can conclude 𝑡 ′ 𝑜 ⊑ t 𝑡 𝑜 and by monotonicity of strengthening (Lemma D.3(1)) weobtain 𝑡 ′ 𝑜 [ 𝑥 ← 𝑡 𝑖 ] ⊑ t 𝑡 𝑜 [ 𝑥 ← 𝑡 𝑖 ] . Conversely, we have: 𝑡 𝑜 [ 𝑥 ← 𝑡 𝑖 ] ⊑ t 𝑡 ′ 𝑜 ( ⋉ t increasing and ( 𝑏 ) ) ⇒ 𝑡 𝑜 [ 𝑥 ← 𝑡 𝑖 ] [ 𝑥 ← 𝑡 𝑖 ] ⊑ t 𝑡 ′ 𝑜 [ 𝑥 ← 𝑡 𝑖 ] (strengthening monotone, Lemma D.3(1)) ⇒ 𝑡 𝑜 [ 𝑥 ← 𝑡 𝑖 ] ⊑ t 𝑡 ′ 𝑜 [ 𝑥 ← 𝑡 𝑖 ] (strengthening idempotent, Lemma D.3(3))Thus, we have 𝑡 𝑜 [ 𝑥 ← 𝑡 𝑖 ] = 𝑡 ′ 𝑜 [ 𝑥 ← 𝑡 𝑖 ] . Using similar reasoning we can also conclude 𝑡 𝑜 [ 𝑥 ← 𝑡 𝑖 ] = 𝑡 ′ 𝑜 [ 𝑥 ← 𝑡 𝑖 ] . From Lemma D.4 it also follows that 𝑡 ′ 𝑜 = 𝑡 ′ 𝑜 [ 𝑥 ← 𝑡 𝑖 ] and 𝑡 ′ 𝑜 = 𝑡 ′ 𝑜 [ 𝑥 ← 𝑡 𝑖 ] .Hence, it follows from ( 𝑏 ) that h 𝑡 𝑜 [ 𝑥 ← 𝑡 𝑖 ] , 𝑡 𝑜 [ 𝑥 ← 𝑡 𝑖 ]i = 𝑡 𝑜 [ 𝑥 ← 𝑡 𝑖 ] ⋉ t 𝑡 𝑜 [ 𝑥 ← 𝑡 𝑖 ] Then, by induction hypothesis we obtain 𝑡 𝑜 [ 𝑥 ← 𝑡 𝑖 ] < : 𝑡 𝑜 [ 𝑥 ← 𝑡 𝑖 ] . It follows that 𝑡 < : 𝑡 . (cid:3) To prove the soundness of the typing rules, we ﬁrst prove the following theorem, which isslightly stronger than Theorem 7.6.

Theorem D.5.

Let 𝑒 be an expression, Γ t a valid typing environment, ˆ 𝑆 an abstract stack, and 𝑡 ∈ V t . If Γ t , ˆ 𝑆 ⊢ 𝑒 : 𝑡 , then for all 𝑀 t , ˆ 𝐸 such that 𝑀 t is safe and Γ t = 𝑀 t ◦ ˆ 𝐸 there exists 𝑀 t suchthat Γ t = 𝑀 t ◦ ˆ 𝐸 and h 𝑀 t , 𝑡 i = step t J 𝑒 K ( ˆ 𝐸, ˆ 𝑆 ) ( 𝑀 t ) . Moreover, 𝑀 t is safe. Proof.

Assume Γ t ⊢ 𝑒 ℓ : 𝑡 and let 𝑀 t , ˆ 𝐸 such that 𝑀 t is safe and Γ t = 𝑀 t ◦ ˆ 𝐸 . Let further ˆ 𝑛 = ℓ ⋄ ˆ 𝐸 .The proof goes by structural induction over 𝑒 ℓ . Case 𝑒 = 𝑐 . Let 𝑀 t = 𝑀 t [ ˆ 𝑛 ↦→ 𝑡 ] , then Γ t = 𝑀 t ◦ ˆ 𝐸 . By deﬁnition of the typing relation wemust have [ 𝜈 = 𝑐 ] t [ Γ t ] < : 𝑡 . Lemma 7.5 thus implies h[ 𝜈 = 𝑐 ] t [ Γ t ] , 𝑡 i = [ 𝜈 = 𝑐 ] t [ Γ t ] ⋉ t 𝑡 and 𝑡 issafe. Moreover, the fact that Γ t is valid and the deﬁnition of [ 𝜈 = 𝑐 ] t and strengthening imply that [ 𝜈 = 𝑐 ] t [ Γ t ] ∈ R t 𝑋 ˆ 𝑛 . By deﬁnition of ⋉ t we then must have 𝑡 = [ 𝜈 = 𝑐 ] t [ Γ t ] ⊔ t 𝑡 . Hence, by deﬁnitionof step t we directly obtain step t J 𝑒 ℓ K ( ˆ 𝐸, ˆ 𝑆 ) ( 𝑀 t ) = h 𝑀 t , 𝑡 i . Since 𝑀 t and 𝑡 are safe, so is 𝑀 t . Case 𝑒 = 𝑥 . Let 𝑀 t = 𝑀 t [ ˆ 𝑛 ↦→ 𝑡 ] , then Γ t = 𝑀 t ◦ ˆ 𝐸 . By deﬁnition of the typing relation we musthave Γ t ( 𝑥 ) [ 𝜈 = 𝑥 ] t [ Γ t ] < : 𝑡 [ 𝜈 = 𝑥 ] t [ Γ t ] . Lemma 7.5 thus implies h Γ t ( 𝑥 ) [ 𝜈 = 𝑥 ] t [ Γ t ] , 𝑡 [ 𝜈 = 𝑥 ] t [ Γ t ]i = Γ t ( 𝑥 ) [ 𝜈 = 𝑥 ] t [ Γ t ] ⋉ t 𝑡 [ 𝜈 = 𝑥 ] t [ Γ t ] and that 𝑡 is safe. Let step t J 𝑒 K ( ˆ 𝐸, ˆ 𝑆 ) ( 𝑀 t ) = h 𝑀 t , 𝑡 i . We know from the deﬁnition of step t that wehave 𝑀 t = 𝑀 t [ ˆ 𝑛 ↦→ 𝑡 ′ , ˆ 𝑛 𝑥 ↦→ 𝑡 ′ 𝑥 ] where 𝑡 ′ = 𝑀 t ( ˆ 𝑛 ) ⊔ t 𝑡 [ 𝜈 = 𝑥 ] t [ Γ t ] and 𝑡 ′ 𝑥 = 𝑀 t ( ˆ 𝑛 𝑥 ) ⊔ t Γ t ( 𝑥 ) [ 𝜈 = 𝑥 ] t [ Γ t ]] for ˆ 𝑛 𝑥 = ˆ 𝐸 ( 𝑥 ) . By deﬁnition of 𝑀 t we have 𝑀 t ( ˆ 𝑛 ) = 𝑡 and moreover Γ t = 𝑀 t ◦ ˆ 𝐸 implies 𝑀 t ( ˆ 𝑛 𝑥 ) = Γ t ( 𝑥 ) . Since strengthening is reductive, we further know 𝑡 [ 𝜈 = 𝑥 ] t [ Γ t ] ⊑ t 𝑡 and Γ t ( 𝑥 ) [ 𝜈 = 𝑥 ] t [ Γ t ] ⊑ t Γ t ( 𝑥 ) . Thus, we have 𝑡 ′ = Γ t ( 𝑥 ) ⊔ t 𝑡 [ 𝜈 = 𝑥 ] t [ Γ t ] = 𝑡 ′ and 𝑡 ′ 𝑥 = Γ t ( 𝑥 ) ⊔ t Γ t ( 𝑥 ) [ 𝜈 = 𝑥 ] t [ Γ t ] = Γ t ( 𝑥 ) . Hence, we can conclude 𝑀 t = 𝑀 t . Case 𝑒 = 𝑒 𝑖 𝑒 𝑗 . By deﬁnition of the typing relation we must have Γ t , ˆ 𝑆 ⊢ 𝑒 𝑖 : 𝑡 𝑖 and Γ t , ˆ 𝑆 ⊢ 𝑒 𝑗 : 𝑡 𝑗 and 𝑡 𝑖 < : 𝑥 : [ 𝑖 ˆ · ˆ 𝑆 : 𝑡 𝑗 → 𝑡 ] for some 𝑡 𝑖 , 𝑡 𝑗 . echnical Report, Zvonimir Pavlinovic, Yusen Su, and Thomas Wies By induction hypothesis we conclude that there exist 𝑀 t 𝑖 and 𝑀 t 𝑗 such that h 𝑀 t 𝑖 , 𝑡 𝑖 i = step t J 𝑒 𝑖 K ( ˆ 𝐸, ˆ 𝑆 ) ( 𝑀 t 𝑖 ) , h 𝑀 t 𝑗 , 𝑡 𝑗 i = step t J 𝑒 𝑗 K ( ˆ 𝐸, ˆ 𝑆 ) ( 𝑀 t 𝑗 ) and 𝑀 t 𝑖 ◦ ˆ 𝐸 = 𝑀 t 𝑗 ◦ ˆ 𝐸 = Γ t . Moreover, 𝑀 t 𝑖 , 𝑀 t 𝑗 are safe. Deﬁne 𝑀 t asfollows: 𝑀 t = Λ ˆ 𝑛 .  𝑀 t 𝑖 ( ˆ 𝑛 ) if 𝑙𝑜𝑐 ( ˆ 𝑛 ) ∈ 𝑒 𝑖 𝑀 t 𝑗 ( ˆ 𝑛 ) if 𝑙𝑜𝑐 ( ˆ 𝑛 ) ∈ 𝑒 𝑗 𝑡 if ˆ 𝑛 = ˆ 𝑛𝑀 t ( ˆ 𝑛 ) otherwiseNote that we have h 𝑀 t , 𝑡 𝑖 i = step t J 𝑒 𝑖 K ( ˆ 𝐸, ˆ 𝑆 ) ( 𝑀 t ) , h 𝑀 t , 𝑡 𝑗 i = step t J 𝑒 𝑗 K ( ˆ 𝐸, ˆ 𝑆 ) ( 𝑀 t ) and 𝑀 t ◦ ˆ 𝐸 = Γ t .Also, 𝑀 t is safe. By Lemma 7.5 we further know h 𝑡 𝑖 , 𝑥 : [ 𝑖 ˆ · ˆ 𝑆 : 𝑡 𝑗 → 𝑡 ]i = 𝑡 𝑖 ⋉ t 𝑥 : [ 𝑖 ˆ · ˆ 𝑆 : 𝑡 𝑗 → 𝑡 ] .Thus, we have step t J 𝑒 K ( ˆ 𝐸, ˆ 𝑆 ) ( 𝑀 t ) = h 𝑀 t , 𝑡 i . Case 𝑒 = 𝜆𝑥 .𝑒 𝑖 . Let ˆ 𝑆 ′ ∈ 𝑡 . By deﬁnition of the typing relation we must have Γ t 𝑖 , ˆ 𝑆 ′ ⊢ 𝑒 𝑖 : 𝑡 𝑖 and 𝑥 : [ ˆ 𝑆 ′ : 𝑡 𝑥 → 𝑡 𝑖 ] < : 𝑡 ( ˆ 𝑆 ′ ) where Γ t 𝑖 = Γ t .𝑥 : 𝑡 𝑥 for some 𝑡 𝑥 , 𝑡 𝑖 .First, by Lemma 7.5 we know that h 𝑥 : [ ˆ 𝑆 ′ : 𝑡 𝑥 → 𝑡 𝑖 ] , 𝑡 ( ˆ 𝑆 ′ )i = 𝑥 : [ ˆ 𝑆 ′ : 𝑡 𝑥 → 𝑡 𝑖 ] ⋉ t 𝑡 ( ˆ 𝑆 ′ ) and that 𝑡 ( ˆ 𝑆 ) , 𝑡 𝑥 , 𝑡 𝑖 are all safe.Deﬁne 𝑀 t ˆ 𝑆 ′ = 𝑀 t [ ˆ 𝑛 ˆ 𝑆 ′ ,𝑥 ↦→ 𝑡 𝑥 ] and ˆ 𝐸 ˆ 𝑆 ′ ,𝑖 = ˆ 𝐸 [ 𝑥 ↦→ ˆ 𝑛 ˆ 𝑆 ′ ,𝑥 ] where ˆ 𝑛 ˆ 𝑆 ′ ,𝑥 = 𝑥 ⋄ ˆ 𝐸 ⋄ ˆ 𝑆 ′ . Then Γ t ℓ = 𝑀 t ˆ 𝑆 ′ ◦ ˆ 𝐸 ˆ 𝑆 ′ ,𝑖 . Because ˆ 𝑆 ′ ∈ 𝑡 and 𝑥 : [ ˆ 𝑆 ′ : 𝑡 𝑥 → 𝑡 𝑖 ] < : 𝑡 ( ˆ 𝑆 ′ ) we know that 𝑡 𝑥 ≠ ⊥ t . If follows that Γ t 𝑖 is valid. Hence, applying the induction hypothesis, it follows that there exists 𝑀 t ˆ 𝑆 ′ , such that step t J 𝑒 𝑖 K ( ˆ 𝐸 ˆ 𝑆 ′ ,𝑖 , ˆ 𝑆 ′ ) ( 𝑀 t ˆ 𝑆 ′ , ) = h 𝑀 t ˆ 𝑆 ′ , , 𝑡 𝑖 i . Altogether, we can therefore conclude body t ( 𝑥, 𝑒 𝑖 , ˆ 𝑛, ˆ 𝐸, 𝑡 ) ( ˆ 𝑆 ′ ) ( 𝑀 t ˆ 𝑆 ′ , ) = h 𝑀 t ˆ 𝑆 ′ , , 𝑡 ( ˆ 𝑆 ′ )i . Now deﬁne 𝑀 t = Λ ˆ 𝑛 .  𝑡 if ˆ 𝑛 = ˆ 𝑛 𝑀 t ˆ 𝑆 ′ , ( ˆ 𝑛 ) if ˆ 𝑛 ˆ 𝑆 ′ ,𝑥 ∈ ˆ 𝑛 𝑀 t ( ˆ 𝑛 ) otherwiseNote that we now have for all ˆ 𝑆 ′ ∈ ˆ S , body t ( 𝑥, 𝑒 𝑖 , ˆ 𝑛, ˆ 𝐸, 𝑡 ) ( ˆ 𝑆 ′ ) ( 𝑀 t ) = h 𝑀 t , 𝑡 ( ˆ 𝑆 ′ )i . Moreover, 𝑀 t issafe and satisﬁes Γ t = 𝑀 t ◦ ˆ 𝐸 . Finally, note that because 𝑡 ∈ T t , we have 𝑀 t ( ˆ 𝑛 ) = 𝑡 = 𝑡 ⊔ t 𝑥 : 𝒕 ⊥ = Ã t ˆ 𝑆 ′ ∈ 𝑡 𝑡 ( ˆ 𝑆 ′ ) . Hence, it follows that step t J 𝑒 K ( ˆ 𝐸, ˆ 𝑆 ) ( 𝑀 t ) = h 𝑀 t , 𝑡 i holds. (cid:3) Proof of Theorem 7.6.

The theorem follows directly from Theorem D.5 by letting 𝐸 ( 𝑥 ) = 𝑥 ⋄ 𝜖 ⋄ 𝜖 and 𝑀 t ( 𝑛 ) = ( 𝛾 ( 𝑥 ) if 𝑛 = 𝐸 ( 𝑥 )⊥ t otherwise (cid:3) Proof of Theorem 7.7.

Assume step t J 𝑒 ℓ K ( ˆ 𝐸, ˆ 𝑆 ) ( 𝑀 t ) = h 𝑀 t , 𝑡 i and 𝑀 t is safe. The proof goesby structural induction over 𝑒 . Let ˆ 𝑛 = ℓ ⋄ ˆ 𝐸 and Γ = 𝑀 t ◦ ˆ 𝐸 . Case 𝑒 = 𝑐 . By deﬁnition of step t , we have 𝑀 t ( ˆ 𝑛 ) = 𝑡 = 𝑡 ⊔ t [ 𝜈 = 𝑐 ] t [ Γ t ] . Moreover, since [ 𝜈 = 𝑐 ] t [ Γ t ] ∈ R t and since 𝑡 ≠ ⊤ t we have by deﬁnition of ⋉ t that h[ 𝜈 = 𝑐 ] t [ Γ t ] , 𝑡 i = [ 𝜈 = 𝑐 ] t [ Γ t ] ⋉ t 𝑡 holds. It follows from Lemma 7.5, that [ 𝜈 = 𝑐 ] t [ Γ t ] < : 𝑡 must also hold. By the deﬁnition of thetyping relation we then obtain Γ t , ˆ 𝑆 ⊢ 𝑒 : 𝑡 . ata Flow Refinement Type Inference Technical Report, Case 𝑒 = 𝑥 . Let ˆ 𝑛 𝑥 = ˆ 𝐸 ( 𝑥 ) . By deﬁnition of step t , we have h 𝑡 𝑥 , 𝑡 ′ i = Γ t ( 𝑥 ) [ 𝜈 = 𝑥 ] t [ Γ t ] ⋉ t Γ t ( 𝑥 ) [ 𝜈 = 𝑥 ] t [ Γ t ] where 𝑡 𝑥 ⊑ t Γ t ( 𝑥 ) and 𝑡 ′ ⊑ t 𝑡 . Hence by monotonicity of strengthening weobtain 𝑡 𝑥 [ 𝜈 = 𝑥 ] t [ Γ t ] ⊑ t Γ t ( 𝑥 ) [ 𝜈 = 𝑥 ] t [ Γ t ] and 𝑡 ′ [ 𝜈 = 𝑥 ] t [ Γ t ] ⊑ t 𝑡 [ 𝜈 = 𝑥 ] t [ Γ t ] .Because ⋉ t is increasing, we further know Γ t ( 𝑥 ) [ 𝜈 = 𝑥 ] t [ Γ t ] ⊑ t 𝑡 𝑥 and 𝑡 [ 𝜈 = 𝑥 ] t [ Γ t ] ⊑ t 𝑡 ′ . Applying monotonicity and idempotence of strengthening again we can conclude Γ t ( 𝑥 ) [ 𝜈 = 𝑥 ] t [ Γ t ] ⊑ t 𝑡 𝑥 [ 𝜈 = 𝑥 ] t [ Γ t ] and 𝑡 [ 𝜈 = 𝑥 ] t [ Γ t ] ⊑ t 𝑡 ′ [ 𝜈 = 𝑥 ] t [ Γ t ] . Hence, we obtain in fact Γ t ( 𝑥 ) [ 𝜈 = 𝑥 ] t [ Γ t ] = 𝑡 𝑥 [ 𝜈 = 𝑥 ] t [ Γ t ] and 𝑡 [ 𝜈 = 𝑥 ] t [ Γ t ] = 𝑡 ′ [ 𝜈 = 𝑥 ] t [ Γ t ] . By Lemma D.4 it further followsthat 𝑡 𝑥 = 𝑡 𝑥 [ 𝜈 = 𝑥 ] t [ Γ t ] and 𝑡 ′ = 𝑡 ′ [ 𝜈 = 𝑥 ] t [ Γ t ] . Hence, we can apply Lemma 7.5 to conclude Γ t ( 𝑥 ) [ 𝜈 = 𝑥 ] t [ Γ t ] < : 𝑡 [ 𝜈 = 𝑥 ] t [ Γ t ] . Finally, using the typing rule for variables, it follows that Γ t , ˆ 𝑆 ⊢ 𝑒 : 𝑡 holds. Case 𝑒 = 𝑒 𝑖 𝑒 𝑗 . By the deﬁnition of step t and the fact that step t is increasing, we must have step t J 𝑒 𝑖 K ( ˆ 𝐸, ˆ 𝑆 ) ( 𝑀 t ) = h 𝑀 t , 𝑡 𝑖 i and step t J 𝑒 𝑗 K ( ˆ 𝐸, ˆ 𝑆 ) ( 𝑀 t ) = h 𝑀 t , 𝑡 𝑗 i . Hence, by induction hypothesis,we conclude Γ t , ˆ 𝑆 ⊢ 𝑒 𝑖 : 𝑡 𝑖 and Γ t , ˆ 𝑆 ⊢ 𝑒 𝑗 : 𝑡 𝑗 . Furthermore, we know that 𝑡 ′ 𝑖 , 𝑥 : [ 𝑖 ˆ · ˆ 𝑆 ⊳ 𝑡 ′ 𝑗 → 𝑡 ′ ] = 𝑡 𝑖 ⋉ t 𝑥 : [ 𝑖 ˆ · ˆ 𝑆 ⊳ 𝑡 𝑗 → 𝑡 ] for some 𝑡 ′ 𝑖 , 𝑡 ′ 𝑗 , 𝑡 ′ such that 𝑡 ′ 𝑖 ⊑ t 𝑡 𝑖 , 𝑡 ′ 𝑗 ⊑ t 𝑡 𝑗 and 𝑡 ′ ⊑ t 𝑡 . Since ⋉ t is increasing, we conclude 𝑡 𝑖 , 𝑥 : [ 𝑖 ˆ · ˆ 𝑆 ⊳ 𝑡 𝑗 → 𝑡 ] = 𝑡 𝑖 ⋉ t 𝑥 : [ 𝑖 ˆ · ˆ 𝑆 ⊳ 𝑡 𝑗 → 𝑡 ] Moreover, since 𝑀 t is safe, so must be 𝑡 𝑖 , 𝑡 𝑗 , and 𝑡 . Applying Lemma 7.5 we therefore obtain 𝑡 𝑖 < : 𝑥 : [ 𝑖 ˆ · ˆ 𝑆 ⊳ 𝑡 𝑗 → 𝑡 ] and, hence, Γ t , ˆ 𝑆 ⊢ 𝑒 : 𝑡 using the typing rule for function application. Case 𝑒 = 𝜆𝑥 . 𝑒 𝑖 . By the deﬁnition of step t , we must have 𝑀 t ( ˆ 𝑛 ) = 𝑡 = 𝑡 ⊔ t 𝒕 ⊥ . Since 𝑡 ≠ ⊤ t itfollows that 𝑡 ∈ T t .Now let ˆ 𝑆 ∈ 𝑡 . Since body t is increasing, it follows that we must have body t ( 𝑥, 𝑒 𝑖 , ˆ 𝑛, ˆ 𝐸, 𝑡 ) ( ˆ 𝑆 ′ ) ( 𝑀 t ) = h 𝑀 t , 𝑡 ( ˆ 𝑆 ′ )i Using similar reasoning again for the deﬁnition of body t , we conclude that we must have step t J 𝑒 𝑖 K ( ˆ 𝐸 𝑖 , ˆ 𝑆 ′ ) ( 𝑀 t ) = h 𝑀 t , 𝑡 𝑖 i where 𝑡 𝑖 = 𝑀 t ( ˆ 𝑛 𝑖 ) , ˆ 𝑛 𝑖 = 𝑖 ⋄ ˆ 𝐸 ′ 𝑖 , ˆ 𝐸 𝑖 , = ˆ 𝐸.𝑥 : ˆ 𝑛 𝑥 , and ˆ 𝑛 𝑥 = 𝑥 ⋄ ˆ 𝐸 ⋄ ˆ 𝑆 ′ .Let further 𝑡 𝑥 = 𝑀 t ( ˆ 𝑛 𝑥 ) and Γ t 𝑖 = Γ t .𝑥 : 𝑡 𝑥 . It follows that Γ t 𝑖 = 𝑀 t ◦ ˆ 𝐸 𝑖 . Hence, the inductionhypothesis entails that we must have Γ t 𝑖 , ˆ S ′ ⊢ 𝑒 𝑖 : 𝑡 𝑖 .Finally, we know 𝑥 : [ ˆ 𝑆 ′ : 𝑡 ′ 𝑥 → 𝑡 ′ 𝑖 ] , 𝑡 ′ = 𝑥 : [ ˆ 𝑆 ′ : 𝑡 𝑥 → 𝑡 𝑖 ] ⋉ t 𝑡 ( ˆ 𝑆 ′ ) for some 𝑡 ′ 𝑥 , 𝑡 ′ 𝑖 , 𝑡 ′ such that 𝑡 ′ 𝑥 ⊑ t 𝑡 𝑥 , 𝑡 ′ 𝑖 ⊑ t 𝑡 𝑖 , and 𝑡 ′ ⊑ t 𝑋 𝑡 ( ˆ S ′ ) . From the fact that ⋉ t is increasing it further thus follows that 𝑡 ′ 𝑥 = 𝑡 𝑥 , 𝑡 ′ 𝑖 = 𝑡 𝑖 , and 𝑡 ′ = 𝑡 ( ˆ S ′ ) . We can therefore conclude using Lemma 7.5 that 𝑥 : [ ˆ 𝑆 ′ : 𝑡 𝑥 → 𝑡 𝑖 ] < : 𝑡 ( ˆ 𝑆 ′ ) . Using the typing rule for lambda abstractions it follows that Γ t , ˆ 𝑆 ⊢ 𝑒 : 𝑡 . (cid:3)(cid:3)