[PDF] HMC: Verifying Functional Programs Using Abstract Interpreters

Abstract

We present Hindley-Milner-Cousots (HMC), an algorithm that allows any interprocedural analysis for first-order imperative programs to be used to verify safety properties of typed higher-order functional programs. HMC works as follows. First, it uses the type structure of the functional program to generate a set of logical refinement constraints whose satisfaction implies the safety of the source program. Next, it transforms the logical refinement constraints into a simple first-order imperative program that is safe iff the constraints are satisfiable. Thus, in one swoop, HMC makes tools for invariant generation, e.g., based on abstract domains, predicate abstraction, counterexample-guided refinement, and Craig interpolation be directly applicable to verify safety properties of modern functional languages in a fully automatic manner. We have implemented HMC and describe preliminary experimental results using two imperative checkers -- ARMC and InterProc -- to verify OCaml programs. Thus, by composing type-based reasoning grounded in program syntax and state-based reasoning grounded in abstract interpretation, HMC opens the door to automatic verification of programs written in modern programming languages.

Full PDF

aa r X i v : . [ c s . P L ] D ec Reﬁnement Type Inference via Abstract Interpretation

Ranjit Jhala

UCSD [email protected]

Rupak Majumdar

UCLA [email protected]

Andrey Rybalchenko

TUM [email protected]

Abstract

Reﬁnement Types are a promising approach for checking behav-ioral properties of programs written using advanced language fea-tures like higher-order functions, parametric polymorphism and re-cursive datatypes. The main limitation of reﬁnement type systemsto date is the requirement that the programmer provides the typesof all functions, after which the type system can check the typesand hence, verify the program.In this paper, we show how to automatically infer reﬁnementtypes, using existing abstract interpretation tools for imperativeprograms. In particular, we demonstrate that the problem of reﬁne-ment type inference can be reduced to that of computing invari-ants of simple, ﬁrst-order imperative programs without recursivedatatypes. As a result, our reduction shows that any of the widevariety of abstract interpretation techniques developed for impera-tive programs, such as polyhedra, counterexample guided predicateabstraction and reﬁnement, or Craig interpolation, can be directlyapplied to verify behavioral properties of modern software in a fullyautomatic manner.

1. Introduction

Automatic veriﬁcation of semantic properties of modern program-ming languages is an important step toward reliable softwaresystems. For higher-order programming languages with inductivedatatypes or polymorphic instantiation, the main veriﬁcation toolhas been type systems, which traditionally capture only coarsedata-type properties (such as int s are only added to int s), andrequire the programmer to explicitly annotate program invariants ifmore precise invariants about program computations are required.For example, reﬁnement type systems [33] associate data typeswith reﬁnement predicates that capture richer properties of programcomputation. Using reﬁnement types, one can state, for instance,that a program variable xs has the reﬁnement type “non-zero inte-ger,” or that the integer division function has the reﬁnement type int → { ν : int | ν = 0 } → int which states that the second ar-gument must be non-zero. Then if a program with reﬁnement typetype-checks, one can assert that there is no division-by-zero error inthe program. The idea of reﬁnement types to express precise pro-gram invariants is well-known [3, 10, 12, 13, 27, 33]. However, ineach of the above systems, the programmer must provide reﬁne-ments for each program type, and the type system checks the pro-vided type reﬁnements for consistency. We believe that this burdenof annotations has limited the widespread adoption of reﬁnementtype systems.For imperative programming languages, algorithms based onabstract interpretation can be used to automatically infer many pro-gram invariants [2,6,16], thereby proving many semantic propertiesof practical interest. However, these tools do not precisely modelmodern programming features such as closures and higher-orderfunctions or inductive datatypes, and so in practice, they are tooimprecise when applied to higher-order programs. OCaml Program(with assertions)Constraint GenerationRTI TranslationSubtyping ConstraintsAbs. InterpretationSimple IMP ProgramSafe Unsafe Figure 1.

RTI algorithm.In this paper, we present an algorithm to automatically verifyproperties of higher-order programs through reﬁnement type in-ference (RTI) by combining reﬁnement type systems for higher-order programs with invariant synthesis techniques for ﬁrst-orderprograms. Our main technical contribution is a translation fromtype constraints derived from a reﬁnement type system for higher-order programs to a ﬁrst-order imperative program with assertions,such that the assertions hold in the ﬁrst-order program iff there isa reﬁnement type that makes the higher-order program type-check.Moreover, a suitable type reﬁnement for the higher-order programcan be constructed from the invariants of the ﬁrst-order program.Thus, our algorithm replaces the manual annotation burden for re-ﬁnement types with automatically constructed program invariantson the translated program, thus enabling fully automatic veriﬁca-tion of programs written in modern languages.The RTI algorithm (Figure 1) proceeds in three steps.

Step 1: Type-Constraint Generation.

First, it performs Hindley-Milner type inference [11] to construct ML types for the program,and uses these types to generate reﬁnement templates , i.e., types inwhich reﬁnement variables κ are used to represent the unknownreﬁnement predicates. Then, the algorithm uses a standard syntax-directed procedure to generate subtyping constraints over the tem-plates such that the program type checks ( i.e., is safe) if the subtyp-ing constraints are satisﬁable [3, 19, 29, 33]. Step 2: Translation.

Second, it translates the set of type constraintsto a ﬁrst-order, imperative program over base values such that thetype constraints are satisﬁable if and only if the imperative programdoes not violate any assertions.

Step 3: Abstract Interpretation.

Finally, an abstract interpretationtechnique for ﬁrst order imperative programs is used to prove thatthe ﬁrst order program is safe. The proof of safety produced bythis analysis automatically translates to solutions to the reﬁnementype variables, thus generating reﬁnement types for the original MLprogram.The main contribution of this paper is the RTI translation al-gorithm. The advantage of the translation is that it allows oneto apply any of the well-developed semantic imperative programanalyses based on abstract interpretation ( e.g., polyhedra [9] andoctagons [6], counterexample-guided predicate abstraction reﬁne-ment (CEGAR) [2, 16], Craig interpolation [16, 22], constraint-based invariant generation [4, 30] random interpretation [15], etc. )to the veriﬁcation of modern software with polymorphism, induc-tive datatypes, and higher-order functions. Instead of painstakinglyreworking each semantic analysis for imperative programs to thehigher order setting, possibly re-implementing them in the process,one can use our translation, and apply any existing analysis as is.In fact, using the translation, our implementation directly uses aCEGAR and interpolation based safety veriﬁcation tool to verifyproperties of O

CAML programs.In essence, our algorithm separates syntactic reasoning aboutfunction calls and inductive data types (handled well by typingconstraints) from semantic reasoning about data invariants (handledwell by abstract domains). The translation from reﬁnement typeconstraints to imperative programs in Step 2 is the key enabler. Thetranslation, and the proof that the satisﬁability of type constraintsand safety of the translated program are equivalent, are based onthe following observations.The ﬁrst observation is that reﬁnement type variables κ deﬁne relations over the value being deﬁned by the reﬁnement type andthe ﬁnitely many variables that are in-scope at the point wherethe type is deﬁned. In the imperative program, each ﬁnite-arityrelation can be encoded with a variable that encodes a relation.Each reﬁnement type constraint can be encoded as a straight-linesequence that reads tuples from and writes tuples to the relationvariables, and the set of constraints can be encoded as a non-terminating while-loop that in each iteration, non-deterministicallyexecutes one of the blocks. Thus, the problem of determining theexistence of appropriate relations reduces to that of computing(overapproximations) of the set of tuples in each relation variablein the translated program (Theorem 1).Our second observation is that if the translated program is in aspecial read-write-once form, where within each straight-line blocka relation variable is read and written at most once , then one canreplace all relation-valued variables with variables whose valuesrange over tuples (Theorem 2). Moreover, we prove that we can,without affecting satisﬁability, preprocess the reﬁnement typingconstraints so that the translated program is a read-write-once pro-gram (Theorem 3). Together, the observations yield a simple anddirect translation from reﬁnement type inference to simple impera-tive programs.We have instantiated our algorithm in a veriﬁcation tool forO CAML programs. Our implementation generates reﬁnement typeconstraints using the algorithm of [29], and uses the ARMC [28]software model checker to verify the translated programs. Thisallows fully automatic veriﬁcation of a set of O

CAML bench-marks for which previous approaches either required manual an-notations (either the reﬁnement types [33] or their constituentpredicates [29]), or an elaborate customization and adaptation ofthe counterexample-guided abstraction reﬁnement paradigm [31].Thus, we show, for the ﬁrst time, how abstract interpretation can belifted “as-is” to the practical reﬁnement type inference for modern,higher-order languages.While we have focused on the veriﬁcation of functional pro-grams, our approach is language independent, and requires only anappropriate reﬁnement type system for the source language. let rec iteri i xs f =match xs with| [] -> ()| x::xs’ -> f i x;iteri (i+1) xs’ flet mask a xs =let g j y = a.(j) <- y && a.(j) inif Array.length a = List.length xs theniteri 0 xs g

Figure 2.

ML Example

2. Overview

We begin with an example that illustrates how our reﬁnement typeinference (RTI) algorithm combines type constraints and abstractinterpretation to automatically verify safety properties of functional

ML programs with higher-order functions and recursive structures.We show that the combination of syntactic type constraints andsemantic abstract interpretation enables the automatic veriﬁcationof properties that are currently beyond the scope of either techniquein isolation.

An ML Example.

Figure 2(a) shows a simple ML program thatupdates an array a using the elements of the list xs . The programcomprises two functions. The ﬁrst is a higher-order list indexed-iterator , iteri , that takes as arguments a starting index i , a (poly-morphic) list xs , and an iteration function f . The iterator goes overthe elements of the list and invokes f on each element and the in-dex corresponding to the element’s position in the list. The secondis a client, mask , of the iterator iteri that takes as input a booleanarray a and a list of boolean values xs , and if the lengths match,calls the indexed iterator with an iteration function g that masks the j th element of the array.Suppose that we wish to statically verify the safety of the arrayreads and writes in function g ; that is to prove that whenever g isinvoked, ≤ j < len ( a ) . As this example combines higher-orderfunctions, recursion, data-structures, and arithmetic constraints onarray indices, it is difﬁcult to analyze automatically using eitherexisting type systems or abstract interpretation implementations inisolation. The former do not precisely handle arithmetic on indices,and the latter do not precisely handle higher-order functions and areoften imprecise on data structures. We show how our RTI techniquecan automatically prove the correctness of this program. Reﬁnement Types.

To verify the program, we compute programinvariants that are expressed as reﬁnements of ML types with pred-icates over program values [3,19,29]. The predicates are additionalconstraints that must be satisﬁed by every value of the type. Abase value, say of type int , can be described by the reﬁnementtype { ν : int | p } where ν is a special value variable representingthe type being deﬁned, and p is a reﬁnement predicate which con-strains the range of ν to a subset of integers. For example, the type { ν : int | ≤ ν < len ( a ) } denotes the set of integers c that arebetween and the value of the expression len ( a ) . Thus, the un-reﬁned type int abbreviates { ν : int | true } , which does not con-strain the set of integers. Base types can be combined to construct dependent function types , written x : T → T , where T is the typeof the domain, T the type of the range, and where the name x forthe formal parameter can appear in the reﬁnement predicates in T .For example, the type x : { ν : int | ν ≥ } → { ν : int | ν = x + 1 } is the type of a function which takes a non-negative integer param-eter and returns an output which is one more than the input. In thefollowing, we write τ for the type { ν : τ | true } . When ν and τ areclear from the context, we write { p } for { ν : τ | p } . afety Speciﬁcation. Reﬁnement types can be used to specify safety properties by encoding pre-conditions into primitive oper-ations of the language. For example, consider the array read a . ( j ) (resp. write a . ( j ) ← e ) in g which is an abbreviation for get a j (resp. set a j e ). By giving get and set the reﬁnement types a : α array → { ν : int | ≤ ν < len ( a ) } → α , a : α array → { ν : int | ≤ ν < len ( a ) } → α → unit , we can specify that in any program the array accesses must bewithin bounds. More generally, arbitrary safety properties can bespeciﬁed by giving assert the appropriate reﬁnement type [29]. Safety Veriﬁcation.

The ML type system is too imprecise to provethe safety of the array accesses in our example as it infers that g has type j : int → y : bool → unit , i.e., that g can be called with any integer j . If the programmer manually provides the reﬁne-ment types for all functions and polymorphic type instantiations,reﬁnement-type checking [3, 12, 33] can be used to verify that theprovided types were consistent and strong enough to prove safety.This is analogous to providing pre- and post-conditions and loop-invariants for verifying imperative programs. For our example, thereﬁnement type system could check the program if the programmerprovided the types: iteri :: i : int → xs : { ν : α list | ≤ len ( ν ) } → ( j : { i ≤ ν < len ( xs ) } → α → unit ) → unitg :: j : { ≤ ν < len ( a ) } → bool → unit Here, we omitted reﬁnement predicates that are equal to true, e.g.,for i in the type of iteri . Automatic Veriﬁcation via RTI.

As even this simple example il-lustrates, the type annotation burden for veriﬁcation is extremelyhigh. Instead, we would like to verify the program without requir-ing the programmer to provide every reﬁnement type. The RTI al-gorithm proceeds in three steps. First, we syntactically analyze the source program to generate subtyping constraints over reﬁnementtemplates. Second, we translate the constraints into an equivalentsimple imperative target program. Third, we semantically analyzethe target program to determine whether it is safe, from which weconclude that the constraints are satisﬁable and hence, the sourceprogram is safe. Next, we illustrate these steps using Figure 2 asthe source program.

In the ﬁrst step, we generate a system of reﬁnement type constraintsfor the source program [19, 29]. To do so, we (a) build templatesthat reﬁne the ML types with reﬁnement variables that stand forthe unknown reﬁnements, and (b) make a syntax-directed pass overthe program to generate subtyping constraints that capture the ﬂowof values. For the functions iteri and g from Figure 2, with therespective ML types i : int → xs : α list → ( j : int → α → unit ) → unitj : int → bool → unit we would generate the respective templates i : int → xs : { ≤ len ( ν ) } → ( j : { κ } → α → unit ) → unitj : { κ } → bool → unit Notice that these templates simply reﬁne the ML types with reﬁne-ment variables κ , κ that stand for the unknown reﬁnements. Forclarity of exposition, we have added the reﬁnement true for somevariables ( e.g., for the type α and bool ); our system would auto-matically infer the unknown reﬁnements. We model the length oflists (resp. arrays) with an uninterpreted function len from the lists(resp. arrays) to integers, and (again, for brevity) add the reﬁnementstating xs has a non-negative length in the type of iteri . After creating the templates, we make a syntax-directed passover the program to generate constraints that capture relationshipsbetween reﬁnement variables. There are two kinds of type con-straints – well-formedness and subtyping . Well-formedness Constraints capture scoping rules, and ensurethat the reﬁnement predicate for a type can only refer to variablesthat are in scope. Our example has two constraints: i : int ; xs : α list ⊢ { ν : int | κ } (w1) a : bool array ; xs : α list ⊢ { ν : int | κ } (w2)The ﬁrst constraint states that κ , which represents the unknownreﬁnement for the ﬁrst parameter passsed to the higher-order iter-ator iteri , can only refer to the two program variables that arein-scope at that point, namely i and xs . Similarly, the second con-straint states that κ , which reﬁnes the ﬁrst argument of g , can onlyrefer to a and xs , which are in scope where g is deﬁned. Subtyping Constraints reduce the ﬂow of values within theprogram into subtyping relationships that must hold between thesource and target of the ﬂow. Each constraint is of the form G ⊢ T < : T where G is an environment comprising a sequence of type bindings,and T and T are reﬁnement templates. The constraint intuitivelystates that under the environment G , the type T must be a subtypeof T . The subtyping constraints are generated syntactically fromthe code. First consider the function iteri . The call to f generates G ⊢ { ν = i } < : { κ } (c1)where the environment G comprises the bindings G . = i : { true } ; xs : { ≤ len ( ν ) } ; x : { true } ; xs ′ : { ≤ len ( ν ) = len ( xs ) − } the constraint ensures that at the callsite, the type of the actual is asubtype of the formal. The bindings in the environment are simplythe reﬁnement templates for the variables in scope at the point thevalue ﬂow occurs. The type system yields the information that thelength of xs ′ is one less than xs as the former is the tail of thelatter [18, 33]. Similarly, the recursive call to iteri generates G ⊢ { j : κ : → α → unit } < : { ( j : κ → α → unit )[ i + 1 / i ][ xs ′ / xs ] } which states that type of the actual f is a subtype of the thirdformal parameter of iteri after applying substitutions [ i + 1 / i ] and [ xs ′ / xs ] that capture the passing in of the actuals i + 1 and xs ′ for the ﬁrst two parameters respectively. By pushing thesubstitutions inside and applying the standard rules for functionsubtyping, this constraint simpliﬁes to G ⊢ { κ [ i / i + 1][ xs / xs ′ ] } < : { κ } (c2)Next, consider the function mask . The array accesses inside g generate the “bounds-check” constraint G ′ ; j : { κ } ; y : { true } ⊢ { ν = j } < : { ≤ ν < len ( a ) } (c3)where G ′ . = a : bool array ; xs : { ≤ len ( ν ) } has bindings forthe other variables in scope. Finally, the ﬂow due to the thirdparameter for the call to iteri yields G ′ ; len ( a ) = len ( xs ) ⊢ { j : κ → τ } < : { ( j : κ → τ )[0 / i ] } here for brevity we write τ for bool → unit , and omit the trivialsubstitution [ xs / xs ] due to the second parameter. The last conjunctin the environment captures the guard from the if under whoseauspices the call occurs. By pushing the substitutions inside andapplying standard function subtyping, the above reduces to G ′ ; len ( a ) = len ( xs ) ⊢ { κ [0 / i ] } < : { κ } (c4)For brevity we omit trivial constraints like · ⊢ int < : int . If theset of constraints constructed above is satisﬁable, then there is avalid reﬁnement typing of the program [29], and hence the programis safe. Determining the satisﬁability of the constraints requires semanticanalysis about program computations. In the second step, our keytechnical contribution, we show a translation that reduces the con-straint satisﬁability problem to checking the safety of a simple, im-perative program. Our translation is based on two observations.

Reﬁnements are Relations.

The ﬁrst observation is that type re-ﬁnements are deﬁned through relations : the set of values denotedby a reﬁnement type { ν : τ | p } where p refers to the program vari-ables x , . . . , x n of the respective base types τ , . . . , τ n is equiva-lent to the set { t | ∃ ( t , . . . , t n ) s.t. ( t , t , . . . , t n ) ∈ R p ∧ t = x ∧ . . . t n = x n } where R p is an ( n + 1) -ary relation in τ × τ × . . . × τ n deﬁnedby p . For example, the set of values denoted by { ν : int | ν ≤ i } is equivalent to the set: { t | ∃ t s.t. ( t , t ) ∈ R ≤ ∧ t = i } , where R ≤ is the standard ≤ -ordering relation over the integers. Inother words, each reﬁnement variable κ can be seen as the projec-tion on the ﬁrst co-ordinate of a ( n + 1) -relation over the variables ( ν, x , . . . , x n ) , where x , . . . , x n are the variables in the well-formedness constraint for κ ( i.e., the variables in scope of κ ). Thus,the problem of determining the satisﬁability of the constraints isanalogous to the problem of determining the existence of appropri-ate relations. Relations are Records.

The second observation is that the problemof ﬁnding appropriate relations can be reduced to the problemof analyzing a simple imperative program with variables rangingover relations. In the imperative program, each reﬁnement variable,standing for an n -ary relation, is translated into a record variablewith n -ﬁelds. Each subtyping constraint can be translated into ablock of reads-from and writes-to the corresponding records. Theset of all tuples that can be written into a given record on someexecution of the program deﬁnes the corresponding relation. Theentire program is an inﬁnite loop, which in each iteration non-deterministically chooses a block of reads and writes deﬁned bya constraint.The arity of a relation, and hence the number of ﬁelds of thecorresponding record, is determined by the well-formedness con-straints. For example, the constraint (w1) speciﬁes that κ corre-sponds to a ternary relation, that is, a set of triples where the th element (corresponding to ν ) is an integer, the st element (corre-sponding to i ) is an integer, and the nd element (corresponding to xs ) is a list. We encode this in the imperative program via a recordvariable κ with three ﬁelds κ . , κ . and κ . .Figure 3 shows the imperative program translated from the con-straints for our running example. We use the subtyping constraintsto deﬁne the ﬂow of tuples into records. For example, consider theconstraint (c2) which is translated to the block marked /*c2*/ .Each variable in the type environment is translated to a correspond- ing variable in the program. The block has a sequence of assign-ments that deﬁne the environment variables. For example, we know i has type int , so there is an assignment of an arbitrary integerto i . When there is a known reﬁnement in the binding, the non-deterministic assignment is followed by an assume operation (aconditional) that establishes that the value assigned satisﬁed thegiven reﬁnement. For example xs gets assigned an arbitrary value,but then the assume establishes the fact that the length of xs is non-negative. Similarly xs ′ gets assigned an arbitrary value, that hasnon-negative length and whose length is 1 less than that of xs . TheLHS of (c2) reads a tuple from κ whose ﬁrst and second ﬁelds areassumed to equal the i + 1 and xs ′ respectively. Finally, the triple ( ν, i , xs ) is written into the record κ which is the RHS of (c2).Next, consider the translated block for the bounds-check con-straint (c3). Here, the translation is as before but the RHS is aknown reﬁnement predicate (that stipulates the integer be withinbounds). In this case, instead of writing into the record that deﬁnesthe RHS, the translation contains an assertion over the correspond-ing variables that ensures that the reﬁnement predicate holds. Relational vs. Imperative Semantics.

There is a direct correspon-dence between the reﬁnement-relations and the record variableswhen the translated program is interpreted under a Relational se-mantics, where (1) the records range over (initially empty) sets oftuples , (2) each write adds a new tuple to the record’s set, and,(3) each read non-deterministically selects some tuple from therecord’s set. Under these semantics, we can show that the con-straints are satisﬁable iff the imperative program is safe ( i.e., noassert fails on any execution) (Theorem 1).Unfortunately, these semantics preclude the direct applicationof mature invariant generation and safety veriﬁcation techniques e.g., those based on abstract interpretation or CEGAR-based soft-ware model checking, as those techniques do not deal well withset-valued variables. We would like to have an imperative seman-tics where each record contains a single value, the last tuple writtento it. We show that there is a syntactic subclass of programs forwhich the two semantics coincide. That is, a program in the sub-class is safe under the imperative semantics if and only if it is safeunder the set-based semantics (Theorem 2). Furthermore, we showa technique that ensures that the translated program belongs to thesubclass (Theorem 3).The attractiveness of the translation is that the resulting pro-grams fall in a particularly pleasant subclass of programs which donot have any advanced language features like higher-order func-tions, polymorphism, and recursive data structures, or variablesover complex types such as sets, that are the bane of semantic anal-yses. Thus, the translation yields simple imperative programs towhich a wide variety of semantic analyses directly apply.

Together these results imply that we can run off-the-shelf abstractinterpretation and invariant generation tools on the translated pro-gram, and use the result of the analysis to determine whether theoriginal ML program is typable.For the translated program shown in Figure 3, the CEGAR-based software model checker ARMC [28] ﬁnds that the assertionis never violated, and computes the invariants: κ . ≤ κ . ∧ κ . < len ( κ . ≤ κ . < len ( κ . which, when plugging in ν , i and xs for the th , st , nd ﬁeldsof κ and ν , a for the th , st ﬁelds of κ respectively, yields thereﬁnements κ . = i ≤ ν < len ( xs ) κ . = 0 ≤ ν < len ( a ) oop { / ∗ c1 ∗ / i ← nondet (); xs ← nondet (); assume (0 ≤ len ( xs )); xs ′ ← nondet (); assume (0 ≤ len ( xs ′ ) = len ( xs ) − ν ← nondet (); assume ( ν = i ); κ ← ( ν, i , xs )[] / ∗ c2 ∗ / i ← nondet (); xs ← nondet (); assume (0 ≤ len ( xs )); xs ′ ← nondet (); assume (0 ≤ len ( xs ′ ) = len ( xs ) − t , t , t ) ← κ ; assume ( t = i + 1); assume ( t = xs ′ ); ν ← t ; κ ← ( ν, i , xs )[] / ∗ c3 ∗ / a ← nondet (); xs ← nondet (); assume (0 ≤ len ( xs ));( t , t , t ) ← κ ; j ← t ; assert (0 ≤ j < len ( a ))[] / ∗ c4 ∗ / a ← nondet (); xs ← nondet (); assume (0 ≤ len ( xs )); assume ( len ( a ) = len ( xs ));( t , t , t ) ← κ ; assume ( t = 0); assume ( t = xs ); ν ← t ; κ ← ( ν, a , xs ) } Figure 3.

Translated Programwhich sufﬁce to typecheck the original ML. Indeed, these predi-cates for κ and κ are easily shown to satisfy the constraints (c1),(c2), (c3), and (c4).

3. Constraints

We start by formalizing constraints over types reﬁned with predi-cates. To this end, we make precise the notions of reﬁnement predi-cates (Section 3.1), reﬁnement types (Section 3.2), constraints overreﬁnement types and the notion of satisfaction (Section 3.3).A discussion of how such constraints can be generated in asyntax-guided manner from program source is outside the scopeof this paper; we refer the reader to the large body of prior researchthat addresses this issue [3, 19, 29, 33].

Notation.

We use uppercase ( Z ) to denote sets, lowercase z todenote elements, and h Z i for a sequence of elements in Z . Figure 4 shows the syntax of reﬁnement predicates. In our discus-sion, we restrict the predicate language to the typed quantiﬁer-freelogic of linear integer arithmetic and uninterpreted functions. How-ever, it is straightforward to extend the logic to include other do-mains equipped with effective decision procedures and abstract in-terpreters.

Types and Environments.

Our logic is equipped with a ﬁxed set of types denoted τ , comprising the basic types int for integer values, bool for boolean values, and ui , a family of uninterpreted types that are used to encode complex source language types such asproducts, sums, polymorphic type variables, recursive types etc. .We assume there is a ﬁxed set of uninterpreted functions. Eachuninterpreted function f has a ﬁxed type τ f . = h τ i f i → τ o f . An environment is a sequence of variable-type bindings. Expressions and Predicates.

In our logic, expressions e comprisevariables, linear arithmetic ( i.e., addition and multiplication by con-stants), and applications of uninterpreted functions f . Note that as isstandard in semantic program analyses, complex operations like di-vision or non-linear multiplication be modelled using uninterpretedfunctions. Finally, predicates comprise atomic comparisons of ex-pressions, or boolean combinations of sub-predicates. We write true (resp. false ) as abbreviations for (resp. ). Well-formedness.

We say that a predicate p is well-formed in anenvironment Γ if every variable appearing in p is bound in Γ and p is “type correct” in the environment Γ . Validity.

For each type τ , we write U ( τ ) to denote the set ofconcrete values of τ . An interpretation σ is a map from variables x to concrete values, and functions f to maps from U ( h τ i f i ) to U ( τ o f ) .We say that σ is valid under Γ if for each x : τ ∈ Γ , we have σ ( x ) ∈ U ( τ ) . We say that a predicate p is valid in an environment Γ , if σ ( p ) evaluates to true for every σ valid under Γ . Figure 4 shows the syntax of reﬁnement types and environments.

Reﬁnements. A reﬁnement r is either a predicate p drawn fromour logic, or a reﬁnement variable with pending substitutions κ [ y /x ] . . . [ y n /x n ] . Intuitively, the former represent known re-ﬁnements (or invariants), while the latter represent the unknown in-variants that hold of different program values. The notion of pend-ing substitutions [1, 19] offers a ﬂexible way of capturing the valueﬂow that arises in the context of function parameter passing (in thefunctional setting), or assignment (in the imperative setting), evenwhen the underlying invariants are unknown. Reﬁnement Types and Environments. A reﬁnement type { ν : τ | r } is a triple consisting of a value variable ν denoting thevalue being described by the reﬁnement type, a type τ describingthe underlying type of the value, and a reﬁnement r . A reﬁnementenvironment G is a sequence of reﬁnement type bindings.The value variables are special variables distinct from theprogram variables, and can occur inside the reﬁnement pred-icates. Thus, intuitively, the reﬁnement type describes the setof concrete values of the underlying type τ which addition-ally satisfy the reﬁnement predicate. For example, the reﬁnementtype: { ν : int | ν = 0 } describes the set of non-zero integers and, { ν : int | ν = x + y } describes the set of integers whose valueequals the sum of the values of the (program) variables x and y .Note that path-sensitive branch information can be capturedby adding suitable bindings to the reﬁnement environment. Forexample, the fact that some expression is only evaluated under theif-condition that x > can be captured in the environment via areﬁnement type binding x b : { ν : bool | x > } . Figure 4 shows the syntax of reﬁnement constraints. Our reﬁnementtype system has two kinds of constraints.

Subtyping Constraints are of the form G ⊢ { ν : τ | r } < : { ν : τ | r } Intuitively, a subtyping constraint states that when the programvariables satisfy the invariants described in G , the set of valuesdescribed by the reﬁnement r must be subsumed by the set ofvalues described by the reﬁnement type r . Well-formedness Constraints are of the form Γ ⊢ { ν : τ | r } . In-tuitively, a well-formedness constraints states that the reﬁnement r must be a well-typed predicate in the environment G extended withthe binding ν : τ for the value variable. Embedding.

To formalize the notions of constraint validity and sat-isfaction, we embed subtyping constraints into our logic. We deﬁnehe function

Emb ( · ) that maps reﬁnement types, environments andsubtyping constraints to predicates in our logic. Emb ( { ν : τ | p } ) . = p Emb ( x : T ; G ) . = Emb ( T )[ ν/x ] ∧ Emb ( G ) Emb ( ∅ ) . = true Emb ( G ⊢ T < : T ) . = Emb ( G ) ⇒ Emb ( T ) ⇒ Emb ( T ) Similarly, we deﬁne the function

Shape ( · ) that maps reﬁnementtypes and environments to types and environments in our logic. Shape ( { ν : τ | p } ) . = τ Shape ( x : T ; G ) . = x : Shape ( T ); Shape ( G ) Shape ( ∅ ) . = ∅ Validity.

A subtyping constraint G ⊢ T < : T that doesnot contain reﬁnement variables is valid if the predicate Emb ( G ⊢ T < : T ) is valid under environment Shape ( G ) . Awell-formedness constraint Γ ⊢ { ν : τ | p } that does not containreﬁnement variables is valid if the predicate p is well-formed in theenvironment Γ . Relational Interpretations.

We assume, without loss of generality,that each reﬁnement variable κ is associated with a unique well-formedness constraint x : τ ; . . . ; x n : τ n ⊢ { ν : τ | κ } calledthe well-formedness constraint for κ . In this case, we say κ has arity n + 1 . Furthermore, we assume that wherever a κ of arity n + 1 appears in a subtyping constraint, it appears with a sequenceof n pending substitutions [ y /x ] . . . [ y n /x n ] . This assumptionis without loss of generality, as we can enforce it with trivialsubstitutions of the form [ x i /x i ] . A relational interpretation for κ of arity n + 1 , is an ( n + 1) -ary relation in U ( τ ) × . . . × U ( τ n ) . A relational model is a map from reﬁnement variables κ to relationalinterpretations. Constraint Satisfaction.

A set of constraints C is satisﬁable iffor all interpretations for uninterpreted functions f , there exists arelational model S such that, when each occurrence of a reﬁnementtype { ν : τ | κ [ y /x ] . . . [ y n /x n ] } in C is substituted with { ν : τ | ∃ t , . . . , t n .S ( κ )( ν, t , . . . , t n ) ∧ t = y ∧ . . . t n = y n ) } every subtyping constraint after the substitution is valid. In thiscase, we say that S is a solution for C .

4. Imperative Programs

RTI translates the satisﬁability problem for reﬁnement type con-straints to the question of checking the safety of an imperative pro-gram in a simple imperative language I MP . In this section, we for-malize the syntax of I MP programs and deﬁne the Relational se-mantics and the Imperative semantics. Figure 5 shows the syntax of I MP programs. An instruction ( I )is a sequence of assignments, assumptions and assertions. A pro-gram ( P ) is an inﬁnite loop over a block, whose body is anon-deterministic choice between a ﬁnite number of instructions I , . . . , I n . Next, we describe the different kinds of instructions.For ease of notation, we assume that there is only one base type τ ,and let V denote the set of values of type τ . Variables. I MP programs have two kinds of variables. (1) base vari-ables, denoted by ν , x , y and t (and subscripted versions thereof),which range over values of type τ . (2) relation variables, denotedby κ , each of which have a ﬁxed arity n and range over tuples ofvalues or sets of n -tuples of values depending on the semantics. Base Assignments. I MP programs have two kinds of assignmentsto base variables. Either (1) an expression over base variables(cf. Figure 4) is evaluated and assigned to the base variable, or, τ ::= Types: | int base type of integers | bool base type of booleans | ui complex uninterpreted type Γ ::=

Environments: | x : τ ;Γ binding | ∅ empty e ::= Expressions: | x variable | n integer | e + e addition | n × e afﬁne multiplication | f ( h e i ) function application p ::= Predicates: | e ⊲⊳ e comparison | ¬ p negation | p ∧ p conjunction | p ⇒ p implication r ::= Reﬁnements: | p predicate | κ [ y /x ] . . . [ y n /x n ] ref. var. with substitutions T ::= { ν : τ | r } Reﬁnement Types G ::= Reﬁnement Environments: | x : T ; G binding | ∅ empty c ::= G ⊢ T < : T Subtype Constraints w ::= Γ ⊢ T WF ConstraintsFigure 4. Predicates, Reﬁnements and Constraints. I ::= Instructions: | x ← e assign expr | x ← nondet () havoc | ( t , . . . , t n ) ← κ get tuple | κ ← ( x , . . . , x n ) set tuple | assume ( p ) assume | assert ( p ) assert | I ; I sequence P ::= loop { I [] . . . [] I n } Program

Figure 5. Imperative Programs: Syntax (2) an arbitrary value of the appropriate base type is assignedto the base variable, i.e., the variable is “havoc-ed” with a non-deterministically chosen value.

Tuple Assignments.

The operations get tuple and set tuple respec-tively read a tuple from and write a tuple to a relation variable.

Assumes and Asserts. I MP programs have the standard assumeand assert instructions using predicates over the base variables (cf.Figure 4). We write skip as an abbreviation for assume (0 = 0) . We deﬁne the Relational semantics as a state transition system. Inthis semantics, κ variables range over sets of tuples over V . Relational States.

A state s ♯ in the Relational semantics is eitherthe special error state E or a map from program variables to valuessuch that every base variable is mapped to a value in V , and everyrelation variable of arity n is mapped to a (possibly empty) set oftuples in V n . Let Σ ♯ be the set of all Relational-program states.or a state s ♯ which is not E , variable x and value v we write s ♯ [ x v ] for the map which maps x to v and every other key x ′ to s ♯ ( x ′ ) . We lift maps s ♯ from base variables to values to maps fromexpressions (and predicates) to values in in the natural way. Initial State.

The initial state s ♯ of an I MP program in the Rela-tional semantics is a map in which every base variable is mappedto a ﬁxed value from V , and every relation variable is mapped tothe empty set. Transition Relation.

The transition relation is deﬁned through a

Post ♯ operator, shown in Figure 6, which maps a state s ♯ and aninstruction I to the set of states that the program can be in after executing the instruction from the state s ♯ . We lift Post ♯ to a set ofstates ˆΣ ♯ ⊆ Σ ♯ in the natural way: Post ♯ ( ˆΣ ♯ , I ) . = [ { Post ♯ ( s ♯ , I ) | s ♯ ∈ ˆΣ ♯ } Notice that the program halts if a get instruction is executed withan empty relation variable, or an assume ( p ) is executed in a statethat does not satisfy p . Safety.

Let P be the program loop { I [] . . . [] I n } . The set of Relational-reachable states of P , denoted Reach ♯ ( P ) is deﬁned byinduction as: Reach ♯ ( P , . = { s ♯ } Reach ♯ ( P , m + 1) . = S { Post ♯ ( Reach ♯ ( P , m ) , I j ) | ≤ j ≤ n } Reach ♯ ( P ) . = S { Reach ♯ ( P , m ) | ≤ m } A program P is Relational-safe if E 6∈

Reach ♯ ( P ) . Next, we deﬁne the Imperative semantics, as a state transitionsystem. In this semantics, κ variables κ range over tuples over V . Imperative States.

In the Imperative semantics, each state s iseither the special error state E or a map from program variablesto values such that every base variable is mapped to a value in V ,and every relation variable of arity n is mapped either to a tuple in V n or to the special undeﬁned value ⊥ . Let Σ denote the set of alla Imperative-program states. Initial State.

The initial state s of an I MP program in the Impera-tive semantics is a map in which every base variable is mapped toa ﬁxed value from V , and every relation variable is mapped to ⊥ . Transition Relation.

The transition relation is deﬁned using a

Post operator, which is identical to

Post ♯ in the Relational semanticsexcept for the tuple-get and tuple-set instructions. Figure 6 showsthe operator Post for get and set operations. Again,

Post is liftedto a set of states in the natural way. Notice that the program halts ifa get instruction is executed with an undeﬁned relation variable, oran assume ( p ) is executed in a state that does not satisfy p . Safety.

Let P be the program loop { I [] . . . [] I n } . The set of Imperative-reachable states of P , denoted Reach ( P ) is deﬁned byinduction as: Reach ( P , . = { s } Reach ( P , m + 1) . = S { Post ( Reach ( P , m ) , I j ) | ≤ j ≤ n } Reach ( P ) . = S { Reach ( P , m ) | ≤ m } A program P is Imperative-safe if E 6∈

Reach ( P ) .

5. From Type Constraints to I MP Programs

In this section we formalize the translation from type constraintsinto I MP programs and prove that the constraints are satisﬁable ifand only if the translated program is safe. Reﬁnement Type Translation [[ { ν : τ | p } ]] get . = ν ← nondet (); assume ( p )[[ { ν : τ | p } ]] set . = assert ( p )[[ { ν : τ | κ [ y . . . y n /x . . . x n ] } ]] get . = ( t , . . . , t n ) ← κ ; assume ( y = t ); ... assume ( y n = t n ); ν ← t [[ { ν : τ | κ [ y . . . y n /x . . . x n ] } ]] set . = κ ← ( ν, y , . . . , y n ) Binding Translation [[ x : T ; G ]] . = [[ τ ]] get ; x ← ν ; [[ G ]][[ · ]] . = skip Constraint Translation [[ G ⊢ T < : T ]] . = [[ G ]]; [[ T ]] get ; [[ T ]] set Constraint Set Translation [[ { c , . . . , c n } ]] . = loop { [[ c ]][] . . . [][[ c n ]] } Figure 7. Translating Constraints to I MP Programs5.1 Translation

Figure 7 formalizes the translation from (a set of) reﬁnement typeconstraints C to an I MP program [[ C ]] . We use the WF constraints totranslate each relation variable κ of arity n +1 into a correspondingtuple variable κ of arity n + 1 .The translation is syntax-driven. We translate each subtypingconstraint G ⊢ T < : T into a straight-line block of instructionswith three parts: a sequence of instructions that establishes theenvironment bindings ( [[ G ]] ), a sequence of instructions that “gets”the values corresponding to the LHS ( [[ T ]] get ) and a sequenceof instructions that “sets” the (LHS) values into the appropriateRHS ( [[ T ]] set ). The translation for a set of constraints is an inﬁniteloop that non-deterministically chooses among the blocks for eachconstraint.Each environment binding gets translated as a “get”. Bindingswith unknown reﬁnements are translated into tuple-get operations,followed by assume statements that establish the equalities corre-sponding to the pending substitutions. Bindings with known reﬁne-ments are translated into non-deterministic assignments followedby a assume that enforces that the reﬁnement holds on the non-deterministic value.Each “set” operation to an unknown reﬁnement is translatedinto a tuple-set instruction that writes the tuple corresponding tothe pending substitutions into the translated tuple variable. Finally,each “set” operation corresponding to a known reﬁnement is trans-lated to an assert instruction; intuitively, in such constraints theRHS deﬁnes an upper bound on the set of values populating thetype, and the assert serves to enforce the upper bound require-ment in the translated program.The correctness of the procedure is stated by the followingtheorem.T HEOREM C is satisﬁable iff [[ C ]] is Relational-safe . The proof of this theorem follows from the properties of thefollowing function α that maps a set ˆΣ ♯ ⊆ Σ ♯ of Relational-states ommon Operations Post ♯ ( E , I ) . = {E} Post ♯ ( s ♯ , I ; I ) . = Post ♯ ( Post ♯ ( s ♯ , I ) , I ) Post ♯ ( s ♯ , x ← e ) . = { s ♯ [ x s ♯ ( e )] } Post ♯ ( s ♯ , x ← nondet ()) . = { s ♯ [ x c ] | c ∈ V } Post ♯ ( s ♯ , assume ( p )) . = ( { s ♯ } if s ♯ ( p ) = true ∅ otherwise Post ♯ ( s ♯ , assert ( p )) . = ( { s ♯ } if s ♯ ( p ) = true {E} otherwise Tuple Operations: Relational Semantics

Post ♯ ( s ♯ , ( t , . . . , t n ) ← κ ) . = { s ♯ [ t v ] . . . [ t n v n ] | ( v , . . . , v n ) ∈ s ♯ ( κ ) } Post ♯ ( s ♯ , κ ← ( x , . . . , x n )) . = { s ♯ [ κ s ♯ ( κ ) ∪ { ( s ♯ ( x ) , . . . , s ♯ ( x n )) } ] } Tuple Operations: Imperative Semantics

Post ( s, ( t , . . . , t n ) ← κ ) . = ( { s [ t v ] . . . [ t n v n ] } if s ( κ ) = ( v , . . . , v n ) ∅ if s ( κ ) = ⊥ Post ( s, κ ← ( x , . . . , x n )) . = { s [ κ ( s ( x ) , . . . , s ( x n ))] } Figure 6. Relational and Imperative Semantics: Other cases of

Post identical to

Post ♯ to constraint solutions: α ( ˆΣ ♯ ) . = λκ. [ { s ♯ ( κ ) | s ♯ ∈ ˆΣ ♯ } The function α enjoys the following property, which can be provenby induction on the construction of Reach ♯ , that relates the satisfy-ing solutions of the constraints to the Relational-reachable states ofthe translated program. Theorem 1 follows from the following ob-servations. If S satisﬁes C then α ( Reach ♯ ([[ C ]]))( κ ) ⊆ S ( κ ) forall κ . If E 6∈

Reach ♯ ([[ C ]]) then α ( Reach ♯ ([[ C ]])) satisﬁes C . At this point, via Theorem 1, we have reduced checking satisﬁa-bility of type constraints to the problem of verifying assertions ofI MP programs under the (non-standard) Relational semantics. Un-fortunately, under these semantics, the program contains variables( κ ) which range over sets of tuples. This makes it inconvenient todirectly apply abstract-interpretation based techniques for imper-ative programs which typically assume the (standard) Imperativesemantics; each technique has to be painstakingly adapted to thenon-standard semantics.We would be home and dry if we could prove the equivalenceof the Relational and Imperative semantics; that is, if we couldshow that an I MP program was Relational-safe if and only if it wasImperative safe. Unfortunately, this is not true. Example.

Consider the I MP program: loop { ν ← nondet (); κ ← ( ν ) [] ( t ) ← κ ; ν ← t ; x ← ν ;( t ) ← κ ; ν ← t ; y ← ν ; assert ( x = y ) } This program is not

Relational-safe as the set-operation in the ﬁrstinstruction populates κ with the set of all integers, and the get-operation in the second instruction can assign different values tointeger values to x and y . However the program is Imperative-safeas whenever the second instruction executes, κ will be undeﬁnedor contain some arbitrary integer that is assigned to both x and y ,which causes the assert to succeed.This example pinpoints exactly why the two semantics differ. Inthe Relational semantics, in any given loop iteration, different getson the same κ can return different tuples, while in the Imperativesemantics the gets are correlated and return the same tuple. Read-Write-Once Programs.

An I MP instruction is a read-write-once instruction if any relation variable κ is read from and writtento at most once in the instruction. That is, read-write-once meansat most one write and at most one read (and not at most one reador write). An I MP program is a read-write-once program if eachinstruction in its loop is a read-write-once instruction. We canshow that for Read-Write-Once I MP programs the Relational andImperative semantics are equivalent.T HEOREM If P is a read-write-once I MP program then P is Relational-safe iff P is Imperative-safe . To prove this theorem, we formalize the connection between thereachable states under the two different semantics, using the func-tion

Expand , which maps a Relational-state to a set of Imperativestates:

Expand ( s ♯ ) . =  s | s ( x ) = s ♯ ( x ) for base variables s ( κ ) = h v i if h v i ∈ s ♯ ( κ ) s ( κ ) = ⊥ if s ♯ ( κ ) = ∅ s = E if s ♯ = E  We lift the function to sets of Relational states in the natural way:

Expand ( ˆΣ ♯ ) . = [ { Expand ( s ♯ ) | s ♯ ∈ ˆΣ ♯ } Next, we can show that read-write-once instructions enjoy the fol-lowing property, by case splitting on the form of I .L EMMA [Step] If I is a read-write-once instruction then Expand ( Post ♯ ( s ♯ , I )) = Post ( Expand ( s ♯ ) , I ) . We use this property to show that the reachable states under thedifferent semantics are equivalent.L

EMMA If P = loop { I [] . . . [] I n } is a read-write-once pro-gram, then Expand ( Reach ♯ ( P )) = Reach ( P ) . P ROOF . To prove that

Reach ( P ) ⊆ Expand ( Reach ♯ ( P )) , we show ∀ m : Reach ( P , m ) ⊆ Expand ( Reach ♯ ( P )) by straightforward induction on m , noting that s ∈ Expand ( s ♯ ) ,and Post ( Expand ( s ♯ ) , I ) ⊆ Post ♯ ( s ♯ , I ) for any Relational-state s ♯ ∈ Σ ♯ , instruction I , and any program P (not necessarily read-write-once).o show inclusion in the other direction, we prove ∀ m : Expand ( Reach ♯ ( P , m )) ⊆ Reach ( P ) by induction on m . For the base case, Expand ( Reach ♯ ( P , Reach ( P , ⊆ Reach ( P ) by the deﬁnition of the initial states. By induction, assume that Expand ( Reach ♯ ( P , m )) ⊆ Reach ( P ) Let s ′ ∈ Expand ( Reach ♯ ( P , m + 1)) . By Lemma 1, either s ′ isalready in Reach ♯ ( P , m ) , in which case the inductive hypothesisapplies and hence s ′ ∈ Reach ( P ) , or s ′ ∈ Post ( Expand ( Reach ♯ ( P , m ) , I j ) for some j . That is, there is a s ∈ Expand ( Reach ♯ ( P , m ) such that s ′ ∈ Post ( s, I j ) . From the induction hypothesis s ∈ Reach ( P ) . As Reach ( P ) is closed under Post , we conclude s ′ ∈ Reach ( P ) . ✷ At this point, we have shown that the Imperative semantics of read-write-once programs are equivalent to the Relational semantics. Allthat remains is to show that the translation procedure of Figure 7produces read-write-once programs. Unfortunately, this is not true.

Example.

Consider the following constraints: ∅ ⊢ { κ } , ∅ ⊢ { true } < : { κ } , x : κ ; y : κ ⊢ { true } < : { x = y } It is easy to check that on the above constraints, the translationprocedure yields the I MP program from the previous example,which is not read-write-once.The reason the translated program is not a read-write-once pro-gram is that there can be constraints G ⊢ T < : T in which κ occurs in multiple places within G and T .To solve this problem, we can simply clone the κ variables thatoccur multiple times inside a constraint, and use different clones ateach occurrence! We formalize this as a procedure Clone that mapsa ﬁnite set of constraints to another ﬁnite set. The procedure worksas follows. For each κ that is read upto n times in some constraint,we make n clones, κ , . . . , κ n , and1. for the i th occurence of κ within any constraint, we use the i th clone κ i (instead of κ ), and,2. for each constraint where κ appears on the right hand side,we make n clones of the constraints where in the i th clonedconstraint, we use κ i (instead of κ ).The ﬁrst step ensures that each κ is read-once in any constraint,and the second step ensures that the clones correspond to exactlythe same set of tuples as the original variable κ . We can prove that Clone enjoys the following properties.T

HEOREM Let C be a ﬁnite set of constraints.1. [[ Clone ( C )]] is a read-write-once program.2. Clone ( C ) is satisﬁable iff C is satisﬁable. It is easy to verify that [[ Clone ( C )]] is a read-write-once pro-gram. Furthermore, any satisfying solution for the original con-straints can be mapped directly to a solution for the cloned con-straints. To go in the other direction, we must map a solution thatsatisﬁes the cloned constraints to one that satisﬁes the original con-straints. This is trivial if the solution for the cloned constraintsmaps each clone κ i to the same set of tuples. We show that if thecloned constraints have a satisfying solution, they have a solutionthat satisﬁes the above property. To this end, we prove the follow-ing lemma that states that for any set of constraints, the satisfyingsolutions are closed under intersection. Program Time Invariant(sec) Reﬁnement Types max 0.091 κ . ≤ κ . ∧ κ . ≤ κ . κ x . = true , κ y . = true , κ . = x ≤ v ∧ y ≤ v sum 0.071 ≤ κ . ∧ κ . ≤ κ . κ k . = true , κ . = 0 ≤ v ∧ k ≤ v foldn 0.060 ≤ κ i . ∧ ≤ κ . ∧ κ . < κ . κ i . = 0 ≤ v, κ . = 0 ≤ v ∧ v < n arraymax 0.135 ≤ κ . ∧ ≤ κ . ∧ ≤ κ . ∧ κ g . < len ( κ g . κ . = ≤ v, κ . = 0 ≤ v,κ . = 0 ≤ v, κ g . = v < len ( a ) mask 0.098 κ . < len ( κ . ∧ κ . ≤ κ . ∧ ≤ κ . ∧ κ . < len ( κ . κ v < len ( xs ) ∧ i ≤ v,κ . = 0 ≤ v ∧ v < len ( a ) samples 0.117 ≤ κ . ∧ κ . < len ( κ . ∧ ≤ κ . ∧ κ . < len ( κ . ∧ ≤ κ . κ . = 0 ≤ v ∧ v < len ( b ) ,κ . = 0 ≤ v ∧ v < len ( a ) , κ . = 0 ≤ v Table 1.

Experimental evaluation using a predicate abstraction-based veriﬁcation tool on examples from [29]. The third columnpresents the invariant for the translated program, and the resultingreﬁnement types.L

EMMA If S and S are solutions that satisfy C then S ∩ S . = λκ.S ( κ ) ∩ S ( κ ) satisﬁes C . Thus if S satisﬁes the cloned constraints then by symmetry andLemma 3 the solution that maps each cloned variable to ∩ ni =1 S ( κ i ) also satisﬁes the cloned constraints, and hence, directly yields asolution to the original constraints.Finally, as a corollary of Theorems 1,2,3 we get our main resultthat reduces the question of reﬁnement type constraint satisfaction,to that of safety veriﬁcation.T HEOREM C is satisﬁable iff [[ Clone ( C )]] is Imperative-safe. While we state Theorems 1 and 3 as preserving satisﬁability, theproof shows how the solutions can be effectively mapped between C and [[ C ]] (or [[ Clone ( C )]] . In particular, while the intersectionof two non-trivial solutions can be a trivial solution, it would beguaranteed that in that case, the trivial solution satisﬁes C . Stated interms of invariants, Lemma 3 states the observation that that theremay be several non-comparable inductive invariants to prove asafety property, but in that case, the intersection of all the inductiveinvariants is also an inductive invariant.

6. Experiments

We have implemented a veriﬁcation tool for O

CAML programsbased on RTI. We use the liquid types infrastructure implementedin D

SOLVE [29] to generate reﬁnement type constraints fromO

CAML programs. We use ARMC [28], a software model checkerusing predicate abstraction and interpolation-based reﬁnement, asthe veriﬁer for the translated imperative program.Table 1 shows the results of running our tool on a suite of smallO

CAML examples from [29]. For array manipulating programs, thesafety objective is to prove array accesses are within bounds. For

MAX we prove that the output is larger than input values. For

SUM we prove that the sum is larger than the largest summation term.Table 2 presents the running time of our tool on the benchmarkprograms for the Depcegar veriﬁer [31]. We observe that despite ofour blackbox treatment of ARMC as a constraint solver we obtaincompetitive running times compared to Depcegar on most of theexamples (Depcegar uses a customized procedure for unfolding rogram Time boolﬂip.ml 2.17s 7 21sum.ml 0.24s 5 14sum-acm.ml 0.11s 1 3sum-all.ml 3.51s 10 26mult.ml 4.67s 10 25mult-cps.ml 780.24s 11 27mult-all.ml 18.44s 9 24boolﬂip-e.ml 0.65ssum-e.ml 0.01ssum-acm-e.ml 0.02ssum-all-e.ml 0.79smult-e.ml 0.01smult-cps-e.ml 7.69smult-all-e.ml 144.93s

Table 2.

Experimental evaluation of our tool on Depcegar bench-marks [31]. The third column presents the number of abstractionreﬁnment iterations required by ARMC. The last column gives thenumber of predicates discovered by ARMC. For the programs withsufﬁx “-e”, which are incorrect, we omit the number of iterationsand predicates and only show the time required by ARMC to ﬁnda counterexample.constraints and creating interpolation queries that yield reﬁnementtypes).Most of the predicates discovered by the interpolation-basedabstraction reﬁnement procedure implemented in ARMC fall intothe fragment “two variables per inequality.” The example

MASK required a predicate that refers to three variables, see κ . While ourinitial experiments used a CEGAR-based tool, we expect optimizedabstract interpreters for numerical domains to also work well forthis class of properties.

7. Extensions and Related Work

The soundness of safety veriﬁcation for higher-order programs forany domain follows from the soundness of constraint generation( e.g.,

Theorem 1 in [29]) and Theorem 4. Since the safety veriﬁ-cation problem for higher-order programs is undecidable, the tech-nique cannot be complete in general. Even in the ﬁnite-state case,in which each base type has a ﬁnite domain ( e.g., booleans), com-pleteness depends on the generation of type constraints. For exam-ple, in our examples and in our implementation, we have assumed a context insensitive constraint generation from program syntax, i.e., we have not distinguished the types of the same function at differ-ent call points. This entails a loss of information, as the followingexample demonstrates. Consider let check f x y = assert (f x = y) incheck (fun a -> a) false false ;check (fun a -> not a) false true where the builtin function assert has the type { ν : bool | ν } → unit . The reﬁnement template for check generated by our con-straint generation process is ( x : { ν : bool | κ } → { κ } ) → { κ } → { κ } → unit which is too weak to show that the program is safe. This is becausethe template “merges” the two call sites for check .One way to get context sensitivity is through intersection types [12, 14, 20, 25]. For the above example, we can show type safetyusing the following reﬁned type for check : V ( x : bool → { ν = x } ) → {¬ ν } → {¬ ν } → unit ( x : bool → { ν = ¬ x } ) → {¬ ν } → { ν } → unit It is important to note that Theorems 1 and 2 hold for any setof constraints. Thus, one way to get completeness in the ﬁnitestate case is to generate reﬁnement templates using intersectiontypes, perform the translation to I MP programs, and then using acomplete invariant generation technique for ﬁnite state systems.The key observation (made in [20]) that ensures a ﬁnite numberof constraints, is that there is at most a ﬁnite number of “contexts”in the ﬁnte state case, and hence a ﬁnite number of terms in theintersection types. The bad news is that the bound on the numberof contexts is exp n ( k ) , where n is the highest order of any functionin the program, k is the maximum arity of any function in theprogram, and exp n ( k ) is a stack of n exponentials, deﬁned by exp ( k ) = k , and exp n +1 ( k ) = 2 exp n ( k ) .Fully context-sensitive constraints are used in [20] to show com-pleteness in the ﬁnite case, at the price of exp n ( k ) in every case ,not just the worst case. In our exposition and our implementation,we have traded off precision for scalability: while we lose pre-cision by generating context-insensitive constraints, we avoid the exp n blow-up that comes with full context sensitivity. However, ithas been shown through practical benchmarks that since the typesthemselves capture relations between the inputs and outputs, thecontext-insensitive constraint generation sufﬁces to prove a varietyof complex programs safe [3, 18, 29].When considering completeness properties in special cases, wepoint out completeness wrt. the discovery of reﬁnement predicatesin octagons/difference bounds abstract domains [24] and template-based invariant generation for linear arithmetic [7] and extensionswith uninterpreted function symbols [5], which carries over fromrespective veriﬁcation approaches. Kobayashi [20, 21] gives an algorithmfor model checking arbitrary µ -calculus properties of ﬁnite-dataprograms with higher order functions by a reduction to modelchecking for higher-order recursion schemes (HORS) [26]. Forsafety veriﬁcation, RTI shows a promising alternative.First, the reduction to HORS critically depends on a ﬁnite-stateabstraction of the data. In contrast, our reduction defers the data ab-straction to the abstract interpreter working on the imperative pro-gram, thus enabling the direct application of abstract interpretersworking over inﬁnite domains. Since abstract interpreters over inﬁ-nite abstract domains are strictly more powerful than (inﬁnite fam-ilies of) ﬁnite ones [8], our approach can be strictly more powerfulfor inﬁnite-state programs.Second, in the translation of an abstracted program to a HORS,this algorithm eliminates Boolean variables by enumerating allpossible assignments to them, giving an exponential blow-up fromthe program to the HORS. In contrast, our technique preserves theBoolean state symbolically , enabling the use of efﬁcient symbolicalgorithms for veriﬁcation. For example, for the simple example: let f b1 ... bn x =if (b1 || ... || bn) then lock x;if (b1 || ... || bn) then unlock xin let f (*) ... (*) (newlock ()) where we wish to prove that lock and unlock alternate. Kobayashi’stranslation [20] gives an exponential sized HORS, with a version of f for each assignment to b1,...,bn . In contrast, our reduction pre-serves the source-level expressions and is linear, and amenable tosymbolic veriﬁcation techniques (e.g., BDDs). Previous experiencewith software model checking [2, 16, 17] shows that the number ofreachable states is often drastically smaller than p where p is thenumber of Booleans. Thus, the pre-processing step that enumeratesBooleans may not lead to a scalable implementation.ight [23] describes logic-ﬂow analysis , a general safety veriﬁ-cation algorithm for higher-order languages, which is the product ofa k -CFA like call-strings analysis and a form of SMT-based pred-icate abstraction (together with widening). In contrast, our workshows how higher-order languages can be analyzed directly via ab-stract analyses designed for ﬁrst-order imperative languages.Inference of reﬁnement types using conterexample-guided tech-niques was recentrly identiﬁed as a promising direction [31, 32]. Incontrast, our approach is not limited to CEGAR and facilitates theapplicability of a wide range abstract interpretation techniques forprecise reasoning about program data. Software Veriﬁcation.

This work was motivated by the recent suc-cess in software model checking for ﬁrst-order imperative pro-grams [2, 6, 16, 22], and the desire to apply similar techniques tomodern programming languages with higher order functions. Ourstarting point was reﬁnement types [14, 19], implemented in de-pendent ML [33] to give strong static guarantees, and the work onliquid types [18, 29] that applied predicate abstraction to infer re-ﬁnement types. By enabling the application of automatic invariantgeneration from software model checking, RTI reduces the needfor programmer annotations in reﬁnement type systems.

References [1] M. Abadi, L. Cardelli, P.-L. Curien, and J.-J. L´evy. Explicitsubstitutions.

Journal of Functional Programming , 1:375–416, 1991.[2] T. Ball and S.K. Rajamani. The SLAM project: debugging systemsoftware via static analysis. In

POPL , pages 1–3. ACM, 2002.[3] J. Bengtson, K. Bhargavan, C. Fournet, A.D. Gordon, and S. Maffeis.Reﬁnement types for secure implementations. In

CSF , 2008.[4] D. Beyer, T. Henzinger, R. Majumdar, and A. Rybalchenko. Invariantsynthesis in combination theories. In

VMCAI , 2007.[5] Dirk Beyer, Thomas A. Henzinger, Rupak Majumdar, and AndreyRybalchenko. Invariant synthesis for combined theories. In

VMCAI .Springer, 2007.[6] B. Blanchet, P. Cousot, R. Cousot, J. Feret, L. Mauborgne, A. Min´e,D. Monniaux, and X. Rival. A static analyzer for large safety-criticalsoftware. In

PLDI , pages 196–207, 2003.[7] Michael Col´on, Sriram Sankaranarayanan, and Henny Sipma. Linearinvariant generation using non-linear constraint solving. In

CAV .Springer, 2003.[8] P. Cousot and R. Cousot. Comparing the Galois connection andwidening/narrowing approaches to abstract interpretation. In

PLILP ,LNCS 631, pages 269–295. Springer-Verlag, 1992.[9] P. Cousot and N. Halbwachs. Automatic discovery of linear restraintsamong variables of a program. In

POPL . ACM Press, 1978.[10] S. Cui, K. Donnelly, and H. Xi. Ats: A language that combinesprogramming with theorem proving. In

FroCos , 2005.[11] L. Damas and R. Milner. Principal type-schemes for functionalprograms. In

POPL , 1982.[12] J. Dunﬁeld.

A Uniﬁed System of Type Reﬁnements . PhD thesis,Carnegie Mellon University, Pittsburgh, PA, USA, 2007.[13] C. Flanagan. Hybrid type checking. In

POPL . ACM, 2006.[14] T. Freeman and F. Pfenning. Reﬁnement types for ML. In

PLDI ,1991.[15] S. Gulwani and G.C. Necula. Discovering afﬁne equalities usingrandom interpretation. In

POPL , pages 74–84, 2003.[16] T.A. Henzinger, R. Jhala, R. Majumdar, and K.L. McMillan.Abstractions from proofs. In

POPL 04 . ACM, 2004.[17] H. Jain, F. Ivancic, A. Gupta, I. Shlyakhter, and C. Wang. Usingstatically computed invariants inside the predicate abstraction andreﬁnement loop. In

CAV , pages 137–151, 2006.[18] M. Kawaguchi, P. Rondon, , and R. Jhala. Type-based data structureveriﬁcation. In

PLDI , pages 304–315, 2009.[19] K. Knowles and C. Flanagan. Type reconstruction for generalreﬁnement types. In

ESOP , 2007. [20] N. Kobayashi. Types and higher-order recursion schemes forveriﬁcation of higher-order programs. In

POPL , 2009.[21] N. Kobayashi and C.-H.L. Ong. A type system equivalent to modal µ -calculus model checking of recursion schemes. In LICS , 2009.[22] K. L. McMillan. Lazy abstraction with interpolants. In

CAV . 2006.[23] Matthew Might. Logic-ﬂow analysis of higher-order programs. In

POPL , pages 185–198, 2007.[24] A. Min´e. The octagon abstract domain.

Higher-Order and SymbolicComputation , 19(1):31–100, 2006.[25] M. Naik and J. Palsberg. A type system equivalent to a model checker.

ACM Trans. Program. Lang. Syst. , 30(5), 2008.[26] C.-H.L. Ong. On model-checking trees generated by higher-orderrecursion schemes. In

LICS , 2006.[27] X. Ou, G. Tan, Y. Mandelbaum, and D. Walker. Dynamic typing withdependent types. In

IFIP TCS , pages 437–450, 2004.[28] A. Podelski and A. Rybalchenko. ARMC: The logical choice forsoftware model checking with abstraction reﬁnement. In

PADL ,2007.[29] P. Rondon, M. Kawaguchi, and R. Jhala. Liquid types. In

PLDI ,2008.[30] S. Sankaranarayanan, H.B. Sipma, and Z. Manna. Scalable analysisof linear systems using mathematical programming. In

VMCAI , 2005.[31] Tachio Terauchi. Dependent types from counterexamples. In

POPL .ACM, 2010.[32] Hiroshi Unno and Naoki Kobayashi. Dependent type inference withinterpolants. In

PPDP . ACM, 2009.[33] H. Xi and F. Pfenning. Dependent types in practical programming. In