HMC: Verifying Functional Programs Using Abstract Interpreters
aa r X i v : . [ c s . P L ] D ec Refinement Type Inference via Abstract Interpretation
Ranjit Jhala
UCSD [email protected]
Rupak Majumdar
UCLA [email protected]
Andrey Rybalchenko
Abstract
Refinement Types are a promising approach for checking behav-ioral properties of programs written using advanced language fea-tures like higher-order functions, parametric polymorphism and re-cursive datatypes. The main limitation of refinement type systemsto date is the requirement that the programmer provides the typesof all functions, after which the type system can check the typesand hence, verify the program.In this paper, we show how to automatically infer refinementtypes, using existing abstract interpretation tools for imperativeprograms. In particular, we demonstrate that the problem of refine-ment type inference can be reduced to that of computing invari-ants of simple, first-order imperative programs without recursivedatatypes. As a result, our reduction shows that any of the widevariety of abstract interpretation techniques developed for impera-tive programs, such as polyhedra, counterexample guided predicateabstraction and refinement, or Craig interpolation, can be directlyapplied to verify behavioral properties of modern software in a fullyautomatic manner.
1. Introduction
Automatic verification of semantic properties of modern program-ming languages is an important step toward reliable softwaresystems. For higher-order programming languages with inductivedatatypes or polymorphic instantiation, the main verification toolhas been type systems, which traditionally capture only coarsedata-type properties (such as int s are only added to int s), andrequire the programmer to explicitly annotate program invariants ifmore precise invariants about program computations are required.For example, refinement type systems [33] associate data typeswith refinement predicates that capture richer properties of programcomputation. Using refinement types, one can state, for instance,that a program variable xs has the refinement type “non-zero inte-ger,” or that the integer division function has the refinement type int → { ν : int | ν = 0 } → int which states that the second ar-gument must be non-zero. Then if a program with refinement typetype-checks, one can assert that there is no division-by-zero error inthe program. The idea of refinement types to express precise pro-gram invariants is well-known [3, 10, 12, 13, 27, 33]. However, ineach of the above systems, the programmer must provide refine-ments for each program type, and the type system checks the pro-vided type refinements for consistency. We believe that this burdenof annotations has limited the widespread adoption of refinementtype systems.For imperative programming languages, algorithms based onabstract interpretation can be used to automatically infer many pro-gram invariants [2,6,16], thereby proving many semantic propertiesof practical interest. However, these tools do not precisely modelmodern programming features such as closures and higher-orderfunctions or inductive datatypes, and so in practice, they are tooimprecise when applied to higher-order programs. OCaml Program(with assertions)Constraint GenerationRTI TranslationSubtyping ConstraintsAbs. InterpretationSimple IMP ProgramSafe Unsafe Figure 1.
RTI algorithm.In this paper, we present an algorithm to automatically verifyproperties of higher-order programs through refinement type in-ference (RTI) by combining refinement type systems for higher-order programs with invariant synthesis techniques for first-orderprograms. Our main technical contribution is a translation fromtype constraints derived from a refinement type system for higher-order programs to a first-order imperative program with assertions,such that the assertions hold in the first-order program iff there isa refinement type that makes the higher-order program type-check.Moreover, a suitable type refinement for the higher-order programcan be constructed from the invariants of the first-order program.Thus, our algorithm replaces the manual annotation burden for re-finement types with automatically constructed program invariantson the translated program, thus enabling fully automatic verifica-tion of programs written in modern languages.The RTI algorithm (Figure 1) proceeds in three steps.
Step 1: Type-Constraint Generation.
First, it performs Hindley-Milner type inference [11] to construct ML types for the program,and uses these types to generate refinement templates , i.e., types inwhich refinement variables κ are used to represent the unknownrefinement predicates. Then, the algorithm uses a standard syntax-directed procedure to generate subtyping constraints over the tem-plates such that the program type checks ( i.e., is safe) if the subtyp-ing constraints are satisfiable [3, 19, 29, 33]. Step 2: Translation.
Second, it translates the set of type constraintsto a first-order, imperative program over base values such that thetype constraints are satisfiable if and only if the imperative programdoes not violate any assertions.
Step 3: Abstract Interpretation.
Finally, an abstract interpretationtechnique for first order imperative programs is used to prove thatthe first order program is safe. The proof of safety produced bythis analysis automatically translates to solutions to the refinementype variables, thus generating refinement types for the original MLprogram.The main contribution of this paper is the RTI translation al-gorithm. The advantage of the translation is that it allows oneto apply any of the well-developed semantic imperative programanalyses based on abstract interpretation ( e.g., polyhedra [9] andoctagons [6], counterexample-guided predicate abstraction refine-ment (CEGAR) [2, 16], Craig interpolation [16, 22], constraint-based invariant generation [4, 30] random interpretation [15], etc. )to the verification of modern software with polymorphism, induc-tive datatypes, and higher-order functions. Instead of painstakinglyreworking each semantic analysis for imperative programs to thehigher order setting, possibly re-implementing them in the process,one can use our translation, and apply any existing analysis as is.In fact, using the translation, our implementation directly uses aCEGAR and interpolation based safety verification tool to verifyproperties of O
CAML programs.In essence, our algorithm separates syntactic reasoning aboutfunction calls and inductive data types (handled well by typingconstraints) from semantic reasoning about data invariants (handledwell by abstract domains). The translation from refinement typeconstraints to imperative programs in Step 2 is the key enabler. Thetranslation, and the proof that the satisfiability of type constraintsand safety of the translated program are equivalent, are based onthe following observations.The first observation is that refinement type variables κ define relations over the value being defined by the refinement type andthe finitely many variables that are in-scope at the point wherethe type is defined. In the imperative program, each finite-arityrelation can be encoded with a variable that encodes a relation.Each refinement type constraint can be encoded as a straight-linesequence that reads tuples from and writes tuples to the relationvariables, and the set of constraints can be encoded as a non-terminating while-loop that in each iteration, non-deterministicallyexecutes one of the blocks. Thus, the problem of determining theexistence of appropriate relations reduces to that of computing(overapproximations) of the set of tuples in each relation variablein the translated program (Theorem 1).Our second observation is that if the translated program is in aspecial read-write-once form, where within each straight-line blocka relation variable is read and written at most once , then one canreplace all relation-valued variables with variables whose valuesrange over tuples (Theorem 2). Moreover, we prove that we can,without affecting satisfiability, preprocess the refinement typingconstraints so that the translated program is a read-write-once pro-gram (Theorem 3). Together, the observations yield a simple anddirect translation from refinement type inference to simple impera-tive programs.We have instantiated our algorithm in a verification tool forO CAML programs. Our implementation generates refinement typeconstraints using the algorithm of [29], and uses the ARMC [28]software model checker to verify the translated programs. Thisallows fully automatic verification of a set of O
CAML bench-marks for which previous approaches either required manual an-notations (either the refinement types [33] or their constituentpredicates [29]), or an elaborate customization and adaptation ofthe counterexample-guided abstraction refinement paradigm [31].Thus, we show, for the first time, how abstract interpretation can belifted “as-is” to the practical refinement type inference for modern,higher-order languages.While we have focused on the verification of functional pro-grams, our approach is language independent, and requires only anappropriate refinement type system for the source language. let rec iteri i xs f =match xs with| [] -> ()| x::xs’ -> f i x;iteri (i+1) xs’ flet mask a xs =let g j y = a.(j) <- y && a.(j) inif Array.length a = List.length xs theniteri 0 xs g
Figure 2.
ML Example
2. Overview
We begin with an example that illustrates how our refinement typeinference (RTI) algorithm combines type constraints and abstractinterpretation to automatically verify safety properties of functional
ML programs with higher-order functions and recursive structures.We show that the combination of syntactic type constraints andsemantic abstract interpretation enables the automatic verificationof properties that are currently beyond the scope of either techniquein isolation.
An ML Example.
Figure 2(a) shows a simple ML program thatupdates an array a using the elements of the list xs . The programcomprises two functions. The first is a higher-order list indexed-iterator , iteri , that takes as arguments a starting index i , a (poly-morphic) list xs , and an iteration function f . The iterator goes overthe elements of the list and invokes f on each element and the in-dex corresponding to the element’s position in the list. The secondis a client, mask , of the iterator iteri that takes as input a booleanarray a and a list of boolean values xs , and if the lengths match,calls the indexed iterator with an iteration function g that masks the j th element of the array.Suppose that we wish to statically verify the safety of the arrayreads and writes in function g ; that is to prove that whenever g isinvoked, ≤ j < len ( a ) . As this example combines higher-orderfunctions, recursion, data-structures, and arithmetic constraints onarray indices, it is difficult to analyze automatically using eitherexisting type systems or abstract interpretation implementations inisolation. The former do not precisely handle arithmetic on indices,and the latter do not precisely handle higher-order functions and areoften imprecise on data structures. We show how our RTI techniquecan automatically prove the correctness of this program. Refinement Types.
To verify the program, we compute programinvariants that are expressed as refinements of ML types with pred-icates over program values [3,19,29]. The predicates are additionalconstraints that must be satisfied by every value of the type. Abase value, say of type int , can be described by the refinementtype { ν : int | p } where ν is a special value variable representingthe type being defined, and p is a refinement predicate which con-strains the range of ν to a subset of integers. For example, the type { ν : int | ≤ ν < len ( a ) } denotes the set of integers c that arebetween and the value of the expression len ( a ) . Thus, the un-refined type int abbreviates { ν : int | true } , which does not con-strain the set of integers. Base types can be combined to construct dependent function types , written x : T → T , where T is the typeof the domain, T the type of the range, and where the name x forthe formal parameter can appear in the refinement predicates in T .For example, the type x : { ν : int | ν ≥ } → { ν : int | ν = x + 1 } is the type of a function which takes a non-negative integer param-eter and returns an output which is one more than the input. In thefollowing, we write τ for the type { ν : τ | true } . When ν and τ areclear from the context, we write { p } for { ν : τ | p } . afety Specification. Refinement types can be used to specify safety properties by encoding pre-conditions into primitive oper-ations of the language. For example, consider the array read a . ( j ) (resp. write a . ( j ) ← e ) in g which is an abbreviation for get a j (resp. set a j e ). By giving get and set the refinement types a : α array → { ν : int | ≤ ν < len ( a ) } → α , a : α array → { ν : int | ≤ ν < len ( a ) } → α → unit , we can specify that in any program the array accesses must bewithin bounds. More generally, arbitrary safety properties can bespecified by giving assert the appropriate refinement type [29]. Safety Verification.
The ML type system is too imprecise to provethe safety of the array accesses in our example as it infers that g has type j : int → y : bool → unit , i.e., that g can be called with any integer j . If the programmer manually provides the refine-ment types for all functions and polymorphic type instantiations,refinement-type checking [3, 12, 33] can be used to verify that theprovided types were consistent and strong enough to prove safety.This is analogous to providing pre- and post-conditions and loop-invariants for verifying imperative programs. For our example, therefinement type system could check the program if the programmerprovided the types: iteri :: i : int → xs : { ν : α list | ≤ len ( ν ) } → ( j : { i ≤ ν < len ( xs ) } → α → unit ) → unitg :: j : { ≤ ν < len ( a ) } → bool → unit Here, we omitted refinement predicates that are equal to true, e.g.,for i in the type of iteri . Automatic Verification via RTI.
As even this simple example il-lustrates, the type annotation burden for verification is extremelyhigh. Instead, we would like to verify the program without requir-ing the programmer to provide every refinement type. The RTI al-gorithm proceeds in three steps. First, we syntactically analyze the source program to generate subtyping constraints over refinementtemplates. Second, we translate the constraints into an equivalentsimple imperative target program. Third, we semantically analyzethe target program to determine whether it is safe, from which weconclude that the constraints are satisfiable and hence, the sourceprogram is safe. Next, we illustrate these steps using Figure 2 asthe source program.
In the first step, we generate a system of refinement type constraintsfor the source program [19, 29]. To do so, we (a) build templatesthat refine the ML types with refinement variables that stand forthe unknown refinements, and (b) make a syntax-directed pass overthe program to generate subtyping constraints that capture the flowof values. For the functions iteri and g from Figure 2, with therespective ML types i : int → xs : α list → ( j : int → α → unit ) → unitj : int → bool → unit we would generate the respective templates i : int → xs : { ≤ len ( ν ) } → ( j : { κ } → α → unit ) → unitj : { κ } → bool → unit Notice that these templates simply refine the ML types with refine-ment variables κ , κ that stand for the unknown refinements. Forclarity of exposition, we have added the refinement true for somevariables ( e.g., for the type α and bool ); our system would auto-matically infer the unknown refinements. We model the length oflists (resp. arrays) with an uninterpreted function len from the lists(resp. arrays) to integers, and (again, for brevity) add the refinementstating xs has a non-negative length in the type of iteri . After creating the templates, we make a syntax-directed passover the program to generate constraints that capture relationshipsbetween refinement variables. There are two kinds of type con-straints – well-formedness and subtyping . Well-formedness Constraints capture scoping rules, and ensurethat the refinement predicate for a type can only refer to variablesthat are in scope. Our example has two constraints: i : int ; xs : α list ⊢ { ν : int | κ } (w1) a : bool array ; xs : α list ⊢ { ν : int | κ } (w2)The first constraint states that κ , which represents the unknownrefinement for the first parameter passsed to the higher-order iter-ator iteri , can only refer to the two program variables that arein-scope at that point, namely i and xs . Similarly, the second con-straint states that κ , which refines the first argument of g , can onlyrefer to a and xs , which are in scope where g is defined. Subtyping Constraints reduce the flow of values within theprogram into subtyping relationships that must hold between thesource and target of the flow. Each constraint is of the form G ⊢ T < : T where G is an environment comprising a sequence of type bindings,and T and T are refinement templates. The constraint intuitivelystates that under the environment G , the type T must be a subtypeof T . The subtyping constraints are generated syntactically fromthe code. First consider the function iteri . The call to f generates G ⊢ { ν = i } < : { κ } (c1)where the environment G comprises the bindings G . = i : { true } ; xs : { ≤ len ( ν ) } ; x : { true } ; xs ′ : { ≤ len ( ν ) = len ( xs ) − } the constraint ensures that at the callsite, the type of the actual is asubtype of the formal. The bindings in the environment are simplythe refinement templates for the variables in scope at the point thevalue flow occurs. The type system yields the information that thelength of xs ′ is one less than xs as the former is the tail of thelatter [18, 33]. Similarly, the recursive call to iteri generates G ⊢ { j : κ : → α → unit } < : { ( j : κ → α → unit )[ i + 1 / i ][ xs ′ / xs ] } which states that type of the actual f is a subtype of the thirdformal parameter of iteri after applying substitutions [ i + 1 / i ] and [ xs ′ / xs ] that capture the passing in of the actuals i + 1 and xs ′ for the first two parameters respectively. By pushing thesubstitutions inside and applying the standard rules for functionsubtyping, this constraint simplifies to G ⊢ { κ [ i / i + 1][ xs / xs ′ ] } < : { κ } (c2)Next, consider the function mask . The array accesses inside g generate the “bounds-check” constraint G ′ ; j : { κ } ; y : { true } ⊢ { ν = j } < : { ≤ ν < len ( a ) } (c3)where G ′ . = a : bool array ; xs : { ≤ len ( ν ) } has bindings forthe other variables in scope. Finally, the flow due to the thirdparameter for the call to iteri yields G ′ ; len ( a ) = len ( xs ) ⊢ { j : κ → τ } < : { ( j : κ → τ )[0 / i ] } here for brevity we write τ for bool → unit , and omit the trivialsubstitution [ xs / xs ] due to the second parameter. The last conjunctin the environment captures the guard from the if under whoseauspices the call occurs. By pushing the substitutions inside andapplying standard function subtyping, the above reduces to G ′ ; len ( a ) = len ( xs ) ⊢ { κ [0 / i ] } < : { κ } (c4)For brevity we omit trivial constraints like · ⊢ int < : int . If theset of constraints constructed above is satisfiable, then there is avalid refinement typing of the program [29], and hence the programis safe. Determining the satisfiability of the constraints requires semanticanalysis about program computations. In the second step, our keytechnical contribution, we show a translation that reduces the con-straint satisfiability problem to checking the safety of a simple, im-perative program. Our translation is based on two observations.
Refinements are Relations.
The first observation is that type re-finements are defined through relations : the set of values denotedby a refinement type { ν : τ | p } where p refers to the program vari-ables x , . . . , x n of the respective base types τ , . . . , τ n is equiva-lent to the set { t | ∃ ( t , . . . , t n ) s.t. ( t , t , . . . , t n ) ∈ R p ∧ t = x ∧ . . . t n = x n } where R p is an ( n + 1) -ary relation in τ × τ × . . . × τ n definedby p . For example, the set of values denoted by { ν : int | ν ≤ i } is equivalent to the set: { t | ∃ t s.t. ( t , t ) ∈ R ≤ ∧ t = i } , where R ≤ is the standard ≤ -ordering relation over the integers. Inother words, each refinement variable κ can be seen as the projec-tion on the first co-ordinate of a ( n + 1) -relation over the variables ( ν, x , . . . , x n ) , where x , . . . , x n are the variables in the well-formedness constraint for κ ( i.e., the variables in scope of κ ). Thus,the problem of determining the satisfiability of the constraints isanalogous to the problem of determining the existence of appropri-ate relations. Relations are Records.
The second observation is that the problemof finding appropriate relations can be reduced to the problemof analyzing a simple imperative program with variables rangingover relations. In the imperative program, each refinement variable,standing for an n -ary relation, is translated into a record variablewith n -fields. Each subtyping constraint can be translated into ablock of reads-from and writes-to the corresponding records. Theset of all tuples that can be written into a given record on someexecution of the program defines the corresponding relation. Theentire program is an infinite loop, which in each iteration non-deterministically chooses a block of reads and writes defined bya constraint.The arity of a relation, and hence the number of fields of thecorresponding record, is determined by the well-formedness con-straints. For example, the constraint (w1) specifies that κ corre-sponds to a ternary relation, that is, a set of triples where the th element (corresponding to ν ) is an integer, the st element (corre-sponding to i ) is an integer, and the nd element (corresponding to xs ) is a list. We encode this in the imperative program via a recordvariable κ with three fields κ . , κ . and κ . .Figure 3 shows the imperative program translated from the con-straints for our running example. We use the subtyping constraintsto define the flow of tuples into records. For example, consider theconstraint (c2) which is translated to the block marked /*c2*/ .Each variable in the type environment is translated to a correspond- ing variable in the program. The block has a sequence of assign-ments that define the environment variables. For example, we know i has type int , so there is an assignment of an arbitrary integerto i . When there is a known refinement in the binding, the non-deterministic assignment is followed by an assume operation (aconditional) that establishes that the value assigned satisfied thegiven refinement. For example xs gets assigned an arbitrary value,but then the assume establishes the fact that the length of xs is non-negative. Similarly xs ′ gets assigned an arbitrary value, that hasnon-negative length and whose length is 1 less than that of xs . TheLHS of (c2) reads a tuple from κ whose first and second fields areassumed to equal the i + 1 and xs ′ respectively. Finally, the triple ( ν, i , xs ) is written into the record κ which is the RHS of (c2).Next, consider the translated block for the bounds-check con-straint (c3). Here, the translation is as before but the RHS is aknown refinement predicate (that stipulates the integer be withinbounds). In this case, instead of writing into the record that definesthe RHS, the translation contains an assertion over the correspond-ing variables that ensures that the refinement predicate holds. Relational vs. Imperative Semantics.
There is a direct correspon-dence between the refinement-relations and the record variableswhen the translated program is interpreted under a Relational se-mantics, where (1) the records range over (initially empty) sets oftuples , (2) each write adds a new tuple to the record’s set, and,(3) each read non-deterministically selects some tuple from therecord’s set. Under these semantics, we can show that the con-straints are satisfiable iff the imperative program is safe ( i.e., noassert fails on any execution) (Theorem 1).Unfortunately, these semantics preclude the direct applicationof mature invariant generation and safety verification techniques e.g., those based on abstract interpretation or CEGAR-based soft-ware model checking, as those techniques do not deal well withset-valued variables. We would like to have an imperative seman-tics where each record contains a single value, the last tuple writtento it. We show that there is a syntactic subclass of programs forwhich the two semantics coincide. That is, a program in the sub-class is safe under the imperative semantics if and only if it is safeunder the set-based semantics (Theorem 2). Furthermore, we showa technique that ensures that the translated program belongs to thesubclass (Theorem 3).The attractiveness of the translation is that the resulting pro-grams fall in a particularly pleasant subclass of programs which donot have any advanced language features like higher-order func-tions, polymorphism, and recursive data structures, or variablesover complex types such as sets, that are the bane of semantic anal-yses. Thus, the translation yields simple imperative programs towhich a wide variety of semantic analyses directly apply.
Together these results imply that we can run off-the-shelf abstractinterpretation and invariant generation tools on the translated pro-gram, and use the result of the analysis to determine whether theoriginal ML program is typable.For the translated program shown in Figure 3, the CEGAR-based software model checker ARMC [28] finds that the assertionis never violated, and computes the invariants: κ . ≤ κ . ∧ κ . < len ( κ . ≤ κ . < len ( κ . which, when plugging in ν , i and xs for the th , st , nd fieldsof κ and ν , a for the th , st fields of κ respectively, yields therefinements κ . = i ≤ ν < len ( xs ) κ . = 0 ≤ ν < len ( a ) oop { / ∗ c1 ∗ / i ← nondet (); xs ← nondet (); assume (0 ≤ len ( xs )); xs ′ ← nondet (); assume (0 ≤ len ( xs ′ ) = len ( xs ) − ν ← nondet (); assume ( ν = i ); κ ← ( ν, i , xs )[] / ∗ c2 ∗ / i ← nondet (); xs ← nondet (); assume (0 ≤ len ( xs )); xs ′ ← nondet (); assume (0 ≤ len ( xs ′ ) = len ( xs ) − t , t , t ) ← κ ; assume ( t = i + 1); assume ( t = xs ′ ); ν ← t ; κ ← ( ν, i , xs )[] / ∗ c3 ∗ / a ← nondet (); xs ← nondet (); assume (0 ≤ len ( xs ));( t , t , t ) ← κ ; j ← t ; assert (0 ≤ j < len ( a ))[] / ∗ c4 ∗ / a ← nondet (); xs ← nondet (); assume (0 ≤ len ( xs )); assume ( len ( a ) = len ( xs ));( t , t , t ) ← κ ; assume ( t = 0); assume ( t = xs ); ν ← t ; κ ← ( ν, a , xs ) } Figure 3.
Translated Programwhich suffice to typecheck the original ML. Indeed, these predi-cates for κ and κ are easily shown to satisfy the constraints (c1),(c2), (c3), and (c4).
3. Constraints
We start by formalizing constraints over types refined with predi-cates. To this end, we make precise the notions of refinement predi-cates (Section 3.1), refinement types (Section 3.2), constraints overrefinement types and the notion of satisfaction (Section 3.3).A discussion of how such constraints can be generated in asyntax-guided manner from program source is outside the scopeof this paper; we refer the reader to the large body of prior researchthat addresses this issue [3, 19, 29, 33].
Notation.
We use uppercase ( Z ) to denote sets, lowercase z todenote elements, and h Z i for a sequence of elements in Z . Figure 4 shows the syntax of refinement predicates. In our discus-sion, we restrict the predicate language to the typed quantifier-freelogic of linear integer arithmetic and uninterpreted functions. How-ever, it is straightforward to extend the logic to include other do-mains equipped with effective decision procedures and abstract in-terpreters.
Types and Environments.
Our logic is equipped with a fixed set of types denoted τ , comprising the basic types int for integer values, bool for boolean values, and ui , a family of uninterpreted types that are used to encode complex source language types such asproducts, sums, polymorphic type variables, recursive types etc. .We assume there is a fixed set of uninterpreted functions. Eachuninterpreted function f has a fixed type τ f . = h τ i f i → τ o f . An environment is a sequence of variable-type bindings. Expressions and Predicates.
In our logic, expressions e comprisevariables, linear arithmetic ( i.e., addition and multiplication by con-stants), and applications of uninterpreted functions f . Note that as isstandard in semantic program analyses, complex operations like di-vision or non-linear multiplication be modelled using uninterpretedfunctions. Finally, predicates comprise atomic comparisons of ex-pressions, or boolean combinations of sub-predicates. We write true (resp. false ) as abbreviations for (resp. ). Well-formedness.
We say that a predicate p is well-formed in anenvironment Γ if every variable appearing in p is bound in Γ and p is “type correct” in the environment Γ . Validity.
For each type τ , we write U ( τ ) to denote the set ofconcrete values of τ . An interpretation σ is a map from variables x to concrete values, and functions f to maps from U ( h τ i f i ) to U ( τ o f ) .We say that σ is valid under Γ if for each x : τ ∈ Γ , we have σ ( x ) ∈ U ( τ ) . We say that a predicate p is valid in an environment Γ , if σ ( p ) evaluates to true for every σ valid under Γ . Figure 4 shows the syntax of refinement types and environments.
Refinements. A refinement r is either a predicate p drawn fromour logic, or a refinement variable with pending substitutions κ [ y /x ] . . . [ y n /x n ] . Intuitively, the former represent known re-finements (or invariants), while the latter represent the unknown in-variants that hold of different program values. The notion of pend-ing substitutions [1, 19] offers a flexible way of capturing the valueflow that arises in the context of function parameter passing (in thefunctional setting), or assignment (in the imperative setting), evenwhen the underlying invariants are unknown. Refinement Types and Environments. A refinement type { ν : τ | r } is a triple consisting of a value variable ν denoting thevalue being described by the refinement type, a type τ describingthe underlying type of the value, and a refinement r . A refinementenvironment G is a sequence of refinement type bindings.The value variables are special variables distinct from theprogram variables, and can occur inside the refinement pred-icates. Thus, intuitively, the refinement type describes the setof concrete values of the underlying type τ which addition-ally satisfy the refinement predicate. For example, the refinementtype: { ν : int | ν = 0 } describes the set of non-zero integers and, { ν : int | ν = x + y } describes the set of integers whose valueequals the sum of the values of the (program) variables x and y .Note that path-sensitive branch information can be capturedby adding suitable bindings to the refinement environment. Forexample, the fact that some expression is only evaluated under theif-condition that x > can be captured in the environment via arefinement type binding x b : { ν : bool | x > } . Figure 4 shows the syntax of refinement constraints. Our refinementtype system has two kinds of constraints.
Subtyping Constraints are of the form G ⊢ { ν : τ | r } < : { ν : τ | r } Intuitively, a subtyping constraint states that when the programvariables satisfy the invariants described in G , the set of valuesdescribed by the refinement r must be subsumed by the set ofvalues described by the refinement type r . Well-formedness Constraints are of the form Γ ⊢ { ν : τ | r } . In-tuitively, a well-formedness constraints states that the refinement r must be a well-typed predicate in the environment G extended withthe binding ν : τ for the value variable. Embedding.
To formalize the notions of constraint validity and sat-isfaction, we embed subtyping constraints into our logic. We definehe function
Emb ( · ) that maps refinement types, environments andsubtyping constraints to predicates in our logic. Emb ( { ν : τ | p } ) . = p Emb ( x : T ; G ) . = Emb ( T )[ ν/x ] ∧ Emb ( G ) Emb ( ∅ ) . = true Emb ( G ⊢ T < : T ) . = Emb ( G ) ⇒ Emb ( T ) ⇒ Emb ( T ) Similarly, we define the function
Shape ( · ) that maps refinementtypes and environments to types and environments in our logic. Shape ( { ν : τ | p } ) . = τ Shape ( x : T ; G ) . = x : Shape ( T ); Shape ( G ) Shape ( ∅ ) . = ∅ Validity.
A subtyping constraint G ⊢ T < : T that doesnot contain refinement variables is valid if the predicate Emb ( G ⊢ T < : T ) is valid under environment Shape ( G ) . Awell-formedness constraint Γ ⊢ { ν : τ | p } that does not containrefinement variables is valid if the predicate p is well-formed in theenvironment Γ . Relational Interpretations.
We assume, without loss of generality,that each refinement variable κ is associated with a unique well-formedness constraint x : τ ; . . . ; x n : τ n ⊢ { ν : τ | κ } calledthe well-formedness constraint for κ . In this case, we say κ has arity n + 1 . Furthermore, we assume that wherever a κ of arity n + 1 appears in a subtyping constraint, it appears with a sequenceof n pending substitutions [ y /x ] . . . [ y n /x n ] . This assumptionis without loss of generality, as we can enforce it with trivialsubstitutions of the form [ x i /x i ] . A relational interpretation for κ of arity n + 1 , is an ( n + 1) -ary relation in U ( τ ) × . . . × U ( τ n ) . A relational model is a map from refinement variables κ to relationalinterpretations. Constraint Satisfaction.
A set of constraints C is satisfiable iffor all interpretations for uninterpreted functions f , there exists arelational model S such that, when each occurrence of a refinementtype { ν : τ | κ [ y /x ] . . . [ y n /x n ] } in C is substituted with { ν : τ | ∃ t , . . . , t n .S ( κ )( ν, t , . . . , t n ) ∧ t = y ∧ . . . t n = y n ) } every subtyping constraint after the substitution is valid. In thiscase, we say that S is a solution for C .
4. Imperative Programs
RTI translates the satisfiability problem for refinement type con-straints to the question of checking the safety of an imperative pro-gram in a simple imperative language I MP . In this section, we for-malize the syntax of I MP programs and define the Relational se-mantics and the Imperative semantics. Figure 5 shows the syntax of I MP programs. An instruction ( I )is a sequence of assignments, assumptions and assertions. A pro-gram ( P ) is an infinite loop over a block, whose body is anon-deterministic choice between a finite number of instructions I , . . . , I n . Next, we describe the different kinds of instructions.For ease of notation, we assume that there is only one base type τ ,and let V denote the set of values of type τ . Variables. I MP programs have two kinds of variables. (1) base vari-ables, denoted by ν , x , y and t (and subscripted versions thereof),which range over values of type τ . (2) relation variables, denotedby κ , each of which have a fixed arity n and range over tuples ofvalues or sets of n -tuples of values depending on the semantics. Base Assignments. I MP programs have two kinds of assignmentsto base variables. Either (1) an expression over base variables(cf. Figure 4) is evaluated and assigned to the base variable, or, τ ::= Types: | int base type of integers | bool base type of booleans | ui complex uninterpreted type Γ ::=
Environments: | x : τ ;Γ binding | ∅ empty e ::= Expressions: | x variable | n integer | e + e addition | n × e affine multiplication | f ( h e i ) function application p ::= Predicates: | e ⊲⊳ e comparison | ¬ p negation | p ∧ p conjunction | p ⇒ p implication r ::= Refinements: | p predicate | κ [ y /x ] . . . [ y n /x n ] ref. var. with substitutions T ::= { ν : τ | r } Refinement Types G ::= Refinement Environments: | x : T ; G binding | ∅ empty c ::= G ⊢ T < : T Subtype Constraints w ::= Γ ⊢ T WF ConstraintsFigure 4. Predicates, Refinements and Constraints. I ::= Instructions: | x ← e assign expr | x ← nondet () havoc | ( t , . . . , t n ) ← κ get tuple | κ ← ( x , . . . , x n ) set tuple | assume ( p ) assume | assert ( p ) assert | I ; I sequence P ::= loop { I [] . . . [] I n } Program
Figure 5. Imperative Programs: Syntax (2) an arbitrary value of the appropriate base type is assignedto the base variable, i.e., the variable is “havoc-ed” with a non-deterministically chosen value.
Tuple Assignments.
The operations get tuple and set tuple respec-tively read a tuple from and write a tuple to a relation variable.
Assumes and Asserts. I MP programs have the standard assumeand assert instructions using predicates over the base variables (cf.Figure 4). We write skip as an abbreviation for assume (0 = 0) . We define the Relational semantics as a state transition system. Inthis semantics, κ variables range over sets of tuples over V . Relational States.
A state s ♯ in the Relational semantics is eitherthe special error state E or a map from program variables to valuessuch that every base variable is mapped to a value in V , and everyrelation variable of arity n is mapped to a (possibly empty) set oftuples in V n . Let Σ ♯ be the set of all Relational-program states.or a state s ♯ which is not E , variable x and value v we write s ♯ [ x v ] for the map which maps x to v and every other key x ′ to s ♯ ( x ′ ) . We lift maps s ♯ from base variables to values to maps fromexpressions (and predicates) to values in in the natural way. Initial State.
The initial state s ♯ of an I MP program in the Rela-tional semantics is a map in which every base variable is mappedto a fixed value from V , and every relation variable is mapped tothe empty set. Transition Relation.
The transition relation is defined through a
Post ♯ operator, shown in Figure 6, which maps a state s ♯ and aninstruction I to the set of states that the program can be in after executing the instruction from the state s ♯ . We lift Post ♯ to a set ofstates ˆΣ ♯ ⊆ Σ ♯ in the natural way: Post ♯ ( ˆΣ ♯ , I ) . = [ { Post ♯ ( s ♯ , I ) | s ♯ ∈ ˆΣ ♯ } Notice that the program halts if a get instruction is executed withan empty relation variable, or an assume ( p ) is executed in a statethat does not satisfy p . Safety.
Let P be the program loop { I [] . . . [] I n } . The set of Relational-reachable states of P , denoted Reach ♯ ( P ) is defined byinduction as: Reach ♯ ( P , . = { s ♯ } Reach ♯ ( P , m + 1) . = S { Post ♯ ( Reach ♯ ( P , m ) , I j ) | ≤ j ≤ n } Reach ♯ ( P ) . = S { Reach ♯ ( P , m ) | ≤ m } A program P is Relational-safe if E 6∈
Reach ♯ ( P ) . Next, we define the Imperative semantics, as a state transitionsystem. In this semantics, κ variables κ range over tuples over V . Imperative States.
In the Imperative semantics, each state s iseither the special error state E or a map from program variablesto values such that every base variable is mapped to a value in V ,and every relation variable of arity n is mapped either to a tuple in V n or to the special undefined value ⊥ . Let Σ denote the set of alla Imperative-program states. Initial State.
The initial state s of an I MP program in the Impera-tive semantics is a map in which every base variable is mapped toa fixed value from V , and every relation variable is mapped to ⊥ . Transition Relation.
The transition relation is defined using a
Post operator, which is identical to
Post ♯ in the Relational semanticsexcept for the tuple-get and tuple-set instructions. Figure 6 showsthe operator Post for get and set operations. Again,
Post is liftedto a set of states in the natural way. Notice that the program halts ifa get instruction is executed with an undefined relation variable, oran assume ( p ) is executed in a state that does not satisfy p . Safety.
Let P be the program loop { I [] . . . [] I n } . The set of Imperative-reachable states of P , denoted Reach ( P ) is defined byinduction as: Reach ( P , . = { s } Reach ( P , m + 1) . = S { Post ( Reach ( P , m ) , I j ) | ≤ j ≤ n } Reach ( P ) . = S { Reach ( P , m ) | ≤ m } A program P is Imperative-safe if E 6∈
Reach ( P ) .
5. From Type Constraints to I MP Programs
In this section we formalize the translation from type constraintsinto I MP programs and prove that the constraints are satisfiable ifand only if the translated program is safe. Refinement Type Translation [[ { ν : τ | p } ]] get . = ν ← nondet (); assume ( p )[[ { ν : τ | p } ]] set . = assert ( p )[[ { ν : τ | κ [ y . . . y n /x . . . x n ] } ]] get . = ( t , . . . , t n ) ← κ ; assume ( y = t ); ... assume ( y n = t n ); ν ← t [[ { ν : τ | κ [ y . . . y n /x . . . x n ] } ]] set . = κ ← ( ν, y , . . . , y n ) Binding Translation [[ x : T ; G ]] . = [[ τ ]] get ; x ← ν ; [[ G ]][[ · ]] . = skip Constraint Translation [[ G ⊢ T < : T ]] . = [[ G ]]; [[ T ]] get ; [[ T ]] set Constraint Set Translation [[ { c , . . . , c n } ]] . = loop { [[ c ]][] . . . [][[ c n ]] } Figure 7. Translating Constraints to I MP Programs5.1 Translation
Figure 7 formalizes the translation from (a set of) refinement typeconstraints C to an I MP program [[ C ]] . We use the WF constraints totranslate each relation variable κ of arity n +1 into a correspondingtuple variable κ of arity n + 1 .The translation is syntax-driven. We translate each subtypingconstraint G ⊢ T < : T into a straight-line block of instructionswith three parts: a sequence of instructions that establishes theenvironment bindings ( [[ G ]] ), a sequence of instructions that “gets”the values corresponding to the LHS ( [[ T ]] get ) and a sequenceof instructions that “sets” the (LHS) values into the appropriateRHS ( [[ T ]] set ). The translation for a set of constraints is an infiniteloop that non-deterministically chooses among the blocks for eachconstraint.Each environment binding gets translated as a “get”. Bindingswith unknown refinements are translated into tuple-get operations,followed by assume statements that establish the equalities corre-sponding to the pending substitutions. Bindings with known refine-ments are translated into non-deterministic assignments followedby a assume that enforces that the refinement holds on the non-deterministic value.Each “set” operation to an unknown refinement is translatedinto a tuple-set instruction that writes the tuple corresponding tothe pending substitutions into the translated tuple variable. Finally,each “set” operation corresponding to a known refinement is trans-lated to an assert instruction; intuitively, in such constraints theRHS defines an upper bound on the set of values populating thetype, and the assert serves to enforce the upper bound require-ment in the translated program.The correctness of the procedure is stated by the followingtheorem.T HEOREM C is satisfiable iff [[ C ]] is Relational-safe . The proof of this theorem follows from the properties of thefollowing function α that maps a set ˆΣ ♯ ⊆ Σ ♯ of Relational-states ommon Operations Post ♯ ( E , I ) . = {E} Post ♯ ( s ♯ , I ; I ) . = Post ♯ ( Post ♯ ( s ♯ , I ) , I ) Post ♯ ( s ♯ , x ← e ) . = { s ♯ [ x s ♯ ( e )] } Post ♯ ( s ♯ , x ← nondet ()) . = { s ♯ [ x c ] | c ∈ V } Post ♯ ( s ♯ , assume ( p )) . = ( { s ♯ } if s ♯ ( p ) = true ∅ otherwise Post ♯ ( s ♯ , assert ( p )) . = ( { s ♯ } if s ♯ ( p ) = true {E} otherwise Tuple Operations: Relational Semantics
Post ♯ ( s ♯ , ( t , . . . , t n ) ← κ ) . = { s ♯ [ t v ] . . . [ t n v n ] | ( v , . . . , v n ) ∈ s ♯ ( κ ) } Post ♯ ( s ♯ , κ ← ( x , . . . , x n )) . = { s ♯ [ κ s ♯ ( κ ) ∪ { ( s ♯ ( x ) , . . . , s ♯ ( x n )) } ] } Tuple Operations: Imperative Semantics
Post ( s, ( t , . . . , t n ) ← κ ) . = ( { s [ t v ] . . . [ t n v n ] } if s ( κ ) = ( v , . . . , v n ) ∅ if s ( κ ) = ⊥ Post ( s, κ ← ( x , . . . , x n )) . = { s [ κ ( s ( x ) , . . . , s ( x n ))] } Figure 6. Relational and Imperative Semantics: Other cases of
Post identical to
Post ♯ to constraint solutions: α ( ˆΣ ♯ ) . = λκ. [ { s ♯ ( κ ) | s ♯ ∈ ˆΣ ♯ } The function α enjoys the following property, which can be provenby induction on the construction of Reach ♯ , that relates the satisfy-ing solutions of the constraints to the Relational-reachable states ofthe translated program. Theorem 1 follows from the following ob-servations. If S satisfies C then α ( Reach ♯ ([[ C ]]))( κ ) ⊆ S ( κ ) forall κ . If E 6∈
Reach ♯ ([[ C ]]) then α ( Reach ♯ ([[ C ]])) satisfies C . At this point, via Theorem 1, we have reduced checking satisfia-bility of type constraints to the problem of verifying assertions ofI MP programs under the (non-standard) Relational semantics. Un-fortunately, under these semantics, the program contains variables( κ ) which range over sets of tuples. This makes it inconvenient todirectly apply abstract-interpretation based techniques for imper-ative programs which typically assume the (standard) Imperativesemantics; each technique has to be painstakingly adapted to thenon-standard semantics.We would be home and dry if we could prove the equivalenceof the Relational and Imperative semantics; that is, if we couldshow that an I MP program was Relational-safe if and only if it wasImperative safe. Unfortunately, this is not true. Example.
Consider the I MP program: loop { ν ← nondet (); κ ← ( ν ) [] ( t ) ← κ ; ν ← t ; x ← ν ;( t ) ← κ ; ν ← t ; y ← ν ; assert ( x = y ) } This program is not
Relational-safe as the set-operation in the firstinstruction populates κ with the set of all integers, and the get-operation in the second instruction can assign different values tointeger values to x and y . However the program is Imperative-safeas whenever the second instruction executes, κ will be undefinedor contain some arbitrary integer that is assigned to both x and y ,which causes the assert to succeed.This example pinpoints exactly why the two semantics differ. Inthe Relational semantics, in any given loop iteration, different getson the same κ can return different tuples, while in the Imperativesemantics the gets are correlated and return the same tuple. Read-Write-Once Programs.
An I MP instruction is a read-write-once instruction if any relation variable κ is read from and writtento at most once in the instruction. That is, read-write-once meansat most one write and at most one read (and not at most one reador write). An I MP program is a read-write-once program if eachinstruction in its loop is a read-write-once instruction. We canshow that for Read-Write-Once I MP programs the Relational andImperative semantics are equivalent.T HEOREM If P is a read-write-once I MP program then P is Relational-safe iff P is Imperative-safe . To prove this theorem, we formalize the connection between thereachable states under the two different semantics, using the func-tion
Expand , which maps a Relational-state to a set of Imperativestates:
Expand ( s ♯ ) . = s | s ( x ) = s ♯ ( x ) for base variables s ( κ ) = h v i if h v i ∈ s ♯ ( κ ) s ( κ ) = ⊥ if s ♯ ( κ ) = ∅ s = E if s ♯ = E We lift the function to sets of Relational states in the natural way:
Expand ( ˆΣ ♯ ) . = [ { Expand ( s ♯ ) | s ♯ ∈ ˆΣ ♯ } Next, we can show that read-write-once instructions enjoy the fol-lowing property, by case splitting on the form of I .L EMMA [Step] If I is a read-write-once instruction then Expand ( Post ♯ ( s ♯ , I )) = Post ( Expand ( s ♯ ) , I ) . We use this property to show that the reachable states under thedifferent semantics are equivalent.L
EMMA If P = loop { I [] . . . [] I n } is a read-write-once pro-gram, then Expand ( Reach ♯ ( P )) = Reach ( P ) . P ROOF . To prove that
Reach ( P ) ⊆ Expand ( Reach ♯ ( P )) , we show ∀ m : Reach ( P , m ) ⊆ Expand ( Reach ♯ ( P )) by straightforward induction on m , noting that s ∈ Expand ( s ♯ ) ,and Post ( Expand ( s ♯ ) , I ) ⊆ Post ♯ ( s ♯ , I ) for any Relational-state s ♯ ∈ Σ ♯ , instruction I , and any program P (not necessarily read-write-once).o show inclusion in the other direction, we prove ∀ m : Expand ( Reach ♯ ( P , m )) ⊆ Reach ( P ) by induction on m . For the base case, Expand ( Reach ♯ ( P , Reach ( P , ⊆ Reach ( P ) by the definition of the initial states. By induction, assume that Expand ( Reach ♯ ( P , m )) ⊆ Reach ( P ) Let s ′ ∈ Expand ( Reach ♯ ( P , m + 1)) . By Lemma 1, either s ′ isalready in Reach ♯ ( P , m ) , in which case the inductive hypothesisapplies and hence s ′ ∈ Reach ( P ) , or s ′ ∈ Post ( Expand ( Reach ♯ ( P , m ) , I j ) for some j . That is, there is a s ∈ Expand ( Reach ♯ ( P , m ) such that s ′ ∈ Post ( s, I j ) . From the induction hypothesis s ∈ Reach ( P ) . As Reach ( P ) is closed under Post , we conclude s ′ ∈ Reach ( P ) . ✷ At this point, we have shown that the Imperative semantics of read-write-once programs are equivalent to the Relational semantics. Allthat remains is to show that the translation procedure of Figure 7produces read-write-once programs. Unfortunately, this is not true.
Example.
Consider the following constraints: ∅ ⊢ { κ } , ∅ ⊢ { true } < : { κ } , x : κ ; y : κ ⊢ { true } < : { x = y } It is easy to check that on the above constraints, the translationprocedure yields the I MP program from the previous example,which is not read-write-once.The reason the translated program is not a read-write-once pro-gram is that there can be constraints G ⊢ T < : T in which κ occurs in multiple places within G and T .To solve this problem, we can simply clone the κ variables thatoccur multiple times inside a constraint, and use different clones ateach occurrence! We formalize this as a procedure Clone that mapsa finite set of constraints to another finite set. The procedure worksas follows. For each κ that is read upto n times in some constraint,we make n clones, κ , . . . , κ n , and1. for the i th occurence of κ within any constraint, we use the i th clone κ i (instead of κ ), and,2. for each constraint where κ appears on the right hand side,we make n clones of the constraints where in the i th clonedconstraint, we use κ i (instead of κ ).The first step ensures that each κ is read-once in any constraint,and the second step ensures that the clones correspond to exactlythe same set of tuples as the original variable κ . We can prove that Clone enjoys the following properties.T
HEOREM Let C be a finite set of constraints.1. [[ Clone ( C )]] is a read-write-once program.2. Clone ( C ) is satisfiable iff C is satisfiable. It is easy to verify that [[ Clone ( C )]] is a read-write-once pro-gram. Furthermore, any satisfying solution for the original con-straints can be mapped directly to a solution for the cloned con-straints. To go in the other direction, we must map a solution thatsatisfies the cloned constraints to one that satisfies the original con-straints. This is trivial if the solution for the cloned constraintsmaps each clone κ i to the same set of tuples. We show that if thecloned constraints have a satisfying solution, they have a solutionthat satisfies the above property. To this end, we prove the follow-ing lemma that states that for any set of constraints, the satisfyingsolutions are closed under intersection. Program Time Invariant(sec) Refinement Types max 0.091 κ . ≤ κ . ∧ κ . ≤ κ . κ x . = true , κ y . = true , κ . = x ≤ v ∧ y ≤ v sum 0.071 ≤ κ . ∧ κ . ≤ κ . κ k . = true , κ . = 0 ≤ v ∧ k ≤ v foldn 0.060 ≤ κ i . ∧ ≤ κ . ∧ κ . < κ . κ i . = 0 ≤ v, κ . = 0 ≤ v ∧ v < n arraymax 0.135 ≤ κ . ∧ ≤ κ . ∧ ≤ κ . ∧ κ g . < len ( κ g . κ . = ≤ v, κ . = 0 ≤ v,κ . = 0 ≤ v, κ g . = v < len ( a ) mask 0.098 κ . < len ( κ . ∧ κ . ≤ κ . ∧ ≤ κ . ∧ κ . < len ( κ . κ v < len ( xs ) ∧ i ≤ v,κ . = 0 ≤ v ∧ v < len ( a ) samples 0.117 ≤ κ . ∧ κ . < len ( κ . ∧ ≤ κ . ∧ κ . < len ( κ . ∧ ≤ κ . κ . = 0 ≤ v ∧ v < len ( b ) ,κ . = 0 ≤ v ∧ v < len ( a ) , κ . = 0 ≤ v Table 1.
Experimental evaluation using a predicate abstraction-based verification tool on examples from [29]. The third columnpresents the invariant for the translated program, and the resultingrefinement types.L
EMMA If S and S are solutions that satisfy C then S ∩ S . = λκ.S ( κ ) ∩ S ( κ ) satisfies C . Thus if S satisfies the cloned constraints then by symmetry andLemma 3 the solution that maps each cloned variable to ∩ ni =1 S ( κ i ) also satisfies the cloned constraints, and hence, directly yields asolution to the original constraints.Finally, as a corollary of Theorems 1,2,3 we get our main resultthat reduces the question of refinement type constraint satisfaction,to that of safety verification.T HEOREM C is satisfiable iff [[ Clone ( C )]] is Imperative-safe. While we state Theorems 1 and 3 as preserving satisfiability, theproof shows how the solutions can be effectively mapped between C and [[ C ]] (or [[ Clone ( C )]] . In particular, while the intersectionof two non-trivial solutions can be a trivial solution, it would beguaranteed that in that case, the trivial solution satisfies C . Stated interms of invariants, Lemma 3 states the observation that that theremay be several non-comparable inductive invariants to prove asafety property, but in that case, the intersection of all the inductiveinvariants is also an inductive invariant.
6. Experiments
We have implemented a verification tool for O
CAML programsbased on RTI. We use the liquid types infrastructure implementedin D
SOLVE [29] to generate refinement type constraints fromO
CAML programs. We use ARMC [28], a software model checkerusing predicate abstraction and interpolation-based refinement, asthe verifier for the translated imperative program.Table 1 shows the results of running our tool on a suite of smallO
CAML examples from [29]. For array manipulating programs, thesafety objective is to prove array accesses are within bounds. For
MAX we prove that the output is larger than input values. For
SUM we prove that the sum is larger than the largest summation term.Table 2 presents the running time of our tool on the benchmarkprograms for the Depcegar verifier [31]. We observe that despite ofour blackbox treatment of ARMC as a constraint solver we obtaincompetitive running times compared to Depcegar on most of theexamples (Depcegar uses a customized procedure for unfolding rogram Time boolflip.ml 2.17s 7 21sum.ml 0.24s 5 14sum-acm.ml 0.11s 1 3sum-all.ml 3.51s 10 26mult.ml 4.67s 10 25mult-cps.ml 780.24s 11 27mult-all.ml 18.44s 9 24boolflip-e.ml 0.65ssum-e.ml 0.01ssum-acm-e.ml 0.02ssum-all-e.ml 0.79smult-e.ml 0.01smult-cps-e.ml 7.69smult-all-e.ml 144.93s
Table 2.
Experimental evaluation of our tool on Depcegar bench-marks [31]. The third column presents the number of abstractionrefinment iterations required by ARMC. The last column gives thenumber of predicates discovered by ARMC. For the programs withsuffix “-e”, which are incorrect, we omit the number of iterationsand predicates and only show the time required by ARMC to finda counterexample.constraints and creating interpolation queries that yield refinementtypes).Most of the predicates discovered by the interpolation-basedabstraction refinement procedure implemented in ARMC fall intothe fragment “two variables per inequality.” The example
MASK required a predicate that refers to three variables, see κ . While ourinitial experiments used a CEGAR-based tool, we expect optimizedabstract interpreters for numerical domains to also work well forthis class of properties.
7. Extensions and Related Work
The soundness of safety verification for higher-order programs forany domain follows from the soundness of constraint generation( e.g.,
Theorem 1 in [29]) and Theorem 4. Since the safety verifi-cation problem for higher-order programs is undecidable, the tech-nique cannot be complete in general. Even in the finite-state case,in which each base type has a finite domain ( e.g., booleans), com-pleteness depends on the generation of type constraints. For exam-ple, in our examples and in our implementation, we have assumed a context insensitive constraint generation from program syntax, i.e., we have not distinguished the types of the same function at differ-ent call points. This entails a loss of information, as the followingexample demonstrates. Consider let check f x y = assert (f x = y) incheck (fun a -> a) false false ;check (fun a -> not a) false true where the builtin function assert has the type { ν : bool | ν } → unit . The refinement template for check generated by our con-straint generation process is ( x : { ν : bool | κ } → { κ } ) → { κ } → { κ } → unit which is too weak to show that the program is safe. This is becausethe template “merges” the two call sites for check .One way to get context sensitivity is through intersection types [12, 14, 20, 25]. For the above example, we can show type safetyusing the following refined type for check : V ( x : bool → { ν = x } ) → {¬ ν } → {¬ ν } → unit ( x : bool → { ν = ¬ x } ) → {¬ ν } → { ν } → unit It is important to note that Theorems 1 and 2 hold for any setof constraints. Thus, one way to get completeness in the finitestate case is to generate refinement templates using intersectiontypes, perform the translation to I MP programs, and then using acomplete invariant generation technique for finite state systems.The key observation (made in [20]) that ensures a finite numberof constraints, is that there is at most a finite number of “contexts”in the finte state case, and hence a finite number of terms in theintersection types. The bad news is that the bound on the numberof contexts is exp n ( k ) , where n is the highest order of any functionin the program, k is the maximum arity of any function in theprogram, and exp n ( k ) is a stack of n exponentials, defined by exp ( k ) = k , and exp n +1 ( k ) = 2 exp n ( k ) .Fully context-sensitive constraints are used in [20] to show com-pleteness in the finite case, at the price of exp n ( k ) in every case ,not just the worst case. In our exposition and our implementation,we have traded off precision for scalability: while we lose pre-cision by generating context-insensitive constraints, we avoid the exp n blow-up that comes with full context sensitivity. However, ithas been shown through practical benchmarks that since the typesthemselves capture relations between the inputs and outputs, thecontext-insensitive constraint generation suffices to prove a varietyof complex programs safe [3, 18, 29].When considering completeness properties in special cases, wepoint out completeness wrt. the discovery of refinement predicatesin octagons/difference bounds abstract domains [24] and template-based invariant generation for linear arithmetic [7] and extensionswith uninterpreted function symbols [5], which carries over fromrespective verification approaches. Kobayashi [20, 21] gives an algorithmfor model checking arbitrary µ -calculus properties of finite-dataprograms with higher order functions by a reduction to modelchecking for higher-order recursion schemes (HORS) [26]. Forsafety verification, RTI shows a promising alternative.First, the reduction to HORS critically depends on a finite-stateabstraction of the data. In contrast, our reduction defers the data ab-straction to the abstract interpreter working on the imperative pro-gram, thus enabling the direct application of abstract interpretersworking over infinite domains. Since abstract interpreters over infi-nite abstract domains are strictly more powerful than (infinite fam-ilies of) finite ones [8], our approach can be strictly more powerfulfor infinite-state programs.Second, in the translation of an abstracted program to a HORS,this algorithm eliminates Boolean variables by enumerating allpossible assignments to them, giving an exponential blow-up fromthe program to the HORS. In contrast, our technique preserves theBoolean state symbolically , enabling the use of efficient symbolicalgorithms for verification. For example, for the simple example: let f b1 ... bn x =if (b1 || ... || bn) then lock x;if (b1 || ... || bn) then unlock xin let f (*) ... (*) (newlock ()) where we wish to prove that lock and unlock alternate. Kobayashi’stranslation [20] gives an exponential sized HORS, with a version of f for each assignment to b1,...,bn . In contrast, our reduction pre-serves the source-level expressions and is linear, and amenable tosymbolic verification techniques (e.g., BDDs). Previous experiencewith software model checking [2, 16, 17] shows that the number ofreachable states is often drastically smaller than p where p is thenumber of Booleans. Thus, the pre-processing step that enumeratesBooleans may not lead to a scalable implementation.ight [23] describes logic-flow analysis , a general safety verifi-cation algorithm for higher-order languages, which is the product ofa k -CFA like call-strings analysis and a form of SMT-based pred-icate abstraction (together with widening). In contrast, our workshows how higher-order languages can be analyzed directly via ab-stract analyses designed for first-order imperative languages.Inference of refinement types using conterexample-guided tech-niques was recentrly identified as a promising direction [31, 32]. Incontrast, our approach is not limited to CEGAR and facilitates theapplicability of a wide range abstract interpretation techniques forprecise reasoning about program data. Software Verification.
This work was motivated by the recent suc-cess in software model checking for first-order imperative pro-grams [2, 6, 16, 22], and the desire to apply similar techniques tomodern programming languages with higher order functions. Ourstarting point was refinement types [14, 19], implemented in de-pendent ML [33] to give strong static guarantees, and the work onliquid types [18, 29] that applied predicate abstraction to infer re-finement types. By enabling the application of automatic invariantgeneration from software model checking, RTI reduces the needfor programmer annotations in refinement type systems.
References [1] M. Abadi, L. Cardelli, P.-L. Curien, and J.-J. L´evy. Explicitsubstitutions.
Journal of Functional Programming , 1:375–416, 1991.[2] T. Ball and S.K. Rajamani. The SLAM project: debugging systemsoftware via static analysis. In
POPL , pages 1–3. ACM, 2002.[3] J. Bengtson, K. Bhargavan, C. Fournet, A.D. Gordon, and S. Maffeis.Refinement types for secure implementations. In
CSF , 2008.[4] D. Beyer, T. Henzinger, R. Majumdar, and A. Rybalchenko. Invariantsynthesis in combination theories. In
VMCAI , 2007.[5] Dirk Beyer, Thomas A. Henzinger, Rupak Majumdar, and AndreyRybalchenko. Invariant synthesis for combined theories. In
VMCAI .Springer, 2007.[6] B. Blanchet, P. Cousot, R. Cousot, J. Feret, L. Mauborgne, A. Min´e,D. Monniaux, and X. Rival. A static analyzer for large safety-criticalsoftware. In
PLDI , pages 196–207, 2003.[7] Michael Col´on, Sriram Sankaranarayanan, and Henny Sipma. Linearinvariant generation using non-linear constraint solving. In
CAV .Springer, 2003.[8] P. Cousot and R. Cousot. Comparing the Galois connection andwidening/narrowing approaches to abstract interpretation. In
PLILP ,LNCS 631, pages 269–295. Springer-Verlag, 1992.[9] P. Cousot and N. Halbwachs. Automatic discovery of linear restraintsamong variables of a program. In
POPL . ACM Press, 1978.[10] S. Cui, K. Donnelly, and H. Xi. Ats: A language that combinesprogramming with theorem proving. In
FroCos , 2005.[11] L. Damas and R. Milner. Principal type-schemes for functionalprograms. In
POPL , 1982.[12] J. Dunfield.
A Unified System of Type Refinements . PhD thesis,Carnegie Mellon University, Pittsburgh, PA, USA, 2007.[13] C. Flanagan. Hybrid type checking. In
POPL . ACM, 2006.[14] T. Freeman and F. Pfenning. Refinement types for ML. In
PLDI ,1991.[15] S. Gulwani and G.C. Necula. Discovering affine equalities usingrandom interpretation. In
POPL , pages 74–84, 2003.[16] T.A. Henzinger, R. Jhala, R. Majumdar, and K.L. McMillan.Abstractions from proofs. In
POPL 04 . ACM, 2004.[17] H. Jain, F. Ivancic, A. Gupta, I. Shlyakhter, and C. Wang. Usingstatically computed invariants inside the predicate abstraction andrefinement loop. In
CAV , pages 137–151, 2006.[18] M. Kawaguchi, P. Rondon, , and R. Jhala. Type-based data structureverification. In
PLDI , pages 304–315, 2009.[19] K. Knowles and C. Flanagan. Type reconstruction for generalrefinement types. In
ESOP , 2007. [20] N. Kobayashi. Types and higher-order recursion schemes forverification of higher-order programs. In
POPL , 2009.[21] N. Kobayashi and C.-H.L. Ong. A type system equivalent to modal µ -calculus model checking of recursion schemes. In LICS , 2009.[22] K. L. McMillan. Lazy abstraction with interpolants. In
CAV . 2006.[23] Matthew Might. Logic-flow analysis of higher-order programs. In
POPL , pages 185–198, 2007.[24] A. Min´e. The octagon abstract domain.
Higher-Order and SymbolicComputation , 19(1):31–100, 2006.[25] M. Naik and J. Palsberg. A type system equivalent to a model checker.
ACM Trans. Program. Lang. Syst. , 30(5), 2008.[26] C.-H.L. Ong. On model-checking trees generated by higher-orderrecursion schemes. In
LICS , 2006.[27] X. Ou, G. Tan, Y. Mandelbaum, and D. Walker. Dynamic typing withdependent types. In
IFIP TCS , pages 437–450, 2004.[28] A. Podelski and A. Rybalchenko. ARMC: The logical choice forsoftware model checking with abstraction refinement. In
PADL ,2007.[29] P. Rondon, M. Kawaguchi, and R. Jhala. Liquid types. In
PLDI ,2008.[30] S. Sankaranarayanan, H.B. Sipma, and Z. Manna. Scalable analysisof linear systems using mathematical programming. In
VMCAI , 2005.[31] Tachio Terauchi. Dependent types from counterexamples. In
POPL .ACM, 2010.[32] Hiroshi Unno and Naoki Kobayashi. Dependent type inference withinterpolants. In
PPDP . ACM, 2009.[33] H. Xi and F. Pfenning. Dependent types in practical programming. In