Soundness and Completeness of the NRB Verification Logic
aa r X i v : . [ c s . L O ] A ug Soundness and Completeness of the NRB Verification Logic
Peter T. Breuer and Simon J. Pickin Department of Computer Science, University of Birmingham, UK [email protected] Facultad de Inform´atica, Universidad Complutense de Madrid [email protected]
Abstract.
This short paper gives a model for and a proof of completeness of theNRB verification logic for deterministic imperative programs, the logic havingbeen used in the past as the basis for automated semantic checks of large, fast-changing, open source C code archives, such as that of the Linux kernel source.The model is a coloured state transitions model that approximates from above theset of transitions possible for a program. Correspondingly, the logic catches alltraces that may trigger a particular defect at a given point in the program, but mayalso flag false positives.
NRB program logic was first introduced in 2004 [5] as the theory supporting an auto-mated semantic analysis suite [4] targeting the C code of the Linux kernel. The analysesperformed with this kind of program logic and automatic tools are typically much moreapproximate than that provided by more interactive or heavyweight techniques such astheorem-proving and model-checking [10], respectively, but the NRB combination hasproved capable of rapidly scanning millions of lines of C code and detecting deadlocksscattered at one per million lines of code [9]. A rough synopsis of the characteristics ofthe logic or an approach using the logic is that it is precise in terms of accurately follow-ing the often complex flow of control and sequence of events in an imperative language,but not very accurate at following data values. That is fine for a target language like C[1, 13], where static analysis cannot reasonably hope to follow all data values accu-rately because of the profligate use of indirection through pointers in a typical program(a pointer may access any part of memory, in principle, hence writing through a pointermight ‘magically’ change any value) and the NRB logic was designed to work aroundthat problem by focussing instead on information derived from sequences of events.NRB is a logic with modal operators. The modalities do not denote a full range ofactions as in Dynamic Logic [12], but rather only the very particular action of the finalexit from a code fragment being via a return , break , or goto . The logic is also config-urable in detail to support the code abstractions that are of interest in different analyses;detecting the freeing of a record in memory while it may still be referenced requiresan abstraction that counts the possible reference holders, for example, not the valuecurrently in the second field from the right. The technique became known as ‘symbolicapproximation’ [6, 7] because of the foundation in symbolic logic and because the anal-ysis is guaranteed to be on the alarmist side (‘approximate from above’); the analysisoes not miss bugs in code, but does report false positives. In spite of a few years’ pedi-gree behind it now, a foundational semantics for the logic has only just been published[8] (as an Appendix to the main text), and this article aims to provide a yet simplersemantics for the logic and also a completeness result, with the aim of consolidatingthe technique’s bona fides.Interestingly, the formal guarantee (‘never miss, over-report’) provided by NRB andthe symbolic approximation technique is said not to be desirable in the commercial con-text by the very practical authors of the Coverity analysis tool [11, 3], which also hasbeen used for static analysis of the Linux kernel and many very large C code projects.Allegedly, in the commercial arena, understandability of reports is crucial, not the guar-antee that no bugs will be missed. The Coverity authors say that commercial clientstend to dismiss any reports that they do not understand, turning a deaf ear to expla-nations. However, the reports produced by our tools have always been filtered beforepresentation, so only the alarms that cannot be dismissed as false positives are seen.The layout of this paper is as follows. In Section 2 a model of programs as setsof ‘coloured’ transitions between states is introduced, and the constructs of a genericimperative language are expressed in those terms. It is shown that the constructs obeycertain algebraic laws, which soundly implement the established deduction rules ofNRB logic. Section 3 shows that the logic is complete for deterministic programs, inthat anything that is true in the model introduced in Section 2 can be proved using theformal rules of the NRB logic.Since the model contains at least as many state transitions as occur in reality, ‘sound-ness’ of the NRB logic means that it may construct false alarms for when a particularcondition may be breached at some particular point in a program, but that it may notmiss any real alarms. ‘Completeness’ means that the logic flags no more false alarmsthan are already to be predicted from the model, so if the model says that there oughtto be no alarms at all (which means that there really are no alarms), then the logic canprove that. Thus, reasoning symbolically is not in principle an approximation here; itis not necessary to laboriously construct and examine the complete graph of modelledstate transitions in order to be able to give a program a ‘clean bill of health’ with refer-ence to some potential defect, because the logic can always do the job as well. This section sets out a semantic model for the full NRBG(E) logic (‘NRB’ for short)shown in Table 1. The ‘NRBG’ part stands for ‘normal, return, break, goto’, and the ‘E’part treats exceptions (catch/throw in Java, setjmp/longjmp in C), aiming at a completetreatment of classical imperative languages. This semantics simplifies a trace model presented in the Appendix to [8], substituting traces there for state transitions here.A natural model of a program is as a relation of type P ( S × S ) , expressing possiblechanges in a state of type S as a set of pairs of initial and final states. We shall add a colour to this picture. The ‘colour’ shows if the program has run normally through to theend (colour ‘ N ’) or has terminated early via a return (colour ‘ R ’), break (colour ‘ B ’), goto (colour ‘ G l ’ for some label l ) or an exception (colour ‘ E k ’ for some exception kind k ). The aim is to document precisely the control flow in the program. In this picture, a2 able 1: NRB deduction rules for triples of assertions and programs. Unless explicitly noted,assumptions G l p l at left are passed down unaltered from top to bottom of each rule. We let E stand for any of R , B , G l , E k ; E any of R , G l , E k ; E any of R . G l ′ for l ′ = l , E k ; E any of R . G l , E k ′ for k ′ = k ; [ h ] the body of the subroutine named h . ⊲ { p } P { N q ∨E x } ⊲ { q } Q { N r ∨E x } ⊲ { p } P ; Q { N r ∨E x } [seq] ⊲ { p } P { B q ∨ N p ∨E x } ⊲ { p } do P { N q ∨E x } [do] ⊲ { p } skip { N p } [skp] ⊲ { p } return { R p } [ret] ⊲ { p } break { B p } [brk] [ p → p l ] G l p l ⊲ { p } goto l { G l p } [go] ⊲ { p } throw k { E k p } [throw] ⊲ { q [ e/x ] } x = e { N q } [let] ⊲ { q ∧ p } P { r } ⊲ { p } q → P { r } [grd] ⊲ { p } P { q } ⊲ { p } Q { q } ⊲ { p } P p Q { q } [dsj] [ N p l → q ] G l p l ⊲ { p } P { q } G l p l ⊲ { p } P : l { q } [frm] G l p l ⊲ { p } P { G l p l ∨ N q ∨E x } ⊲ { p } label l.P { N q ∨E x } [lbl] ⊲ { p } [ h ] { R r ∨ E k x k } G l p l ⊲ { p } call h { N r ∨ E k x k } [sub] ⊲ { p } P { N r ∨ E k q ∨E x } ⊲ { q } Q { N r ∨ E k x k ∨E x } ⊲ { p } try P catch ( k ) Q { N r ∨ E k x k ∨E x } [try] ⊲ { p i } P { q } ⊲ {∨∨ p i } P { q } ⊲ { p } P { q i } ⊲ { p } P {∧∧ q i } G l p li ⊲ { p } P { q }∨∨ G l p li ⊲ { p } P { q } [ p ′ → p, q → q ′ , p ′ l → p l | G l q ′ → G l p ′ l ] G l p l ⊲ { p } P { q } G l p ′ l ⊲ { p ′ } P { q ′ } deterministic program may be modelled as a set of ‘coloured’ transitions of type P ( S × ⋆ × S ) where the colours ⋆ are a disjoint union ⋆ = { N } ⊔ { R } ⊔ { B } ⊔ { G l | l ∈ L } ⊔ { E k | k ∈ K } and L is the set of possible goto labels and K the set of possible exception kinds.The programs we consider are in fact deterministic, but we will use the generalsetting. Where the relation is not defined on some initial state s , we understand thatthe initial state s leads to the program getting hung up in an infinite loop, instead ofterminating. Relations representing deterministic programs thus have a set of imagesfor any given initial state that is either of size zero (‘hangs’) or one (‘terminates’). Onlypaths through the program that do not ‘hang’ in an infinite loop are of interest to us, andwhat the NRB logic will say about a program at some point will be true only supposingcontrol reaches that point, which it may never do.Programs are put together in sequence with the second program accepting as inputsonly the states that the first program ends ‘normally’ with. Otherwise the state withwhich the first program exited abnormally is the final outcome. That is, J P ; Q K = { s ι s ∈ J P K | ι = N }∪ { s ι s | s ι s ∈ J Q K , s N s ∈ J P K } skip statement is modelled as J skip K g = { s N s | s ∈ S } It makes the transition from a state to thesame state again, and ends ‘normally’. A return statement has the model J return K g = { s R s | s ∈ S } It exits at once ‘via a return flow’ after a sin-gle, trivial transition.The model of skip ; return is J skip ; return K g = { s R s | s ∈ S } which is the same as that of return . It is madeup of the compound of two trivial state tran-sitions, s N s from skip and s R s from re-turn , the latter ending in a ‘return flow’. The return ; skip compound is modelledas: J return ; skip K g = { s R s | s ∈ S } It is made up of of just the s R s transi-tions from return . There is no transition thatcan be formed as the composition of a tran-sition from return followed by a transitionfrom skip , because none of the first end ‘nor-mally’.Table 2: Models of simple statements. This statement is not complete, however, because abnormal exits with a goto from P may still re-enter in Q if the goto label is in Q , and proceed. We postpone considera-tion of this eventuality by predicating the model with the sets of states g l hypothesised as being fed in at the label l in the code. The model of P and Q with these sets asassumptions produce outputs that take account of these putative extra inputs at label l : J P ; Q K g = { s ι s ∈ J P K g | ι = N }∪ { s ι s | s ι s ∈ J Q K g , s N s ∈ J P K g } Later, we will tie things up by ensuring that the set of states bound to early exits via a goto l in P are exactly the sets g l hypothesised here as entries at label l in Q (and viceversa). The type of the interpretation expressed by the fancy square brackets is J − K − : C → ( L P S ) → P ( S × ⋆ × S ) where g , the second argument/suffix, has the partial function type L P S and thefirst argument/bracket interior has type C , denoting a simple language of imperativestatements whose grammar is set out in Table 3. The models of some of its very basicstatements as members of P ( S × ⋆ × S ) are shown in Table 2 and we will discuss themand the interpretations of other language constructs below.A real imperative programming language such as C can be mapped onto C – inprinciple exactly, but in practice rather approximately with respect to data values, aswill be indicated below. A conventional if ( b ) P else Q statement in C is written as thenondeterministic choice between two guarded statements b → P p ¬ b → Q in the abstractlanguage C ; the conventional while ( b ) P loop in C is expressed as do {¬ b → break p b → P } , using the forever-loop of C , etc. A sequence P ; l : Q in C with a label l in the4 able 3: Grammar of the abstract imperative language C , where integer variables x ∈ X , termexpressions e ∈ E , boolean expressions b ∈ B , labels l ∈ L , exceptions k ∈ K , statements c ∈ C , integer constants n ∈ Z , infix binary relations r ∈ R , subroutine names h ∈ H . Notethat labels (the targets of goto s) are declared with ‘ label ’ and a label cannot be the first thingin a code sequence; it must follow some statement. Instead of if , C has guarded statements, andexplicit nondeterminism, which, however, is only to be used here in the deterministic construct b → P p ¬ b → Q for code fragments P , Q . C :: = skip | return | break | goto l | c ; c | x = e | b → c | c p c | do c | c : l | label l.c | call h | try c catch ( k ) c | throw k E :: = n | x | n ∗ e | e + e | b ? e : e B :: = ⊤ | ⊥ | e r e | b ∨ b | b ∧ b | ¬ b | ∃ x.bR :: = < | > | ≤ | ≥ | = | 6 = middle should strictly be expressed as P : l ; Q in C , but we regard P ; l : Q as syntacticsugar for that, so it is still permissible to write P ; l : Q in C . As a very special syntacticsweetener, we permit l : Q too, even when there is no preceding statement P , regardingit as an abbreviation for skip : l ; Q .Curly brackets may be used to group code statements for clarity in C , and paren-theses may be used to group expressions. The variables are globals and are not formallydeclared. The terms of C are piecewise linear integer forms in integer variables, so theboolean expressions are piecewise comparisons between linear forms. Example 1.
A valid integer term is ‘
5x + 4y + 3 ’, and a boolean expression is ‘
5x +4y + 3 < z − ∧ y ≤ x ’.In consequence another valid integer term, taking the value of the first on the rangedefined by the second, and 0 otherwise, is ‘ (5x+4y+3 < z − ∧ y ≤ x) ? 5x+4y+3 : 0 ’.The limited set of terms in C makes it practically impossible to map standard imper-ative language assignments as simple as ‘ x = x ∗ y ’ or ‘ x = x | y ’ (the bitwise or)succinctly. In principle, those could be expressed exactly point by point using condi-tional expressions (with at most disjuncts), but it is usual to model all those casesby means of an abstraction away from the values taken to attributes that can be repre-sented more elegantly using piecewise linear terms The abstraction may be to how manytimes the variable has been read since last written, for example, which maps ‘ x = x ∗ y ’to ‘ x = x + 1; y = y + 1; x = 0 ’.Formally, terms have a conventional evaluation as integers and booleans that isshown (for completeness!) in Table 4. The reader may note the notation s x for theevaluation of the variable named x in state s , giving its integer value as result. We saythat state s satisfies boolean term b ∈ B , written s | = b , whenever J b K s holds.The label construct of C declares a label l ∈ L that may subsequently be used asthe target in goto s. The component P of the construct is the body of code in which thelabel is in scope . A label may not be mentioned except in the scope of its declaration.The same label may not be declared again in the scope of the first declaration. Thesemantics of labels and goto s will be further explained below.5 able 4: The conventional evaluation of integer and boolean terms of C , for variables x ∈ X ,integer constants κ ∈ Z , using s x for the (integer) value of the variable named x in a state s . Theform b [ n/x ] means ‘expression b with integer n substituted for all unbound occurrences of x ’. J − K : E → S → Z J x K s = s x J κ K s = κ J κ ∗ e K s = κ ∗ J e K s J e + e K s = J e K s + J e K s J b ? e : e K s = if J b K s then J e K s else J e K s J − K : B → S → bool J ⊤ K s = ⊤ J ⊥ K s = ⊥ J e < e K s = J e K s < J e K s J b ∨ b K s = J b K s ∨ J b K s J b ∧ b K s = J b K s ∧ J b K s J ¬ b K s = ¬ ( J b K s ) J ∃ x.b K s = ∃ n ∈ Z . J b [ n/x ] K s The only way of exiting the C do loop construct normally is via break in the body P of the loop. An abnormal exit other than break from the body P terminates thewhole loop abnormally. Terminating the body P normally evokes one more turn roundthe loop. So conventional while and for loops need to be mapped to a do loop with aguarded break statement inside, at the head of the body. The precise models for thisand every construct of C as a set of coloured transitions are enumerated in Table 5.Among the list of models in Table 5, that of label declarations in particular requiresexplanation because labels are more explicitly controlled in C than in standard imper-ative languages. Declaring a label l makes it invisible from the outside of the block(while enabling it to be used inside), working just the same way as a local variable dec-laration does in a standard imperative programming language. A declaration removesfrom the model of a labelled statement the dependence on the hypothetical set g l of thestates attained at goto l statements. All the instances of goto l statements are inside theblock with the declaration at its head, so we can take a look to see what totality of statesreally do accrue at goto l statements; they are recognisable in the model because theyare the outcomes of the transitions that are marked with G l . Equating the set of suchstates with the hypothesis g l gives the (least) fixpoint g ∗ l required in the label l model.The hypothetical sets g l of states that obtain at goto l statements are used at thepoint where the label l appears within the scope of the declaration. We say that any ofthe states in g l may be an outcome of passing through the label l , because it may havebeen brought in by a goto l statement. That is an overestimate; in reality, if the state justbefore the label is s , then at most those states s in g l that are reachable at a goto l from an initial program state s that also leads to s (either s first or s first) mayobtain after the label l , and that may be considerably fewer s than we calculate in g ∗ l .Here is a visualisation of such a situation; the curly arrows denote a trace: { s } l : { s , s } { s } { s } goto l If the initial precondition on the code admits more than one initial state s then themodel may admit more states s after the label l than occur in reality when s precedes l , because the model does not take into account the dependence of s on s through s . It is enough for the model that s proceeds from some s and s proceeds from6 able 5: Model of programs of language C , given as hypothesis the sets of states g l for l ∈ L observable at goto l statements. A recursive reference means ‘the least set satisfying thecondition’. For h ∈ H , the subroutine named h has code [ h ] . The state s altered by the assignmentof n to variable x is written s [ x n ] . J − K g : C → P ( S × ⋆ × S ) J skip K g = { s N s | s ∈ S } J return K g s = { s R s | s ∈ S } J break K g = { s B s | s ∈ S } J goto l K g = { s G l s | s ∈ S } J throw k K g = { s E k s | s ∈ S } J P ; Q K g = { s ι s ∈ J P K g | ι = N }∪ { s ι s | s ι s ∈ J Q K g , s N s ∈ J P K g } J x = e K g s = { s N s [ x J e K s ] } | s ∈ S } J p → P K g = { s ι s ∈ J P K g | J p K s } J P p Q K g = J P K g ∪ J Q K g J do P K g = { s N s | s B s ∈ J P K g }∪ { s ι s ∈ J P K g | ι = N , B }∪ { s ι s | s ι s ∈ J do P K g , s ι s ∈ J P K g } J P : l K g = J P K g ∪ { s N s | s ∈ S, s ∈ g l } J label l P K g = J P K g ∪{ l g ∗ l } − g ∗ l where g ∗ l = { s | s G l s ∈ J P K g ∪{ l g ∗ l } } J call h K g = { s N s | s R s ∈ J [ h ] K { } }∪ { s E k s ∈ J [ h ] K { } | k ∈ K } J try P catch ( k ) Q K g = { s ι s ∈ J P K g | ι = E k }∪ { s ι s | s ι s ∈ J Q K g , s E k s ∈ J P K g } some (possibly different) s satisfying the same initial condition. In mitigation, goto sare sparsely distributed in real codes and we have not found the effect pejorative. Example 2.
Consider the code R and suppose the input is restricted to a unique state s : label A, B. P z }| { skip ; goto A ; B : return ; A | {z } Q : goto B with labels A , B in scope in body P , and the marked fragment Q . The single transitionsmade in the code P and the corresponding statement sequences are: s N s G A s skip ; goto A ; s N s N s G B s skip ; goto A ; A : goto Bs N s N s N s R s skip ; goto A ; A : goto B ; B : return able 6: Extending the language B of propositions to modal operators N , R , B , G l , E k for l ∈ L , k ∈ K . An evaluation on transitions is given for b ∈ B , b ∗ ∈ B ∗ . B ∗ :: – b | N b ∗ | R b ∗ | B b ∗ | G l b ∗ | E k b ∗ | b ∗ ∨ b ∗ | b ∗ ∧ b ∗ | ¬ b ∗ J b K ( s ι s ) = J b K s J N b ∗ K ( s ι s ) = ( ι = N ) ∧ J b ∗ K ( s ι s ) J R b ∗ K ( s ι s ) = ( ι = R ) ∧ J b ∗ K ( s ι s ) J B b ∗ K ( s ι s ) = ( ι = B ) ∧ J b ∗ K ( s ι s ) J G l b ∗ K ( s ι s ) = ( ι = G l ) ∧ J b ∗ K ( s ι s ) J E k b ∗ K ( s ι s ) = ( ι = E k ) ∧ J b ∗ K ( s ι s ) with observed states g A = { s } , g B = { s } at the labels A and B respectively.The goto B statement is not in the fragment Q so there is no way of knowing aboutthe set of states at goto B while examining Q . Without that input, the traces of Q are s N s G A s skip ; goto As N s N s skip ; goto A ; A : There are no possible entries at B originating from within Q itself. That is, the model J Q K g of Q as a set of transitions assuming g B = { } , meaning there are no entries fromoutside, is J Q K g = { s N s, s G A s } .When we hypothesise g B = { s } for Q , then Q has more traces: s N s N s N s R s skip ; goto A ; A : goto B ; B : return corresponding to these entries at B from the rest of the code proceeding to the return in Q , and J Q K g = { s N s, s G A s, s R s } . In the context of the whole code P , that isthe model for Q as a set of initial to final state transitions. Example 3.
Staying with the code of Example 2, the set { s G A s, s G B s, s R s } is themodel J P K g of P starting at state s with assumptions g A , g B of Example 2, and thesets g A , g B are observed at the labels A , B in the code under these assumptions. Thus { A g A , B g B } is the fixpoint g ∗ of the label declaration rule in Table 5.That rule says to next remove transitions ending at goto A s and B s from visibilityin the model of the declaration block, because they can go nowhere else, leaving only J R K { } = { s R s } as the set-of-transitions model of the whole block of code, whichcorresponds to the sequence skip ; goto A ; A : goto B ; B : return .We extend the propositional language to B ∗ which includes the modal operators N , R , B , G l , E k for l ∈ L , k ∈ K , as shown in Table 6, which defines a model of B ∗ on transitions. The predicate N p informally should be read as picking out from theset of all coloured state transitions ‘those normal-coloured transitions that produce astate satisfying p ’, and similarly for the other operators. The modal operators satisfy thealgebraic laws given in Table 7. Additionally, however, for non-modal p ∈ B , p = N p ∨ R p ∨ B p ∨ ∨∨ G l p ∨∨ E k p (1)8 able 7: Laws of the modal operators N , R , B , G l , E k with M, M , M ∈ { N , R , B , G l , E k | l ∈ L, k ∈ K } and M = M . M ( ⊥ ) = ⊥ (flatness) M ( b ∨ b ) = M ( b ) ∨ M ( b ) (disjunctivity) M ( b ∧ b ) = M ( b ) ∧ M ( b ) (conjunctivity) M ( Mb ) = Mb (idempotence) M ( M b ) = M ( b ) ∧ M ( b ) = ⊥ (orthogonality) because each transition must be some colour, and those are all the colours. The decom-position works in the general case too: Proposition 1.
Every p ∈ B ∗ can be (uniquely) expressed as p = N p N ∨ R p R ∨ B p B ∨ ∨∨ G l p G l ∨∨ E k p E k for some p N , p R , etc that are free of modal operators.Proof. Equation (1) gives the result for p ∈ B . The rest is by structural induction on p , using Table 7 and boolean algebra. Uniqueness follows because N p N = N p ′ N , forexample, applying N to two possible decompositions, and applying the orthogonalityand idempotence laws; apply the definition of N in the model in Table 6 to deduce p N = p ′ N for non-modal predicates p N , p ′ N . Similarly for B , R , G l , E k . So modal formulae p ∈ B ∗ may be viewed as tuples ( p N , p R , p B , p G l , p E k ) of non-modal formulae from B for labels l ∈ L , exception kinds k ∈ K . That means that N p ∨ R q , for example, is simply a convenient notation for writing down two assertionsat once: one that asserts p of the final states of the transitions that end ‘normally’, andone that asserts q on the final states of the transitions that end in a ‘return flow’. Themeaning of N p ∨ R q is the union of the set of the normal transitions with final statethat satisfy p plus the set of the transitions that end in a ‘return flow’ and whose finalstates satisfy q . We can now give meaning to a notation that looks like (and is intendedto signify) a Hoare triple with an explicit context of certain ‘ goto assumptions’: Definition 1.
Let g l = J p l K be the set of states satisfying p l ∈ B , labels l ∈ L . Then‘ G l p l ⊲ { p } a { q } ’, for non-modal p, p l ∈ B , P ∈ C and q ∈ B ∗ , means: J G l p l ⊲ { p } P { q } K = J { p } P { q } K g = ∀ s ι s ∈ J P K g . J p K s ⇒ J q K ( s ι s ) That is read as ‘the triple { p } P { q } holds under assumptions p l at goto l when everytransition of P that starts at a state satisfying p also satisfies q ’. The explicit Gentzen-style assumptions p l are free of modal operators. What is meant by the notation is that9hose states that may be attainable as the program traces pass through goto statementsare assumed to be restricted to those that satisfy p l .The G l p l assumptions may be separated by commas, as G l p l , G l p l , . . . , with l = l , etc. Or they may be written as a disjunction G l p l ∨ G l p l ∨ . . . becausethe information in this modal formula is only the mapping l p l , l p l , etc. Ifthe same l appears twice among the disjuncts G l p l , then we understand that the unionof the two p l is intended.Now we can prove the validity of laws about triples drawn from what Definition 1says. The first laws are strengthening and weakening results on pre- and postconditions: Proposition 2.
The following algebraic relations hold: J {⊥} P { q } K g ⇔ ⊤ (2) J { p } P {⊤} K g ⇔ ⊤ (3) J { p ∨ p } P { q } K g ⇔ J { p } P { q } K g ∧ J { p } P { q } K g (4) J { p } P { q ∧ q } K g ⇔ J { p } P { q } K g ∧ J { p } P { q } K g (5) ( p → p ) ∧ J { p } P { q } K g ⇒ J { p } P { q } K g (6) ( q → q ) ∧ J { p } P { q } K g ⇒ J { p } P { q } K g (7) J { p } P { q } K g ′ ⇒ J { p } P { q } K g (8) for p, p , p ∈ B , q, q , q ∈ B ∗ , P ∈ C , and g l ⊆ g ′ l ∈ P S .Proof. (2-5) follow on applying Definition 1. (6-7) follow from (4-5) on considering thecases p ∨ p = p and q ∧ q = q . The reason for (8) is that g ′ l is a bigger set than g l , so J P K g ′ is a bigger set of transitions than J P K g and thus the universal quantifier inDefinition 1 produces a smaller (less true) truth value. Theorem 1 (Soundness).
The following algebraic inequalities hold, for E any of R , B , G l , E k ; E any of R , G l , E k ; E any of R , B , G l ′ for l ′ = l , E k ; E any of R , B , G l , E k ′ for k ′ = k ; [ h ] the code of the subroutine called h : { p } P { N q ∨ E x } K g ∧ J { q } Q { N r ∨ E x } K g (cid:27) ⇒ J { p } P ; Q { N r ∨ E x } K g (9) J { p } P { B q ∨ N p ∨ E x } K g ⇒ J { p } do P { N q ∨ E x } K g (10) ⊤ ⇒ J { p } skip { N p } K g (11) ⊤ ⇒ J { p } return { R p } K g (12) ⊤ ⇒ J { p } break { B p } K g (13) ⊤ ⇒ J { p } goto l { G l p } K g (14) ⊤ ⇒ J { p } throw k { E k p } K g (15) J { b ∧ p } P { q } K g ⇒ J { p } b → P { q } K g (16) J { p } P { q } K g ∧ J { p } Q { q } K g ⇒ J { p } P p Q { q } K g (17) ⊤ ⇒ J { q [ e/x ] } x = e { N q } K g (18) J { p } P { q } K g ∧ g l ⊆ { s | s N s ∈ J q K } ⇒ J { p } P : l { q } K g (19) J { p } P { G l p l ∨ N q ∨ E x } K g ∪{ l p l } ⇒ J { p } label l.P { N q ∨ E x } K g (20) J { p } [ h ] { R r ∨ E k x k } K { } ⇒ J { p } call h { N r ∨ E k x k } K g (21) J { p } P { N r ∨ E k q ∨ E x } K g ∧ J { q } Q { N r ∨ E k x k ∨ E x } K g (cid:27) ⇒ J { p } try P catch ( k ) Q { N r ∨ E k x k ∨ E x } K g (22) Proof.
By evaluation, given Definition 1 and the semantics from Table 5.The reason why the theorem is titled ‘Soundness’ is that its inequalities can be readas the NRB logic deduction rules set out in Table 1, via Definition 1. The fixpointrequirement of the model at the label construct is expressed in the ‘arrival from a goto ata label’ law (19), where it is stated that if the hypothesised states g l at a goto l statementare covered by the states q immediately after code block P and preceding label l , then q holds after the label l too. However, there is no need for any such predication whenthe g l are exactly the fixpoint of the map g l
7→ { s | s G l s ∈ J P K g } because that is what the fixpoint condition says. Thus, while the model in Table 5 satis-fies equations (9-22), it satisfies more than they require – some of the hypotheses in theequations could be dropped and the model would still satisfy them. But the NRB logicrules in Table 1 are validated by the model and thus are sound. In proving completeness of the NRB logic, at least for deterministic programs, we willbe guided by the proof of partial completeness for Hoare’s logic in K. R. Apt’s surveypaper [2]. We will need, for every (possibly modal) postcondition q ∈ B ∗ and everyconstruct R of C , a non-modal formula p ∈ B that is weakest in B such that if p holds11f a state s , and s ι s ′ is in the model of R given in Table 5, then q holds of s ι s ′ .This p is written wp ( R, q ) , the ‘weakest precondition on R for q ’. We construct it viastructural induction on C at the same time as we deduce completeness, so there is anelement of chicken versus egg about the proof, and we will not labour that point.We will also suppose that we can prove any tautology of B and B ∗ , so ‘complete-ness of NRB’ will be relative to that lower-level completeness.Notice that there is always a set p ∈ P S satisfying the ‘weakest precondition’ char-acterisation above. It is { s ∈ S | s ι s ′ ∈ J R K g ⇒ s ι s ′ ∈ J q K } , and it is called theweakest semantic precondition on R for q . So we sometimes refer to wp ( R, q ) as the‘weakest syntactic precondition’ on R for q , when we wish to emphasise the distinction.The question is whether or not there is a formula in B that exactly expresses this set. Ifthere is, then the system is said to be expressive , and that formula is the weakest (syn-tactic) precondition on R for q , wp ( R, q ) . Notice also that a weakest (syntactic) precon-dition wp ( R, q ) must encompass the semantic weakest precondition; that is because ifthere were a state s in the latter and not in the former, then we could form the disjunctionwp ( R, q ) ∨ ( x = sx ∧ . . . x n = sx n ) where the x i are the variables of s , and this wouldalso be a precondition on R for q , hence x = sx ∧ . . . x n = sx n → wp ( R, q ) mustbe true, as the latter is supposedly the weakest precondition, and so s satisfies wp ( R, q ) in contradiction to the assumption that s is not in wp ( R, q ) . For orientation, then, thereader should note that ‘there is a weakest (syntactic) precondition in B ’ means thereis a unique strongest formula in B covering the weakest semantic precondition.We will lay out the proof of completeness inline here, in order to avoid excessivelyoverbearing formality, and at the end we will draw the formal conclusion.A completeness proof is always a proof by cases on each construct of interest. Ithas the form ‘suppose that foo is true, then we can prove it like this’, where foo runsthrough all the constructs we are interested in. We start with assertions about the se-quence construction P ; Q . We will look at this in particular detail, noting where andhow the weakest precondition formula plays a role, and skip that detail for most othercases. Thus we start with foo equal to G l g l ⊲ { p } P ; Q { q } for some assumptions g l ∈ B , but we do not need to take the assumptions g l into account in this case. Case P ; Q . Consider a sequence of two statements P ; Q for which { p } P ; Q { q } holds in the model set out by Definition 1 and Table 5. That is, suppose that initiallythe state s satisfies predicate p and that there is a progression from s to some finalstate s ′ through P ; Q . Then s ι s ′ is in J P ; Q K g and s ι s ′ satisfies q . We will considertwo subcases, the first where P terminates normally from s , and the second where P terminates abnormally from s . A third possibility, that P does not terminate at all, isruled out because a final state s ′ is reached.Consider the first subcase, which means that we think of s as confined to wp ( P, N ⊤ ) .According to Table 5, that means that P started in state s = s and finished normallyin some state s and Q ran on from state s to finish normally in state s = s ′ . Let r stand for the weakest precondition wp ( Q, N q ) that guarantees a normal termina-tion of Q with q holding. By definition of weakest precondition, { r } Q { N q } , is trueand s satisfies r (if not, then r ∨ ( x = sx ∧ x = sx ∧ . . . ) would be a weakerprecondition for N q than r , which is impossible). The latter is true whatever s satis-fying p and wp ( P, N ⊤ ) we started with, so by definition of weakest precondition, p ∧ p ( P, N ⊤ ) → wp ( P, N r ) must be true, which is to say that { p ∧ wp ( P, N ⊤ ) } P { N r } is true.By induction, it is the case that there are deductions ⊢ { p ∧ wp ( P, N ⊤ ) } P { N r } and ⊢ { r } Q { N q } in the NRB system. But the following rule { p ∧ wp ( P, N ⊤ ) } P { N r } { r } Q { N q }{ p ∧ wp ( P, N ⊤ ) } P ; Q { N q } is a derived rule of NRB logic. It is a specialised form of the general NRB rule ofsequence. Putting these deductions together, we have a deduction of the truth of theassertions { p ∧ wp ( P, N ⊤ ) } P ; Q { N q } . By weakening on the conclusion, since N q → q is (always) true, we have a deduction of { p ∧ wp ( P, N ⊤ ) } P ; Q { q } .Now consider the second subcase, when the final state s reached from s = s through P obtains via an abnormal flow out of P . This means that we think of s asconfined to wp ( P, ¬ N ⊤ ) . Now the transition s ι s in J P K g satisfies q , and s isarbitrary in p ∧ wp ( P, ¬ N ⊤ ) , so { p ∧ wp ( P, ¬ N ⊤ ) } P { q } . However, ‘not endingnormally’ (and getting to a termination, which is the case here) means ‘ending abnor-mally’, i.e., R ⊤ ∨ B ⊤ ∨ . . . through all of the available colours, as per Proposition 1,and we may write the assertion out as { p ∧ wp ( P, R ⊤ ∨ B ⊤ . . . ) } P { q } . Consider-ing the cases separately, one has { p ∧ wp ( P, R ⊤ ) } P { R q } (since R q is the compo-nent of q that expects an R -coloured transition), and { p ∧ wp ( P, B ⊤ ) } P { B q } , andso on, all holding. By induction, there are deductions ⊢ { p ∧ wp ( P, R ⊤ ) } P { R q } , ⊢ { p ∧ wp ( P, B ⊤ ) } P { B q } , etc. But the following rule { p ∧ wp ( P, E⊤ ) } P {E q }{ p ∧ wp ( P, E⊤ ) } P ; Q {E q } is a derived rule of NRB logic for each ‘abnormal’ colouring E , and hence we havea deduction ⊢ { p ∧ wp ( P, E⊤ ) } P ; Q {E q } for each of the ‘abnormal’ colours E .By weakening on the conclusion, since E q → q , for each of the colours E , we have adeduction ⊢ { p ∧ wp ( P, E⊤ ) } P ; Q { q } for each of the colours E .By the rule on disjunctive hypotheses (fourth from last in Table 1) we now have adeduction ⊢ { p ∧ ( wp ( P, N ⊤ ) ∨ wp ( P, R ⊤ ) ∨ . . . ) } P ; Q { q } . But the weakest pre-condition is monotonic, so wp ( P, N ⊤ ) ∨ wp ( P, R ⊤ ) ∨ . . . is covered by wp ( P, N ⊤ ∨ R ⊤ ∨ . . . ) , which is wp ( P, ⊤ ) by Proposition 1. But for a deterministic program P , theoutcome from a single starting state s can only be uniquely a normal termination, oruniquely a return termination, etc, and wp ( P, N ⊤ ) ∨ wp ( P, R ⊤ ) ∨ · · · = wp ( P, N ⊤ ∨ R ⊤ ∨ . . . ) = wp ( P, ⊤ ) exactly. The latter is just ⊤ , so we have a proof ⊢ { p } P ; Q { q } .As to what the weakest precondition wp ( P ; Q, q ) is, it is wp ( P, N wp ( Q, q )) ∨ wp ( P, R q ) ∨ wp ( P, B q ) ∨ . . . , the disjunction being over all the possible colours. That concludes the consideration of the case P ; Q . The existence of a formula ex-pressing a weakest precondition is what really drives the proof above along, and inlieu of pursuing the proof through all the other construct cases, we note the importantweakest precondition formulae below: – The weakest precondition for assignment is wp ( x = e, N q ) = q [ e/x ] for q withoutmodal components. In general wp ( x = e, q ) = N q [ e/x ] .13 The weakest precondition for a return statement is wp ( return , q ) = R q . – The weakest precondition for a break statement is wp ( break , q ) = B q . Etc. – The weakest precondition wp ( do P, N q ) for a do loop that ends ‘normally’ is wp ( P, B q ) ∨ wp ( P, Nwp ( P, B q )) ∨ wp ( P, Nwp ( P, Nwp ( P, B q ))) ∨ . . . . Thatis, we might break from P with q , or run through P normally to the precondition forbreaking from P with q next, etc. Write wp ( P, B q ) as p and write wp ( P, N r ) ∧¬ p as ψ ( r ) , Then wp ( do P, N q ) can be written p ∨ ψ ( p ) ∨ ψ ( p ∨ ψ ( p )) ∨ . . . ,which is the strongest solution to π = ψ ( π ) no stronger than p . This is the weakestprecondition for p after while ( ¬ p ) P in classical Hoare logic. It is an existentiallyquantified statement, stating that an initial state s gives rise to exactly some n passesthrough P before the condition p becomes true for the first time. It can classicallybe expressed as a formula of first-order logic and it is the weakest precondition for N q after do P here.The preconditions for E q for each ‘abnormal’ coloured ending E of the loop do P are similarly expressible in B , and the precondition for q is the disjunction of eachof the preconditions for N q , R q , B q , etc. – The weakest precondition for a guarded statement wp ( p → P, q ) is p → wp ( P, q ) ,as in Hoare logic; and the weakest precondition for a disjunction wp ( P p Q, q ) is wp ( P, q ) ∧ wp ( Q, q ) , as in Hoare logic. However, we only use the deterministiccombination p → P p ¬ p → Q for which the weakest precondition is ( p → wp ( P, q )) ∧ ( ¬ p → wp ( Q, q )) , i.e. p ∧ wp ( P, q ) ∨ ¬ p ∧ wp ( Q, q ) .To deal with labels properly, we have to extend some of these notions and notationsto take account of the assumptions G l g l that an assertion G l g l ⊲ { p } P { q } is madeagainst. The weakest precondition p on P for q is then p = wp g ( P, q ) , with the g l as ex-tra parameters. The weakest precondition for a label use wp g ( P : l, q ) is then wp g ( P, q ) ,provided that g l → q , since the states g l attained by goto l statements throughout thecode are available after the label, as well as those obtained through P . The weakest pre-condition in the general situation where it is not necessarily the case that g l → q holds iswp g ( P, q ∧ ( g l → q )) , which is wp g ( P, q ) .Now we can continue the completeness proof through the statements of the form P : l (a labelled statement) and label l.P (a label declaration). Case labelled statement . If J { p } P : l { q } K g holds, then every state s = s sat-isfying p leads through P with s ι s satisfying q , and also q must contain all thetransitions s N s where s satisfies g l . Thus s satisfies wp g ( P, q ) and N g l → q holds.Since s is arbitrary in p , so p → wp g ( P, q ) holds and by induction, ⊢ G l g l ⊲ { p } P { q } .Then, by the ‘frm’ rule of NRB (Table 1), we may deduce ⊢ G l g l ⊲ { p } P : l { q } . Case label declaration . The weakest precondition for a declaration wp g ( label l.P, q ) is simply p = wp g ′ ( P, q ) , where the assumptions after the declaration are g ′ = g ∪{ l g l } and g l is such that G l g l ⊲ { p } P { q } . In other words, p and g l are simultaneouslychosen to make the assertion hold, p maximal and g l the least fixpoint describing thestates at goto l statements in the code P , given that the initial state satisfies p andassumptions G l g l hold. The g l y are the statements that after exactly some n ∈ N moretraversals through P via goto l , the trace from state s will avoid another goto l forthe first time and exit P normally or via an abnormal exit that is not a goto l . f it is the case that J { p } label l.P { q } K g holds then every state s = s satisfying p leads through label l.P with s ι s satisfying q . That means that s ι s leadsthrough P , but it is not all that do; there are extra transitions with ι = G l that are notconsidered. The ‘missing’ transitions are precisely the G l g l where g l is the appropriateleast fixpoint for g l = { s | s G l s ∈ J P K g ∪{ l g l } , which is a predicate expressingthe idea that s at a goto l initiates some exactly n traversals back through P againbefore exiting P for a first time other than via a goto l . The predicate q cannot mention G l since the label l is out of scope for it, but it may permit some, all or no G l -colouredtransitions. The predicate q ∨ G l g l , on the other hand, permits all the G l -colouredtransitions that exit P . transitions. Thus adding G l g l to the assumptions means that s traverses P via s ι s satisfying q ∨ G l g l even though more transitions are admitted.Since s = s is arbitrary in p , so p → wp g ∪{ l g l } ( P, q ∨ G l g l ) and by induction ⊢ G l ⊲ { p } P { q ∨ G l g l } , and then one may deduce ⊢ { p } label l.P { q } by the ‘lbl’rule. That concludes the text that would appear in a proof, but which we have abridgedand presented as a discussion here! We have covered the typical case ( P ; Q ) and theunusual cases ( P : l , label l.P ). The proof-theoretic content of the discussion is: Theorem 2 (Completeness).
The system of NRB logic in Table 1 is complete for deter-ministic programs, relative to the completeness of first-order logic.
We do not know if the result holds for non-deterministic programs too, but it seemsprobable. A different proof technique would be needed (likely showing that attemptingto construct a proof backwards either succeeds or yields a counter-model).Along with that we note
Theorem 3 (Expressiveness).
The weakest precondition wp ( P, q ) for q ∈ B ∗ , P ∈ C in the interpretation set out in Definition 1 and Table 5 is expressible in B . The observation above is that there is a formula in B that expresses the semantic weak-est precondition exactly. We have proven the NRB logic sound with respect to a simple transition-based modelof programs, and showed that it is complete for deterministic programs.
References
1. American National Standards Institute. American national standard for information systems– programming langu age C, ANSI X3.159-1989, 1989.2. Krzysztof R. Apt. Ten years of Hoare’s logic: A survey: Part I.
ACM Trans. Program. Lang.Syst. , 3(4):431–483, October 1981.3. Al Bessey, Ken Block, Ben Chelf, Andy Chou, Bryan Fulton, Seth Hallem, Charles Henri-Gros, Asya Kamsky, Scott McPeak, and Dawson Engler. A few billion lines of code later:using static analysis to find bugs in the real world.
Commun. ACM , 53(2):66–75, February2010. . Peter Breuer and Simon Pickin. Checking for deadlock, double-free and other abuses in thelinux kernel source code. In Proc. Computational Science – ICCS 2006 , number 3994 inLNCS, pages 765–772. Springer, May 2006.5. Peter T Breuer and Marisol Garcia Valls. Static deadlock detection in the linux kernel. In
Proc. Reliable Software Technologies/Ada-Europe 2004 , number 3063 in LNCS, pages 52–64. Springer Berlin/Heidelberg, June 2004.6. Peter T Breuer and Simon Pickin. Symbolic approximation: an approach to verification inthe large.
Innovations in Systems and Software Engineering , 2(3):147–163, 2006.7. Peter T Breuer and Simon Pickin. Verification in the large via symbolic approximation. In
Proc. 2nd International Symposium on Leveraging Applications of Formal Methods, Verifi-cation and Validation, 2006 (ISoLA 2006) , pages 408–415. IEEE, 2006.8. Peter T Breuer and Simon Pickin. Open source verification in an anonymous volunteernetwork.
Science of Computer Programming , 2013. To appear.9. Peter T Breuer, Simon Pickin, and Maria Larrondo Petrie. Detecting deadlock, double-freeand other abuses in a million lines of linux kernel source. In
Proc. 30th Annual SoftwareEngineering Workshop 2006 (SEW’06) , pages 223–233. IEEE/NASA, 2006.10. E. Clarke, E. Emerson, and A. Sistla. Automatic verification of finite-state concurrent sys-tems using tempora l logic specifications.
ACM Transactions on Programming Languagesand Systems (TOPLAS) , 8(2):244–253, 1986.11. D. Engler, B. Chelf, A. Chou, and S. Hallem. Checking system rules using system-specific,programmer-written compiler extensions. In
Proc. 4th Symposium on Operating SystemDesign and Implementati on (OSDI 2000) , pages 1–16, October 2000.12. David Harel, Jerzy Tiuryn, and Dexter Kozen.
Dynamic Logic . MIT Press, Cambridge, MA,USA, 2000.13. International Standards Organisation. ISO/IEC 9899-1999, programming languages - C,1999.. MIT Press, Cambridge, MA,USA, 2000.13. International Standards Organisation. ISO/IEC 9899-1999, programming languages - C,1999.