Certificate size reduction in Abstraction-Carrying Code
Elvira Albert, Puri Arenas, Germán Puebla, Manuel Hermenegildo
aa r X i v : . [ c s . P L ] O c t Under consideration for publication in Theory and Practice of Logic Programming Certificate Size Reduction inAbstraction-Carrying Code ∗ ELVIRA ALBERT, PURI ARENAS
School of Computer Science, Complutense University of MadridE28040-Profesor Jos´e Garc´ıa Santesmases, s/n, Madrid, Spain ( e-mail: { elvira,puri } @sip.ucm.es ) GERM ´AN PUEBLA , MANUEL HERMENEGILDO , School of Computer Science, Technical University of MadridE28660-Boadilla del Monte, Madrid, Spain ( e-mail: { german,herme } @fi.upm.es ) Madrid Institute for Advanced Studiesin Software Development Technology (IMDEA Software)Madrid,Spain ( e-mail: [email protected] ) submitted 19 September 2007; revised 16 Octorer 2009; 27 May 2010; accepted 6 October 2010 Abstract
Abstraction-Carrying Code (ACC) has recently been proposed as a framework for mobilecode safety in which the code supplier provides a program together with an abstraction (or abstract model of the program) whose validity entails compliance with a predefinedsafety policy. The abstraction plays thus the role of safety certificate and its generationis carried out automatically by a fixpoint analyzer. The advantage of providing a (fix-point) abstraction to the code consumer is that its validity is checked in a single pass (i.e., one iteration) of an abstract interpretation-based checker. A main challenge to makeACC useful in practice is to reduce the size of certificates as much as possible while atthe same time not increasing checking time. The intuitive idea is to only include in thecertificate information that the checker is unable to reproduce without iterating. We in-troduce the notion of reduced certificate which characterizes the subset of the abstractionwhich a checker needs in order to validate (and re-construct) the full certificate in a singlepass. Based on this notion, we instrument a generic analysis algorithm with the necessaryextensions in order to identify the information relevant to the checker. Interestingly, thefact that the reduced certificate omits (parts of) the abstraction has implications in thedesign of the checker. We provide the sufficient conditions which allow us to ensure that1) if the checker succeeds in validating the certificate, then the certificate is valid for theprogram (correctness) and 2) the checker will succeed for any reduced certificate which isvalid (completeness). Our approach has been implemented and benchmarked within the
CiaoPP system. The experimental results show that our proposal is able to greatly reducethe size of certificates in practice.
To appear in Theory and Practice of Logic Programming (TPLP).KEYWORDS : Proof-Carrying Code. Abstraction-Carrying Code. Static Analysis. Re-duced Certificates. ∗ A preliminary version of this work appeared in the Proceedings of ICLP’06 (Albert et al. 2006).
Introduction
Proof-Carrying Code (PCC) (Necula 1997) is a general framework for mobile codesafety which proposes to associate safety information in the form of a certificate to programs. The certificate (or proof) is created at compile time by the certifier on the code supplier side, and it is packaged along with the code. The consumerwhich receives or downloads the (untrusted) code+certificate package can then runa checker which by an efficient inspection of the code and the certificate can verifythe validity of the certificate and thus compliance with the safety policy. The keybenefit of this “certificate-based” approach to mobile code safety is that the taskof the consumer is reduced from the level of proving to the level of checking, aprocedure that should be much simpler, efficient, and automatic than generatingthe original certificate.Abstraction-Carrying Code (ACC) (Albert et al. 2005; Albert et al. 2008) hasbeen recently proposed as an enabling technology for PCC in which an abstrac-tion (or abstract model of the program) plays the role of certificate. An importantfeature of ACC is that not only the checking, but also the generation of the ab-straction, is carried out automatically by a fixpoint analyzer. In this article we willconsider analyzers which construct a program analysis graph which is interpreted asan abstraction of the (possibly infinite) set of states explored by the concrete exe-cution. To capture the different graph traversal strategies used in different fixpointalgorithms, we use the generic description of (Hermenegildo et al. 2000), whichgeneralizes the algorithms used in state-of-the-art analysis engines.Essentially, the certification/analysis carried out by the supplier is an iterativeprocess which repeatedly traverses the analysis graph until a fixpoint is reached.The analysis information inferred for each call which appears during the (multiple)graph traversals is stored in the answer table (Hermenegildo et al. 2000). After eachiteration (or graph traversal), if the answer computed for a certain call is differentfrom the one previously stored in the answer table, both answers are combined (bycomputing their lub) and the result is used 1) to update the table, and 2) to launchthe recomputation of those calls whose answer depends on the answer currentlycomputed. In the original ACC framework, the final full answer table constitutesthe certificate. A main idea is that, since this certificate contains the fixpoint, asingle pass over the analysis graph is sufficient to validate such certificate on theconsumer side.One of the main challenges for the practical uptake of ACC (and related methods)is to produce certificates which are reasonably small. This is important since thecertificate is transmitted together with the untrusted code and, hence, reducing itssize will presumably contribute to a smaller transmission time –very relevant for in-stance under limited bandwidth and/or expensive network connectivity conditions.Also, this reduces the storage cost for the certificate. Nevertheless, a main con-cern when reducing the size of the certificate is that checking time is not increased(among other reasons because pervasive and embedded systems also suffer typically2rom limited computing –and power– resources). In principle, the consumer coulduse an analyzer for the purpose of generating the whole fixpoint from scratch, whichis still feasible as analysis is automatic. However, this would defeat one of the mainpurposes of ACC, which is to reduce checking time. The objective of this work isto characterize the smallest subset of the abstraction which must be sent within acertificate –and which still guarantees a single pass checking process– and to designan ACC scheme which generates and validates such reduced certificates. The maincontributions of this article are:1. The notion of reduced certificate which characterizes the subset of the abstrac-tion which, for a given analysis graph traversal strategy, the checker needs inorder to validate (and re-construct) the full certificate in a single pass.2. An instrumentation of the generic abstract interpretation-based analysis al-gorithm of (Hermenegildo et al. 2000) with the necessary extensions in orderto identify relevant information to the checker.3. A checker for reduced certificates which is correct , i.e., if the checker succeedsin validating the certificate, then the certificate is valid for the program.4. Sufficient conditions for ensuring completeness of the checking process. Con-cretely, if the checker uses the same strategy as the analyzer then our proposedchecker will succeed for any reduced certificate which is valid.5. An experimental evaluation of the effect of our approach on the
CiaoPP system(Hermenegildo et al. 2005), the abstract interpretation-based preprocessor ofthe
Ciao multi-paradigm (Constraint) Logic Programming system. The ex-perimental results show that the certificate can be greatly reduced (by a factorof 3.35) with no increase in checking time.Both the ACC framework and our work here are applied at the source level. Incontrast, in existing PCC frameworks, the code supplier typically packages the cer-tificate with the object code rather than with the source code (both are untrusted).Nevertheless, our choice of making our presentation at the source level is withoutloss of generality because both the original ideas in the ACC approach and those inour current proposal can also be applied directly to bytecode. Indeed, a good num-ber of abstract interpretation-based analyses have been proposed in the literaturefor bytecode and machine code, most of which compute a fixpoint during analysiswhich can be reduced using the general principle of our proposal. For instance, inrecent work, the concrete CLP verifier used in the original ACC implementationhas itself been shown to be applicable without modification also to Java bytecodevia a transformational approach, based on partial evaluation (Albert et al. 2007) orvia direct transformation (M´endez-Lojo et al. 2007a) using standard tools such asSoot (Vallee-Rai et al. 1999). Furthermore, in (M´endez-Lojo et al. 2007b; M´endez-Lojo et al. 2007a)a fixpoint-based analysis framework has been developed specifically for Java byte-code which is essentially equivalent to that used in the ACC proposal and to the onethat we will apply in this work on the producer side to perform the analysis and ver-ification. This supports the direct applicability of our approach to bytecode-basedprogram representations and, in general, to other languages and paradigms.The rest of the article is organized as follows. The following section presents a3eneral view of ACC. Section 3 gives a brief overview of our method by meansof a simple example. Section 4 recalls the certification process performed by thecode supplier and illustrates it with a running example. Section 5 characterizes thenotion of reduced certificate and instruments a generic certifier for its generation.Section 6 presents a generic checker for reduced certificates together with correctnessand completeness results. Finally, Section 7 discusses some experimental results andrelated work.
We assume the reader familiar with abstract interpretation (see (Cousot and Cousot 1977))and (Constraint) Logic Programming (C)LP (see, e.g., (Marriot and Stuckey 1998)and (Lloyd 1987)).An abstract interpretation-based certifier is a function certifier : Prog × ADom × APol ACert which for a given program P ∈ Prog , an abstract domain h D α , ⊑i ∈ ADom and a safety policy I α ∈ APol generates a certificate
Cert α ∈ ACert ,by using an abstract interpreter for D α , which entails that P satisfies I α . In thefollowing, we denote that I α and Cert α are specifications given as abstract semanticvalues of D α by using the same α . The essential idea in the certification processcarried out in ACC is that a fixpoint static analyzer is used to automatically inferan abstract model (or simply abstraction ) about the mobile code which can thenbe used to prove that the code is safe w.r.t. the given policy in a straightforwardway. The basics for defining the abstract interpretation-based certifiers in ACC aresummarized in the following four points and equations. Approximation.
We consider a description (or abstract) domain h D α , ⊑i ∈ ADom and its corresponding concrete domain h D , ⊆i , both with a complete latticestructure. Description (or abstract) values and sets of concrete values are re-lated by an abstraction function α : 2 D → D α , and a concretization function γ : D α → D . The pair h α, γ i forms a Galois connection. The concrete andabstract domains must be related in such a way that the following conditionholds (Cousot and Cousot 1977): ∀ x ∈ D , ∀ y ∈ D α : ( α ( x ) ⊑ y ) ⇐⇒ ( x ⊆ γ ( y ))In general ⊑ is induced by ⊆ and α . Similarly, the operations of least upper bound ( ⊔ ) and greatest lower bound ( ⊓ ) mimic those of 2 D in a precise sense. Abstraction generation.
We consider the class of fixpoint semantics in which a(monotonic) semantic operator, S P , is associated to each program P . The mean-ing of the program, [[ P ]], is defined as the least fixed point of the S P operator,i.e., [[ P ]]=lfp( S P ). If S P is continuous, the least fixed point is the limit of an it-erative process involving at most ω applications of S P starting from the bottomelement of the lattice. Using abstract interpretation, we can use an operator S αP which works in the abstract domain and which is the abstract counterpart of S P . This operator induces the abstract meaning of the program, which we referto as [[ P ]] α . Now, again, starting from the bottom element of the lattice we can4btain the least fixpoint of S αP , denoted lfp( S αP ), and we define [[ P ]] α =lfp( S αP ).Correctness of analysis (Cousot and Cousot 1977) ensures that [[ P ]] α safely ap-proximates [[ P ]], i.e., [[ P ]] ∈ γ ([[ P ]] α ). In actual analyzers, it is often the case thatthe analysis computes a post-fixpoint of S αP , which we refer to as Cert α , insteadof the least fixpoint. The reason for this is that computing the least fixpoint mayrequire a too large (even infinite) number of iterations. An analyzer is a function analyzer : Prog × ADom ACert such that: analyzer ( P, D α )= Cert α ∧ S αP ( Cert α )= Cert α (1)Since [[ P ]] α ⊑ Cert α , Cert α is a safe approximation of [[ P ]]. Verification Condition.
Let
Cert α be a safe approximation of [[ P ]]. If an ab-stract safety specification I α can be proved w.r.t. Cert α , then P satisfies thesafety policy and Cert α is a valid certificate: Cert α is a valid certificate for P w.r.t. I α if Cert α ⊑ I α (2) Certification.
Together, Equations (1) and (2) define a certifier which providesprogram fixpoints,
Cert α , as certificates which entail a given safety policy, i.e.,by taking Cert α = analyzer ( P, D α ).The second main idea in ACC is that a simple, easy-to-trust abstract interpretation-based checker verifies the validity of the abstraction on the mobile code. The checkeris defined as a specialized abstract interpreter whose key characteristic is that itdoes not need to iterate in order to reach a fixpoint (in contrast to standard ana-lyzers). The basics for defining the abstract interpretation-based checkers in ACCare summarized in the following two points and equations. Checking.
If a certificate
Cert α is a fixpoint of S αP , then S αP ( Cert α ) = Cert α .Thus, a checker is a function checker : Prog × ADom × ACert bool which for aprogram P ∈ Prog , an abstract domain D α ∈ ADom and a certificate
Cert α ∈ ACert checks whether
Cert α is a fixpoint of S αP or not: checker ( P, D α , Cert α ) returns true iff ( S αP ( Cert α ) = Cert α ) (3) Verification Condition Regeneration.
To retain the safety guarantees, the con-sumer must regenerate a trustworthy verification condition –Equation 2– and usethe incoming certificate to test for adherence to the safety policy.
P is trusted iff
Cert α ⊑ I α (4)Therefore, the general idea in ACC is that, while analysis –Equation (1)– is aniterative process, which may traverse (parts of) the abstraction more than onceuntil the fixpoint is reached, checking –Equation (3)– is guaranteed to be done ina single pass over the abstraction. This characterization of checking ensures thatthe task performed by the consumers is indeed strictly more efficient than thecertification carried out by the producers, as shown in (Albert et al. 2005).5 An Informal Account of our Method
In this section we provide an informal account of the idea of reduced certificatewithin the ACC framework by means of a very simple example.
Example 3.1
Consider the following program, the simple abstract domain ⊥ ⊑ int ⊑ real ⊑ term that we will use in all our examples, and the initial calling pattern S α = { q ( X ): h term i} which indicates that q can be called with any term as argument: q(X) :- p(X). p(X) :- X = 1.0.p(X) :- X = 1. A (top-down) analyzer for logic programs would start the analysis of q ( term ) whichin turn requires the analysis of p ( term ) and, as a result, the following fixpoint canbe inferred: Cert α = { q ( X ): h term i 7→ h real i , p ( X ): h term i 7→ h real i} . This gives usa safe approximation of the result of executing q ( X ). In particular, it says thatwe obtain a real number as a result of executing q ( X ). Observe that the fixpointis sound but possibly inaccurate since when only the second rule defining p ( X ) isexecuted, we would obtain an integer number.Given a safety policy, the next step in any approach to PCC is to verify that Cert α entails such policy. For instance, if the safety policy specifies that I α = { q ( X ): h term i 7→ h term i} , then clearly Cert α ⊑ I α holds and, hence, Cert α can beused as a certificate. Similarly, a safety policy I ′ α = { q ( X ): h term i 7→ h real i} isentailed by the certificate, while I ′′ α = { q ( X ): h term i 7→ h int i} is not.The next important idea in ACC is that, given a valid certificate Cert α , a singlepass of a static analyzer over it must not change the result and, hence, this way Cert α can be validated. Observe that when analyzing the second rule of p ( X ) theinferred information X int is lubbed with X real which we have in thecertificate and, hence, the fixpoint does not change. Therefore, the checker can beimplemented as a non-iterating single-pass analyzer over the certificate. If the resultof applying the checker to Cert α yields a result that is different from Cert α an erroris issued. Once the checker has verified that Cert α is a fixpoint (and thus it safelyapproximates the program semantics) the only thing left is to verify that Cert α entails I α , thus ensuring that the validated certificate enforces the safety policy,exactly as the certifier does.We now turn to the key idea of reduced certificates in ACC: the observationthat any information in the certificate that the checker is able to reconstruct byitself in a single-pass does not need to be included in the certificate . For example, ifgeneration of the certificate does not require iteration, then no information needs tobe included in the certificate, since by performing the same steps as the generatorthe checker will not iterate. If the generator does need to iterate, then the challengeis to find the minimal amount of information that needs to be included in Cert α to avoid such iteration in the checker.Whether a generator requires iteration depends on the strategy used when com-puting the fixpoint as well as on the domain and the program itself (presence ofloops and recursions, multivariance, etc.). In fact, much work has been done in6rder to devise optimized strategies to reduce as much as possible iterations dur-ing analysis. As mentioned before, (Hermenegildo et al. 2000), which will be ourstarting point, presents a parametric algorithm that allows capturing a large classof such strategies. An important observation is that whether the checker can avoiditeration is controlled by the same factors as in the generator, modified only by theeffects of the information included in the (reduced) certificate, that we would liketo be minimal.As an (oversimplied) example in order to explain this idea, let us consider twopossible fixpoint strategies, each one used equally in both the analyzer (generator)and the checker:(1) a strategy which first analyzes the first rule for p ( X ) and then the second one,and(2) a strategy which analyzes the rules in the opposite order than (1).Assume also that the analyzer has the simple iteration rule that as soon as an answerchanges during analysis then analysis is restarted at the top (these strategies arereally too simple and no practical analyzer would really iterate on this example,but they are useful for illustration here –the general issue of strategies will becomeclear later in the paper).In (1), the answer X real is inferred after the checking of the first rule. Then,the second rule is analyzed which leads to the answer X int that is lubbed withthe previous one yielding X real . Hence, in a single pass over the program thefixpoint is reached. Therefore, with this strategy X real can be reconstructedby the checker without iterating and should not be included in the certificate.However, with strategy (2) we first obtain the answer X int . Then, after theanalysis of the first rule, X real is inferred. When lubbing it with the previousvalue, X int , the answer obtained is X real . Since the answer has changedthe analyzer starts a new iteration in which it reanalyzes the second rule with thenew answer X real . Since now nothing changes in this iteration the fixpoint isreached.The key idea is that, if strategy (2) is used, then more than one iteration isneeded to reach the fixpoint. Hence the certificate cannot be empty and instead ithas to include (some of) the analysis information. The conclusion is that the notionof reduced certificate is strongly related to the strategy used during analysis andchecking. ✷ The remainder of the article will formalize and discuss in detail each of the abovesteps and issues.
This section recalls ACC and the notion of full certificate in the context of (C)LP(Albert et al. 2005). This programming paradigm offers a good number of advan-tages for ACC, an important one being the maturity and sophistication of theanalysis tools available for it. It is also a non-trivial case in many ways, including7he fact that logic variables and incomplete data structures essentially represent re-spectively pointers and structures containing pointers (see also the arguments andpointers to literature in Section 1 which provide evidence that our approach is ap-plicable essentially directly to programs in other programming paradigms, includingtheir bytecode representations).Very briefly, terms are constructed from variables x ∈ V , functors (e.g., f ) and predicates (e.g., p ). We denote by { x /t , . . . , x n /t n } the substitution σ , where x i = x j , if i = j , and t i are terms. A renaming is a substitution ρ for which thereexists the inverse ρ − such that ρρ − ≡ ρ − ρ ≡ id . A constraint is a conjunctionof expressions built from predefined predicates (such as inequalities over the reals)whose arguments are constructed using predefined functions (such as real addition).An atom has the form p ( t , ..., t n ) where p is a predicate symbol and t i are terms.A literal is either an atom or a constraint. A rule is of the form H :- D where H ,the head , is an atom and D , the body , is a possibly empty finite sequence of literals.A constraint logic program P ∈ Prog , or program , is a finite set of rules. Programrules are assumed to be normalized: only distinct variables are allowed to occur asarguments to atoms. Furthermore, we require that each rule defining a predicate p has identical sequence of variables x p , . . . x p n in the head atom, i.e., p ( x p , . . . x p n ).We call this the base form of p . This is not restrictive since programs can alwaysbe normalized. Algorithm 1 has been presented in (Hermenegildo et al. 2000) as a generic de-scription of a fixpoint algorithm which generalizes those used in state-of-the-artanalysis engines, such as the one in
CiaoPP (Hermenegildo et al. 2005), PLAI (Muthukumar and Hermenegildo 1992;de la Banda et al. 1996), GAIA (Le Charlier and Van Hentenryck 1994), and theCLP( R ) analyzer (Kelly et al. 1998). It has the description domain D α (and func-tions on this domain) as parameters. Different domains give analyzers which providedifferent kinds of information and degrees of accuracy. In order to analyze a pro-gram, traditional (goal dependent) abstract interpreters for (C)LP programs receiveas input, in addition to the program P and the abstract domain D α , a set S α ⊆ AA - tom of Abstract Atoms (or call patterns ). Such call patterns are pairs of the form A : CP where A is a procedure descriptor and CP is an abstract substitution (i.e.,a condition of the run-time bindings) of A expressed as CP ∈ D α . For brevity, wesometimes omit the subscript α in the algorithms. The analyzer of Algorithm 1, Analyze f , constructs an and–or graph (Bruynooghe 1991) (or analysis graph) for S α which is an abstraction of the (possibly infinite) set of (possibly infinite) exe-cution paths (and-or trees) explored by the concrete execution of the initial callsdescribed by S α in P . Let S αP be the abstract semantics of the program for the callpatterns S α defined in (Bruynooghe 1991). Following the notation in Section 2, theanalysis graph –denoted as [[ P ]] α – corresponds to (or safely approximates) lfp( S αP ).The program analysis graph is implicitly represented in the algorithm by means8 lgorithm 1 Generic Analyzer for Abstraction-Carrying Code
Initialization of global data structures:
DAT = AT = ∅ function Analyze f ( S α ⊆ AAtom , Ω ∈ QHS ) for A : CP ∈ S α do add event ( newcall ( A : CP ) , Ω); while E = next event (Ω) do if E = newcall ( A : CP ) then new call pattern ( A : CP , Ω); else if E = updated ( A : CP ) then add dependent rules ( A : CP , Ω); else if E = arc ( R ) then process arc ( R, Ω); return AT ; procedure new call pattern ( A : CP ∈ AAtom , Ω ∈ QHS ) for all rule A k : − B k, , . . . , B k,n k do CP := Aextend ( CP , vars ( . . . , B k,i , . . . )); CP := Arestrict ( CP , vars ( B k, )); add event ( arc ( A k : CP ⇒ [ CP ] B k, : CP ),Ω); add answer table ( A : CP ); procedure process arc ( H k : CP ⇒ [ CP ] B k,i : CP ∈ Dep , Ω ∈ QHS ) if B k,i is not a constraint then add H k : CP ⇒ [ CP ] B k,i : CP to DAT ; W := vars ( H k , B k, , . . . , B k,n k ); CP := get answer ( B k,i : CP , CP , W, Ω); if CP = ⊥ and i = n k then CP := Arestrict ( CP , vars ( B k,i +1 )); add event ( arc ( H k : CP ⇒ [ CP ] B k,i +1 : CP ),Ω); else if CP = ⊥ and i = n k then AP := Arestrict ( CP , vars ( H k )); insert answer info ( H : CP AP , Ω); function get answer ( L : CP ∈ AAtom , CP ∈ D α , W ⊆ V , Ω ∈ QHS ) if L is a constraint then return Aadd ( L, CP ); else AP := lookup answer ( L : CP , Ω); AP := Aextend ( AP , W ); return Aconj ( CP , AP ); function lookup answer ( A : CP ∈ AAtom , Ω ∈ QHS ) if there exists a renaming σ s.t. σ ( A : CP ) AP in AT then return σ − ( AP ); else add event ( newcall ( σ ( A : CP )) , Ω) where σ is renaming s.t. σ ( A ) in base form; return ⊥ ; procedure insert answer info ( H : CP AP ∈ Entry , Ω ∈ QHS ) AP := lookup answer ( H : CP ); AP := Alub ( AP , AP ); if AP = AP then add answer table (( H : CP AP ); add event ( updated ( H : CP ) , Ω); procedure add dependent rules ( A : CP ∈ AAtom , Ω ∈ QHS ) for all arc of the form H k : CP ⇒ [ CP ] B k,i : CP in graph where there existsrenaming σ s.t. A : CP =( B k,i : CP ) σ do add event ( arc ( H k : CP ⇒ [ CP ] B k,i : CP ) , Ω); of two global data structures, the answer table AT and the dependency arc tableDAT , both initially empty as shown at the beginning of Algorithm 1. Given the information in these, it is straightforward to construct the graph and the associatedprogram-point annotations. efinition 4.1 ( answer and dependency arc table )Let P ∈ Prog be a program and D α an abstract domain. • An Answer Table ( AT ⊆ Entry ) for P and D α is a set of entries of the form A : CP AP ∈ Entry where A : CP ∈ AAtom , A is always in base form and CP and AP are abstract substitutions in D α . • A Dependency Arc Table ( DAT ⊆ Dep ) for P and D α is a set of dependencies of the form A k : CP ⇒ [ CP ] B k,i : CP ∈ Dep , where A k :- B k, , . . . , B k,n is a program rule in P and CP , CP , CP are abstract substitutions in D α .Informally, an entry A : CP AP in AT should be interpreted as “the answerpattern for calls to A satisfying precondition (or call pattern) CP meets post-condition (or answer pattern), AP .” Dependencies are used for efficiency. As wewill explain later, Algorithm 1 finishes when there are no more events to be pro-cessed (function Analyze f ). This happens when the answer table AT reachesa fixpoint. Any entry A : CP AP in AT is generated by analyzing all rulesassociated to A (procedure new call pattern ). Thus, if we have a rule of theform A k :- B k, , . . . , B k,n , we know that the answer for A depends on the answersfor all literals in the body of the rule. We annotate this fact in DAT by meansof the dependencies A k : CP ⇒ [ CP k,i − ] B k,i : CP k,i , i ∈ { , ..n } , which meanthat the answer for A k : CP depends on the answer for B k,i : CP k,i , also storedin AT . Then if during the analysis, the answer for B k,i : CP k,i changes, the arc A k : CP ⇒ [ CP k,i − ] B k,i : CP k,i must be reprocessed in order to compute the“possibly” new answer for A k : CP . This is to say that the rule for A k has to beprocessed again starting from atom B k,i . Thus, as we will see later, dependencyarcs are used for forcing recomputation until a fixpoint is reached. The remainingpart CP k,i − is the program annotation just before B k,i is reached and containsinformation about all variables in rule k . CP k,i − is not really necessary, but isincluded for efficiency.Intuitively, the analysis algorithm is a graph traversal algorithm which placesentries in the answer table AT and dependency arc table DAT as new nodes andarcs in the program analysis graph are encountered. To capture the different graphtraversal strategies used in different fixpoint algorithms, a prioritized event queue is used. We use Ω ∈ QHS to refer to a
Queue Handling Strategy which a particularinstance of the generic algorithm may use. Events are of three forms: • newcall ( A : CP ) which indicates that a new call pattern for literal A withabstract substitution CP has been encountered. • arc ( H k : ⇒ [ ] B k,i : ) which indicates that the rule with H k as headneeds to be (re)computed from the position k, i . • updated ( A : CP ) which indicates that the answer to call pattern A withabstract substitution CP has been changed in AT .The algorithm is defined in terms of five abstract operations on the domain D α : • Arestrict ( CP , V ) performs the abstract restriction of an abstract substitution CP to the set of variables in the set V .10 Aextend ( CP , V ) extends the abstract substitution CP to the variables in theset V . • Aadd ( C, CP ) performs the abstract operation of conjoining the actual con-straint C with the abstract substitution CP . • Aconj ( CP , CP ) performs the abstract conjunction of two abstract substitu-tions. • Alub ( CP , CP ) performs the abstract disjunction of two abstract substitu-tions.Apart from the parametric domain-dependent functions, the algorithm has severalother undefined functions. The functions add event and next event respectively pushan event to the priority queue and pop the event of highest priority, according toΩ. When an arc H k : CP ⇒ [ CP ′′ ] B k,i : CP ′ is added to DAT , it replaces anyother arc of the form H k : CP ⇒ [ ] B k,i : (modulo renaming) in the tableand the priority queue. Similarly when an entry H k : CP AP is added to the AT ( add answer table ), it replaces any entry of the form H k : CP (modulorenaming). Note that the underscore ( ) matches any description, and that there isat most one matching entry in DAT or AT at any time.More details on the algorithm can be found in (Hermenegildo et al. 2000; Puebla and Hermenegildo 1996).Let us briefly explain its main procedures: • The algorithm centers around the processing of events on the priority queue,which repeatedly removes the highest priority event (Line 4) and calls theappropriate event-handling function (L5-7). • The function new call pattern initiates processing of all the rules for thedefinition of the internal literal A , by adding arc events for each of the firstliterals of these rules (L13). Initially, the answer for the call pattern is set to ⊥ (L14). • The procedure process arc performs the core of the analysis. It performs asingle step of the left-to-right traversal of a rule body. — If the literal B k,i is not a constraint (L16), the arc is added to DAT (L17). — Atoms are processed by function get answer : – Constraints are simply added to the current description (L26). – In the case of literals, the function lookup answer first looks upan answer for the given call pattern in AT (L30) and if it is notfound, it places a newcall event (L32). When it finds one, then thisanswer is extended to the variables in the rule the literal occursin (L27) and conjoined with the current abstract substitution (L28).The resulting answer (L19) is either used to generate a new arc eventto process the next literal in the rule, if B k,i is not the last one (L20);otherwise, the new answer is computed by insert answer info . • The part of the algorithm that is more relevant to the generation of reducedcertificates is within insert answer info . The new answer for the rule is combined with the current answer in the table (L35). If the fixpoint for such11all has not been reached, then the corresponding entry in AT is updatedwith the combined answer (L37) and an updated event is added to the queue(L38). • The purpose of an updated event is that the function add dependent rules (re)processes those calls which depend on the call pattern A : CP whoseanswer has been updated (L40). This effect is achieved by adding the arcevents for each of its dependencies (L41). The fact that dependency arcscontain information at the level of body literals, identified by a pair k, i ,allows reprocessing only those rules for the predicate which depend on theupdated pattern. Furthermore, those rules are reprocessed precisely from thebody atom whose answer has been updated. If, instead, dependencies werekept at the level of rules, rules would need to be reprocessed always fromthe leftmost atom. Furthermore, if dependencies were kept at the level ofpredicates, all rules for a predicate would have to be reprocessed from theleftmost atom as soon as an answer pattern it depended on were updated.In the following section, we illustrate the algorithm by means of an example. Our running example is the program rectoy taken from (Rose 1998). We will useit to illustrate our algorithms and show that our approach improves on state-of-the-art techniques for reducing the size of certificates. Our approach can deal withthe very wide class of properties for which abstract interpretation has been proveduseful (for example in the context of LP this includes variable sharing, determi-nacy, non-failure, termination, term size, etc.). For brevity and concreteness, in allour examples abstract substitutions simply assign an abstract value in the simpledomain introduced in Section 3 to each variable in a set V over which each suchsubstitution ranges. We use term as the most general type (i.e., term correspondsto all possible terms). For brevity, variables whose regular type is term are oftennot shown in abstract substitutions. Also, when it is clear from the context, an ab-stract substitution for an atom p ( x , . . . , x n ) is shown as a tuple h t , . . . , t n i , suchthat each value t i indicates the type of x i . The most general substitution ⊤ assigns term to all variables in V . The least general substitution ⊥ assigns the empty setof values to each variable. Example 4.2
Consider the
Ciao version of procedure rectoy (Rose 1998) and the call pattern rectoy ( N , M ) : h int , term i which indicates that external calls to rectoy are per-formed with an integer value, int , in the first argument N : rectoy(N,M) :- N = 0, M = 0.rectoy(N,M) :- N1 is N-1, rectoy(N1,R), M is N1+R. We now briefly describe four main steps carried out in the analysis using someΩ ∈ QHS : 12. The initial event newcall ( rectoy ( N , M ) : h int , term i ) introduces the arcs A , and A , in the queue, each one corresponds to the rules in the order above: A , ≡ arc ( rectoy ( N , M ) : h int , term i ⇒ [ { N / int } ] N = : { N / int } ) A , ≡ arc ( rectoy ( N , M ) : h int , term i ⇒ [ { N / int } ] N1 is N − : { N / int } )The initial answer E ≡ rectoy ( N , M ) : h int , term i 7→ ⊥ is inserted in AT .B. Assume that Ω assigned higher priority to A , . The procedure get answer simply adds the constraint N = to the abstract substitution { N / int } . Uponreturn, as it is not the last body atom, the following arc event is generated: A , ≡ arc ( rectoy ( N , M ) : h int , term i ⇒ [ { N / int } ] M = : { M / term } )Arc A , is handled exactly as A , and get answer simply adds the con-straint M = , returning { N / int , M / int } . As it is the last atom in the body(L23), procedure insert answer info computes Alub between ⊥ and theabove answer and overwrites E with: E ′ ≡ rectoy ( N , M ) : h int , term i 7→ h int , int i Therefore, the event U ≡ updated ( rectoy ( N , M ) : h int , term i ) is introduced inthe queue. Note that no dependency has been originated during the processingof this rule (as both body atoms are constraints).C. Now, Ω can choose between the processing of U or A , . Let us assume that A , has higher priority. For its processing, we have to assume that prede-fined functions “ − ”, “+” and “ is ” are dealt by the algorithm as standardconstraints by just using the following information provided by the system: E ≡ C is A + B : h int , int , term i 7→ h int , int , int i E ≡ C is A − B : h int , int , term i 7→ h int , int , int i where the three values in the abstract substitutions correspond to variables A , B , and C , in this order. In particular, after analyzing the subtraction withthe initial call pattern, we infer that N1 is of type int and no dependency isasserted. Next, the arc: A , ≡ arc ( rectoy ( N , M ) : h int , term i ⇒ [ { N / int , N1 / int } ] rectoy ( N1 , R ) : h int , term i )is introduced in the queue and the corresponding dependency is stored in DAT . The call to get answer returns the current answer E ′ . Then, we usethis answer as call pattern to process the last addition by creating a new arc A , . A , ≡ arc ( rectoy ( N , M ) : h int , term i ⇒ [ { N / int , N1 / int , R / int } ] M is N1 + R : { N1 / int , R / int } )Clearly, the processing of A , does not change the final answer E ′ . Hence,no more updates are introduced in the queue.D. Finally, we have to process the event U introduced in step B to which Ωhas assigned lowest priority. The procedure add dependent rules finds the13 : rectoy ( N , M ) A (cid:127) (cid:127) (cid:127)(cid:127)(cid:127)(cid:127)(cid:127)(cid:127)(cid:127)(cid:127)(cid:127)(cid:127)(cid:127)(cid:127)(cid:127)(cid:127)(cid:127)(cid:127) A (cid:31) (cid:31) ???????????????? rectoy ( N , M ) A , (cid:127) (cid:127) (cid:127)(cid:127)(cid:127)(cid:127)(cid:127)(cid:127)(cid:127)(cid:127)(cid:127)(cid:127)(cid:127)(cid:127)(cid:127)(cid:127)(cid:127)(cid:127)(cid:127) A , (cid:15) (cid:15) rectoy ( N , M ) A , (cid:127) (cid:127) (cid:127)(cid:127)(cid:127)(cid:127)(cid:127)(cid:127)(cid:127)(cid:127)(cid:127)(cid:127)(cid:127)(cid:127)(cid:127)(cid:127)(cid:127)(cid:127)(cid:127) A , (cid:15) (cid:15) A , (cid:31) (cid:31) ????????????????? N = M = N1 is N − rectoy ( N1 , R ) W W / / / / / / / / / / / / / / M is N1 − R , , , { N / int } { N / int , M / int } { N / int } { N / int , N1 / int } { N1 / int } { N1 / int , R / int } { M / int , R / int , N1 / int }
10 : { M / int , R / int , N1 / int }
11 : { N / int , M / int } Fig. 1. Analysis Graph for our Running Exampledependency corresponding to arc A , and inserts it in the queue. This re-launches an arc identical to A , . This in turn launches an arc identical to A , . However, the reprocessing does not change the fixpoint result E ′ andthe analysis terminates computing as answer table the entry E ′ and as uniquedependency arc A , .Figure 1 shows the analysis graph for the analysis above. The graph has two sortsof nodes. Those which correspond to atoms are called “OR-nodes.” An OR-nodeof the form CP A AP is interpreted as: the answer for the call pattern A : CP is AP .For instance, the OR-node { N1 / int } rectoy ( N1 , R ) { N1 / int , R / int } indicates that, when the atom rectoy ( N1 , R ) is called with the abstract substitution h int , term i , the answer computed is h int , int i . As mentioned before, variableswhose type is term will often not be shown in what follows. Those nodes whichcorrespond to rules are called “AND-nodes.” In Figure 1, they appear within adotted box and contain the head of the corresponding clause. Each AND-node hasas children as many OR-nodes as there are atoms in the body. If a child OR-nodeis already in the tree, it is not expanded any further and the currently availableanswer is used. For instance, the analysis graph in the figure at hand contains twooccurrences of the abstract atom rectoy ( N , M ) : h int , term i (modulo renaming),but only one of them (the root) has been expanded. This is depicted by a dashedarrow from the non-expanded occurrence to the expanded one.The answer table AT contains entries for the different OR-nodes which appearin the graph. In our example AT contains E ′ associated to the (root) OR-node14iscussed above. Dependencies in DAT indicate direct relations among OR-nodes.An OR-node A F : CP F depends on another OR-node A T : CP T iff the OR-node A T : CP T appears in the body of some clause for A F : CP F . For instance,the dependency A , indicates that the OR-node rectoy ( N1 , R ) : h int , term i isused in the OR-node rectoy ( N , M ) : h int , term i . Thus, if the answer pattern for rectoy ( N1 , R ) : h int , term i is ever updated, then we must reprocess the OR-node rectoy ( N , M ) : h int , term i . ✷ The following definition corresponds to the essential idea in the ACC framework–Equations (1) and (2)– of using a static analyzer to generate the certificates. Theanalyzer corresponds to Algorithm 1 and the certificate is the full answer table.
Definition 4.3 ( full certificate )We define function Certifier f : Prog × ADom × AAtom × APol × QHS ACert which takes P ∈ P rog , D α ∈ ADom , S α ⊆ AAtom , I α ∈ APol , Ω ∈ QHS andreturns as full certificate , FCert ∈ ACert , the answer table computed by
Analy - ze f ( S α , Ω) for P in D α iff FCert ⊑ I α .If the inclusion does not hold, we do not have a certificate. This can happen eitherbecause the program does not satisfy the policy or because the analyzer is notprecise enough. In the latter case, a solution is to try analyzing with a more precise(and generally more expensive) abstract domain. In the former case (the programdoes not satisfy the policy), this can be due to two possible reasons. A first one isthat we have formalized a policy which is unnecessarily restrictive, in which casethe solution is to weaken it. The other possible reason is that the program actuallyviolates the policy, either inadvertently or on purpose. In such a case there is ofcourse no way a certificate can be found for such program and policy. Example 4.4
Consider the safety policy expressed by the following specification I α : rectoy ( N , M ) : h int , term i7→h int , real i . The certifier in Definition 4.3 returns as valid certificatethe single entry E ′ . Clearly E ′ ⊑ I α since ⊥ ⊑ int ⊑ real ⊑ term . ✷ As already mentioned in Section 1, in the ACC framework, since this certificatecontains the fixpoint, a single pass over the analysis graph is sufficient to validatesuch certificate on the consumer side. The key observation in order to reduce thesize of certificates within the ACC framework is that certain entries in a certifi-cate may be irrelevant , in the sense that the checker is able to reproduce themby itself in a single pass. The notion of relevance is directly related to the ideaof recomputation in the program analysis graph. Intuitively, given an entry in theanswer table A : CP AP , its fixpoint may have been computed in several iter-ations from ⊥ , AP , AP , . . . until AP . For each change in the answer, an event15 pdated ( A : CP ) is generated during the analysis. The above entry is relevant ina certificate (under some strategy) when its updates launch the recomputation ofother arcs in the graph which depend on A : CP (i.e., there is a dependency fromit in the table). Thus, unless A : CP AP is included in the (reduced) certificate,a single-pass checker which uses the same strategy as the code producer will notbe able to validate the certificate. Section 5.1 identifies redundant updates whichshould not be considered. In Section 5.2, we characterize formally the notion ofreduced certificate containing only relevant answers. Then, in Section 5.3, we in-strument an analysis algorithm to identify relevant answers and define a certifierbased on the instrumented analyzer which generates reduced certificates. According to the above intuition, we are interested in determining when an entryin the answer table has been “updated” during the analysis and such changes affectother entries. There is a special kind of updated events which can be directly consid-ered irrelevant and correspond to those updates which launch a redundant computa-tion (like the U event generated in step B of Example 4.2). We write DAT | A : CP todenote the set of arcs of the form H : CP ⇒ [ CP ] B : CP ∈ Dep in the currentdependency arc table which depend on A : CP , i.e., such that A : CP = ( B : CP ) σ for some renaming σ . Definition 5.1 ( redundant update )Let P ∈ Prog , S α ⊆ AAtom and Ω ∈ QHS . We say that an event updated ( A : CP )which appears in the prioritized event queue during the analysis of P for S α is redundant w.r.t. Ω if, when it is generated, DAT | A : CP = ∅ .It should be noted that redundant updates can only be generated by updatedevents for call patterns which belong to S α , i.e., to the initial set of call patterns.Otherwise, DAT | A : CP cannot be empty. Let us explain the intuition of this. Thereason is that whenever an event updated ( A : CP ), A : CP S α , is generated isbecause a rule for A has been completely analyzed. Hence, a corresponding call to insert answer info for A : CP (L24 in Algorithm 1) has been done. If such a rulehas been completely analyzed then all its arcs were introduced in the prioritizedevent queue. Observe that the first time that an arc is introduced in the queueis because a call to procedure new call pattern for A : CP occurred, i.e., a newcall ( A : CP ) event was analyzed. Consider the first event newcall for A : CP .If A : CP S α , then this event originates from the analysis of some other arc ofthe form H : CP ⇒ [ CP ] A : CP for which A : CP has no entry in the answertable. Thus, the dependency H : CP ⇒ [ CP ] A : CP was added to DAT . Sincedependencies are never removed from
DAT , then any later updated event for A : CP occurs under the condition DAT | A : CP = ∅ . Even if it is possible to fix the strategyand define an analysis algorithm which does not introduce redundant updates, weprefer to follow as much as possible the generic one. Example 5.2
16n our running example U is redundant for Ω at the moment it is generated.However, since the event has been given low priority its processing is delayed untilthe end and, in the meantime, a dependency from it has been added. This causesthe unnecessary redundant recomputation of the second arc A , for rectoy . ✷ Note that redundant updates are indeed events which if processed immediatelycorrespond to “nops”.The following proposition ensures the correctness of using a queue handling strat-egy which assigns the highest priority to redundant updates. This result can befound in (Hermenegildo et al. 2000), where it is stated that
Analyze f is correctindependently of the order in which events in the prioritized event queue are pro-cessed.
Proposition 5.3
Let Ω ∈ QHS . Let Ω ′ ∈ QHS be a strategy which assigns the highest priority to anyupdated event which is redundant. Then, ∀ P ∈ Prog , D α ∈ ADom , S α ⊆ AAtom , Analyze f ( S α , Ω)=
Analyze f ( S α , Ω ′ ). As mentioned above, the notion of reduced certificate is directly related to the ideaof recomputation in the program analysis graph. Now, we are interested in findingthose entries A : CP ∈ Entry in the answer table, whose analysis has launched thereprocessing of some arcs and hence recomputation has occurred. Certainly, thereprocessing of an arc may only be caused by a non-redundant updated event for A : CP , which inserted (via add dependent rules ) all arcs in DAT | A : CP intothe prioritized event queue. However some updated events are not dangerous. Forinstance, if the processing of an arc H : CP ⇒ [ CP ] A : CP has been stoppedbecause of the lack of answer for A : CP (L20 and L23 in Algorithm 1), this arcmust be considered as “suspended”, since its continuation has not been introducedin the queue. In particular, we do not take into account updated events for A : CP which are generated when DAT | A : CP only contains suspended arcs. Note that thiscase still corresponds to the first traversal of any arc and should not be consideredas a reprocessing. The following definition introduces the notion of suspended arc ,i.e., of an arc suspended during analysis. Definition 5.4 ( suspended arc )Let P ∈ Prog , S α ⊆ AAtom and Ω ∈ QHS . We say that an arc H : CP ⇒ [ CP ] B : CP in the dependency arc table is suspended w.r.t. Ω during the analysisof P for S α iff when it is generated, the answer table does not contain any entryfor B : CP or contains an entry of the form B : CP
7→ ⊥ .For the rest of the updated events, their relevance depends strongly on the strat-egy used to handle the prioritized event queue. For instance, assume that the pri-oritized event queue contains an event arc ( H : CP ⇒ [ CP ] A : CP ), coming froma suspended arc in DAT . If all updated events for A : CP are processed before thisarc (i.e., the fixpoint of A : CP is available before processing the arc), then these17pdated events do not launch any recomputation. Let us define now the notion ofrecomputation. Definition 5.5 ( multi-traversed arc )Let P ∈ Prog , S α ⊆ AAtom and Ω ∈ QHS . We say that an arc H : CP ⇒ [ CP ] A : CP in the dependency arc table has been multi-traversed w.r.t. Ω after the analysisof P for S α iff it has been introduced in the dependency arc table at least twice asa non suspended arc w.r.t. Ω. Example 5.6
Assume that we use a strategy Ω ′′ ∈ QHS such that step C in Example 4.2 is per-formed before B (i.e., the second rule is analyzed before the first one). Then, whenthe answer for rectoy ( N1 , R ): h int , term i is looked up, procedure get answer re-turns ⊥ and thus the processing of arc A , is suspended at this point in the sensethat its continuation A , is not inserted in the queue (see L20 in Algorithm 1).Indeed, we can proceed with the remaining arc A , which is processed exactly asin step B. In this case, the updated event U is not redundant for Ω ′′ , as thereis a suspended dependency introduced by the former processing of arc A , in thetable. Therefore, the processing of U introduces the suspended arc A , again inthe queue, and again A , is introduced in the dependency arc table, but now asnot suspended. The important point is that the fact that U inserts A , must notbe considered as a reprocessing, since A , had been suspended and its continuation( A , in this case) had not been handled by the algorithm yet. Hence, finally A , has not been multi-traversed. ✷ We define now the notion of relevant entry , which will be crucial for definingreduced certificates. The key observation is that those answer patterns whose com-putation has generated multi-traversed arcs should be available in the certificate.
Definition 5.7 ( relevant entry )Let P ∈ Prog , S α ⊆ AAtom and Ω ∈ QHS . We say that the entry A : CP AP in the answer table is relevant w.r.t. Ω after the analysis of P for S α iff there existsa multi-traversed arc ⇒ [ ] A : CP w.r.t. Ω in the dependency arc table.The notion of reduced certificate allows us to remove irrelevant entries from theanswer table and produce a smaller certificate which can still be validated in onepass. Definition 5.8 ( reduced certificate )Let P ∈ Prog , S α ⊆ AAtom and Ω ∈ QHS . Let
FCert = Analyze f ( S α , Ω) for P and S α . We define the reduced certificate , RCert , as the set of relevant entries in
FCert w.r.t. Ω.
Example 5.9
From now on, in our running example, we assume the strategy Ω ′ ∈ QHS whichassigns the highest priority to redundant updates (see Proposition 5.3). For thisstrategy, the entry E ′ ≡ rectoy ( N , M ) : h int , term i 7→ h int , int i in Example 4.2 is18ot relevant since no arc has been multi-traversed. Therefore, the reduced certifi-cate for our running example is empty. In the following section, we show that ourchecker is able to reconstruct the fixpoint in a single pass from the empty certifi-cate. It should be noted that, using Ω as in Example 4.2, the answer is obtainedby performing two analysis iterations over the arc associated to the second rule of rectoy ( N , M ) (steps C and D) due to the fact that U has been delayed and becomesrelevant for Ω. Thus, this arc has been multi-traversed. ✷ Consider now the Java version of the procedure rectoy , borrowed from (Rose 1998): int rectoy (int n) { int m; int r;m=0;if (n > { n= n-1;r = this.rectoy(n);m = n + 4; } ; return m; // Program point 30 } For this program, lightweight bytecode verification (LBV) (Rose 1998) sends, to-gether with the program, the reduced non-empty certificate cert = ( { ( ǫ, rectoy · int · int · ⊥ ) } , ǫ ), which states that at program point 30 the stack does not containinformation (first occurrence of ǫ ), and variables n , m and r have type int , int and ⊥ . The need for sending this information is because rectoy , implemented in Java,contains an if -branch (equivalent to the branching for selecting one of our twoclauses for rectoy ). In LBV, cert has to inform the checker that it is possible forvariable r at point 30 to be undefined, if the if condition does not hold. However,in our method this is not necessary because the checker is able to reproduce thisinformation itself. Therefore, the above example shows that our approach improveson state-of-the-art PCC techniques by reducing the certificate even further whilestill keeping the checking process one-pass. In this section, we instrument the analyzer of Algorithm 1 with the extensionsnecessary for producing reduced certificates, as defined in Definition 5.8. Togetherwith the answer table returned by Algorithm 1, this new algorithm returns alsothe set
RED (initially empty) of call patterns which will form finally the reducedcertificate
RCert . The resulting analyzer
Analyze r is presented in Algorithm 2.Except for procedure process arc and insert answer info , it uses the sameprocedures as Algorithm 1, adapting them to the new syntax of arcs. Now, arcs willbe annotated with an integer value u which counts the number of times that the The second occurrence of ǫ indicates that there are no backward jumps. add event ( arc ( A k (0) : CP ⇒ [ CP ] B k, : CP ),Ω)Let us see the differences between Algorithm 2 and Algorithm 1:1. We detect all multi-traversed arcs.
When a call to process arc is generated,this procedure checks if the arc is suspended (L13) before introducing thecorresponding arc in the dependency arc table. If the arc is suspended, thenits u value is not modified, since, as explained before, it cannot be consideredas a reprocessing. Otherwise, the u -value is incremented by one. Furthermore,if B k,i is not a constraint and u is greater than 1, then B k,i : CP is addedto the RED set, since this means that the arc has been multi-traversed. Notethat the
RED set will contain in the end those call patterns whose analysislaunches the recomputation of some arc.Another important issue is how to handle the continuation of the arc whichis being currently processed. If the arc is suspended, then no continuationis introduced in the queue (checked by L4 and L9). Otherwise (L4), beforeintroducing the continuation in the queue, we check if the dependency arctable already contains such a continuation (L6). In that case, we add thearc with the same u annotation than that in the queue (L7). Otherwise, weintroduce the continuation as an arc initialized with 0 (L8).2. We ignore redundant updates.
Only non-redundant updates are processed byprocedure insert answer info (L23). Each time an updated event is gener-ated, we check if
DAT | H : CP is different from ∅ (L23). Only then, an updatedevent for H : CP is generated (L24). Example 5.10
Consider the four steps performed in the analysis of our running example. StepA is identical. In step B the insert answer info procedure detects a redundantupdated event (L23). No updated event is generated. Step C remains identical andthe arc A , (the only one able to contribute to the RED set) is annotated with 1,and step D does not occur. As expected, upon return, the
RED set remains empty. ✷ This section shows the correctness of the certification process carried out to gen-erate reduced certificates, based on the correctness of the certification with fullcertificates of (Albert et al. 2008). First, note that, except for the control of rel-evant entries,
Analyze f ( S α , Ω) and
Analyze r ( S α , Ω) have the same behaviorand thus compute the same answer table.20 lgorithm 2
Analyze r : Analyzer instrumented for Certificate Reduction procedure process arc ( H k ( u ) : CP ⇒ [ CP ] B k,i : CP ∈ Dep ,Ω ∈ QHS ) W := vars ( H k , B k, , . . . , B k,n k ); CP := get answer ( B k,i : CP , CP , W, Ω); if CP = ⊥ and i = n k then CP := Arestrict ( CP , vars ( B k,i +1 )); if there exists the arc H k ( w ) : ⇒ : B k,i +1 inthe dependency arc table then add event ( arc ( H k ( w ) : CP ⇒ [ CP ] B k,i +1 : CP ),Ω); else add event ( arc ( H k (0) : CP ⇒ [ CP ] B k,i +1 : CP ),Ω); else if CP = ⊥ and i = n k then AP := Arestrict ( CP , vars ( H k )); insert answer info ( H : CP AP , Ω); if B k,i is not a constraint then if CP = ⊥ then add H k ( u ): CP ⇒ [ CP ] B k,i : CP ) to dependency arc table; else % non-suspended arc add H k ( u + 1): CP ⇒ [ CP ] B k,i : CP to dependency arc table; if u +1 > then add B k,i : CP to RED ; procedure insert answer info ( H : CP AP ∈ Entry , Ω ∈ QHS ) AP := lookup answer ( H : CP , Ω); AP := Alub ( AP , AP ); if AP = AP then %updated required add answer table ( H : CP AP ); if DAT | H : CP = ∅ then % non-redundant updated add event ( updated ( H : CP )); Proposition 5.11
Let P ∈ Prog , D α ∈ ADom , S α ⊆ AAtom , Ω , Ω ′ ∈ QHS . Let AT be the answertable computed by Analyze r ( S α , Ω ′ ). Then, Analyze f ( S α , Ω) = AT . Proof
First note that except for the u -annotations, the procedures process arc in Al-gorithms 1 and 2 are similar. In fact, there is a one-to-one correspondence betweenthe definition of both procedures. Concretely, we have the following mapping: Analyze f Analyze r
L16-L17 L12-17L18 L2L19 L3L20-L22 L4-L8L23-L24 L9-L11The only difference between Algorithms 1 and 2 relies on insert answer info . Forthe case of Algorithm 2, redundant updates are never introduced in the prioritizedevent queue (L23). Then, let us choose a new strategy Ω ′′ , identical to Ω ′ exceptwhen dealing with redundant updates. For redundant updates, let us assume that21 ′′ processes them inmediately after being introduced in the event queue. Such pro-cessing does not generate any effect since the dependency arc table does not containarcs to be launched for these updates. Hence it holds that Analyze r ( S α , Ω ′ ) gen-erates the same answer table AT than Analyze f ( S α , Ω ′′ ). From Proposition 5.3it holds that Analyze f ( S α , Ω)=
Analyze f ( S α , Ω ′′ ) and the claim follows.The following definition presents the certifier for reduced certificates. Definition 5.12
We define the function
Certifier r : Prog × ADom × AAtom × APol × QHS ACert ,which takes P ∈ Prog , D α ∈ ADom , S α ⊆ AAtom , I α ∈ APol , Ω ∈ QHS . Itreturns as certificate,
RCert = { A : CP AP ∈ FCert | A : CP ∈ RED } , where h FCert , RED i = Analyze r ( S α , Ω), iff
FCert ⊑ I α .Finally, we can establish the correctness of Certifier r which amounts to say that
RCert contains all relevant entries in
FCert . Theorem 5.13
Let P ∈ P rog , D α ∈ ADom , S α ⊆ AAtom , I α ∈ APol and Ω ∈ QHS . Let
FCert = Analyze f ( S α , Ω) and
RCert = Certifier r ( P, D α , S α , I α , Ω). Then, anentry A : CP AP ∈ FCert is relevant w.r.t. Ω iff A : CP AP ∈ RCert . Proof
According to Definition 5.12,
RCert = { A : CP AP ∈ FCert | A : CP ∈ RED } ,where h FCert , RED i = Analyze r ( S α , Ω). Hence, it is enough to prove that an entry A : CP AP ∈ FCert is relevant w.r.t. Ω iff A : CP ∈ RED .( ⇐ ) Assume that A : CP ∈ RED . Then, from L16 and L17 it holds that there existsan arc H ( u ) : CP ′ ⇒ [ ] A : CP in the dependency arc table such that u >
1. Butthe u -value of an arc can only be increased in procedure process arc (L16) afterchecking that CP is different from ⊥ (L15). But CP is computed by means of get answer (L3) which calls lookup answer (L27). This last function onlyreturns a value different from ⊥ if A : CP as an entry in the answer table (L30and L31). Since u > u has been incremented at least twice and as arguedbefore, in both cases the answer table contained an entry for A : CP , i.e., byDefinition 5.5, the arc H : CP ′ ⇒ [ ] A : CP is multi-traversed w.r.t Ω. Hence, byDefinition 5.7, A : CP AP is a relevant entry.( ⇒ ) Assume now that the entry A : CP AP is relevant w.r.t Ω. Then, byDefinition 5.7, there exists an arc H : CP ′ ⇒ [ ] A : CP in the dependencyarc table which has been multi-traversed. By Definition 5.5, this arc has beenintroduced in the dependency arc table at least twice as non-suspended arc. Butarcs are introduced in DAT via procedure process arc and each time the arcis non suspended (L15) its u -value is increased by 1 (L16). Hence the u value for H : CP ′ ⇒ [ ] A : CP is at least 2. Now, L17 ensures that A : CP ∈ RED .22
Checking Reduced Certificates
In the ACC framework for full certificates (Albert et al. 2005) a concrete checkingalgorithm is used with a specific graph traversal strategy which we will refer to asΩ C . This checker has been shown to be very efficient (i.e., this particular Ω C isa good choice) but here we would like to consider a more generic design for thechecker in which it is parametric on Ω C in addition to being parametric on theabstract domain. This lack of parametricity on Ω C was not an issue in the originalformulation of ACC in (Albert et al. 2005) since there full certificates were used.Note that even if the certifier uses a strategy Ω A which is different from Ω C , allvalid full certificates are guaranteed to be validated in one pass by that specificchecker, independently of Ω C . This result allowed using a particular strategy in thechecker without loss of generality. However, the same result does not hold any morein the case of reduced certificates. In particular, completeness of checking is notguaranteed if Ω A = Ω C . This occurs because, though the answer table is identicalfor all strategies, the subset of redundant entries depends on the particular strategyused. The problem is that, if there is an entry A : CP AP in FCert such that itis relevant w.r.t. Ω C but it is not w.r.t. Ω A , then a single-pass checker will fail tovalidate the RCert generated using Ω A . In this section, we design a generic checkerwhich is not tied to a particular graph traversal strategy. In practice, upon agreeingon the appropriate parameters, the consumer uses the particular instance of thegeneric checker resulting from the application of such parameters. In a particularapplication of our framework, we expect that the graph traversal strategy is agreeda priori between consumer and producer. Alternatively, if necessary (e.g., when theconsumer does not implement this strategy), the strategy can be sent along withthe certificate in the transmitted package.It should be noted that the design of generic checkers is also relevant in light ofcurrent trends in verified analyzers (e.g., (Klein and Nipkow 2003; Cachera et al. 2004)),which could be transferred directly to the checking end. In particular, since the de-sign of the checking process is generic, it becomes feasible in ACC to use automaticprogram transformation techniques (Jones et al. 1993) to specialize a certified (spe-cific) analysis algorithm in order to obtain a certified checker with the same strategywhile preserving correctness and completeness. The following definition presents a generic checker for validating reduced certifi-cates. In addition to the genericity issue discussed above, an important differencewith the checker for full certificates (Albert et al. 2005) is that there are certain en-tries which are not available in the certificate and that we want to reconstruct andoutput in checking. The reason for this is that the safety policy has to be tested w.r.t.the full answer table –Equation (2). Therefore, the checker must reconstruct, from Note that both the analysis and checking algorithms are always parametric on the abstractdomain. This genericity allows proving a wide variety of properties by using the large set ofavailable abstract domains, this being one of the fundamental advantages of ACC. Cert , the answer table returned by
Analyze f , FCert , in order to test for adher-ence to the safety policy –Equation (4). Note that reconstructing the answer tabledoes not add any additional cost compared to the checker in (Albert et al. 2005),since the full answer table also has to be created in (Albert et al. 2005).
Algorithm 3
Generic Checker for Reduced Certificates
Checking r procedure insert answer info ( H : CP AP ∈ Entry , Ω ∈ QHS ) AP := lookup answer ( H : CP , Ω); AP := Alub ( AP , AP ); ( IsIn , AP ′ )= look fixpoint ( H : CP , RCert ); if IsIn and
Alub ( AP , AP ′ ) = AP ′ then return error ; % error of type a)6: if AP = AP then % updated required7: if IsIn and AP = ⊥ then AP = AP ′ add answer table ( H : CP AP ); if DAT | H : CP = ∅ then add event ( updated ( H : CP ) , Ω); function look fixpoint ( A : CP ∈ AAtom , RCert ∈ ACert ) if ∃ a renaming σ such that σ ( A : CP AP ) ∈ RCert then return ( true , σ − ( AP )); else return ( false , ⊥ ); Definition 6.1 ( checker for reduced certificates )Function Checking r is defined as function
Analyze r with the following modi-fications:1. It receives
RCert as an additional input parameter.2. It does not use the set
RED and it replaces L17 of Algorithm 2 with:17: If u +1 > return error
3. If it fails to produce an answer table, then it issues an error .4. Function insert answer info is replaced by the new one in Algorithm 3.Function
Checker r takes P ∈ P rog , D α ∈ ADom , S α ⊆ AAtom , I α ∈ APol ,Ω ∈ QHS , RCert ∈ ACert and returns:1. error if Checking r ( S α , Ω , RCert ) for P in D α returns error .2. Otherwise it returns FCert = Checking r ( S α , Ω , RCert ) for P and D α iff FCert ⊑ I α .Let us briefly explain the differences between Algorithms 2 and 3. First, the checkerhas to detect (and issue) two sources of errors:a) The answer in the certificate and the one obtained by the checker differ(L5). This is the traditional error in ACC and means that the certificateand program at hand do not correspond to each other. The call to function look fixpoint ( H : CP , RCert ) in L4 returns a tuple (
IsIn , AP ′ ) such that:if H : CP is in RCert , then
IsIn is equal to true and AP ′ returns the fixpointstored in RCert . Otherwise,
IsIn is equal to false and AP ′ is ⊥ .24) Recomputation is required. This should not occur during checking, i.e., noarcs must be multi-traversed by the checker (L17). This second type of errorcorresponds to situations in which some non-redundant update is needed inorder to obtain an answer (it cannot be obtained in one pass). This is detectedin L17 prior to check that the arc is not suspended (L9) and it has beentraversed before, i.e., its u value is greater than 1. Note that we flag this as anerror because the checker will have to iterate and the description we provideddoes not include support for it. In general, however, it is also possible to usea checker that is capable of iterating. In that case of course the certificatestransmitted can be even smaller than the reduced ones, at the cost of increasedchecking time (as well as some additional complexity in the checking code).This allows supporting different tradeoffs between certificate size, checkingtime, and checker code complexity.The second difference is that the A : CP AP ′ entries stored in RCert have to beadded to the answer table after finding the first partial answer for A : CP (differentfrom ⊥ ), in order to detect errors of type a) above. In particular, L7 and L8 addthe fixpoint AP ′ stored in RCert to the answer table.
Example 6.2
All steps given for the analysis of Example 5.10 are identical in
Checker r exceptfor the detection of possible errors. Errors of type a) are not possible since
RCert isempty. An error of type b) can only be generated because of the u value of arc A , . However note that in step C, this arc is introduced in the queue with u = 0.After processing the arc, the arc goes to the dependency arc table with u = 1. Butsince no updated events are generated, this arc is no longer processed. Hence, theprogram is validated in a single pass over the graph. ✷ In this section we prove the correctness of the checking process, which amounts tosaying that if
Checker r does not issue an error when validating a certificate, thenthe reconstructed answer table is a fixpoint verifying the given input safety policy.As a previous step, we prove the following proposition in which we also ensure thatthe validation of the certificate is done in one pass.
Proposition 6.3
Let P ∈ Prog , D α ∈ ADom , S α ⊆ AAtom , I α ∈ APol and Ω ∈ QHS . Let
FCert = Certifier f ( P, D α , S α , I α , Ω),
RCert = Certifier r ( P, D α , S α , I α , Ω). Then
Check-ing r ( S α , Ω , RCert ) does not issue an error and it returns
FCert . Furthermore, thevalidation of
FCert does not generate multi-traversed arcs.
Proof
Let us consider first the call:( ∗ ) Checking r ( S α , Ω , RCert )25or this call, let us prove that (1) it does not issue an error and; (2) it returns
FCert as result. (1)
Checking r ( S α , Ω , RCert ) does not issue an error.
Errors of type (a) (L5 of Algorithm 3) are not possible since, from Definition 5.12,
RCert ⊆ FCert , where
FCert is the answer table computed by
Analyze f ( S α , Ω).The correctness of Algorithm
Analyze f ( S α , Ω) (see (Hermenegildo et al. 2000))avoids this kind of errors.Errors of type (b) can only occur in L17 of procedure process arc (Algorithm3), for some arc H : CP ⇒ [ CP ] B k,i : CP . Since we follow the same strategyΩ in Checking r and
Analyze r , then
Analyze r ( S α , Ω) introduces B k,i : CP in RED (L17 of Algorithm 2), and thus, Definition 5.12 ensures that B k,i : CP AP ∈ RCert . But this is a contradiction since for all entries in
RCert , the firsttime that the arc is processed without answer in AT for B k,i : CP , Algorithm 3(L7 and L8) introduces B k,i : CP AP in AT together with the correspondingevent updated ( B k , i : CP ). So when Ω selects this event, the new event arc ( H : CP ⇒ [ CP ] B k,i : CP ) is again introduced in the prioritized event queue. Whenthis arc is selected by Ω, the arc goes again to DAT . But since B k,i : CP AP ∈ AT , no more events of the form updated ( B k , i : CP ) may occur (L6 of insert answer info ( B k,i : CP ) in Algorithm 3 never holds). Hence, no morecalls to process arc for arc ( H : CP ⇒ [ CP ] B k,i : CP ) occur. Then the u -valuefor this arc will be at most 1 and no error will be generated. (2) The call ( ∗ ) returns FCert .The only differences between the call ( ∗ ) and the call Analyze r ( S α , Ω) rely on pro-cedure insert answer info and L17 of procedure process arc . Since (1) ensuresthat no error is issued by ( ∗ ), then L5 and L17 of Algorithm 3 are never executed.Then, it is trivial that (1) computes an answer table AT as result. Furthermore,since ( ∗ ) and Analyze r ( S α , Ω) use the same strategy, the only difference is in theprioritized event queue since for ( ∗ ) no relevant updates will appear in the queue.Instead of this, the real fixpoints in RCert ⊆ FCert are introduced in AT in L7and L8 of insert answer info . Except for this fact, Algorithms 2 and 3 behaveidentically and thus ( ∗ ) computes FCert as result.Finally, proving that the validation of
RCert does not generate multi-traversedarcs is trivial since, by definition, multi-traversed arcs correspond to arcs in
DAT with the u -value greater than 1. Since the call ( ∗ ) does not issue an error, L17 ofAlgorithm 3 is never executed, i.e., no arc is multi-traversed. Corollary 6.4
Let P ∈ Prog , D α ∈ ADom , S α ⊆ AAtom , I α ∈ APol and Ω ∈ QHS . Let
FCert = Certifier f ( P, D α , S α , I α , Ω), and
RCert Ω = Certifier r ( P, D α , S α , I α , Ω).26f
Checking r ( S α , Ω , RCert ), RCert ∈ ACert , does not issue an error , then it re-turns
FCert and
RCert Ω ⊆ RCert . Furthermore, the validation of
FCert does notgenerate multi-traversed arcs.
Proof
Let us prove, by contradiction, that
RCert Ω ⊆ RCert . If we assume that
RCert Ω RCert , then there exists an entry A : CP AP ∈ RCert Ω such that A : CP AP RCert . By definition of
RCert Ω , A : CP ∈ RED . Hence, L16 of Algorithm 2ensures that there exists an arc H : CP ( u ) ⇒ [ CP ] A : CP in DAT with u >
Checking r ( S α , Ω , RCert Ω ) wouldissue an error, what is a contradiction by Proposition 6.5.Now observe that from Proposition 6.5 it holds that Checking r ( S α , Ω , RCert Ω )returns FCert and the validation of
RCert Ω does not generate multi-traversed arcs.But since RCert Ω ⊆ RCert , then it trivially holds that
Checking r ( S α , Ω , RCert )also returns
FCert exactly in the same way that
Checking r ( S α , Ω , RCert Ω ) does,i.e., without generating multi-traversed arcs. Theorem 6.5 ( correctness )Let P ∈ Prog , D α ∈ ADom , S α ⊆ AAtom , I α ∈ APol , Ω ∈ QHS and
RCert ∈ ACert . Then, if
Checker r ( P, D α , S α , I α , Ω , RCert ) does not issue an error andreturns a certificate
FCert ∈ ACert , then • FCert is a fixpoint of P . • FCert ⊑ I α ; Proof If Checker r ( P, D α , S α , I α , Ω , RCert ) does not issue an error then, from Defini-tion 6.1, it holds that
FCert = Checking r ( S α , Ω , RCert ) does not issue an error and
FCert ⊑ I α . From Corollary 6.4. it follows that FCert = Certifier f ( P, D α , S α , I α , Ω).Hence, as Definition 4.3 establishes,
FCert is the answer table computed by
Ana - lyze f ( S α , Ω ′ ). Finally, by the results in (Hermenegildo et al. 2000), FCert is afixpoint for P . The following theorem (completeness) provides sufficient conditions under which achecker is guaranteed to validate reduced certificates which are actually valid. Inother words, if a certificate is valid and such conditions hold, then the checker isguaranteed to validate the certificate. Note that it is not always the case when thestrategy used to generate it and the one used to check it are different.27 heorem 6.6 ( completeness )Let P ∈ Prog , D α ∈ ADom , S α ⊆ AAtom , I α ∈ APol and Ω A ∈ QHS . Let
FCert = Certifier f ( P, D α , S α , I α , Ω A ) and RCert Ω A = Certifier r ( P, D α , S α , I α , Ω A ). LetΩ C ∈ QHS be such that
RCert Ω C = Certifier r ( P, D α , S α , I α , Ω C ) and RCert Ω A ⊇ RCert Ω C . Then, Checker r ( P, D α , S α , Ω C , RCert Ω A ) returns FCert and does notissue an error . Proof
We prove it by contradiction. The only cases in which
Checker r ( P, D α , S α , Ω C , RCert Ω A ) issues an error are the following: • The partial answer AP computed for some calling pattern A : CP (provided in RCert Ω A ) leads to Alub ( AP, AP ′ ) = AP ′ (L5), where AP ′ is the answer for A : CP ,i.e., A : CP AP ′ ∈ RCert Ω A . But, RCert Ω A ⊆ FCert , i.e.,
FCert would containan incorrect answer for A : CP , which is a contradiction with the assumption that FCert is a valid certificate for P . • There exists some arc H : CP ⇒ [ CP ] B : CP which has been traversed morethan once, i.e., its u -value is greater than 1 (L17 in Algorithm 3). Since RCert Ω C ⊆ RCert Ω A , i.e., RCert Ω C contains possibly less entries than RCert Ω A , then the call( ∗ ) in Theorem 6.5 fails also because of such a multi-traversed arc. But this is acontradiction with ( ) in Theorem 6.5.Consequently, Checker r ( P, D, S, Ω C , RCert Ω A ) returns an answer table AT . Fi-nally, by Theorem 6.5, we know that since no error is issued, then Checker r re-turns
FCert .Obviously, if Ω C = Ω A then the checker is guaranteed to be complete. Addition-ally, a checker using a different strategy Ω C is also guaranteed to be complete aslong as the certificate reduced w.r.t Ω C is equal to or smaller than the certificatereduced w.r.t Ω A . Furthermore, if the certificate used is full, the checker is completefor any strategy. Note that if RCert Ω A RCert Ω C , Checker r with the strategyΩ C may fail to validate RCert Ω A , which is indeed valid for the program under Ω A . Example 6.7
Consider the program of Example 3.1, the same abstract domain D α than in ourrunning example, and the call pattern S α = { q ( X ): h term i} : The full certificate com-puted by Certifier f is FCert = { q ( X ): h term i 7→ h real i , p ( X ): h term i 7→ h real i} .Let us consider two different queue handling strategies Ω A = Ω C . Under both strate-gies, we start the analysis introducing q ( X ): h term i 7→ ⊥ in the answer table andprocessing the single rule for q . The arc q ( X )( ): h term i⇒ [ { X / term } ] p ( X ): h term i is introduced in the queue and processed afterward. As a result, q ( X )( ): h term i⇒ [ { X / term } ] p ( X ): h term i goes to DAT and event newcall ( p(X): h term i ) is gener-ated. The processing of this last event adds p ( X ): h term i 7→ ⊥ to the answer ta-ble. Now, using Ω A , the analyzer processes both rules for p ( X ) in textual order.None of the arcs introduced in DAT can issue an error. After traversing the firstrule, answer p ( X ): h term i 7→ h real i is inferred and non-redundant updated event updated ( p ( X ): h term i ) is generated. The analysis of the second rule produces as28nswer h int i and does not update the entry since Alub ( { X / real } , { X / int } ) re-turns { X / real } . We process the non-redundant update for p by calling function add dependent rules . The arc for q stored in the dependency arc table with 0is launched. When processing this arc, again the arc is introduced in DAT with u = 1, and the answer q ( X ): h term i 7→ h real i replaces the old one in the answertable. Since RED is empty, then
RCert Ω A is empty.Assume now that Ω C assigns a higher priority to the second rule of p . In this case,the answer for p ( X ): h term i changes from ⊥ to { X / int } , producing a non-redundantupdate. Suppose now that the updated event is processed, which launches the arcfor q stored in DAT . If we process such an arc, then it will be introduced again in
DAT , but now with u = 1. Answer q ( X ): h term i 7→ { X / int } is inserted in the answertable. When the first arc for p is processed, the computed answer is { X / real } . Now,a new non-redundant updated event is needed. The processing of this update eventlaunches again the arc for q stored in DAT , whose analysis introduces it in
DAT with u = 2.Hence RCert Ω A is empty but RCert Ω C contains the single entry p ( X ): h term i 7→h real i . Thus, Checker r ( P, D α , S α , Ω C , RCert Ω A ) will issue an error (L17) whentrying to validate the program if provided with the empty certificate RCert Ω A . Onthe contrary, by Theorem 6.6, Checker r ( P, D α , S α , RCert Ω C , Ω A ) returns FCert and does not issue an error . This justifies the results intuitively shown in Section 3. ✷ As we have illustrated throughout the paper, the reduction in the size of the cer-tificates is directly related to the number of updates (or iterations) performed dur-ing analysis. Clearly, depending on the “quality” of the graph traversal strategyused, different instances of the generic analyzer will generate reduced certificatesof different sizes. Significant and successful efforts have been made during recentyears towards improving the efficiency of analysis. The most optimized analyzersactually aim at reducing the number of updates necessary to reach the final fix-point (Puebla and Hermenegildo 1996). Interestingly, our framework greatly bene-fits from all these advances, since the more efficient analysis, the smaller the cor-responding reduced certificates. We have implemented a generator and a checkerof reduced certificates as an extension of the efficient, highly optimized, state-of-the-art analysis system available in
CiaoPP . Both the analysis and checker use theoptimized depth-first new-calling QHS of (Puebla and Hermenegildo 1996).In our experiments we study two crucial points for the practicality of our pro-posal: the size of reduced vs. full certificates (Table 7.1) and the relative effi-ciency of checking reduced vs. full certificates (Table 7.2). As mentioned before,the algorithms are parametric w.r.t. the abstract domain. In all our experimentswe use the same implementation of the domain-dependent functions of the shar-ing+freeness (Muthukumar and Hermenegildo 1991) abstract domain. We have se-lected this domain because it is highly optimized and also because the informationit infers is very useful for reasoning about instantiation errors, which is a cru-29 rogram Source ByteC BC/S FCert RCert F/R R/Saiakl 1555 3817 2.455 3090 1616 1.912 1.039bid 4945 10376 2.098 5939 883 6.726 0.179browse 2589 8492 3.280 1661 941 1.765 0.363deriv 957 4221 4.411 288 288 1.000 0.301grammar 1598 3182 1.991 1259 40 31.475 0.025hanoiapp 1172 2264 1.932 2325 880 2.642 0.751occur 1367 6919 5.061 1098 666 1.649 0.487progeom 1619 3570 2.205 2148 40 53.700 0.025qsortapp 664 1176 1.771 2355 650 3.623 0.979query 2090 8818 4.219 531 40 13.275 0.019rdtok 13704 15423 1.125 6533 2659 2.457 0.194rectoy 154 140 0.909 167 40 4.175 0.260serialize 987 3801 3.851 1779 1129 1.576 1.144zebra 2284 5396 2.363 4058 40 101.450 0.018Overall 2.17 3.35 0.28
Table 1. Size of Reduced and Full Certificatescial aspect for the safety of logic programs. Furthermore, as mentioned previously,sharing domains have also been shown to be useful for checking properties of im-perative programs, including for example information flow characteristics of Javabytecode (Secci and Spoto 2005; Genaim and Spoto 2005). On the other hand, wehave used ⊤ as call patterns in order to get all possible modes of use of predicatecalls.The whole system is written in Ciao (Bueno et al. 2009) and the experimentshave been run using version 1.13r5499 with compilation to bytecode on a Pentium4 (Xeon) at 2 Ghz and with 4 Gb of RAM, running GNU Linux Fedora Core-22.6.9.A relatively wide range of programs has been used as benchmarks. They are thesame ones used in (Hermenegildo et al. 2000; Albert et al. 2005), where they aredescribed in more detail.
Table 7.1 shows our experimental results regarding certificate size reduction, codedin compact ( fastread ) format, for the different benchmarks. It compares the sizeof each reduced certificate to that of the full certificate and to the correspondingsource code for the same program.The column
Source shows the size of the source code and
ByteC its corre-sponding bytecode. To make this comparison fair, in column
BC/S we subtract4180 bytes from the size of the bytecode for each program: the size of the byte-code for an empty program in this version of
Ciao (minimal top-level drivers and30 rogram C F C R C F /C R aiakl 85 86 0.986bid 46 48 0.959browse 20 20 0.990deriv 28 27 1.038grammar 14 14 1.014hanoiapp 31 30 1.033occur 18 20 0.911progeom 17 16 1.012qsortapp 24 19 1.290query 13 14 0.917rdtok 59 56 1.061rectoy 8 9 0.909serialize 27 30 0.875zebra 125 129 0.969Overall 0.99 Table 2. Comparison of Checking Timesexception handlers for any executable). The size of the certificates is showed in thefollowing columns. The columns
FCert and
RCert contain the size of the full andreduced certificates, respectively, for each benchmark, and they are compared in thenext column (
F/R ). Our results show that the reduction in size is quite significantin all cases. It ranges from 101 .
45 in zebra ( RCert is indeed empty –the minimumsize of an empty certificate is 40 bytes– whereas
FCert is 4058) to 1 for deriv (bothcertificates have the same size).The last column (
R/S ) compares the size of the reduced certificate to the sourcecode (i.e., the size of the final package to be submitted to the consumer). Theresults show the size of the reduced certificate to be very reasonable. It ranges from0.018 times the size of the source code (for zebra) to 1.144 (in the case of serialize).Overall, it is 0.28 times the size of the source code. We consider this satisfactorysince in general (C)LP programs are quite compact (up to 10 times more compactthan equivalent imperative programs).
Table 7.2 presents our experimental results regarding checking time. Executiontimes are given in milliseconds and measure runtime . They are computed as thearithmetic mean of five runs. For each benchmark, columns C F and C R are thetimes for executing Checker f and
Checker r , respectively. Column C F /C R compares both checking times. These times show that the efficiency of Checker r is very similar to that of
Checker f in most cases.The last row (Overall) summarizes the results for the different benchmarks using31 weighted mean which places more importance on those benchmarks with rela-tively larger certificates and checking times. We use as weight for each program itsactual checking time. We believe that this weighted mean is more informative thanthe arithmetic mean, since, for example, doubling the speed in which a large andcomplex program is checked is more relevant than achieving this for small, simpleprograms. As mentioned before, the efficiency of the checker for reduced certificatesis very similar to that of
Checker f (the overall slowdown is 0.99).
A detailed comparison of the technique of ACC with related methods can be foundin (Albert et al. 2008). In this section, we focus only on work related to certificatesize reduction in PCC. The common idea in order to compress a certificate in thePCC scheme is to store only the analysis information which the checker is not ableto reproduce by itself (Leroy 2003). In the field of abstract interpretation, this isknown as fixpoint compression and it is being used in different contexts and tools.For instance, in the Astr´ee analyzer (Cousot et al. 2005) designed to detect runtimeerrors in programs written in C, only one abstract element by head of loop is keptfor memory usage purposes. Our solution is an improvement in the sense that someof these elements many not need to be included in the certificate (i.e., if they are notrelevant). In other words, some loops do not require iteration to reach the fixpointand our technique detects this.With our same purpose of reducing the size of certificates, Necula and Lee (Necula and Lee 1998)designed a variant of the Edinburgh Logical Framework LF (Harper et al. 1993),called LF i , in which certificates (or proofs) discard part of the information that isredundant or that can be easily synthesized. LF i inherits from LF the possibility ofencoding several logics in a natural way but avoiding the high degree of redundancyproper of the LF representation of proofs. In the producer side, the original certifi-cate is an LF proof to which a representation algorithm is applied. On the consumerside, LF i proofs are validated by using a one pass LF type checker which is ableto reconstruct on the fly the missing parts of the proof in one pass. Experimentalresults for a concrete implementation reveal an important reduction on the size ofcertificates (w.r.t. LF representation proofs) and on the checking time. Althoughthis work attacks the same problem as ours the underlying techniques used areclearly different. Furthermore, our certificates may be considered minimal, whereasin (Necula and Lee 1998), redundant information is still left in the certificates inorder to guarantee a more efficient behaviour of the type checker.A further step is taken in Oracle-based PCC (Necula and Rahul 2001). This isa variation of the PCC idea that allows the size of proofs accompanying the codeto adapt to the complexity of the property being checked such that when PCCis used to verify relatively simple properties such as type safety, the essential in-formation contained in a proof is significantly smaller than the entire proof. Theproof as an oracle is implemented as a stream of bits aimed at resolving the non-deterministic interpretation choices. Although the underlying representations andtechniques are different from ours, we share with this work the purpose of reducing32he size of certificates by providing the checker with the minimal information itrequires to perform a proof and the genericity which allows both techniques to dealwith different kinds of properties beyond types.The general idea of certificate size reduction has also been deployed in lightweightbytecode verification (LBV) (Rose 1998; Rose 2003). LBV is a practical PCC ap-proach to Java Bytecode Verification (Leroy 2003) applied to the KVM (an em-bedded variant of the JVM). The idea is that the type-based bytecode verificationis split in two phases, where the producer first computes the certificate by meansof a type-based dataflow analyzer and then the consumer simply checks that thetypes provided in the code certificate are valid. As in our case, the second phasecan be done in a single, linear pass over the bytecode. However, LBV is limited totypes while ACC generalizes it to arbitrary domains. Also, ACC deals with multi-variance with the associated accuracy gains (while LBV is monovariant). Regardingthe reduction of certificate size, our work characterizes precisely the minimal infor-mation that can be sent for a generic algorithm not tied to any particular graphtraversal strategy. While the original notion of certificate in (Rose 1998) includesthe complete entry solution with respect to each basic block, (Rose 2003) reducescertificates by sending information only for “backward” jumps. As we have seenthrough our running example, (Rose 2003) sends information for all such backwardjumps while our proposal carries the reduction further because it includes only theanalysis information of those calls in the analysis graph whose answers have been updated , including both branching and non-branching instructions. We believe thatour notion of reduced certificate could also be used within Rose’s framework.As a final remark, the main ideas in ACC showed in Equations 2 and 4 in Sec-tion 2 have been the basis to build a PCC architecture based on certified abstractinterpretation in (Besson et al. 2006). Therefore, this proposal is built on the ba-sics of ACC for certificate generation and checking, but relies on a certified checkerspecified in Coq (Barras et al. 1997) in order to reduce the trusted computing base.In contrast to our framework, this work is restricted to safety properties which holdfor all states and, for now, it has only been implemented for a particular abstractdomain. Acknowledgments
The authors would like to gratefully thank the anonymous referees for useful com-ments on a preliminary version of this article. This work was funded in part bythe Information & Communication Technologies program of the European Com-mission, Future and Emerging Technologies (FET), under the ICT-231620
HATS project, by the Spanish Ministry of Science and Innovation (MICINN) under theTIN-2008-05624
DOVES project, the TIN2008-04473-E (Acci´on Especial) project,the HI2008-0153 (Acci´on Integrada) project, the UCM-BSCH-GR58/08-910502 Re-search Group and by the Madrid Regional Government under the S2009TIC-1465
PROMETIDOS project. 33 eferences
Albert, E. , G´omez-Zamalloa, M. , Hubert, L. , and Puebla, G. NinthInternational Symposium on Practical Aspects of Declarative Languages (PADL 2007) .Number 4354 in LNCS. Springer-Verlag, 124–139.
Albert, E. , Puebla, G. , and Hermenegildo, M. . Number 3452 in LNAI. Springer-Verlag, 380–397. Albert, E. , Puebla, G. , and Hermenegildo, M. New Generation Computing 26,
Barras, B. , Boutin, S. , Cornes, C. , Courant, J. , Filliatre, J. , Gimenez, E. , Her-belin, H. , Huet, G. , Munoz, C. , Murthy, C. , Parent, C. , Paulin-Mohring, C. , Saibi, A. , and Werner, B. Besson, F. , Jensen, T. , and Pichardie, D. Proc. of first International Workshop on EmergingApplications of Abstract Interpretation (EAAI 2006) . ENTCS.
Bruynooghe, M.
Journal of Logic Programming 10 , 91–124.
Bueno, F. , Cabeza, D. , Carro, M. , Hermenegildo, M. , L´opez-Garc´ıa, P. , andPuebla-(Eds.), G. . Cachera, D. , Jensen, T. , Pichardie, D. , and Rusu, V. The European Symposium on Programming (ESOP2004) . Number 2986 in LNCS. Springer-Verlag, 385–400.
Cousot, P. and Cousot, R.
ACMSymposium on Principles of Programming Languages (POPL’77) . ACM Press, 238–252.
Cousot, P. , Cousot, R. , Feret, J. , Mauborgne, L. , Min´e, A. , Monniaux, D. , andRival, X. The European Symposium on Programming(ESOP 2005) . Number 3444 in LNCS. Springer-Verlag, 21–30. de la Banda, M. G. , Hermenegildo, M. , Bruynooghe, M. , Dumortier, V. , Janssens, G. , and Simoens, W. ACM Transactions on Programming Languages and Systems 18,
Genaim, S. and Spoto, F.
SixthInternational Conference on Verification, Model Checking and Abstract Interpretation(VMCAI 2005) . Number 3385 in LNCS. Springer-Verlag, 346–362.
Harper, R. , Honsell, F. , and Plotkin, G. Journal of the Association for Computing Machinery 40,
1, 143–184.
Hermenegildo, M. , Puebla, G. , Bueno, F. , and L´opez-Garc´ıa, P. Science of Computer Programming 58,
Hermenegildo, M. , Puebla, G. , Marriott, K. , and Stuckey, P. ACM Transactions on Programming Languagesand Systems 22,
Jones, N. , Gomard, C. , and Sestoft, P. Partial Evaluation and AutomaticProgram Generation . Prentice Hall, New York. elly, A. , Marriott, K. , Søndergaard, H. , and Stuckey, P. Software: Practice and Experience 28,
Klein, G. and Nipkow, T.
Theoretical ComputerScience 3(298) , 583–626.
Le Charlier, B. and Van Hentenryck, P.
ACM Transactions on ProgrammingLanguages and Systems 16,
1, 35–101.
Leroy, X.
Journal ofAutomated Reasoning 30,
Lloyd, J.
Foundations of Logic Programming . Springer, second, extended edition.
Marriot, K. and Stuckey, P.
Programming with Constraints: An Introduction .The MIT Press.
M´endez-Lojo, M. , Navas, J. , and Hermenegildo, M. . Number4915 in LNCS. Springer-Verlag, 154–168. M´endez-Lojo, M. , Navas, J. , and Hermenegildo, M. ETAPS Workshop on BytecodeSemantics, Verification, Analysis and Transformation (BYTECODE 2007) . ElectronicNotes in Theoretical Computer Science. Elsevier - North Holland.
Muthukumar, K. and Hermenegildo, M.
InternationalConference on Logic Programming (ICLP 1991) . MIT Press, 49–63.
Muthukumar, K. and Hermenegildo, M.
Journal of Logic Programming 13,
Necula, G.
ACM Symposium on Principles of program-ming languages (POPL 1997) . ACM Press, 106–119.
Necula, G. and Lee, P.
IEEE Symposium on Logic in Computer Science (LICS 1998) . IEEE Computer Society,93–104.
Necula, G. and Rahul, S.
Principles of Programming Languages (POPL 2001) . ACM Press, 142–154.
Puebla, G. and Hermenegildo, M.
International Static Analysis Symposium (SAS 1996) .Number 1145 in LNCS. Springer-Verlag, 270–284.
Rose, E.
Journal of Automated Reasoning 31 ,303–334.
Rose, E. Rose, K.
OOPSLA Workshop onFormal Underpinnings of Java . Secci, S. and Spoto, F.
Static Analysis Symposium (SAS 2005) . Number 3672 in LNCS. 320–335.
Vallee-Rai, R. , Hendren, L. , Sundaresan, V. , Lam, P. , Gagnon, E. , and Co, P. Proc. of Conference of the Centre forAdvanced Studies on Collaborative Research (CASCON) . 125–135.. 125–135.