A Decision Tree Lifted Domain for Analyzing Program Families with Numerical Features (Extended Version)
AA Decision Tree Lifted Domain for AnalyzingProgram Families with Numerical Features(Extended Version)
Aleksandar S. Dimovski , Sven Apel , and Axel Legay Mother Teresa University, 12 Udarna Brigada 2a, 1000 Skopje, MKD Saarland University, Campus E1.1, 66123 Saarbr¨ucken, Germany Universit´e catholique de Louvain, 1348 Ottignies-Louvain-la-Neuve, Belgium
Abstract.
Lifted ( family-based ) static analysis by abstract interpreta-tion is capable of analyzing all variants of a program family simultaneously,in a single run without generating any of the variants explicitly. The ele-ments of the underlying lifted analysis domain are tuples, which maintainone property per variant. Still, explicit property enumeration in tuples,one by one for all variants, immediately yields combinatorial explosion.This is particularly apparent in the case of program families that, apartfrom Boolean features, contain also numerical features with big domains,thus admitting astronomic configuration spaces.The key for an efficient lifted analysis is proper handling of variability-specific constructs of the language (e.g., feature-based runtime tests and directives). In this work, we introduce a new symbolic representationof the lifted abstract domain that can efficiently analyze program familieswith numerical features. This makes sharing between property elementscorresponding to different variants explicitly possible. The elements ofthe new lifted domain are constraint-based decision trees , where decisionnodes are labeled with linear constraints defined over numerical featuresand the leaf nodes belong to an existing single-program analysis domain.To illustrate the potential of this representation, we have implementedan experimental lifted static analyzer, called SPLNum Analyzer , forinferring invariants of C programs. It uses existing numerical domains (e.g.,intervals, octagons, polyhedra) from the
APRON library as parameters.An empirical evaluation on benchmarks from SV-COMP and BusyBoxyields promising preliminary results indicating that our decision trees-based approach is effective and outperforms the tuple-based approach,which is used as a baseline analysis based on abstract interpretation.
Many software systems today are configurable [7]: they use features (or config-urable options) to control the presence and absence of software functionality.Different family members, called variants, are derived by switching features onand off, while the reuse of common code is maximized, leading to productivitygains, shorter time to market, greater market coverage, etc. Program families a r X i v : . [ c s . P L ] D ec e.g., Software Product Lines) are commonly seen in the development of com-mercial embedded software, such as cars, phones, avionics, medicine, robotics,etc. Configurable options (features) are used to either support different applica-tion scenarios for embedded components, to provide portability across differenthardware platforms and configurations, or to produce variations of productsfor different market segments or different customers. We consider here programfamilies implemented using directives from the C preprocessor CPP [20]. Theyuse -s to specify under which conditions parts of code should be included orexcluded from a variant. Classical program families use only Boolean featuresthat have two values: on and off. However, Boolean features are insufficientfor real-world program families, as there exist features that have a range ofnumbers as possible values. These features are called numerical features [18,25].For instance, Linux kernel, BusyBox, Apache web server, Java Garbage Collectorrepresent some real-world program families with numerical features. Analyzingsuch program families is very challenging, due to the fact that from only a fewfeatures, a huge number of variants can be derived.This paper concerns the verification of program families with Boolean andnumerical features using abstract interpretation-based static analysis.
Abstractinterpretation [8,24] is a general theory for approximating the semantics ofprograms. It provides sound (all confirmative answers are correct) and efficient(with a good trade-off between precision and cost) static analyses of run-timeproperties of real programs. It has been used as the foundation for varioussuccessful industrial-scale static analyzers, such as
ASTREE [11]. Still, staticanalysis of program families is harder than static analysis of single programs,because the number of possible variants can be very large (often huge) in practice.The simplest brute-force approach that uses a preprocessor to generate all variantsof a family, and then applies an existing off-the-shelf single-program analyzer toeach individual variant, one-by-one, is very inefficient [4,27]. Therefore, we useso-called lifted (family-based) static analyses [4,22,27], which analyze all variantsof the family simultaneously without generating any of the variants explicitly.They take as input the common code base, which encodes all variants of aprogram family, and produce precise analysis results corresponding to all variants.They use a lifted analysis domain, which represents an n -fold product of anexisting single-program analysis domain used for expressing program properties(where n is the number of valid configurations). That is, the lifted analysisdomain maintains one property element per valid variant in tuples. The problemis that this explicit property enumeration in tuples becomes computationallyintractable with larger program families because the number of variants (i.e.configurations) grows exponentially with the number of features. This problemhas been successfully addressed for program families that contain only Booleanfeatures [1,2,3,14], by using sharing through binary decision diagrams (BDDs).However, the fundamental limitation of existing lifted analysis techniques is thatthey do not deal with numerical features.To overcome this limitation, in this work we present a new, refined liftedabstract domain for effectively analyzing program families with numerical features2y means of abstract interpretation. The elements of the lifted abstract domain areconstraint-based decision trees , where the decision nodes are labelled with linearconstraints over numerical features, whereas the leaf nodes belong to a single-program analysis domain. The decision trees recursively partition the space ofconfigurations (i.e., the space of possible combinations of feature values), whereasthe program properties at the leaves provide analysis information correspondingto each partition, i.e. to the variants (configurations) that satisfy the constraintsalong the path to the given leaf node. The partitioning is dynamic, which meansthat partitions are split by feature-based tests (at directives), and joinedwhen merging the corresponding control flows again. In terms of decision trees,this means that new decision nodes are added by feature-based tests and removedwhen merging control flows. In fact, the partitioning of the set of configurationsis semantics-based, which means that linear constraints over numerical featuresthat occur in decision nodes are automatically inferred by the analysis and donot necessarily occur syntactically in the code base.The lifted abstract domain is parametric in the choice of numerical propertydomain which underlies the linear constraints over numerical features labellingdecision nodes, and the choice of the single-program analysis domain for leafnodes. In fact, in our implementation, we also use numerical property domainsfor leaf nodes, which encode linear constraints over program variables. We usehere the well-known numerical domains, such as intervals [8], octagons [23],polyhedra [13], from the APRON library [19] to obtain a concrete decisiontree-based implementation of the lifted abstract domain. This way, we haveimplemented a forward reachability analysis of C program families with numerical(and Boolean) features for the automatic inference of invariants. Our tool, called
SPLNum Analyzer , computes a set of possible invariants, which representlinear constraints over program variables. We can use the implemented liftedstatic analyzer to check invariance properties of C program families, such asassertions, buffer overflows, null pointer references, division by zero, etc [10].In summary, we make several contributions in this work: – First, we propose a new, parameterized lifted analysis domain based ondecision trees for analyzing program families with numerical features. – Then, we implement a prototype lifted static analyzer,
SPLNum Analyzer ,that performs a forward analysis of -enriched C programs, where numericalproperty domains from the
APRON library are used as parameters in thelifted analysis domain. – Finally, we evaluate our approach for automatic inference of invariants bycomparing performances of lifted analyzers based on tuples and decisiontrees. Num in the name of the tool refers to its ability to both handle Num erical featuresand to perform
Num erical client analysis of SPLs (program families). Motivating Example
To illustrate the potential of a decision tree-based lifted domain, we consider amotivating example using the code base of the following program family
SIMPLE : ○ int x := , y := ○ while ( x != { ○ x := x- ○ ( SIZE ≤ y := y+ ○ (! B ) y := ; ○ } ○ assert ( y > F of features is { B , SIZE } , where B is a Boolean feature and SIZE is anumerical feature whose domain is [1 ,
4] = { , , , } . Thus, the set of validconfigurations is K = { B ∧ ( SIZE = 1) , B ∧ ( SIZE = 2) , B ∧ ( SIZE = 3) , B ∧ ( SIZE =4) , ¬ B ∧ ( SIZE = 1) , ¬ B ∧ ( SIZE = 2) , ¬ B ∧ ( SIZE = 3) , ¬ B ∧ ( SIZE = 4) } . Thecode of SIMPLE contains two directives, which change the value assignedto y , depending on how features from F are set at compile-time. For eachconfiguration from K , a different variant (single program) can be generatedby appropriately resolving -s. For example, the variant corresponding toconfiguration B ∧ ( SIZE = 1) will have B and SIZE set to true and 1, so that theassignment y := y+ skip in program locations ○ and ○ , respectively, willbe included in this variant. The variant for configuration ¬ B ∧ ( SIZE = 4) will havefeatures B and SIZE set to false and 4, so the assignments y := y- y := ○ and ○ , respectively, will be included in this variant. Thereare | K | = 8 variants that can be derived from the family SIMPLE .Assume that we want to perform lifted polyhedra analysis of SIMPLE usingthe
Polyhedra numerical domain [13]. The standard lifted analysis domain usedin the literature [4,22] is defined as cartesian product of | K | copies of the basicanalysis domain (e.g. polyhedra). Hence, elements of the lifted domain are tuplescontaining one component for each valid configuration from K , where eachcomponent represents a polyhedra linear constraint over program variables ( x and y in this case). The lifted analysis result in location ○ of SIMPLE is an8-sized tuple shown in Fig. 1. Note that the first component of the tuple inFig. 1 corresponds to configuration B ∧ ( SIZE = 1), the second to B ∧ ( SIZE = 2),the third to B ∧ ( SIZE = 3), and so on. We can see in Fig. 1 that the polyhedraanalysis discovers very precise results for the variable y : ( y = 10) for configurations B ∧ ( SIZE = 1), B ∧ ( SIZE = 2), and B ∧ ( SIZE = 3); ( y = −
10) for configuration B ∧ ( SIZE = 4); and ( y = 0) for all other configurations. This is due to the fact thatthe polyhedra domain is fully relational and is able to track all relations betweenprogram variables x and y . Using this result in location ○ , we can successfullyconclude that the assertion is valid for configurations B ∧ ( SIZE = 1), B ∧ ( SIZE = 2),and B ∧ ( SIZE = 3), whereas the assertion fails for all other configurations.If we perform lifted polyhedra analysis based on the decision tree domain proposed in this work, then the corresponding decision tree inferred in the finalprogram location ○ of SIMPLE is depicted in Fig. 2. Notice that the innernodes of the decision tree in Fig. 2 are labeled with
Interval linear constraints4 B ∧ ( SIZE =1) (cid:122) (cid:125)(cid:124) (cid:123) [ y =10 , x =0] , B ∧ ( SIZE =2) (cid:122) (cid:125)(cid:124) (cid:123) [ y =10 , x =0] , B ∧ ( SIZE =3) (cid:122) (cid:125)(cid:124) (cid:123) [ y =10 , x =0] , B ∧ ( SIZE =4) (cid:122) (cid:125)(cid:124) (cid:123) [ y = − , x =0] , ¬ B ∧ ( SIZE =1) (cid:122) (cid:125)(cid:124) (cid:123) [ y =0 , x =0] , ¬ B ∧ ( SIZE =2) (cid:122) (cid:125)(cid:124) (cid:123) [ y =0 , x =0] , ¬ B ∧ ( SIZE =3) (cid:122) (cid:125)(cid:124) (cid:123) [ y =0 , x =0] , ¬ B ∧ ( SIZE =4) (cid:122) (cid:125)(cid:124) (cid:123) [ y =0 , x =0] (cid:17) Fig. 1: Tuple-based analysis result atlocation ○ of SIMPLE . B SIZE ≤ [ y =0 ∧ x =0][ y =10 ∧ x =0] [ y = − ∧ x =0] Fig. 2: Decision tree-based analysis re-sult at location ○ of SIMPLE (solidedges = true, dashed edges = false).over features (
SIZE and B ), while the leaves are labeled with the Polyhedra linear constraints over program variables x and y . Hence, we use two differentnumerical abstract domains in our decision trees: Interval domain [8] for expressingproperties in decision nodes, and Polyhedra domain [13] for expressing propertiesin leaf nodes. The edges of decision trees are labeled with the truth value ofthe decision on the parent node; we use solid edges for true (i.e. the constraintin the parent node is satisfied) and dashed edges for false (i.e. the negation ofthe constraint in the parent node is satisfied). As decision nodes partition thespace of valid configurations K , we implicitly assume the correctness of linearconstraints that take into account domains of numerical features. For example,the node with constraint ( SIZE ≤
3) is satisfied when (
SIZE ≤ ∧ (1 ≤ SIZE ≤ SIZE > ∧ (1 ≤ SIZE ≤ ≤ SIZE ≤
4) represent the domain [1 ,
4] of
SIZE . We can see that decision treesoffer more possibilities for sharing and interaction between analysis propertiescorresponding to different configurations, they provide symbolic and compactrepresentation of lifted analysis elements. For example, Fig. 2 presents polyhedraproperties of two program variables x and y , which are partitioned with respectto features B and SIZE . When ( B ∧ ( SIZE ≤ y = 10 , x = 0), whereas when ( B ∧ ¬ ( SIZE ≤ y = − , x = 0). When ¬ B is true, the property is independent from the valueof SIZE , hence a node with a constraint over
SIZE is not needed. Therefore, allsuch cases are identical and so they share the same leaf node ( y = 0 , x = 0). Ineffect, the decision tree-based representation uses only three leafs, whereas thetuple-based representation uses eight properties. This ability for sharing is thekey motivation behind the decision trees-based representation. Let F = { A , . . . , A k } be a finite and totaly ordered set of numerical features available in a program family. For each feature A ∈ F , dom( A ) ⊆ Z denotes theset of possible values that can be assigned to A . Note that any Boolean feature5an be represented as a numerical feature B ∈ F with dom( B ) = { , } , suchthat 0 means that feature B is disabled while 1 means that B is enabled. Avalid combination of feature’s values represents a configuration k , which specifiesone variant of a program family. It is given as a valuation function k : F → Z ,which is a mapping that assigns a value from dom( A ) to each feature A , i.e. k ( A ) ∈ dom( A ) for any A ∈ F . We assume that only a subset K of all possibleconfigurations are valid . An alternative representation of configurations is basedupon propositional formulae. Each configuration k ∈ K can be represented bya formula: ( A = k ( A )) ∧ . . . ∧ ( A k = k ( A k )). We often abbreviate ( B = 1)with B and ( B = 0) with ¬ B , for a Boolean feature B ∈ F . The set of validconfigurations K can be also represented as a formula: ∨ k ∈ K k .We define feature expressions , denoted FeatExp ( F ), as the set of propositionallogic formulas over constraints of F generated by the grammar: θ ::= true | e F Z (cid:46)(cid:47) e F Z | ¬ θ | θ ∧ θ | θ ∨ θ , e F Z ::= n | A | e F Z ⊕ e F Z where A ∈ F , n ∈ Z , ⊕ ∈ { + , − , ∗} , and (cid:46)(cid:47) ∈ { = , < } . We will use θ ∈ FeatExp ( F )to write presence conditions. When a configuration k ∈ K satisfies a featureexpression θ ∈ FeatExp ( F ), we write k | = θ , where | = is the standard satisfactionrelation of logic. We write [[ θ ]] to denote the set of configurations from K thatsatisfy θ , that is, k ∈ [[ θ ]] iff k | = θ . For example, for the SIMPLE programfamily we have F = { B , SIZE } , where dom( SIZE ) = [1 , K = { B ∧ ( SIZE =1) , B ∧ ( SIZE = 2) , B ∧ ( SIZE = 3) , B ∧ ( SIZE = 4) , ¬ B ∧ ( SIZE = 1) , ¬ B ∧ ( SIZE =2) , ¬ B ∧ ( SIZE = 3) , ¬ B ∧ ( SIZE = 4) } . For the feature expression ( SIZE ≤ SIZE ≤ { B ∧ ( SIZE = 1) , B ∧ ( SIZE = 2) , B ∧ ( SIZE = 3) , ¬ B ∧ ( SIZE =1) , ¬ B ∧ ( SIZE = 2) , ¬ B ∧ ( SIZE = 3) } . Hence, B ∧ ( SIZE = 2) | = ( SIZE ≤
3) and B ∧ ( SIZE = 4) (cid:54)| = (
SIZE ≤ B ∧ ( SIZE = 2) ∈ K , B ∧ ( SIZE = 4) ∈ K , and( SIZE ≤ ∈ FeatExp ( F ).We consider a simple sequential non-deterministic programming language,which will be used to exemplify our work. The program variables Var are staticallyallocated and the only data type is the set Z of mathematical integers. To encodemultiple variants, a new compile-time conditional statement is included. The newstatement “ ( θ ) s ” contains a feature expression θ ∈ FeatExp ( F ) as apresence condition, such that only if θ is satisfied by a configuration k ∈ K thestatement s will be included in the variant corresponding to k . The syntax is: s ::= skip | x:= e | s ; s | if ( e ) then s else s | while ( e ) do s | ( θ ) s ,e ::= n | [ n, n (cid:48) ] | x | e ⊕ e where n ranges over integers, [ n, n (cid:48) ] over integer intervals, x over program variables Var , and ⊕ over binary arithmetic operators. Integer intervals [ n, n (cid:48) ] denote arandom choice of an integer in the interval. The set of all statements s is denotedby Stm ; the set of all expressions e is denoted by Exp .A program family is evaluated in two stages. First, the C preprocessor
CPP takes a program family s and a configuration k ∈ K as inputs, and produces avariant (without -s) corresponding to k as the output. Second, the obtainedvariant is evaluated using the standard single-program semantics. The first6 nt x := , y := while ( x != { x := x- y := y+ skip ; } (a) P B ∧ ( SIZE =1) ( SIMPLE ) int x := , y := while ( x != { x := x- y := y- skip ; } (b) P B ∧ ( SIZE =4) ( SIMPLE ) int x := , y := while ( x != { x := x- y := y+ y := } (c) P ¬ B ∧ ( SIZE =1) ( SIMPLE ) int x := , y := while ( x != { x := x- y := y- y := } (d) P ¬ B ∧ ( SIZE =4) ( SIMPLE ) Fig. 3: Different variants of the program family
SIMPLE from Section 2.stage is specified by the projection function P k , which is an identity for allbasic statements and recursively pre-processes all sub-statements of compoundstatements. Hence, P k ( skip ) = skip and P k ( s ; s (cid:48) ) = P k ( s ); P k ( s (cid:48) ). The interestingcase is “ ( θ ) s ”, where statement s is included in the variant if k | = θ ,otherwise, s is removed : P k ( ( θ ) s ) = (cid:40) P k ( s ) if k | = θ skip if k (cid:54)| = θ For example, variants P B ∧ ( SIZE =1) ( SIMPLE ), P B ∧ ( SIZE =4) ( SIMPLE ), P ¬ B ∧ ( SIZE =1) ( SIMPLE ),as well as P ¬ B ∧ ( SIZE =4) ( SIMPLE ) shown in Fig. 3a, Fig. 3b, Fig. 3c, and Fig. 3d,respectively, are derived from the
SIMPLE family defined in Section 2.
Lifted analyses are designed by lifting existing single-program analyses to workon program families, rather than on individual programs. They directly analyzeprogram families. Lifted analysis as defined by Midtgaard et. al. [22] rely ona lifted domain that is | K | -fold product of an existing single-program analysisdomain A defined over program variables Var . We assume that the domain A is equipped with sound operators for concretization γ A , ordering (cid:118) A , join (cid:116) A ,meet (cid:117) A , bottom ⊥ A , top (cid:62) A , widening ∇ A , and narrowing (cid:52) A , as well as soundtransfer functions for tests FILTER A and forward assignments ASSIGN A . Morespecifically, FILTER A ( a : A , e : Exp ) returns an abstract element from A obtainedby restricting a to satisfy the test e , whereas ASSIGN A ( a : A , x:= e : Stm ) returnsan updated version of a by abstractly evaluating x:= e in it. Lifted Domain.
The lifted analysis domain is defined as (cid:104) A K , ˙ (cid:118) , ˙ (cid:116) , ˙ (cid:117) , ˙ ⊥ , ˙ (cid:62)(cid:105) , where A K is shorthand for the | K | -fold product (cid:81) k ∈ K A , that is, there is one separatecopy of A for each configuration of K . For example, consider the tuple in Fig. 1. Lifted Abstract Operations.
Given a tuple (lifted domain element) a ∈ A K , theprojection π k selects the k th component of a . All abstract lifted operations aredefined by lifting the abstract operations of the domain A configuration-wise. Since k ∈ K is a valuation function, either k | = θ holds or k (cid:54)| = θ holds for any θ . ( a ) = (cid:81) k ∈ K ( γ A ( π k ( a ))) , a ˙ (cid:118) a ≡ π k ( a ) (cid:118) A π k ( a ) , for ∀ k ∈ K a ˙ (cid:116) a = (cid:81) k ∈ K ( π k ( a ) (cid:116) A π k ( a )) , a ˙ (cid:117) a = (cid:81) k ∈ K ( π k ( a ) (cid:117) A π k ( a ))˙ (cid:62) = (cid:81) k ∈ K (cid:62) A = ( (cid:62) A , . . . , (cid:62) A ) , ˙ ⊥ = (cid:81) k ∈ K ⊥ A = ( ⊥ A , . . . , ⊥ A ) a ˙ ∇ a = (cid:81) k ∈ K ( π k ( a ) ∇ A π k ( a )) , a ˙ (cid:52) a = (cid:81) k ∈ K ( π k ( a ) (cid:52) A π k ( a )) Lifted Transfer Functions.
We now define lifted transfer functions for tests,forward assignments (ASSIGN), and -s (IFDEF). There are two types oftests: expression-based tests , denoted FILTER, that occur in while -s and if -s, and feature-based tests , denoted FEAT-FILTER, that occur in -s. Eachlifted transfer function takes as input a tuple from A K representing the invariantbefore evaluating the statement (resp., expression) to handle, and returns a tuplerepresenting the invariant after evaluating the given statement (resp., expression).FILTER( a : A K , e : Exp ) = (cid:81) k ∈ K (FILTER A ( π k ( a ) , e ))FEAT-FILTER( a : A K , θ : FeatExp ( F )) = (cid:81) k ∈ K (cid:40) π k ( a ) , if k | = θ ⊥ A , if k (cid:54)| = θ ASSIGN( a : A K , x:= e : Stm ) = (cid:81) k ∈ K (ASSIGN A ( π k ( a ) , x:= e ))IFDEF( a : A K , ( θ ) s : Stm ) = [[ s ]](FEAT-FILTER( a, θ )) ˙ (cid:116) FEAT-FILTER( a, ¬ θ )where [[ s ]]( a ) is the lifted transfer function for statement s . FILTER and ASSIGNare defined by applying FILTER A and ASSIGN A independently on each com-ponent of the input tuple a . FEAT-FILTER keeps those components k of theinput tuple a that satisfy θ , otherwise it replaces the other components with ⊥ A .IFDEF captures the effect of analyzing the statement s in the components k of a that satisfy θ , otherwise it is an identity for the other components k that donot satisfy θ . Lifted Analysis.
Lifted abstract operators and transfer functions of the liftedanalysis domain A K are combined together to analyze program families. Initially,we build a tuple a in where all components are set to (cid:62) A for the first programlocation, and tuples where all components are set to ⊥ A for all other locations.The analysis properties are propagated forward from the first program locationtowards the final location taking assignments, -s, and tests into account withjoin and widening around while -s. We apply delayed widening [9], which meanswe start extrapolating by widening only after some fixed number of iterations weanalyze the loop. We improve the precision of the solution obtained by delayedwidening by further applying a narrowing operator [9]. The soundness of thelifted analysis based on A K follows immediately from the soundness of all abstractoperators and transfer functions of A (proved in [22]). Numerical Lifted Analysis
The single-program analysis domain A can be instan-tiated by some of the well-known numerical property domains [24].8he Interval domain [8], denoted as (cid:104) I , (cid:118) I (cid:105) , is a non-relational numericalproperty domain that identifies the range of possible values for every variable as aninterval. The property elements are: { [ l, h ] | l ∈ Z ∪ {−∞} , h ∈ Z ∪ { + ∞} , l ≤ h } .The Octagon domain [23], denoted as (cid:104) O , (cid:118) O (cid:105) , is a weakly-relational numericalproperty domain, where property elements are conjunctions of linear constraintsof the form + − x j + − x i ≥ β between variables x i and x j , and β ∈ Z .The Polyhedra domain [13], denoted as (cid:104) P , (cid:118) P (cid:105) , is a fully relational numericalproperty domain. It expresses conjunctions of linear constraints of the form α x + . . . + α n x n + β ≥
0, where x , . . . , x n are variables and α i , β ∈ Z . We now introduce a new decision tree lifted domain. Its elements are disjunctionsof leaf nodes that belong to an existing single-program domain A defined overprogram variables Var . The leaf nodes are separated by linear constraints overnumerical features, organized in the decision nodes. Hence, we encapsulate theset of configurations K into the decision nodes of a decision tree where each top-down path represents one or several configurations that satisfy the constraintsencountered along the given path. We store in each leaf node the propertygenerated from the variants representing the corresponding configurations. Abstract domain for decision nodes.
We define the family of abstract domains forlinear constraints C D , which are parameterized by any of the numerical propertydomains D (intervals I , octagons O , polyhedra P ). We use C I = { + − A i ≥ β | A i ∈ F , β ∈ Z } to denote the set of interval constraints , C O = { + − A i + − A j ≥ β | A i , A j ∈ F , β ∈ Z } to denote the set of octagonal constraints , and C P = { α A + . . . + α k A k + β ≥ | A , . . . A k ∈ F , α , . . . , α k , β ∈ Z , gcd( | α | , . . . , | α k | , | β | ) = 1 } to denote the set of polyhedral constraints . We have C I ⊆ C O ⊆ C P .The set C D of linear constraints over features F is constructed by theunderlying numerical property domain (cid:104) D , (cid:118) D (cid:105) using the Galois connection (cid:104)P ( C D ) , (cid:118) D (cid:105) −−−−→←−−−− α C D γ C D (cid:104) D , (cid:118) D (cid:105) , where P ( C D ) is the power set of C D . The abstrac-tion function α C D : P ( C D ) → D maps a set of interval (resp., octagon, polyhedral)constraints to an interval (resp., an octagon, polyhedral) that represents a con-junction of constraints; the concretization function γ C D : D → P ( C D ) mapsan interval (resp., an octagon, a polyhedron) that represents a conjunction ofconstraints to a set of interval (resp., octagonal, polyhedral) constraints. We have γ C D ( (cid:62) D ) = ∅ and γ C D ( ⊥ D ) = {⊥ C D } , where ⊥ C D is an unsatisfiable constraint.The domain of decision nodes is C D . We assume F = { A , . . . , A k } be a finiteand totally ordered set of features, such that the ordering is A > A > . . . > A k .We impose a total order < C D on C D to be the lexicographic order on the coefficients α , . . . , α k and constant α k +1 of the linear constraints, such that:( α · A + . . . + α k · A k + α k +1 ≥ < C D ( α (cid:48) · A + . . . + α (cid:48) k · A k + α (cid:48) k +1 ≥ ⇐⇒ ∃ j > . ∀ i < j. ( α i = α (cid:48) i ) ∧ ( α j < α (cid:48) j )9he negation of linear constraints is formed as: ¬ ( α A + . . . α k A k + β ≥
0) = − α A − . . . − α k A k − β − ≥
0. For example, the negation of A − ≥ − A + 2 ≥ A ≤ c and its negation ¬ c cannot both appearas nodes in a decision tree. For example, we only keep the largest constraintwith respect to < C D between c and ¬ c . For this reason, we define the equivalencerelation ≡ C D as c ≡ C D ¬ c . We define (cid:104) C D , < C D (cid:105) to denote (cid:104) C D / ≡ , < C D (cid:105) , such thatelements of C D are constraints obtained by quotienting by the equivalence ≡ C D . Abstract domain for constraint-based decision trees. A constraint-based decisiontree t ∈ T ( C D , A ) over the sets C D of linear constraints defined over F and theleaf abstract domain A defined over Var is either a leaf node (cid:28) a (cid:29) with a ∈ A ,or [[ c : tl, tr ]], where c ∈ C D (denoted by t.c ) is the smallest constraint withrespect to < C D appearing in the tree t , tl (denoted by t.l ) is the left subtree of t representing its true branch , and tr (denoted by t.r ) is the right subtree of t representing its false branch . The path along a decision tree establishes the setof configurations (those that satisfy the encountered constraints), and the leafnodes represent the analysis properties for the corresponding configurations. Example 1.
The following two constraint-based decision trees t and t havedecision nodes labelled with Interval linear constraints over the numeric feature SIZE with domain { , , , } , whereas leaf nodes are Interval properties: t = [[ SIZE ≥ (cid:28) [ y ≥ (cid:29) , (cid:28) [ y = 0] (cid:29) ]] , t = [[ SIZE ≥ (cid:28) [ y ≥ (cid:29) , (cid:28) [ y ≤ (cid:29) ]] (cid:117)(cid:116) Abstract Operations.
The concretization function γ T of a decision tree t ∈ T ( C D , A ) returns γ A ( a ) for k ∈ K , where k satisfies the set C ∈ P ( C D ) ofconstraints accumulated along the top-down path to the leaf node a ∈ A . Moreformally, γ T ( t ) = γ T [ K ]( t ). The function γ T accumulates into a set C ∈ P ( C D )constraints along the paths up to a leaf node, which is initially equal to the set ofimplicit constraints over F , K = ∨ k ∈ K k , taking into account domains of features: γ T [ C ]( (cid:28) a (cid:29) ) = (cid:81) k | = C γ A ( a ) , γ T [ C ]([[ c : tl, tr ]]) = γ T [ C ∪{ c } ]( tl ) × γ T [ C ∪{¬ c } ]( tr )Note that k | = C is equivalent with α C D ( { k } ) (cid:118) D α C D ( C ). Therefore, we cancheck k | = C using the abstract operation (cid:118) D of the numerical domain D .Other binary operations of T ( C D , A ) are based on Algorithm 1 for tree unifica-tion , which finds a common refinement (labelling) of two trees t and t by callingfunction UNIFICATION ( t , t , K ). It possibly adds new constraints as decisionnodes (Lines 5–7, Lines 11–13), or removes constraints that are redundant (Lines3,4,9,10,15,16). The function UNIFICATION accumulates into the set C ∈ P ( C D )(initialized to K , which represents implicit constraints satisfied by both t and t ),constraints encountered along the paths of the decision tree. This set C is usedby the function isRedundant( c, C ), which checks whether the linear constraint c ∈ C D is redundant with respect to C by testing α C D ( C ) (cid:118) D α C D ( { c } ). Note thatthe tree unification does not lose any information.10 lgorithm 1: UNIFICATION( t , t , C ) if isLeaf( t ) ∧ isLeaf( t ) then return ( t , t ); if isLeaf( t ) ∨ (isNode( t ) ∧ isNode( t ) ∧ t .c < C D t .c ) then if isRedundant( t .c, C ) then return UNIFICATION( t , t .l, C ) ; if isRedundant( ¬ t .c, C ) then return UNIFICATION( t , t .r, C ) ; ( l , l ) = UNIFICATION( t , t .l, C ∪ { t .c } ) ; ( r , r ) = UNIFICATION( t , t .r, C ∪ {¬ t .c } ) ; return ( [[ t .c : l , r ]] , [[ t .c : l , r ]] ); if isLeaf( t ) ∨ (isNode( t ) ∧ isNode( t ) ∧ t .c < C D t .c ) then if isRedundant( t .c, C ) then return UNIFICATION( t .l, t , C ) ; if isRedundant( ¬ t .c, C ) then return UNIFICATION( t .r, t , C ) ; ( l , l ) = UNIFICATION( t .l, t , C ∪ { t .c } ) ; ( r , r ) = UNIFICATION( t .r, t , C ∪ {¬ t .c } ) ; return ( [[ t .c : l , r ]] , [[ t .c : l , r ]] ); else if isRedundant( t .c, C ) then return UNIFICATION( t .l, t .l, C ) ; if isRedundant( ¬ t .c, C ) then return UNIFICATION( t .r, t .r, C ) ; ( l , l ) = UNIFICATION( t .l, t .l, C ∪ { t .c } ) ; ( r , r ) = UNIFICATION( t .r, t .r, C ∪ {¬ t .c } ) ; return ( [[ t .c : l , r ]] , [[ t .c : l , r ]] ); Example 2.
Consider constraint-based decision trees t and t from Example 1.After tree unification UNIFICATION ( t , t , K ), the resulting decision trees are: t = [[ SIZE ≥ (cid:28) [ y ≥ (cid:29) , [[ SIZE ≥ (cid:28) [ y = 0] (cid:29) , (cid:28) [ y = 0] (cid:29) ]]]] ,t = [[ SIZE ≥ (cid:28) [ y ≥ (cid:29) , [[ SIZE ≥ (cid:28) [ y ≥ (cid:29) , (cid:28) [ y ≤ (cid:29) ]]]]Note that UNIFICATION adds a decision node for
SIZE ≥ t , whereas it adds a decision node for SIZE ≥ t and removes the redundantconstraint SIZE ≥ t . (cid:117)(cid:116) All binary operations are performed leaf-wise on the unified decision trees.Given two unified decision trees t and t , their ordering and join are defined as: (cid:28) a (cid:29)(cid:118) T (cid:28) a (cid:29) = a (cid:118) A a , [[ c : tl , tr ]] (cid:118) T [[ c : tl , tr ]] = ( tl (cid:118) T tl ) ∧ ( tr (cid:118) T tr ) (cid:28) a (cid:29)(cid:116) T (cid:28) a (cid:29) = (cid:28) a (cid:116) A a (cid:29) , [[ c : tl , tr ]] (cid:116) T [[ c : tl , tr ]] = [[ c : tl (cid:116) T tl , tr (cid:116) T tr ]]Similarly, we compute meet, widening, and narrowing of t and t . The top is atree with a single (cid:62) A leaf: (cid:62) T = (cid:28)(cid:62) A (cid:29) , while the bottom is: ⊥ T = (cid:28)⊥ A (cid:29) . Example 3.
Consider the unified trees t and t from Example 2. We have that t (cid:118) T t holds, and t (cid:116) T t = [[ SIZE ≥ (cid:28) [ y ≥ (cid:29) , [[ SIZE ≥ (cid:28) [ y ≥ (cid:29) , (cid:28) [ y ≤ (cid:29) ]]]]. Transfer functions.
The transfer functions for forward assignments (ASSIGN T )and expression-based tests (FILTER T ) modify only leaf nodes of a constraint-based decision tree. In contrast, transfer functions for variability-specific con-structs, such as feature-based tests (FEAT-FILTER T ) and -s (IFDEF T ) add,11odify, or delete decision nodes of a decision tree. This is due to the fact thatthe analysis information about program variables is located in leaf nodes, whilethe information about feature variables is located in decision nodes. Algorithm 2:
ASSIGN T ( t, x:= e ) if isLeaf( t ) then return (cid:28) ASSIGN A ( t, x:= e ) (cid:29) ; return [[ t.c : ASSIGN T ( t.l, x:= e ), ASSIGN T ( t.r, x:= e )]] ; Transfer function ASSIGN T for handling an assignment x:= e in the input tree t is described by Algorithm 2. Note that x ∈ Var , and e ∈ Exp may contain onlyprogram variables. We apply ASSIGN A to each leaf node a of t , which substitutesexpression e for variable x in a . Similarly, transfer function FILTER T for handlingexpression-based tests e ∈ Exp is implemented by applying FILTER A leaf-wise.Transfer function FEAT-FILTER T for feature-based tests θ is described byAlgorithm 3. It reasons by induction on the structure of θ (we assume negation isapplied to atomic propositions). When θ is an atomic constraint over numericalfeatures (Lines 2,3), we use FILTER D to approximate θ , thus producing a set ofconstraints J , which are then added to the tree t , possibly discarding all paths of t that do not satisfy θ . This is done by calling function RESTRICT ( t, K , J ), whichadds linear constraints from J to t in ascending order with respect to < C D asshown in Algorithm 4. Note that θ may not be representable exactly in C D (e.g.,in the case of non-linear constraints over F ), so FILTER D may produce a set ofconstraints approximating it. When θ is a conjunction (resp., disjunction) of twofeature expressions (Lines 4,5) (resp., (Lines 6,7)), the resulting decision treesare merged by operation meet (cid:117) T (resp., join (cid:116) T ). Function RESTRICT ( t, C, J ),described in Algorithm 4, takes as input a decision tree t , a set C of linearconstraints accumulated along paths up to a node, and a set J of linear constraintsin canonical form that need to be added to t . For each constraint j ∈ J , thereexists a boolean b j that shows whether the tree should be constrained withrespect to j or with respect to ¬ j . When J is not empty, the linear constraintsfrom J are added to t in ascending order with respect to < C D . At each iteration,the smallest linear constraint j is extracted from J (Line 9), and is handledappropriately based on whether j is smaller (Line 11–15), or greater or equal(Line 17–21) to the constraint at the node of t we currently consider.Finally, transfer function IFDEF T is defined as:IFDEF T ( t, ( θ ) s ) = [[ s ]] T (FEAT-FILTER T ( t, θ )) (cid:116) T FEAT-FILTER T ( t, ¬ θ )where [[ s ]] T ( t ) denotes the transfer function in T ( C D , A ) for statement s .After applying transfer functions, the obtained decision trees may containsome redundancy that can be exploited to further compress them. Function COMPRESS T ( t, C ), described by Algorithm 5, is applied to decision trees t in orderto compress (reduce) their representation. We use five different optimizations.12 lgorithm 3: FEAT-FILTER T ( t, θ ) switch θ do case ( e F Z (cid:46)(cid:47) e F Z ) || ( ¬ ( e F Z (cid:46)(cid:47) e F Z )) do J = FILTER D ( (cid:62) D , θ ); return RESTRICT ( t, K , J ) case θ ∧ θ do return FEAT-FILTER T ( t, θ ) (cid:117) T FEAT-FILTER T ( t, θ ) case θ ∨ θ do return FEAT-FILTER T ( t, θ ) (cid:116) T FEAT-FILTER T ( t, θ ) First, if constraints on a path to some leaf are unsatisfiable, we eliminate thatleaf node (Lines 9,10). Second, if a decision node contains two same subtrees,then we keep only one subtree and we also eliminate the decision node (Lines11–13). Third, if a decision node contains a left leaf and a right subtree, such thatits left leaf is the same with the left leaf of its right subtree and the constraint inthe decision node is less or equal to the constraint in the root of its right subtree,then we can eliminate the decision node and its left leaf (Lines 14,15). A similarrule exists when a decision node has a left subtree and a right leaf (Lines 16,17).
Lifted analysis.
The abstract operations and transfer functions of T ( C D , A ) canbe used to define the lifted analysis for program families. Tree t in at the initiallocation has only one leaf node (cid:62) A and decision nodes that define the set K . Notethat if K ≡ true, then t in = (cid:62) T . In this way, we collect the possible invariants inthe form of decision trees at all program locations.We establish correctness of the lifted analysis based on T ( C D , A ) by showingthat it produces identical results with tuple-based domain A K . Let [[ s ]] T and [[ s ]]denote transfer functions of statement s in T ( C D , A ) and A K , respectively. Theorem 1 (App. A). γ T (cid:0) [[ s ]] T ( t in ) (cid:1) = γ (cid:0) [[ s ]]( a in ) (cid:1) .Example 4. Let us consider the code base of a program family P given in Fig. 4.It contains only one numerical feature SIZE with domain N . The decision treeinferred at the final location ○ is depicted in Fig. 5. It uses the Interval domainfor both decision and leaf nodes. Note that the constraint ( SIZE <
3) doesnot explicitly appear in the code base, but we obtain it in the decision treerepresentation. This shows that partitioning of the configuration space K inducedby decision trees is semantics-based rather than syntactic-based. Example 5.
Let us consider the code base of a program family P (cid:48) given in Fig. 6.It contains one numerical feature A with domain [1 ,
4] and a non-linear featureexpression A ∗ A <
9. At program location ○ , FEAT-FILTER T ( (cid:28) x = 0 (cid:29) , A ∗ A < (cid:28) x = 0 (cid:29) , whereas FEAT-FILTER T ( (cid:28) x =0 (cid:29) , ¬ ( A ∗ A < A ≥ , (cid:28) x = 0 (cid:29) , (cid:28)⊥ I (cid:29) ]]. In effect, we obtain anover-approximating result at the final program location ○ as shown in Fig. 7.13 lgorithm 4: RESTRICT( t, C, J ) if isEmpty( J ) then if isLeaf( t ) then return t ; if isRedundant( t.c, C ) then return RESTRICT ( t.l, C, J ); if isRedundant( ¬ t.c, C ) then return RESTRICT ( t.r, C, J ); l = RESTRICT ( t.l, C ∪ { t.c } , J ) ; r = RESTRICT ( t.r, C ∪ {¬ t.c } , J ) ; return ( [[ t.c : l, r ]] ); else j = min < CD ( J ) ; if isLeaf( t ) ∨ (isNode( t ) ∧ j < C D t.c ) then if isRedundant( j, C ) then return RESTRICT( t, C, J \{ j } ) ; if isRedundant( ¬ j, C ) then return (cid:28)⊥ A (cid:29) ; if j = C D t.c then ( if b j then t = t.l ; else t = t.r ) ; if b j then return ( [[ j : RESTRICT ( t, C ∪ { j } , J \{ j } ) , (cid:28)⊥ A (cid:29) ]] ) ; else return ( [[ j : (cid:28)⊥ A (cid:29) , RESTRICT ( t, C ∪ { j } , J \{ j } )]] ) ; else if isRedundant( t.c, C ) then return RESTRICT ( t.l, C, J ); if isRedundant( ¬ t.c, C ) then return RESTRICT ( t.r, C, J ); l = RESTRICT ( t.l, C ∪ { t.c } , J ) ; r = RESTRICT ( t.r, C ∪ {¬ t.c } , J ) ; return ( [[ t.c : l, r ]] ); ○ int x := ○ ( SIZE ≤ x := x+ ○ ( SIZE==3 || SIZE==4 ) x := x- ○ Fig. 4: Code base for program family P . SIZE < [ x=
1] [ x=- Fig. 5: Decision tree at loc. ○ of P .The precise result at the program location ○ , which can be obtained in case wehave numerical domains that can handle non-linear constraints, is given in Fig. 8.We observe that when ¬ ( A ≤ − ≤ x ≤ x = −
1) due to the over-approximation of the non-linearfeature expression in the numerical domains we use. (cid:117)(cid:116)
Implementation
We have developed a prototype lifted static analyzer, called
SPLNum Analyzer , that uses lifted abstract domains of tuples A K and deci-sion trees T ( C D , A ). The abstract domains A for encoding properties of tuplecomponents and leaf nodes as well as the abstract domain D for encoding linearconstraints over numerical features are based on intervals, octagons, and poly-hedra domains. Their abstract operations and transfer functions are provided14 lgorithm 5: COMPRESS T ( t, C ) switch t do case (cid:28) n (cid:29) do return (cid:28) n (cid:29) ; case [[ t.c : l, r ]] do l (cid:48) = COMPRESS T ( t.l, C ∪ { t.c } ) ; r (cid:48) = COMPRESS T ( t.r, C ∪ {¬ t.c } ) ; switch l (cid:48) , r (cid:48) do case (cid:28) n (cid:48) l (cid:29) , (cid:28) n (cid:48) r (cid:29) do if UNSAT( C ∪ { t.c } ) then return (cid:28) n (cid:48) r (cid:29) ; if UNSAT( C ∪ {¬ t.c } ) then return (cid:28) n (cid:48) l (cid:29) ; if n (cid:48) l = n (cid:48) r then return (cid:28) n (cid:48) l (cid:29) ; case [[ c : l , r ]] , [[ c : l , r ]] when c = c ∧ l = l ∧ r = r do return [[ c : l , r ]] ; case (cid:28) n (cid:48) l (cid:29) , [[ c : l , r ]] when (cid:28) n (cid:48) l (cid:29) = l ∧ c ≤ C D c do return [[ c : l , r ]] ; case [[ c : l , r ]] , (cid:28) n (cid:48) r (cid:29) when (cid:28) n (cid:48) r (cid:29) = r ∧ c ≤ C D c do return [[ c : l , r ]] ; case default: do return [[ t.c : l (cid:48) , r (cid:48) ]] ; by the APRON library [19]. Our proof-of-concept implementation is writtenin
OCaml and consists of around 6K lines of code. The current front-end ofthe tool accepts programs written in a (subset of) C with directives, butwithout struct and union types. It currently provides only a limited supportfor arrays, pointers, and recursion. The only basic data type is mathematicalintegers. SPLNum Analyzer automatically infers numerical invariants in allprogram locations corresponding to all variants in the given family.
Experimental setup and Benchmarks
All experiments are executed on a 64-bitIntel (cid:114)
Core
T M i7-8700 [email protected] ×
12, Ubuntu 18.04.5 LTS, with 8 GBmemory, and we use a timeout value of 300 sec. All times are reported as averageover five independent executions. The implementation, benchmarks, and all resultsobtained from our experiments are available from: http://bit.ly/2SRElgK . Inour experiments, we use three instances of our lifted analysis via decision trees: A T ( I ), A T ( O ), and A T ( P ) that use intervals, octagons, and polyhedra domainsfor properties in leaf nodes and in decision nodes, respectively. We also use threeinstances of our lifted analysis based on tuples: A Π ( I ), A Π ( O ), and A Π ( P ). SPLNum Analyzer was evaluated on a dozen of C numerical programscollected from several different folders (categories) of the 8th International Com-petition on Software Verification (SV-COMP 2019, https://sv-comp.sosy-lab.org/2019/ ) as well as from the real-world BusyBox project ( https://busybox. ○ int x := ○ ( A ∗ A < x := x+ ○ Fig. 6: Code base for P (cid:48) . A ≤ [ x=
1] [ − ≤ x ≤ Fig. 7: Over-approximatingdecis. tree at loc. ○ of P (cid:48) . A ≤ [ x=
1] [ x=- Fig. 8: Precise decision treeat loc. ○ of P (cid:48) . net ). The folders from SV-COMP we use are: loops , loop-invgen ( invgen forshort), loop-lit ( lit for short), termination-crafted ( crafted for short). Incase of SV-COMP, we have first selected some numerical programs with integers,and then we have manually added variability (features and directives) in eachof them. In case of BusyBox, we have first selected some programs with numericalfeatures, and then we have simplified those programs so that our tool can handlethem. For example, any reference to a pointer or a library function is replacedwith [ −∞ , + ∞ ]. Table 1 presents characteristics of the selected benchmarks. Welist: the file name (Benchmark), folder where it is located (folder), number offeatures ( | F | ), number of configurations ( | K | ), and number of lines of code (LOC). Performance Results
Table 1 shows the results of analyzing our benchmark filesby using different versions of our lifted static analyses based on decision treesand on tuples. For each version of decision tree-based lifted analysis, there aretwo columns. In the first column,
Time , we report the running time in secondsto analyze the given benchmark using the corresponding version of lifted analysisbased on decision trees. In the second column,
Impr. , we report the speed upfactor for each version of lifted analysis based on decision trees relative to thecorresponding baseline lifted analysis based on tuples ( A T ( I ) vs. A Π ( I ), A T ( O )vs. A Π ( O ), and A T ( P ) vs. A Π ( P )). The performance results confirm that sharingis indeed effective and especially so for large values of | K | . On our benchmarks, ittranslates to speed ups (i.e., ( A T ( − ) vs. A Π ( − )) that range from 1.1 to 4.6 timeswhen | K | < | K | > A T ( I ) isthe fastest version, and A T ( P ) is the slowest but the most precise. Computational tractability
The tuple-based lifted analysis A Π ( − ) may becomevery slow or even infeasible for very large configuration spaces | K | . We have testedthe limits of A Π ( P ) and A T ( − ). We took a method, test kn (), which contains n numerical features A , . . . , A n , such that each numerical feature A i has domaindom( A i ) = [0 , k −
1] = { , . . . , k − } . The body of test kn () consists of n sequen-tially composed -s of the form ( A i = 0) i := i+ For example, test () with two features A and A , whose domain is [0 , ○ int i := ○ ( A = 0) i := i+ ○ ( A = 0) i := i+ ○ Benchmark folder | F | | K | LOC A T ( I ) A T ( O ) A T ( P ) Time Impr. Time Impr. Time Impr. half 2.c invgen × × × heapsort.c invgen × × × seq.c invgen × × × eq1.c loops × × × eq2.c loops × × × sum01*.c loops × × × hhk2008.c lit × × × gsv2008.c lit × × × gcnr2008.c lit × × × Toulouse*.c crafted × × × Mysore.c crafted × × × copyfd.c BusyBox × × × real path.c BusyBox × × × (cid:0) A ∧ A (cid:122) (cid:125)(cid:124) (cid:123) [ i = 2] , A ∧ A (cid:122) (cid:125)(cid:124) (cid:123) [ i = 0] , A ∧ A (cid:122) (cid:125)(cid:124) (cid:123) [ i = 0] , A ∧ A (cid:122) (cid:125)(cid:124) (cid:123) [ i = 1] , A ∧ A (cid:122) (cid:125)(cid:124) (cid:123) [ i = 0] , A ∧ A (cid:122) (cid:125)(cid:124) (cid:123) [ i = 0] , A ∧ A (cid:122) (cid:125)(cid:124) (cid:123) [ i = 1] , A ∧ A (cid:122) (cid:125)(cid:124) (cid:123) [ i = 0] , A ∧ A (cid:122) (cid:125)(cid:124) (cid:123) [ i = 0] (cid:1) Fig. 9: A Π ( P ) results at ○ of test (). A =0 A =0 [ i =0][ i =2] [ i =1] Fig. 10: A D ( P ) results at ○ of test ().Subject to the chosen configuration, the variable i in location ○ can have avalue in the range from value 2 when A and A are assigned to 0, to value 0 when A ≥
1. The analysis results in location ○ of test () obtained using A Π ( P ) and A T ( P ) are shown in Fig. 9 and Fig. 10, respectively. A Π ( P ) uses tuples with 9interval properties (components), while A T ( P ) uses 3 interval properties (leafs).We have generated methods test kn () by gradually increasing variability. Ingeneral, the size of tuples used by A Π ( P ) is k n , whereas the number of leafnodes in decision trees used by A T ( P ) in the final program location is n + 1.The performance results of analyzing test kn , for different values of n and k ,using A Π ( P ) and A T ( P ) are shown in Table 2. In the columns Impr. , we reportthe speed-up of A T ( P ) with respect to A Π ( P ). We observe that A T ( P ) yieldsdecision trees that provide quite compact and symbolic representation of liftedanalysis results. Since the configurations with equivalent analysis results arenicely encoded using linear constraints in decision nodes, the performance of A T ( P ) does not depend on k , but only depends on n . On the other hand, the17able 2: The performance results of analyzing test kn . n k = 3 k = 5 k = 7 A Π ( P ) A T ( P ) Impr. A Π ( P ) A T ( P ) Impr. A Π ( P ) A T ( P ) Impr. × × × × × infeasible ∞× × infeasible ∞× infeasible ∞×
10 278.7 5.591 49.8 × infeasible ∞× infeasible ∞× infeasible ∞× infeasible ∞× infeasible ∞× infeasible ∞× infeasible ∞× infeasible ∞× performance of A Π ( P ) heavily depends on k . Thus, within a timeout limit of 300seconds, the analysis A Π ( P ) fails to terminate for test , test , and test . Insummary, we can conclude that decision trees A T ( P ) can not only greatly speedup lifted analyses, but also turn previously infeasible analyses into feasible. Decision-tree abstract domains have been used in abstract interpretation com-munity recently [17,12,6,26]. Decision trees have been applied for the disjunctiverefinement of Interval domain [17]. That is, each element of the new domain is apropositional formula over interval linear constraints. Segmented decision treeabstract domains has also been defined [12,6] to enable path dependent staticanalysis. Their elements contain decision nodes that are determined either byvalues of program variables [12] or by the branch ( if ) conditions [6], whereas theleaf nodes are numerical properties. Urban and Mine [26] use decision tree-basedabstract domains to prove program termination. Decision nodes are labelledwith linear constraints that split the memory space and leaf nodes contain affineranking functions for proving program termination.Recently, two main styles of static analysis have been a topic of considerableresearch in the SPL community: a dataflow analysis from the monotone framework developed by Kildall [21] that is algorithmically defined on syntactic CFGs, and an abstract interpretation-based static analysis developed by Cousot and Cousot[8] that is more general and semantically defined. Brabrand et. al. [4] lift adataflow analysis from the monotone framework , resulting in a tuple-basedlifted dataflow analysis that works on the level of families. Another efficientimplementation of the lifted dataflow analysis from the monotone framework isbased on using variational data structures [27] (e.g., variational CFGs, variationaldata-flow facts). Midtgaard et. al. [22] have proposed a formal methodologyfor systematic derivation of tuple-based lifted static analyses in the abstractinterpretation framework . A more efficient lifted static analysis by abstractinterpretation obtained by improving representation via BDD domains is given in[14]. Another approach to speed up lifted analyses is by using so-called variabilityabstractions [15], which are used to derive abstract lifted analyses. They tame18he combinatorial explosion of the number of configurations and reduce it tosomething more tractable by manipulating the configuration space. However, theabove lifted analyses are applied to program families with only Boolean features.On the other hand, here we consider C families with both Boolean and numericalfeatures, which represent the majority of industrial embedded code. In this work we employ decision trees and widely-known numerical abstractdomains for automatic inference of invariants in all locations of C programfamilies that contain numerical features. In future, we would like to extendthe lifted abstract domain to also support non-linear constraints [16] and morecomplex heap-manipulating families [5]. An interesting direction for future workwould be to explore possibilities of applying variability abstractions [15] as yetanother way to speed up lifted analyses.
References
1. Sven Apel, Hendrik Speidel, Philipp Wendler, Alexander von Rhein, and DirkBeyer. Detection of feature interactions using feature-aware verification. In , pages 372–375, 2011.2. Sven Apel, Alexander von Rhein, Philipp Wendler, Armin Gr¨oßlinger, and DirkBeyer. Strategies for product-line verification: case studies and experiments. In , pages 482–491,2013.3. Eric Bodden, T´arsis Tolˆedo, M´arcio Ribeiro, Claus Brabrand, Paulo Borba, andMira Mezini. Spllift: statically analyzing software product lines in minutes insteadof years. In
ACM SIGPLAN Conference on PLDI ’13 , pages 355–364, 2013.4. Claus Brabrand, M´arcio Ribeiro, T´arsis Tolˆedo, Johnni Winther, and Paulo Borba.Intraprocedural dataflow analysis for software product lines.
T. Aspect-OrientedSoftware Development , 10:73–108, 2013.5. Bor-Yuh Evan Chang and Xavier Rival. Modular construction of shape-numericanalyzers. In
Semantics, Abstract Interpretation, and Reasoning about Programs:Essays Dedicated to David A. Schmidt on the Occasion of his Sixtieth Birthday,2013. , volume 129 of
EPTCS , pages 161–185, 2013.6. Junjie Chen and Patrick Cousot. A binary decision tree abstract domain functor.In
Static Analysis - 22nd International Symposium, SAS 2015, Proceedings , volume9291 of
LNCS , pages 36–53. Springer, 2015.7. Paul Clements and Linda Northrop.
Software Product Lines: Practices and Patterns .Addison-Wesley, 2001.8. Patrick Cousot and Radhia Cousot. Abstract interpretation: A unified lattice modelfor static analysis of programs by construction or approximation of fixpoints. In
Conference Record of the Fourth ACM Symposium on Principles of ProgrammingLanguages , pages 238–252. ACM, 1977.9. Patrick Cousot and Radhia Cousot. Comparing the galois connection and widen-ing/narrowing approaches to abstract interpretation. In
Programming LanguageImplementation and Logic Programming, 4th International Symposium, PLILP’92,Proceedings , volume 631 of
LNCS , pages 269–295. Springer, 1992.
0. Patrick Cousot, Radhia Cousot, J´erˆome Feret, Laurent Mauborgne, Antoine Min´e,David Monniaux, and Xavier Rival. The astre´e analyzer. In
Programming Languagesand Systems, 14th European Symposium on Programming, ESOP 2005, Proceedings ,volume 3444 of
LNCS , pages 21–30. Springer, 2005.11. Patrick Cousot, Radhia Cousot, J´erˆome Feret, Laurent Mauborgne, Antoine Min´e,and Xavier Rival. Why does astr´ee scale up?
Formal Methods in System Design ,35(3):229–264, 2009.12. Patrick Cousot, Radhia Cousot, and Laurent Mauborgne. A scalable segmenteddecision tree abstract domain. In
Time for Verification, Essays in Memory of AmirPnueli , volume 6200 of
LNCS , pages 72–95. Springer, 2010.13. Patrick Cousot and Nicolas Halbwachs. Automatic discovery of linear restraintsamong variables of a program. In
Conference Record of the Fifth Annual ACMSymposium on Principles of Programming Languages (POPL’78) , pages 84–96.ACM Press, 1978.14. Aleksandar S. Dimovski. Lifted static analysis using a binary decision diagramabstract domain. In
Proceedings of the 18th ACM SIGPLAN International Confer-ence on Generative Programming: Concepts and Experiences, GPCE 2019 , pages102–114. ACM, 2019.15. Aleksandar S. Dimovski, Claus Brabrand, and Andrzej Wasowski. Variabilityabstractions: Trading precision for speed in family-based analyses. In , volume 37 of
LIPIcs ,pages 247–270. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2015.16. Philippe Granger. Static analysis of arithmetical congruences.
International Journalof Computer Mathematics , 30(3-4):165–190, 1989.17. Arie Gurfinkel and Sagar Chaki. Boxes: A symbolic abstract domain of boxes. In
Static Analysis - 17th International Symposium, SAS 2010. Proceedings , volume6337 of
LNCS , pages 287–303. Springer, 2010.18. Christopher Henard, Mike Papadakis, Mark Harman, and Yves Le Traon. Combiningmulti-objective search and constraint solving for configuring large software productlines. In , pages 517–528. IEEE Computer Society, 2015.19. Bertrand Jeannet and Antoine Min´e. Apron: A library of numerical abstractdomains for static analysis. In
Computer Aided Verification, 21st InternationalConference, CAV 2009. Proceedings , volume 5643 of
LNCS , pages 661–667. Springer,2009.20. Christian K¨astner.
Virtual Separation of Concerns: Toward Preprocessors 2.0 . PhDthesis, University of Magdeburg, Germany, May 2010.21. Gary A. Kildall. A unified approach to global program optimization. In
Con-ference Record of the ACM Symposium on Principles of Programming Languages,(POPL’73) , pages 194–206, 1973.22. Jan Midtgaard, Aleksandar S. Dimovski, Claus Brabrand, and Andrzej Wasowski.Systematic derivation of correct variability-aware program analyses.
Sci. Comput.Program. , 105:145–170, 2015.23. Antoine Min´e. The octagon abstract domain.
Higher-Order and Symbolic Compu-tation , 19(1):31–100, 2006.24. Antoine Min´e. Tutorial on static inference of numeric invariants by abstractinterpretation.
Foundations and Trends in Programming Languages , 4(3-4):120–372,2017.25. Daniel-Jesus Munoz, Jeho Oh, M´onica Pinto, Lidia Fuentes, and Don S. Batory.Uniform random sampling product configurations of feature models that have umerical features. In Proceedings of the 23rd International Systems and SoftwareProduct Line Conference, SPLC 2019, Volume A , pages 39:1–39:13. ACM, 2019.26. Caterina Urban and Antoine Min´e. A decision tree abstract domain for provingconditional termination. In
Static Analysis - 21st International Symposium, SAS2014. Proceedings , volume 8723 of
LNCS , pages 302–318. Springer, 2014.27. Alexander von Rhein, J¨org Liebig, Andreas Janker, Christian K¨astner, and SvenApel. Variability-aware static analysis at scale: An empirical study.
ACM Trans.Softw. Eng. Methodol. , 27(4):18:1–18:33, 2018. Appendix
Proof (of Theorem 1).
The proof is by induction on the structure of s . Assume γ T ( t ) = γ ( a ) (*). We consider the two most interesting cases. Case x:= e . ASSIGN( a, x:= e ) applies ASSIGN A ( t, x:= e ) to each component of a .On the other hand, ASSIGN T ( t, x:= e ) applies ASSIGN A ( t, x:= e ) to each leaf a in t . The proof follows by correctness of the assumption (*). Case ( θ ) s . Transfer functions for are identical in both lifted do-mains. We only need to show that FEAT-FILTER( a, θ ) and
FEAT-FILTER T ( t, θ )are identical. This can be shown by induction on θ . Assume that θ is anatomic constraint. FEAT-FILTER( a, θ ) keeps only those components k of a such that k | = θ . On the other hand, FEAT-FILTER T ( t, θ ) first produces alllinear constraints in T that satisfy θ , and then adds them in the tree t . Thus,it keeps only those leaf nodes that satisfy the newly generated constraintsfrom θθ