Lifting Datalog-Based Analyses to Software Product Lines
LLifting Datalog-Based Analyses to Software Product Lines
Ramy Shahin [email protected] of TorontoCanada
Marsha Chechik [email protected] of TorontoCanada
Rick Salay [email protected] of TorontoCanada
ABSTRACT
Applying program analyses to Software Product Lines (SPLs) hasbeen a fundamental research problem at the intersection of Prod-uct Line Engineering and software analysis. Different attemptshave been made to "lift" particular product-level analyses to runon the entire product line. In this paper, we tackle the class ofDatalog-based analyses (e.g., pointer and taint analyses), study thetheoretical aspects of lifting Datalog inference, and implement alifted inference algorithm inside the Soufflé Datalog engine. Weevaluate our implementation on a set of benchmark product lines.We show significant savings in processing time and fact databasesize (billions of times faster on one of the benchmarks) comparedto brute-force analysis of each product individually.
CCS CONCEPTS • Software and its engineering → Automated static analysis ; Software design techniques . KEYWORDS
Software Product Lines, Datalog, Program Analysis, Pointer Analy-sis, Lifting, Doop, Soufflé
ACM Reference Format:
Ramy Shahin, Marsha Chechik, and Rick Salay. 2019. Lifting Datalog-BasedAnalyses to Software Product Lines. In
Proceedings of the 27th ACM Joint Eu-ropean Software Engineering Conference and Symposium on the Foundationsof Software Engineering (ESEC/FSE ’19), August 26–30, 2019, Tallinn, Esto-nia.
ACM, New York, NY, USA, 11 pages. https://doi.org/10.1145/3338906.3338928
Software Product Lines (SPLs) are families of related products, usu-ally developed together from a common set of artifacts. Each prod-uct configuration is a combination of features. As a result, the
Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for components of this work owned by others than theauthor(s) must be honored. Abstracting with credit is permitted. To copy otherwise, orrepublish, to post on servers or to redistribute to lists, requires prior specific permissionand/or a fee. Request permissions from [email protected].
ESEC/FSE ’19, August 26–30, 2019, Tallinn, Estonia © 2019 Copyright held by the owner/author(s). Publication rights licensed to ACM.ACM ISBN 978-1-4503-5572-8/19/08...$15.00https://doi.org/10.1145/3338906.3338928 number of potential products is combinatorial in the number of fea-tures. This high level of configurability is usually desired. However,analysis tools (syntax analyzers, type checkers, model checkers,static analysis tools, etc...) typically work on a single product, notthe whole SPL. Applying an analysis to each product separately isusually infeasible for non-trivial SPLs because of the exponentialnumber of products [20].Since all products of an SPL share a common set of artifacts, ana-lyzing each product individually (usually referred to as brute-forceanalysis ) would involve a lot of redundancy. How to leverage thiscommonality and analyze the whole product line at once, bringingthe total analysis time down, is a fundamental research problem atthe intersection of Product Line Engineering and software analysis.Different attempts have been made to lift individual analyses torun on product lines [4, 9, 11, 16, 18, 22, 24]. Those attempts showsignificant time savings when the SPL is analyzed as a whole com-pared to brute-force analysis. The downside though is the amountof effort required to correctly lift each of those analyses.In this paper, we tackle the class of Datalog-based program analyses.Datalog is a declarative query language that adds logical inferenceto relational queries. Some program analyses (in particular, pointerand taint analyses) can be fully specified as sets of Datalog infer-ence rules. Those rules are applied by an inference engine to factsextracted from a software product. Results are more facts, inferredby the engine based on the rules. The advantage of Datalog-basedanalyses is that they are declarative, concise and can be efficientlyexecuted by highly optimized Datalog engines [15, 19].Instead of lifting individual Datalog-based analyses, we lift a Data-log engine. This way any analysis running on the lifted engine islifted for free. Our approach is not specific to a particular enginethough, and can be implemented in others.
Contributions
In this paper we make the following contributions:(1) We present (cid:154) infer, a Datalog inference algorithm lifted to factsextracted from Software Product Lines. (2) We state the correctnesscriteria of lifted Datalog inference and show that (cid:154) infer is correct.(3) We implement our lifted algorithm as a part of a Datalog engine.We also extend the Doop pointer analysis framework [6] to extractfacts from SPLs. (4) We evaluate our implementation on a sampleof pointer and taint analyses applied to a suite of Java benchmarks.We show significant savings in processing time and fact databasesizes compared to brute-force analysis of one product at a time.For one of the benchmarks, our lifted implementation is billionsof times faster than brute-force analysis (with savings in databasesize of the same order of magnitude). a r X i v : . [ c s . S E ] J u l SEC/FSE ’19, August 26–30, 2019, Tallinn, Estonia Ramy Shahin, Marsha Chechik, and Rick Salay
The rest of the paper starts with a background on SPLs and Datalog(Sec. 2). We provide a theoretical treatment of Datalog inference,how the inference algorithm is lifted, together with correctnesscriteria and a correctness proof in Sec. 3. In Sec. 4, we describethe implementation of our algorithm in the Soufflé engine. Evalu-ation process and results are discussed in Sec. 5. We compare ourapproach to related work in Sec. 6 and conclude (Sec. 7).
In this section, we summarize the basic concepts of Software Prod-uct Lines, Horn Clauses, Datalog and Datalog-based analyses. A Software Product Line (SPL) is a family of related software prod-ucts developed together. Different variants of an SPL have different features , i.e., externally visible attributes such as a piece of function-ality, support for a particular peripheral device, or a performanceoptimization.Definition 1 (SPL).
An SPL L is a tuple ( F , Φ , D , ϕ ) where: (1) F isthe set of features s.t. an individual product can be derived from L via a feature configuration ρ ⊆ F . (2) Φ ∈ Prop ( F ) is a propositionalformula over F defining the valid set of feature configurations. Φ iscalled a Feature Model (FM) . The set of valid configurations definedby Φ is called Conf (L) . (3) D is a set of program elements, calledthe domain model . The whole set of program elements is sometimesreferred to as the . (4) ϕ : D → Prop ( F ) isa total function mapping each program element to a proposition( feature expression ) defined over the set of features F . ϕ ( e ) is calledthe Presence Condition (PC) of element e , i.e. the set of productconfigurations in which e is present. Example.
Consider the annotative
Java product line with featureset F = { FA , FB } , shown in Listing 1. Features are annotated us-ing the C Pre-Processor(CPP) conditional compilation directives. Bydefining or not-defining macros corresponding to features, differentproducts can be generated from this product line. One example is theproduct on Listing 2, with FA not defined and FB defined.Here a single code-base (domain model D ) is maintained, wheredifferent pieces of code are annotated with feature expressions. Forexample, tokens on line 10 are annotated with ¬ FA . That is, ¬ FA isthe PC of these tokens. Similarly, tokens on line 13 have the PC FB .This SPL allows all four feature combinations, so its feature model Φ is True . A Horn Clause (HC) is a disjunction of uniquepropositional literals with at most one positive literal. For example, (¬ a ∨ ¬ b ∨ c ∨ ¬ d ) is an HC which can be also written as a reverse-implication ( c ← ( a ∧ b ∧ d )) , where c is called the head and ( a ∧ b ∧ d ) is called the body of the clause. The language of HCs is a fragmentof Propositional Logic that can be checked for satisfiability in linear 1 c l a s s P a r e n t { public
O b j e c t f ; }2 c l a s s
ClassA extends
P a r e n t { }3 c l a s s
C l a s s B extends
P a r e n t { }45 ClassA o1 = new
ClassA ( ) ;6 C l a s s B o2 = new
C l a s s B ( ) ;7 i f d e f
FA8 P a r e n t o3 = o1 ;9 e l s e
10 P a r e n t o3 = o2 ;11 endif
12 i f d e f
FB13 o2 . f = o1 ;14 e l s e
15 o2 . f = o2 ;16 endif
17 O b j e c t r = o3 . f ;
Listing 1: A product line with features FA and FB . public O b j e c t f ; }2 C l a s s ClassA extends
P a r e n t { }3 C l a s s C l a s s B extends
P a r e n t { }45 ClassA o1 = new
ClassA ( ) ;6 C l a s s B o2 = new
C l a s s B ( ) ;7 P a r e n t o3 = o2 ;8 o2 . f = o1 ;9 O b j e c t r = o3 . f ;
Listing 2: A product with a configeration (¬ FA ∧ FB ) . time, as opposed to general propositional satisfiability which isNP-complete [14]. Datalog is a declarative database query languagethat extends relational algebra with logical inference [13]. Data-log inference rules are HCs in First Order Logic, where atoms arepredicate expressions, not just propositional literals. A fact is aground rule with only a head and no body. Syntactically, the ’:-’symbol is usually used instead of backward implication, and atomsin the body are separated by commas instead of the conjunctionsymbol.Fig. 1a defines the grammar of Datalog clauses as follows: (1) build-ing blocks are finite sets of constants, variables and predicate sym-bols; (2) a term is a constant or a variable symbol; (3) a predicateexpression is an n -ary predicate applied to arguments;(4) a fact is aground predicate expression, i.e., all of its arguments are constants;(5) a rule is a Horn Clause of predicate expressions; and (6) a Datalogclause is either a fact or a rule.A Datalog program is a finite set of rules, usually referred to asthe
Intensional Database (IDB) , which operates on a finite set offacts called the
Extensional Database (EDB) . The inference algorithm ifting Datalog-Based Analyses to Software Product Lines ESEC/FSE ’19, August 26–30, 2019, Tallinn, Estonia (explained next) repeatedly applies the rules to the facts, inferringnew facts and adding them to the EDB, until a fixed point is reached(i.e., no more new facts can be inferred).
For each rule R , the algo-rithm checks to see if the EDB has facts fulfilling the premises of R , with a consistent assignment of variables to constants (Fig. 1b).If it does, the head of that rule is inferred as a new fact F . If F doesn’t already exist in the EDB, it is added to it. Newly inferredfacts may trigger some of the rules again; this process continuesuntil a fixed point is reached, i.e., no new facts are inferred. Thisalgorithm (called the forward chaining algorithm [7]) is guaranteedto terminate because it does not create any new constants, and runsin polynomial time w.r.t. the number of input clauses [7]. Data: input: IDB, EDB
Result:
EDB + inferred clauses repeat fixpoint = True; foreach ( C :- s , ..., s n ) ∈ IDB doforeach ( γ , f , ..., f n ) , f i ∈ EDB , [ γ ] s i = f i doif [ γ ] C (cid:60) EDB then fixpoint = False;
EDB = EDB ∪ {[ γ ] C } endendenduntil fixpoint ;return EDB; Algorithm 1:
Inference algorithm infer (forward chaining).
Some program analyses [2, 6, 10, 12] can be written in Datalog assets of clauses. Facts relevant to the analysis are extracted from theprogram to be analyzed, and then fed into a Datalog engine togetherwith the analysis clauses. Fact extraction is usually analysis-specificbecause different analyses work on different aspects of the program.One example of Datalog-based analyses is pointer analysis.
Pointer analysis [25] determines which objects might be pointed toby a particular program expression. This whole-program analysisis over-approximating in the sense that it returns a set of objectsthat might be pointed to by each pointer, possibly with false pos-itives. Fig. 2a shows a set of Datalog rules for a simple pointeranalysis [21]. Each predicate defines a relation between differentartifacts. For example,
VarPointsTo(v,h) states that pointer v mightpoint to heap object h . The first three rules specify the conditionsfor this predicate to hold: either a new object is allocated and apointer is initialized; a pointer that already points to an object is as-signed to another pointer; or an object field points to a heap object,and that field is assigned to another pointer. The fourth rule statesthat assigning a value to an object field results in that field pointingto the same object as the right-hand-side of the assignment. Fig. 2b shows the facts corresponding to the program in Listing 2.The first two are object allocation facts; the third is an assign-ment fact, and the fourth and the fifth are store and load facts,respectively. Fig. 2c is the results of running the Datalog inferencealgorithm on those rules and facts. The example in Fig. 2a is calleda context-insensitive pointer analysis because it does not distinguishbetween different objects, call sites and types in a class hierarchy.More precise context-sensitive pointer analyses take different kindsof context into consideration. For example, a analysis considers method call sites. A analysis(similarly, ) includes object allocation sites (types ofobjects allocated) as part of the context. In this section, we present our approach to lifting Datalog abstractsyntax and the Datalog inference algorithm. We also formally statethe correctness criteria for lifted Datalog inference, and outline acorrectness proof of our lifted algorithm.
When analyzing a single software product, an initial set of facts isextracted from product artifacts, and analysis rules are applied tothose facts, eventually adding newly inferred facts to the initial set.In the case of SPLs, a fact might be valid only in a subset of products,and not necessarily the entire product space. We have to associatea representation of that subset with each of the extracted facts.Similar to SPL annotation techniques, a
Presence Condition (PC) is asuccinct representation that can be used to annotate facts.Facts annotated with PCs are called lifted facts , and are stored in alifted Extensional Database – (cid:100)
EDB . Given a feature expression ρ ,we define (cid:100) EDB | ρ to be the set of facts from (cid:100) EDB which only existin the product set defined by ρ : (cid:100) EDB | ρ = { f | ( f , pc ) ∈ (cid:100) EDB ∧ sat ( pc ∧ ρ )} When the Datalog inference algorithm is applied to annotated facts,we have to take the PCs attached to facts into account. Wheneverthe inference algorithm generates a new fact, we need to associatea PC to it. If f new is generated from premises f , f , ..., f n , with PCs pc , ..., pc n , then pc new attached to f new should be the conjunctionof the input PCs, i.e., pc ∧ ... ∧ pc n . Intuitively, pc new representsthe set of products in which f new exists, which is the intersectionof the sets of products in which the premises exist.To avoid having too many generated facts that are practically vac-uous, we check pc new for satisfiability. If it isn’t satisfiable, thenits corresponding fact exists in the empty set of products, i.e., non-existent. Those facts can be safely removed from (cid:100) EDB , potentiallyimproving the performance of inference.
Algorithm 2 takes a set of Datalog rules (IDB) and a set of annotatedfacts ( (cid:100)
EDB ) as input, and returns all inferred clauses, annotated
SEC/FSE ’19, August 26–30, 2019, Tallinn, Estonia Ramy Shahin, Marsha Chechik, and Rick Salay S = { finite set of constant symbols } V = { finite set of variables } P = { finite set of predicate symbols } T :: = S | VL :: = P ( T , ..., T n ) F :: = P ( S , ..., S m ) R :: = L :- L , ..., L k D :: = F | R (a) Datalog Grammar. γ : V → S [ γ ] C = C [ v / γ ( v )] , for each free variable v in C (b) Variable assignment function and substitution for clause C . PC :: = f | ¬ PC | PC ∧ PC | PC ∨ PC (cid:98) D :: = ( F , PC ) | R (c) Grammar for lifted Datalog clauses. Syntactic category f is set offeature names. Figure 1: (a) Grammar of Datalog clauses, (b) variable assignment function and substitution, and (c) lifted Datalog clauses.
V a r P o i n t s T o ( v1 , h1 ) : − New ( v1 , h1 ) .V a r P o i n t s T o ( v1 , h2 ) : − A s s i g n ( v1 , v2 ) , V a r P o i n t s T o ( v2 , h2 ) .V a r P o i n t s T o ( v1 , h2 ) : − Load ( v1 , v2 , f ) ,V a r P o i n t s T o ( v2 , h1 ) ,HeapPointsTo ( h1 , f , h2 ) .HeapPointsTo ( h1 , f , h2 ) : − S t o r e ( v1 , f , v2 ) ,V a r P o i n t s T o ( v1 , h1 ) ,V a r P o i n t s T o ( v2 , h2 ) . (a) Pointer Analysis Rules.
New ( " o1 " , "A" ) . / / l i n e 5
New ( " o2 " , " B " ) . / / l i n e 6
A s s i g n ( " o3 " , " o2 " ) . / / l i n e 7
S t o r e ( " o2 " , " f " , " o1 " ) . / / l i n e 8
Load ( " r " , " o3 " , " f " ) . / / l i n e 9 (b) Facts extracted from Listing 2.
V a r P o i n t s T o ( " o1 " , "A" ) .V a r P o i n t s T o ( " o2 " , " B " ) .V a r P o i n t s T o ( " o3 " , " B " ) .HeapPointsTo ( " B " , " f " , "A" ) .V a r P o i n t s T o ( " r " , "A" ) . (c) Results of applying the rules to the extracted facts.
Figure 2: (a) Context-insensitive pointer analysis rules (simplistic), (b)input facts, and (c) output facts for program in Listing 2. C :- s , ..., s n [ γ ] s = f ... [ γ ] s n = f n ∀ ( ≤ i ≤ n ) , f i ∈ EDB γ : V → S [ γ ] C MPC :- s , ..., s n [ γ ] s = f ... [ γ ] s n = f n ∀ ( ≤ i ≤ n ) , ( f i , pc i ) ∈ (cid:100) EDB γ : V → S ([ γ ] C , pc ∧ ... ∧ pc n ) (cid:100) MP Figure 3: Modus ponens for (a) Datalog clauses and (b) lifted Datalog clause inference. with their corresponding presence conditions. The structure of thisalgorithm is similar to that of Algorithm 1, with the exception ofconjoining the presence conditions of the facts used in inference,and assigning the conjunction as the presence condition of theresult. There are four cases for ( c , pc c ) to consider: (1) if sat ( pc c ) isFalse ( pc c is not satisfiable), then this result is ignored because itdoesn’t exist in any valid product; (2) if ( c , pc c ) ∈ (cid:100) EDB , then thisresult is also ignored because it already exists for the same set ofproducts; (3) if ( c , pc d ) ∈ (cid:100) EDB , where pc d (cid:44) pc c , then ( c , pc d ) isreplaced with ( c , pc d ∨ pc c ) in (cid:100) EDB . This means we are expandingthe already existing set of products in which c exists to also includethe set denoted by pc c ; (4) if c doesn’t exist at all in (cid:100) EDB , we add ( c , pc c ) to it. For example, when the lifted inference algorithm isapplied to the rules in Fig. 2a and annotated facts in Fig. 5, the resultis the following: V a r P o i n t s T o ( " o1 " , "A" ) @ True .V a r P o i n t s T o ( " o2 " , " B " ) @ True .V a r P o i n t s T o ( " o3 " , "A" ) @ FA .V a r P o i n t s T o ( " o3 " , " B " ) @ ! FA .HeapPointsTo ( " B " , " f " , "A" ) @ FB .HeapPointsTo ( " B " , " f " , " B " ) @ ! FB .V a r P o i n t s T o ( " r " , "A" ) @ FA .V a r P o i n t s T o ( " r " , " B " ) @ ! FA . When applying the lifted inference algorithm (cid:154) infer to a set of rules
IDB and a set of annotated facts (cid:100)
EDB , we expect the result to beexactly the union of the results of applying infer to facts from eachproduct individually. Moreover, each clause in the result of (cid:154) infer ifting Datalog-Based Analyses to Software Product Lines ESEC/FSE ’19, August 26–30, 2019, Tallinn, Estonia
Data: input:
IDB , (cid:100) EDB
Result: (cid:100)
EDB + annotated inferred clauses repeat fixpoint = True ; foreach ( C :- s , ..., s n ) ∈ IDB doforeach ( γ , ( f , pc ) , ..., ( f n , pc n )) , ( f i , pc i ) ∈ (cid:100) EDB , [ γ ] s i = f i do pc c = pc ∧ ... ∧ pc n ; if sat( pc c ) thenif ([ γ ] C , pc c ) (cid:60) (cid:100) EDB then fixpoint = False ; if ∃ pc d , ([ γ ] C , pc d ) ∈ (cid:100) EDB then pc c = pc c ∨ pc d ; (cid:100) EDB = (cid:100) EDB − {([ γ ] C , pc d )} end (cid:100) EDB = (cid:100) EDB ∪ {[ γ ] C , pc c )} endendendenduntil fixpoint ;return (cid:100) EDB ; Algorithm 2:
Lifted inference algorithm (cid:154) infer.has to be properly annotated (i.e., its presence condition has torepresent exactly the set of products having this clause in theirun-lifted analysis results).Theorem 1.
Given an SPL L = ( F , Φ , D , ϕ ) , a set of rules IDB , anda set of lifted facts (cid:100)
EDB annotated with feature expressions over F : ∀ ( ρ ∈ Conf (L)) , (cid:154) infer ( (cid:100) EDB )| ρ = infer ( (cid:100) EDB | ρ ) Proof. • C ∈ (cid:154) infer ( (cid:100) EDB )| ρ = ⇒ C ∈ infer ( (cid:100) EDB | ρ ) By structural induction over the derivation tree of C : Base Case: ( C , pc ) ∈ (cid:100) EDB , where sat ( pc ∧ ρ ) . Then C ∈ (cid:100) EDB | ρ (by definition of restriction operator). Since inputs are alreadyincluded in the output of infer, C ∈ infer ( (cid:100) EDB | ρ ) . Induction Hypothesis:
Given a rule R = C :- s , ..., s n , and avariable assignment γ , ∀ ( ≤ i ≤ n ) : [ γ ] s i ∈ (cid:154) infer ( (cid:100) EDB )| ρ = ⇒ [ γ ] s i ∈ infer ( (cid:100) EDB | ρ ) Induction Step: C is derived by (cid:100) MP (Fig. 3) from rule R . Sinceall the premises of C are in infer ( (cid:100) EDB | ρ ) (induction hypothesis),then so is C ( MP ). • C ∈ infer ( (cid:100) EDB | ρ ) = ⇒ C ∈ (cid:154) infer ( (cid:100) EDB )| ρ By structural induction over the derivation tree of C : Base Case:
Assume ( C , pc ) ∈ (cid:100) EDB , for some pc , where sat ( pc ∧ ρ ) . Then ( C , pc ) ∈ (cid:154) infer ( (cid:100) EDB ) (input included in output of (cid:154) infer).Since pc ∧ ρ is satisfiable, then C ∈ (cid:154) infer ( (cid:100) EDB )| ρ (definition ofrestriction). Figure 4: The Doop architecture.Induction Hypothesis:
Given a rule R = C :- s , ..., s n , and avariable assignment γ , ∀ ( ≤ i ≤ n ) : [ γ ] s i ∈ infer ( (cid:100) EDB | ρ ) = ⇒ [ γ ] s i ∈ (cid:154) infer ( (cid:100) EDB )| ρ Induction Step: C is derived by MP (Fig. 3) from rule R . Sinceall the premises of C are in (cid:154) infer ( (cid:100) EDB )| ρ (induction hypothesis),then so is C ( (cid:100) MP ). □ In this section, we explain how we lift the Doop pointer and taintanalysis framework, together with its underlying Soufflé Datalogengine.
To illustrate and evaluate the Datalog lifting approach outlinedin Sec. 3, we modified the Doop [6] Datalog-based pointer analy-sis framework , together with its underlying Soufflé [15] Datalogengine . Fig. 4 outlines the Doop architecture. Doop is an exten-sible family of pointer and taint analyses implemented as Datalogrules. In addition, it includes a fact extractor from Java bytecode.Doop users select a particular analysis among the available analysesthrough a command-line argument. The rules corresponding to thechosen analysis (the IDB), together with the extracted facts (theEDB), are then passed to Soufflé.Since Doop extracts syntactic facts, we need to identify the PCs ofeach of the syntactic tokens contributing to a fact, and associate theconjunction of those PCs as the fact PC. We had to do this for eachtype of fact extracted by Doop. The fact PC is just added to a factas a trailing PC field, prefixed with ’@’. Facts with no PC field areassumed to belong to all products (an implicit PC of True ).Our Doop modifications were only in the fact extractor. None of theDoop Datalog rules were changed. Our fact extraction modificationswere scattered because extractors for different kinds of facts are Available online at https://bitbucket.org/rshahin/doop Available online at https://github.com/ramyshahin/souffle
SEC/FSE ’19, August 26–30, 2019, Tallinn, Estonia Ramy Shahin, Marsha Chechik, and Rick Salay
New ( " o1 " , "A" ) @ True . / / l i n e 5
New ( " o2 " , " B " ) @ True . / / l i n e 6
A s s i g n ( " o3 " , " o1 " ) @ FA . / / l i n e 8
A s s i g n ( " o3 " , " o2 " ) @ ! FA . / / l i n e 10
S t o r e ( " o2 " , " f " , " o1 " ) @ FB . / / l i n e 13
S t o r e ( " o2 " , " f " , " o2 " ) @ ! FB . / / l i n e 15
Load ( " r " , " o3 " , " f " ) @ True . / / l i n e 17
Figure 5: Annotated facts extracted from Listing 1.Figure 6: Soufflé architecture. implemented separately in Doop. However, all those changes weresystematic and non-invasive. In total we modified only about 100lines of code in the Doop fact extractor.
As seen in Fig. 6, a Soufflé program is first parsed and translatedinto a Relational Algebra Machine (RAM) program. RAM is a lan-guage with relational algebra constructs, in addition to a fixed-pointlooping operator. Based on a command-line argument, Soufflé theneither interprets the RAM program on the fly, or synthesizes C++code that is semantically equivalent to the RAM program. SinceC++ programs are compiled (typically by optimizing compilers)into native machine code, native executables are at least an orderof magnitude faster than interpreted analyses [15]. In this paper,we only cover the Soufflé interpreter.At the syntax level, we extend the Soufflé language with fact an-notations . Those are propositional formulas prefixed with ’@’. TheSoufflé parser is extended with a syntactic category for proposi-tional formulas. AST nodes for facts are extended with a PC field,with a default value of
True . Propositional variables are added to asymbol table separate from that holding Soufflé identifiers.As a part of compiling Soufflé programs into RAM, we turn syntacticpresence conditions into Binary Decision Diagrams (BDDs). Weuse CUDD [26] as a BDD engine, and on top of it maintain a map from textual presence conditions to their corresponding canonicalBDDs. As stated in (cid:154) infer, when facts are resolved with a rule, theconjunction of their PCs becomes the conclusion’s PC.Soufflé implements several indexing and query optimization tech-niques to improve inference time. To keep our changes independentof those optimizations, we add the presence condition as a fieldopaque to the query engine. We only manipulate this field as a PCwhen performing clause resolution, which takes place at a higherlevel than the details of indexing and query processing. This waywe avoid touching relatively complex optimization code, whilepreserving the semantics of our lifted inference algorithm.Some relational features of Soufflé were not lifted. For example,aggregation functions (sum, average, max, min, etc...) still returnsingleton values. None of those functions is used by Doop on liftedfacts, so this does not affect the correctness of our results. Westill plan to address this general limitation in the future though.
Table 1: Java product lines used for evaluation.
Benchmark Size (KLOC) Features Valid ConfigurationsBerkeleyDB 70 42 8,759,844,864GPL 1.4 21 4,176Lampiro 45 18 2,048MM08 5.7 27 784Prevayler 7.9 5 32We evaluate the performance of our lifted version of Doop (to-gether with lifted Soufflé) on five Java benchmark product lines(previously used in the evaluation of other lifted analyses [4, 5]).For each of the benchmarks, Table 1 lists its size (in thousands oflines of code), number of features, and number of valid configu-rations according to its feature model. For example, BerkeleyDBis about 70,000 lines of code, comprised of 42 features, and hasabout 8.76 billion valid product configurations. We evaluate threeDoop analyses: context-insensitive pointer analysis ( insens ), one-type heap-sensitive pointer analysis ( ), and one-call-siteheap-sensitive taint analysis (
Taint-1Call+Heap ). For taint analy-sis, we use the default sources, sinks, transform and sanitizationfunctions curated in Doop for the JDK and Android [12]. All experi-ments were performed on a Quad-core Intel Core i7-6700 processorrunning at 3.4GHZ, with 16GB RAM and hyper-threading enabled,running 64-bit Ubuntu Linux (kernel version 4.15).Pointer and taint analyses work on the whole program, includinglibrary dependencies. Since general-purpose libraries usually donot have any variability, the comparison between lifted and single-product analyses is independent of them. Moreover, time spentin analyzing library code, and space taken by their facts, mightskew the overall results. We restrict our experiments to applicationcode and direct dependencies only using the Doop command-lineargument " --Xfacts-subset APP_N_DEPS ".Doop extracts its facts from Java byte-code. However, SPL anno-tation techniques work at the source-code level. Feature selection ifting Datalog-Based Analyses to Software Product Lines ESEC/FSE ’19, August 26–30, 2019, Tallinn, Estonia usually takes place at compile-time, which means an SPL codebaseis compiled into a single product. To get around this limitation,we had to choose benchmarks that only have disciplined anno-tations [17], in the sense that adding or removing an annotationpreserves the syntactic correctness of the 150% representation. Thisis not a limitation of our lifted inference algorithm though.The benchmarks we chose are annotated using CIDE [17], whichuses different highlighting colors as presence conditions. We hadto extract this color information from CIDE, together with the map-ping from colors to locations of tokens (line and column number) insource files. Our fact extractor uses byte-code symbol informationto locate tokens, and assign their presence conditions based onCIDE colors.The primary goal of our experiments is to compare the performanceof lifted analyses applied to the SPLs to that of running the corre-sponding product-level analyses on each of the valid configurationsindividually. Since the number of valid product configurations forsome benchmarks is relatively big, it is neither practical nor particu-larly useful to enumerate all of the valid products and analyze them.Instead, for each SPL, we run the product-level analysis on twocode-base subsets: the base code common across all variants, andthe 150% representation (the whole SPL code-base, implementingall feature behaviors). Although those two extremes are not neces-sarily valid products, they are the lower bound and the upper boundin terms of code size, and averaging over them gives an "average"valid product approximation. The expected brute-force performance is the average valid product performance (
P-Avg ) multiplied by thenumber of valid configurations.We split our evaluation into two parts: fact extraction and inference,and evaluate performance in terms of both the processing timeand space (size of the fact database in kilobytes(KB)). Our primaryresearch questions are:
RQ1 : How do fact extraction time (and size of the extracted factdatabase) of lifted analyses compare to brute-force fact extrac-tion?
RQ2 : How do the Soufflé inference time, and the size of the inferreddatabase, of lifted analyses compare to brute-force analysis?
Table 2 summarizes the "average" performance of product-level factextraction (
P-Avg ) and that of the lifted fact extraction for the entireproduct line (
SPL ). For each of the three analyses, we compare factextraction time (in milliseconds) and the size of the extracted data-base (in KB). For example, for context-insensitive analysis, averagefact extraction time of a single product of Prevayler is 1,416ms,and the average size of the extracted fact database is 3,230KB. Onthe other hand, extracting facts from the whole Prevayler SPL atonce takes 1,554ms, and the extracted fact database is 4,407KB. Thedifference between P-Avg Time and SPL Time is very small forall three analyses and five benchmarks, which is expected sinceextraction is syntactic and thus its time is proportional to code-basesize, not the number of features. Size of the extracted database is
Figure 7: Context-insensitive Fact extraction speedup andDB savings factors: SPL vs average product. noticeably bigger for lifted extraction (DB SPL columns) becauselifted facts are augmented with presence conditions.To evaluate the savings attributed to lifted fact extraction comparedto brute-force extraction in terms of time and space, we compute thespeedup and space saving factors (P-Avg * | Conf (L)| / SPL). Fig. 7shows a log-scale bar graph of lifted fact extraction speedup andspace savings for context-insensitive analysis. The other two anal-yses exhibit a similar trend and are omitted here. The figure showsthat the time and space savings are proportional to the number ofvalid configurations of the product line. For example, Lampiro has2048 valid configurations, and its lifted fact extraction is 2020 timesfaster than brute-force, with a database 2045 times smaller than thetotal space of brute-force databases. On the other hand, Prevaylerhas only 32 valid configurations, with an insens lifting speedup fac-tor of 29, and a space savings factor of 23. Since different analysestypically require different facts, the size of the fact database alsovaries from one analysis to another. Experimental results do notshow a direct correlation between an analysis and the size of its factdatabase. For example, in Lampiro, the
Taint-1Call+Heap databasesare significantly bigger than those of . BerkeleyDB, onthe other hand, exhibits the opposite trend.
Table 3 summarizes the performance of lifted analyses on the entireproduct line (SPL) and that of product-level analyses on an averageproduct (P-Avg). For example, running on an averageMM08 product, inference is estimated to take 4,596ms, resulting in adatabase of 8,788KB. Running the same analysis on the whole MM08product line though takes 8,788ms, resulting in a 13,021KB database.Fig. 8 is a log-scale bar graph of the speedup factor and the DB spacesavings factor for insens . Speedup and space savings trends are againproportional to the number of valid configurations. For example,for BerkeleyDB, lifted insens is about 7.4 billion times faster thanbrute-force, with a DB 5.6 billion times smaller. All three analysesshow similar speedup and disk space savings trends.
SEC/FSE ’19, August 26–30, 2019, Tallinn, Estonia Ramy Shahin, Marsha Chechik, and Rick Salay
Table 2: Fact extraction time (in ms) and DB size (in KB): Average Product (P-Avg) vs SPL for all three analyses. insens 1Type+Heap Taint-1Call+HeapTime(ms) DB(KB) Time(ms) DB(KB) Time(ms) DB(KB)
P-Avg SPL P-Avg SPL P-Avg SPL P-Avg SPL P-Avg SPL P-Avg SPLBerkeleyDB
GPL
816 814 175 409 782 876 245 593 789 802 188 462
Lampiro
MM08
Prevayler
Table 3: Inference time (in ms) and inferred DB size (in KB): Average Product (P-Avg) vs SPL for the three analyses. insens 1Type+Heap Taint-1Call+HeapTime(ms) DB(KB) Time(ms) DB(KB) Time(ms) DB(KB)
P-Avg SPL P-Avg SPL P-Avg SPL P-Avg SPL P-Avg SPL P-Avg SPLBerkeleyDB
GPL
Lampiro
MM08
Prevayler
Figure 8: Context-insensitive Inference speedup and DB sav-ings factors: SPL vs average product.
Recall that the theoretic bottleneck of the lifted inference algorithm(Algorithm 2) is the satisfiability checks performed when conjoin-ing two PCs. Since propositional satisfiability is
NP-complete , wewanted to evaluate whether it is a bottleneck in practice. WhileSAT checks are not required to maintain correctness of the liftedinference algorithm, we perform them in order to avoid generatingspurious facts that do not exist in any product. An UNSAT presencecondition denotes an empty set of products, but what about PCsdenoting sets of invalid product configurations? The
Feature Model(FM) of a product line specifies which product configurations arevalid and which are not. If a fact belongs only to a set of configura-tions excluded by the FM, then this fact can be removed. Removingspurious facts saves DB space, but, more importantly, keeps the setof facts searched by the inference algorithm as small as possible,improving the overall performance. We study the impact of SATchecking and using the FM below.
Figure 9: Inference time: SPL vs. SPL with SAT checking dis-abled for all three analyses.RQ2.1 : How much does SAT checking contribute to the processingtime of the lifted Datalog engine?Table 4 summarizes the performance of our lifted analyses andthe same analyses with SAT checking disabled (noSAT). Fig. 9and Fig. 10 show the noSAT-associated speedup and database sizesavings, respectively. Recall that we represent PCs using BDDs.SAT checking over BDDs is a constant-time operation [14]. Sinceconjoining and disjoining BDDs can take exponential time, wedisable all BDD operations, keeping only the textual representationof PCs. A speedup factor below 1.0 means that disabling SAT checksslows down inference. This is what we observed for most of thebenchmarks. We believe that the slowdown is because of the use oftextual representation of PCs which resulted in a much bigger PCtable, with slower lookup times. We also do not see any DB savingsbecause non-canonically represented PCs tend to be longer thanBDD-based ones, resulting, on average, in more characters (andbytes) per PC. We note that the number of features is relatively ifting Datalog-Based Analyses to Software Product Lines ESEC/FSE ’19, August 26–30, 2019, Tallinn, Estonia
Table 4: SAT vs. noSAT. Time in milliseconds, Inferred DB in KB. insens 1Type+Heap Taint-1Call+HeapTime(ms) DB(KB) Time(ms) DB(KB) Time(ms) DB(KB)
SPL noSAT SPL noSAT SPL noSAT SPL noSAT SPL noSAT SPL noSATBerkeleyDB
GPL
Lampiro
MM08
Prevayler
Figure 10: Inferred database size (KB): SPL vs. SPL with SATchecking disabled for all three analyses.Figure 11: Inference time: SPL vs. SPL with FM for all threeanalyses. low in all of our benchmarks. BDD-based SAT solving is known toperform well on such small number of propositional variables. Withproduct lines of hundreds or thousands of features, it is possiblethat noSAT might result in performance improvements.
RQ2.2 : What is the effect of taking the feature model (FM) of anSPL into consideration when running Datalog variability-awareanalyses, in terms of inference time and DB size?Table 5 compares the performance of our lifted analyses against thesame analyses using the feature model (SAT+FM). SAT+FM entailsconjoining the feature model to each PC before performing the
Figure 12: Inferred database size (KB): SPL vs. SPL with FMfor all three analyses. satisfiability check. If the PC encodes a set of products excluded bythe FM, the conjunction is unsatisfiable. Fig. 11 and Fig. 12 showthe SAT+FM-associated speedup and space savings, respectively.For most of the experiments, using the FM results in slowdownsand larger DBs. FM usage reduces the number of inferred facts, asobserved in Table 6, but the reduction is relatively small. On theother hand, PCs now conjoined with the FM are more complex,taking longer to construct (hence the performance penalty), andmore bytes to store (hence the bigger DBs).
For internal threats, we note that all of our benchmarks are CIDEproduct lines. While our lifting approach and implementation arenot specific to CIDE, CIDE limitations make the benchmarks bi-ased towards specific annotation patterns. For example, only well-behaved annotations are allowed. Furthermore, since feature expres-sions do not support feature negation, all input PCs are satisfiable,as well as conjunctions over those PCs. We experimented with dis-abling satisfiability checks to see how much they affect performance(while they always return true for this set of benchmarks). As notedpreviously, the overhead of those checks is marginal.Another internal threat is that we approximate average product per-formance using only two samples (the maximum and the minimum).These averages are not expected to be completely accurate, but areused to give a brute-force estimate. Our experiments show perfor-mance improvement of several orders of magnitude, so we believe
SEC/FSE ’19, August 26–30, 2019, Tallinn, Estonia Ramy Shahin, Marsha Chechik, and Rick Salay
Table 5: SPL vs. SPL+FM. Time in milliseconds, Inferred DB in KB. insens 1Type+Heap Taint-1Call+HeapTime(ms) DB(KB) Time(ms) DB(KB) Time(ms) DB(KB)
SPL SPL+FM SPL SPL+FM SPL SPL+FM SPL SPL+FM SPL SPL+FM SPL SPL+FMBerkeleyDB
GPL
Lampiro
MM08
Prevayler
Table 6: The number of inferred facts with and without theFeature Model (FM). insens 1Type+Heap Taint-1Call+Heap
SPL SPL+FM SPL SPL+FM SPL SPL+FMBerkeleyDB
GPL
Lampiro
MM08
Prevayler that our approximation (compared to more elaborate configurationsampling techniques) can be tolerated.Finally, all of the the analyses we used come from the Doop frame-work. Again, nothing in our lifted inference engine is Doop-specific,but extraction of annotated features is a part of Doop. Other frame-works can extract fact annotations in a similar fashion.
Different kinds of software analyses have been re-implemented tosupport product lines [27]. For example, the TypeChef project [16,18] implements variability aware parsers [18] and type check-ers [16] for Java and C. The SuperC project [11] is another Clanguage variability-aware parser. The Henshin [1] graph transfor-mation engine was lifted to support product lines of graphs [24].Those lifted analyses were written from scratch, without reusingany components from their respective product-level analyses. Ourapproach, on the other hand, lifts an entire class of product-levelanalyses written as Datalog rules, by lifting their inference engine(and extracting presence conditions together with facts).SPL
Lift [4] extends IFDS [23] data flow analyses to product lines.Model checkers based on Featured Transition Systems [8] checktemporal properties of transition systems where transitions canbe labeled by presence conditions. Both of these SPL analyses usealmost the same single-product analyses on a lifted data represen-tation. At a high level, our approach is similar in the sense thatthe logic of the original analysis is preserved, and only data isaugmented with presence conditions. Still, our approach is uniquebecause we do not touch any of the Datalog rules comprising theanalysis logic itself.Syntactic transformation techniques have been suggested for lift-ing abstract interpretation analyses to SPLs [22]. This line of workoutlines a systematic approach to lifting abstract interpretation analyses, together with correctness proofs. Yet this approach isnot automated which means lifted analyses still need to be writ-ten from scratch, albeit while being guided by some systematicguidelines.Datalog engines have been used as backends by several programanalysis frameworks. In addition to Doop, examples of analysisframeworks based on logic programming include XSB [10], bddb-ddb [28] and Paddle [19]. DIMPLE [2] is another declarative pointeranalysis framework where rules are written in Prolog. To the bestof our knowledge, all those program analysis frameworks havebeen targeting single products. Our primary contribution is liftingthis class of analyses to SPLs in a generic way, without makingany analysis-specific assumptions. In addition, our approach canbe systematically implemented in any Datalog engine used by anyof those frameworks.
In this paper, presented an algorithm for lifting Datalog-based soft-ware analyses to SPLs. We implemented this algorithm in the Souf-flé Datalog engine, and evaluated performance of three programanalyses from the Doop framework on a suite of SPL benchmarks.Comparing our lifted implementation to brute-force analysis ofeach product individually, we show significant savings in terms ofprocessing time and database size.Our Soufflé implementation only lifts the interpreter but not thecode generator (compiler). Aggregation functions (e.g., sum, count)are not currently lifted either. We plan to address these implemen-tation level limitations in future work. We also plan to evaluatelifted Soufflé on analyses frameworks other than Doop. Anothertrack for future work is lifting Datalog rules, not just facts. Thiswould allow us to apply a product line of analyses to an SPL allat once. Our work can also be extended to lift Horn-Clause basedanalysis and verification tools [3] to support SPLs.
ACKNOWLEDGMENTS
We thank Azadeh Farzan for discussions related to this work, andanonymous reviewers for their feedback on an earlier versionof this paper. This work was supported by General Motors andNSERC. ifting Datalog-Based Analyses to Software Product Lines ESEC/FSE ’19, August 26–30, 2019, Tallinn, Estonia
REFERENCES [1] Thorsten Arendt, Enrico Biermann, Stefan Jurack, Christian Krause, and GabrieleTaentzer. 2010. Henshin: Advanced Concepts and Tools for In-place EMF ModelTransformations. In
Proceedings of the 13th International Conference on ModelDriven Engineering Languages and Systems: Part I (MODELS’10) . Springer-Verlag,Berlin, Heidelberg, 121–135. http://dl.acm.org/citation.cfm?id=1926458.1926471[2] William C. Benton and Charles N. Fischer. 2007. Interactive, Scalable, DeclarativeProgram Analysis: From Prototype to Implementation. In
Proceedings of the 9thACM SIGPLAN International Conference on Principles and Practice of DeclarativeProgramming (PPDP ’07) . ACM, New York, NY, USA, 13–24. https://doi.org/10.1145/1273920.1273923[3] Nikolaj Bjørner, Arie Gurfinkel, Ken McMillan, and Andrey Rybalchenko. 2015.
Horn Clause Solvers for Program Verification . Springer International Publishing,Cham, 24–51. https://doi.org/10.1007/978-3-319-23534-9_2[4] Eric Bodden, Társis Tolêdo, Márcio Ribeiro, Claus Brabrand, Paulo Borba, andMira Mezini. 2013. SPLLIFT: Statically Analyzing Software Product Lines inMinutes Instead of Years. In
Proceedings of the 34th ACM SIGPLAN Conference onProgramming Language Design and Implementation (PLDI ’13) . ACM, New York,NY, USA, 355–364. https://doi.org/10.1145/2491956.2491976[5] Claus Brabrand, Márcio Ribeiro, Társis Tolêdo, and Paulo Borba. 2012. Intrapro-cedural Dataflow Analysis for Software Product Lines. In
Proceedings of the 11thAnnual International Conference on Aspect-oriented Software Development (AOSD’12) . ACM, New York, NY, USA, 13–24. https://doi.org/10.1145/2162049.2162052[6] Martin Bravenboer and Yannis Smaragdakis. 2009. Strictly Declarative Spec-ification of Sophisticated Points-to Analyses. In
Proceedings of the 24th ACMSIGPLAN Conference on Object Oriented Programming Systems Languages andApplications (OOPSLA ’09) . ACM, New York, NY, USA, 243–262. https://doi.org/10.1145/1640089.1640108[7] S. Ceri, G. Gottlob, and L. Tanca. 1989. What you always wanted to know aboutDatalog (and never dared to ask).
IEEE Transactions on Knowledge and DataEngineering
1, 1 (March 1989), 146–166. https://doi.org/10.1109/69.43410[8] Andreas Classen, Maxime Cordy, Pierre-Yves Schobbens, Patrick Heymans, AxelLegay, and Jean-Francois Raskin. 2013. Featured Transition Systems: Foundationsfor Verifying Variability-Intensive Systems and Their Application to LTL ModelChecking.
IEEE Trans. Softw. Eng.
39, 8 (Aug. 2013), 1069–1089. https://doi.org/10.1109/TSE.2012.86[9] Andreas Classen, Patrick Heymans, Pierre-Yves Schobbens, Axel Legay, and Jean-François Raskin. 2010. Model Checking Lots of Systems: Efficient Verificationof Temporal Properties in Software Product Lines. In
Proceedings of the 32NdACM/IEEE International Conference on Software Engineering - Volume 1 (ICSE ’10) .ACM, New York, NY, USA, 335–344. https://doi.org/10.1145/1806799.1806850[10] Steven Dawson, C. R. Ramakrishnan, and David S. Warren. 1996. PracticalProgram Analysis Using General Purpose Logic Programming Systems&Mdash;aCase Study. In
Proceedings of the ACM SIGPLAN 1996 Conference on ProgrammingLanguage Design and Implementation (PLDI ’96) . ACM, New York, NY, USA,117–126. https://doi.org/10.1145/231379.231399[11] Paul Gazzillo and Robert Grimm. 2012. SuperC: Parsing All of C by Taming thePreprocessor. In
Proceedings of the 33rd ACM SIGPLAN Conference on ProgrammingLanguage Design and Implementation (PLDI ’12) . ACM, New York, NY, USA, 323–334. https://doi.org/10.1145/2254064.2254103[12] Neville Grech and Yannis Smaragdakis. 2017. P/Taint: Unified Points-to and TaintAnalysis.
Proc. ACM Program. Lang.
1, OOPSLA, Article 102 (Oct. 2017), 28 pages.https://doi.org/10.1145/3133926[13] Sergio Greco and Cristian Molinaro. 2016.
Datalog and Logic Databases . Morgan& Claypool. [14] Michael Huth and Mark Ryan. 2004.
Logic in Computer Science (2nd ed.) . Cam-bridge University Press.[15] Herbert Jordan, Bernhard Scholz, and Pavle Subotić. 2016. Soufflé: On Synthesisof Program Analyzers. In
Computer Aided Verification , Swarat Chaudhuri andAzadeh Farzan (Eds.). Springer International Publishing, Cham, 422–430.[16] Christian Kästner, Sven Apel, Thomas Thüm, and Gunter Saake. 2012. TypeChecking Annotation-based Product Lines.
ACM Trans. Softw. Eng. Methodol.
Objects, Components, Models and Patterns ,Manuel Oriol and Bertrand Meyer (Eds.). Springer Berlin Heidelberg, Berlin,Heidelberg, 175–194.[18] Christian Kästner, Paolo G. Giarrusso, Tillmann Rendel, Sebastian Erdweg, KlausOstermann, and Thorsten Berger. 2011. Variability-aware Parsing in the Presenceof Lexical Macros and Conditional Compilation. In
Proceedings of the 2011 ACMInternational Conference on Object Oriented Programming Systems Languagesand Applications (OOPSLA ’11) . ACM, New York, NY, USA, 805–824. https://doi.org/10.1145/2048066.2048128[19] Ondřej Lhoták and Laurie Hendren. 2008. Evaluating the Benefits of Context-sensitive Points-to Analysis Using a BDD-based Implementation.
ACM Trans.Softw. Eng. Methodol.
18, 1, Article 3 (Oct. 2008), 53 pages. https://doi.org/10.1145/1391984.1391987[20] Jörg Liebig, Alexander von Rhein, Christian Kästner, Sven Apel, Jens Dörre, andChristian Lengauer. 2013. Scalable Analysis of Variable Software. In
Proceedingsof the 2013 9th Joint Meeting on Foundations of Software Engineering (ESEC/FSE2013) . ACM, New York, NY, USA, 81–91. https://doi.org/10.1145/2491411.2491437[21] Magnus Madsen, Ming-Ho Yee, and Ondřej Lhoták. 2016. From Datalog to Flix: ADeclarative Language for Fixed Points on Lattices. In
Proceedings of the 37th ACMSIGPLAN Conference on Programming Language Design and Implementation (PLDI’16) . ACM, New York, NY, USA, 194–208. https://doi.org/10.1145/2908080.2908096[22] Jan Midtgaard, Aleksandar S. Dimovski, Claus Brabrand, and Andrzej Wąsowski.2015. Systematic Derivation of Correct Variability-aware Program Analyses.
Sci.Comput. Program.
Proceedings of the 22Nd ACMSIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’95) . ACM, New York, NY, USA, 49–61. https://doi.org/10.1145/199448.199462[24] Rick Salay, Michalis Famelis, Julia Rubin, Alessio Di Sandro, and Marsha Chechik.2014. Lifting Model Transformations to Product Lines. In
Proceedings of the 36thInternational Conference on Software Engineering (ICSE 2014) . ACM, New York,NY, USA, 117–128. https://doi.org/10.1145/2568225.2568267[25] Yannis Smaragdakis and George Balatsouras. 2015. Pointer Analysis.
Foundationsand Trends in Programming Languages
2, 1 (2015), 1–69. https://doi.org/10.1561/2500000014[26] Fabio Somenzi. 1998. CUDD: CU Decision Diagram Package Release 2.2.0. (061998).[27] Thomas Thüm, Sven Apel, Christian Kästner, Ina Schaefer, and Gunter Saake.2014. A Classification and Survey of Analysis Strategies for Software ProductLines.
ACM Comput. Surv.
47, 1, Article 6 (June 2014), 45 pages. https://doi.org/10.1145/2580950[28] John Whaley, Dzintars Avots, Michael Carbin, and Monica S. Lam. 2005. UsingDatalog with Binary Decision Diagrams for Program Analysis. In