Compositional Verification of Heap-Manipulating Programs through Property-Guided Learning
aa r X i v : . [ c s . P L ] A ug Compositional Verification of Heap-ManipulatingPrograms through Property-Guided Learning
Long H. Pham ⋆ , Jun Sun , and Quang Loc Le Singapore University of Technology and Design, Singapore Singapore Management University, Singapore School of Computing & Digital Technologies, Teesside University, UK
Abstract.
Analyzing and verifying heap-manipulating programs automaticallyis challenging. A key for fighting the complexity is to develop compositionalmethods. For instance, many existing verifiers for heap-manipulating programsrequire user-provided specification for each function in the program in order todecompose the verification problem. The requirement, however, often hinders theusers from applying such tools. To overcome the issue, we propose to automat-ically learn heap-related program invariants in a property-guided way for eachfunction call. The invariants are learned based on the memory graphs observedduring test execution and improved through memory graph mutation. We imple-mented a prototype of our approach and integrated it with two existing programverifiers. The experimental results show that our approach enhances existing veri-fiers effectively in automatically verifying complex heap-manipulating programswith multiple function calls.
Analyzing and verifying heap-manipulating programs (hereafter heap programs) auto-matically is challenging [45]. Given the complexity, the key is to develop compositionalmethods which allow us to decompose a complex problem into smaller manageableones. One successful example is the Infer static analyzer [1], which applies techniqueslike bi-abduction for local reasoning [36] to infer a specification for each function in aprogram to be analyzed.While Infer generates function specifications for identifying certain classes of pro-gram errors, we aim to develop compositional methods for the more challenging task ofverifying heap programs with data structures. In recent years, there have been multipletools developed to verify heap programs in a compositional way, including Dafny [31],GRASShopper [43,44] and HIP [10]. These tools are, however, far from being ap-plicable to real-world complex programs. One reason is that substantial user effort isneeded. In particular, besides providing a specification to verify against, users must pro-vide auxiliary specification to decompose the verification problem. For instance, Dafny,GRASShopper and HIP all require users to provide a specification for each functionused in the program. Writing the function specification is highly non-trivial. It is thusdesirable to develop approaches for verifying heap programs in a compositional waywhich requires minimum user effort. ⋆ Corresponding author. Email: [email protected] n this work, we propose to automatically generate function specifications for com-positional verification of heap programs. Our approach differs from existing approacheslike Infer in three ways. Firstly, because our goal is to verify the correctness of heapprograms with data structures , our approach generates more expressive function speci-fications than those generated by Infer.Secondly, we learn a specification of each function call (rather than each function)in a property-guided way. For instance, assume that we have the following verifica-tion problem (expressed in the form of a Hoare triple) { pre } f unc (); f unc (); { post } where pre is a precondition, post is a postcondition and f unc (); f unc () are two con-secutive calls of the same function. We automatically generate a program invariant inv after the first function call and before the second function call. As a result, we gener-ate the specification { pre } f unc () { inv } for the first function call and the specification { inv } f unc () { post } for the second function call. The (smaller) problems of verifyingthese two Hoare triples thus replace the problem of verifying the original Hoare triple.Thirdly, our invariant generation method is based on a novel technique, namely,a combination of classification and memory-graph mutation. We start with generatingmultiple random test cases (based on existing methods [37]). We then instrument theprogram and execute the test cases to obtain values of multiple features which are re-lated to the memory graphs before and after each function call in the program. Theobtained feature vectors are labeled according to the testing results (i.e., whether thepostcondition is satisfied or not). Then we apply a classification algorithm [8] to find aninvariant that separates the feature vectors with different labels. The invariant is an arbi-trary boolean formula of the features, which is then used to decompose the verificationproblem.There are two technical challenges which we must solve in order to make the aboveapproach work. First, what features of the memory graphs shall we use? In this work,we adopt an expressive specification language for heap programs which combines sep-aration logic, user-defined inductive predicates and arithmetic [10,23,27,45]. We thendefine a set of features based on the specification language. In addition, our approachallows users to define their own features. Secondly, how do we solve the problem ofthe lack of labeled samples, i.e., the test cases which we learn from may be limited. Toovercome the problem, we mutate the memory graphs according to the learned invariantto validate whether the learned invariant is correct. We refine the invariant based on thevalidation result (if necessary) and repeat the process until the invariant is validated.We implement our idea in a prototype, called SLearner, which takes a program to beverified as input, generates multiple invariants and outputs a set of decomposed verifi-cation tasks. We integrate SLearner with two existing state-of-the-art verifiers for heapprograms, i.e., GRASShopper and HIP. Experiments are then performed on 110 pro-grams manipulating 10 challenging data structures. The experimental results show that,enhanced with our approach, both GRASShopper and HIP are able to successfully ver-ify programs with multiple function calls without user-provided function specifications.The novelty of our work is in learning heap-related specification in a property-guided way and applying graph mutation to improve the learning process. The restof the paper is organized as follows. Section 2 presents an illustrative example. Sec-2 public void main ( int m , int n ) { // precondition : m ≤ n Node x = createSLL ( m );4 Node y = createSLL ( n );5 getSum ( x , y );6 // postcondition : sll ( x , ) ∗ sll ( y , )7 } private Node createSLL ( int n ) { if ( n < = ) return null ;10 else { Node x = new Node ( n , null );12 x . next = createSLL ( n − );13 return x ; 14 } } private int getSum ( Node x , Node y ) { int sum = ;18 if ( x ! = null ) { sum + = x . data + y . data ;20 sum + = getSum ( x . next , y . next );21 } return sum ;23 } Fig. 1: An illustrative exampletion 3 presents the details of our approach. Section 4 evaluates our approach. Section 5reviews related work. Finally, Section 6 concludes.
In this section, we illustrate our approach with an example. The program is shown asfunction main in Fig. 1, where function createSLL ( n ) returns a singly-linked list withlength n and function getSum ( x, y ) returns the sum of the data in two disjoint singly-linked list objects (pointed to by the two pointers x and y ). Note that both functionsare recursively defined. The precondition and postcondition are shown at line 2 and6 respectively. They are specified in an assertion language based on separation logic(refer to details in Section 3). The precondition is self-explanatory. The postcondition sll ( x, ) ∗ sll ( y, ) intuitively means that x and y are two disjoint singly-linked listobjects, i.e., sll ( x, n ) is an inductive predicate denoting that x is a singly-linked listobject with n nodes, and ∗ is the separating conjunction predicate specifying the dis-jointness in separation logic. Besides the postcondition, we assume that memory safetyis always implicitly asserted and thus must be verified. For instance, we aim to verifythat x . data at line 19 would not result in null-pointer de-referencing.Our experiment shows that state-of-the-art verifiers like GRASShopper and HIPcannot verify this program. Only after specifications for both functions createSLL and getSum are provided manually, the program is verified. On one hand, providing aspecification for every function called by the given program is highly nontrivial. On theother hand, part of the function specification may be irrelevant to verifying the givenprogram. For an extreme example, if we change the postcondition of the program shownin Fig. 1 to true , a complete specification for singly-linked lists would not be necessaryto verify the program.Our approach is to automatically learn a just-enough invariant before and after eachfunction call so that we can verify the program in a compositional way. For this ex-ample, we learn two invariants: inv right after the first function call at line 3 and inv right after the second function call at line 4. Next, we verify the program by3able 1: Collected feature vectors and labels is sll ( x ) is sll ( y ) is sll ( x ) ∧ is sll ( y ) ∧ sep ( x, y ) len sll ( x ) ≤ len sll ( y ) labelm=1, n=0 true true true false negativem=0, n=1 true true true true positive verifying the following three Hoare triples: { m ≤ n } createSLL ( m ) { inv [ res /x ] } ; { inv } createSLL ( n ) { inv [ res /y ] } ; and { inv } getSum { sll ( x, ) ∗ sll ( y, ) } with res is a special variable for the return value of a function and inv [ res /x ] is a sub-stitution of all variable x in inv by variable res . As the program in each Hoare tripleinvolves only one function, existing verifiers like GRASShopper and HIP can automat-ically verify the Hoare triples.To learn inv and inv , we instrument the program to collect a set of features atthe learning points and collect their values during test executions. For instance, Table 1shows a few of the features and their values for the above program after line 4 forlearning inv . The first row shows the features and the second and third rows showthe values of the features given two test cases { m =1 , n =0 } and { m =0 , n =1 } respec-tively. The features are designed based on our assertion language. In particular, feature len sll ( x ) is a numeric value denoting the length of a singly-linked list x which isextracted based on the user-defined predicate sll ; feature is sll ( x ) denotes whether x points to a singly-linked list, and feature sep ( x, y ) denotes whether x and y are dis-joint in the heap. We label each feature vector with either negative or positive , where negative means that a memory error is generated, the postcondition is violated, or thetest case likely runs into infinite loop (i.e., it does not stop after certain time units); and positive means otherwise.Next, we apply a classification algorithm [8] to generate a predicate which separatesthe positive and negative feature vectors. The predicate takes the form of an arbitraryboolean formula of the features. Given the feature vectors in Table 1, the generatedpredicate is: len sll ( x ) ≤ len sll ( y ) . Although this predicate is an invariant afterline 4, it is not strong enough to verify the postcondition. This is in general a problemdue to having a limited number of test cases. To solve the problem, we systematicallymutate the memory graphs obtained during the test executions to obtain more labeledfeature vectors with the aim to improve the predicate (see details in Section 3.5). In ourexample, with the additional feature vectors, the classification algorithm generates thefollowing predicate for inv . ( is sll ( x ) ∧ is sll ( y ) ∧ sep ( x, y ) ∧ x = null ) ∨ ( is sll ( x ) ∧ is sll ( y ) ∧ sep ( x, y ) ∧ len sll ( x ) ≤ len sll ( y )) We obtain x = null ∨ ( is sll ( x ) ∧ len sll ( x ) ≤ n ) similarly for inv after line 3 .Afterwards, inv and inv are translated into the formulas in our assertion lan-guage. Note that the translation is straightforward since the features are designed basedon the assertion language. The last step is to verify three verification problems. Thisis done using state-of-the-art verifiers for heap programs. For instance, HIP solves thethree verification problems automatically, which verifies the program.For efficiency, in the verification step we perform the following two simplifications.First, for dead code detection, we invoke a separation logic solver (e.g., the one pre-4 ::= ∆ | Φ ∨ Φ ∆ ::= ∃ ¯ v · ( κ ∧ π ) κ ::= emp | r c (¯ t ) | P (¯ t ) | κ ∗ κ π ::= true | false | φ | π ∧ π φ ::= true | i | v = null | φ ∧ φ i ::= a = a | a ≤ a a ::= k | v | k × a | a + a | − a Fig. 2: Syntax: where c is a data type; k is an integer value; t i , v , r are variables; and ¯ t is a sequence of variablessented in [27,29]) to check the satisfiability of inferred invariant. Secondly, we identifyand eliminate the frame of a Hoare triple before sending them to the verifiers. For ex-ample, for the Hoare triple { inv } createSLL ( n ) { inv [ res /y ] } , we find that x hasnot been accessed by the code, the occurrences of the singly-linked list x in both theprecondition and postcondition of the triple are eliminated before sending it to the ver-ifiers. Our input is a Hoare triple { pre } prog { post } , where pre is a precondition, post is post-condition and prog is a heap program which may invoke other functions. One exampleis the function main shown in Fig. 1. The precondition and postcondition are in anexpressive specification language previously developed in [10,23,27,45]. The languagesupports separation logic, inductive predicates and Presburger arithmetic [19], which isshown to be expressive to capture many properties of heap programs.The syntax of the language is presented in Fig. 2. In general, a predicate Φ in thislanguage is a disjunction of multiple symbolic heaps. A symbolic heap ∆ is an exis-tentially quantified conjunction of a heap formula κ (i.e., a predicate constraining thememory structure) and a pure formula π (i.e., a predicate constraining numeric vari-ables). A heap formula κ is an empty heap predicate emp , a points-to predicate r c (¯ t ) (where r is its root variable), a user-defined predicate P (¯ t ) , or a spatial conjunction oftwo heap formulas κ ∗ κ . User-defined predicates are defined in the same language.A pure formula π can be true , an (in)equality on variables, a Presburger arithmeticformula, negation of a formula, or their conjunction. We refer the readers to [19] fordetails on Presburger arithmetic. We note that v = v (resp. v = null ) is used to denote ¬ ( v = v ) (resp. ¬ ( v = null ) ) and we may use to indicate “don’t care” values.For instance, the following predicate sll ( x,n ) defines a singly-linked list (with aroot-pointer x and size n ), which is used in the illustrative example. sll ( x,n ) ≡ ( emp ∧ x = null ∧ n =0) ∨ ( ∃ q,n · x Node ( ,q ) ∗ sll ( q,n ) ∧ n = n +1) Our problem is to automatically verify the Hoare triple. Different from existingapproaches, we aim to do that in a compositional way without user-provided functionspecifications. 5 .2 Test Generation and Code Instrumentation
Given { pre } prog { post } , we first automatically generate a test suite S using existingtest case generation methods like [37]. Note that we do not require the test cases tosatisfy the precondition because negative feature vectors from invalid test cases will befiltered out by our learning process. Based on the testing results, we divide S into twodisjoint sets. One set includes passed test cases that terminate normally without anymemory error or violation of the postcondition, denoted as S + . The other set containsthe remaining ones, denoted as S − . Note that we heuristically consider that a test casedoes not terminate after waiting for a threshold number of time units. Afterwards, weidentify all function calls in prog and add learning points before and after each call. Ateach learning point l , we identify a set of relevant variables, denoted as V l . We applystatic program slicing to remove the variables which are visible at l but irrelevant to thepostcondition or memory safety. In the example shown in Fig. 1, the sets of relevantvariables at learning point 1 and 2 are { x, n } and { x, y } respectively. For each learningpoint, we instrument the program to extract a vector of features from each test. Central to our approach is the answer to the question: what features to extract? In thiswork, we view a program state as a memory graph and systematically extract two groupsof features based on the memory graph. One group contains generic features of thememory graph and the other contains features which are specific to the verificationtask. Formally, a memory graph G is a tuple ( M, init, E, T y, L ) such that – M is a set of heap nodes including a special node null ; – init ∈ M is a special initial node; – E is a set of labeled and directed edges such that ( s, n, s ′ ) ∈ E means that we canaccess heap node s ′ via a pointer named n from s . An edge starting from init isalways labeled with one of the variables in the program. – T y is a total labeling function which labels each heap node in M by a type; – and L is a labelling function which labels a heap node of primitive type by a value. init Node null y nextdatax Fig. 3: A memory graph Given a test case and a learning point, we represent theprogram state at the learning point during the test execu-tion in the form of a memory graph ( M, init, E, T y, L ) .Fig. 3 shows the memory graph for our example at the learn-ing point 2 with test input m = 0 and n = 1 . Note thatany rooted path of a memory graph represents a variable,e.g., the path with the sequence of labels h y, next i in theabove memory graph is a variable y.next at learning point2. For complicated programs, the memory graph might con-tain many paths and thus many variables from which we canextract features. We thus set a bound on the number of de-referencing to limit the number of variables. For example, ifwe set the bound to be 2, we focus on variables { x, x.data, x.next, y, y.data, y.next } x = null x Node () ∧ is sll ( y ) ∧ sep ( x, y ) len sll ( x ) + len sll ( y ) > y = null is sll ( x ) ∧ y Node () ∧ sep ( x, y ) len sll ( x ) − len sll ( y ) > x Node () (a.k.a. x = null ) 12 is sll ( x ) ∧ is sll ( y ) ∧ sep ( x, y ) − len sll ( x ) + len sll ( y ) > y Node () (a.k.a. y = null ) 13 len sll ( x ) > − len sll ( x ) − len sll ( y ) > x = y len sll ( y ) > len sll ( x ) + len sll ( y ) = 0 x = y len sll ( x ) < len sll ( x ) − len sll ( y ) = 0 is sll ( x ) len sll ( y ) < − len sll ( x ) + len sll ( y ) = 0 is sll ( y ) len sll ( x ) = 0 − len sll ( x ) − len sll ( y ) = 0 x Node () ∧ y Node () ∧ sep ( x, y ) len sll ( y ) = 0 at learning point 2 and similarly variables { x, x.data, x.next, n } at learning point 1.With length bounded to 1, we focus only on { x, y } at learning point 2 and { x, n } atlearning point 1.We extract two groups of boolean-typed features based on the memory graph. Thefirst group contains generic heap-related features, which include the following. – For each reference type variable x , we extract two features which represent if it is null or not, i.e, whether its corresponding path leads to the special node null . – For each pair of reference type variables, we extract two features which representif the two variables are aliasing or not, i.e., whether their corresponding paths leadto the same non-null node. – For each pair of reference type variables, we extract a feature which representswhether two variables are separated in the memory. Assume that variables x and y lead to nodes n x and n y , x and y are separated, denoted as sep ( x, y ) , if and onlyif all reachable nodes except null from n x (including n x ) are not reachable from n y and vice versa. – For each pair of the numeric variables, we extract boolean features in differencelogic and the octagon abstract domain [34], e.g., ± x ± y>c , ± x ± y = c , ± x>c or x = c where c is a constant. We apply a heuristic to collect constants in conditionalexpressions in the given program as candidate values for c . The value 0 is chosenby default.While general heap-related features are often useful, some programs can only be provenwith features which are specific to the verification problem. Thus, we extract a secondgroup of features based on user-defined predicates used to assert the correctness of thegiven program, which include the following. – For every permutation of n variables, we extract a feature which represents whetherthe variables satisfy the predicate. For instance, given the user-defined predicate sll which has one reference-typed parameter, we extract a feature which representswhether x satisfies the predicate, for each reference variable x . – For a pair of two sequences of variables X and Y which satisfy some user-definedpredicates, we extract a feature which represents whether the variables are separatedin the memory, i.e., all nodes reachable from any variable in X (except null ) arenot reachable from any node in Y and vice versa. This feature is inspired by theseparation conjunction operator ∗ in our assertion language. For instance, given x y which both satisfy is sll , this feature value is true if and only if all objectsin the singly-linked list x and singly-linked list y are disjoint in memory. Note thatthis feature subsumes the feature sep ( x, y ) explained above. – For each numerical parameter of the user-defined predicate, we use a variable torepresent its value for each sequence of variables which satisfy the predicate. Forinstance, as sll has a numeric parameter, if variable x satisfies sll , we use a freshvariable (denoted as len sll for readability) to represent the value of the numericparameter. Boolean features of these numeric variables, together with existing nu-meric variables, are then extracted in the chosen abstract domains.In general, user-defined predicates can be complicated. Existing heap program veri-fiers like GRASShopper and HIP maintain a library of commonly used predicates. Weadopt the predicates in their library and define the corresponding functions to extract theabove-mentioned features in the form of an extensible library for our approach. Notethat this is a one-time effort. For instance, Table 2 shows the list of 26 features whichwe extract at learning point 2 for the program shown in Fig. 1. In the following, we present our approach on learning an invariant based on the ex-tracted feature vectors. Recall that we systematically instrument the program at everylearning point, then extract a value for every feature we discussed above. In our im-plementation, each feature is extracted using a function which returns a boolean value.Afterwards, each test case is executed so that we collect a vector of boolean values(a.k.a. a feature vector) which represents an abstraction of the memory graph accord-ing to the chosen features. If the test case finishes successfully, the feature vector islabeled positive ; otherwise, it is labeled negative . The labeled feature vectors can beorganized into a matrix M whose rows are feature vectors and whose columns are thefeature values in all test cases. To ensure all feature vectors have the same dimension,if a feature does not apply (e.g., a variable is not accessible in the test case), we set thecorresponding feature value to a special default value. For instance, Table 3 shows thematrix where the features are sequenced in the same order of Table 2.The first step in our learning process is normalising the matrix M . If there are tworows with the same feature values and same labels, one of them is redundant and re-moved. Next, we apply the algorithm in [8] to learn a boolean combination of features toseparate positive and negative vectors. Informally, the algorithm considers each featurevector as a point in space and every positive point is connected to every negative pointby a line. A feature ‘cuts’ a line if the corresponding positive point and negative pointhave different values for the feature. The goal is to find a list of features that can cut allthe lines, i.e., separate all positive and negative points. The features are chosen using agreedy algorithm. At each step, the feature which cuts the most number of uncut linesis selected. After all lines are cut, the selected features partition the space into multipleregions, each of which contains either positive points only or negative points only. Eachregion can be characterised by a conjunction of the features and the disjunction of allthe formulas characterising the positive regions is a boolean formula which separatesall the positive and negative feature vectors.8able 3: Matrix of feature vectors Vectors obtained from test cases
Algorithm 1:
Choose the list of features choose ( M ) K = {} ; L = { ( i, j ) | row i is positive and row j is negative } ; while L is not empty do Find k s.t. { ( i, j ) ∈ L | M ik = 1 ∧ M jk = 0 } is the largest; if the number of pairs ( i, j ) that k can classify is then Stop and ask for user input for a new feature; else Remove ( i, j ) s.t. M ik = 1 and M jk = 0 from L ; Add k to K ; Return K ;The details are shown in Algorithm 1 and 2. In Algorithm 1, the input is a nor-malised matrix M and the output is the list of features K which can classify all positiveand negative rows in M . K is initialised as an empty list (line 1). A list L is initial-ized to contain all pairs of rows ( i, j ) such that i is the index of a positive row and j is that of a negative row (line 1). During each iteration, the feature k that ‘cuts’ themost number of pairs in L is identified (line 3). Note that we do not consider the case M ik = 0 ∧ M jk = 1 because it will create the negations of features, which may notbe easily transformed into separation logic. We then remove from L the pairs that areclassified correctly by k (line 7) and add the new feature k into K (line 8). The loopstops when L is empty (line 2) or the best feature at the current iteration cannot classifymore pairs (line 4). In the former case, we return the list of features K (line 9). In thelatter case, it means the features are not sufficient to distinguish all positive and negativerows. We thus stop and may ask users to provide a new feature (line 5).9 lgorithm 2: Combine the features combine ( M, K ) R = {} ; P P = { p | row p is positive } ; N P = { n | row n is negative } ; Mark all p ∈ P P as uncovered; for i = 1 to | K | do Create all combinations C with i elements from the list of features K ; for each combination c ∈ C do if ∀ n ∈ N P ∃ k ∈ c : M nk = 0 then CP = { p | p ∈ P P and ∀ k ∈ c : M pk = 1 } ; if CP contains at least one uncovered index then Remove from R the combinations that have the covered indexesare proper subsets of CP ; Add c to R ; Mark all p ∈ CP as covered; if all p ∈ P P are covered then Return R ;Algorithm 2 then shows how a boolean formula that classifies all positive and neg-ative rows in M is constructed from the chosen features. The input is a normalisedmatrix M and a list of features K chosen using Algorithm 1 and the output is a booleancombination of these features. Initially, the list of regions R is empty; P P and
N P arethe set of indexes of positive and negative rows respectively (line 1). Recall that eachrow can be seen as a point in space. All points in
P P are marked as uncovered at line2. Favoring simple hypothesis (which is a heuristics often applied in machine learning),we try the combination from 1 feature to | K | (which is the number of features in K )features (line 3). At line 4, all the combinations of i features are created. For each com-bination (line 5), we check if the created region contains no negative points (line 6). Ifit is the case, we find a list of positive points that are covered by the region (line 7).If this region contains at least one uncovered point (line 8), we add this combinationinto R and mark the positive points in the region as covered (line 10). Line 9 simplifiesthe results by removing the chosen regions that only cover a proper subset of positivepoints in the new region. When all positive points are covered, we return the set of com-binations R (line 11 and 12). Each combination is a conjunction of features and the setof combinations is the disjunction of these conjunctions.For our example, at the learning point 2, after removing redundant rows, we havea matrix with 4 rows and 26 columns, i.e., the bolded rows in Table 3. Row 1, 2 and3 are positive, whereas row 4 is negative. To separate these rows, two columns 1 and4 are chosen. From this, we can form two regions, in particular, the first one with onlycolumn 1, the second one with only column 4. These two columns represent feature x = null (column 1) and y = null (column 4). As a result, we learn the predicate x = null ∨ y = null . Note that this predicate is incorrect and it is to be improved later.It can be shown that Algorithm 1 and 2 always terminate. The worst-case complexityof Algorithm 1 is O ( Row ∗ Col ) where Row and
Col are the number of rows and10olumns in the input matrix respectively. For Algorithm 2, the worst-case complexityis O (2 | K | ∗ ( Row ∗ Col + Row )) . While the worst-case complexity is high, thesealgorithms are often reasonably efficient (as we show in our empirical study). The mainreason is that the number of features K (which dominates the overall complexity) isoften small (average 1.05 in our experiments). Recall that we only need a correct predicate, which is an invariant at the learning pointand sufficient to prove the postcondition. A fundamental limitation of using classifica-tion techniques is that the learned predicate is likely incorrect if the feature vectors (i.e.,test cases) are insufficient. One way to solve this problem is to use a program verifierto check whether the predicate is correct. If it is not correct, the verifier would gener-ate a counterexample and the learning process can continue with a new feature vectorobtained from the counterexample. This approach is not ideal for two reasons. One isthat verifying heap programs is often costly and thus we would like to avoid it as longas possible. The other is that it is highly nontrivial to construct counterexamples whenverifying heap programs [5].Because of that, in this work, to improve the learned predicate, we apply an ideasimilar in spirit to [11] to automatically mutate the memory graphs obtained from thetest cases and generate more program states. For each learned predicate Φ , we system-atically apply a set of mutation operators based on Φ . For each variable x in Φ , if it is areference type, the following mutation operators are applied.1. Point x to a freshly constructed object of the right type.2. Point x to a heap node of the right type in the memory graph (including null ).3. Swap x with another reference-type variable.If x is a primitive type, we follow the idea in [42] and mutate it by setting it to aconstant, increasing/decreasing its value with a pre-defined offset, or swapping it withanother primitive variable. The number of mutants we generate depends on the currentlearned predicate.These mutation operators are designed to create states which potentially invalidatethe learned predicate. For instance, if the current predicate is is sll ( x ) ∧ is sll ( y ) ∧ sep ( x, y ) , where x and y are two reference variables, applying the mutation opera-tors allows us to obtain memory graphs which invalidate is sll ( x ) , is sll ( y ) and/or sep ( x, y ) . The expectation is that such a mutated program state would lead to violationof the postcondition and thus be labeled with negative . If our expectation is met, thepredicate is now more likely to be correct; otherwise, the predicate is incorrect and isrefined with the new feature vector.In the extreme cases when all feature vectors are labeled positive or negative , thelearned predicates are true or false respectively. We then apply all mutation opera-tors to all variables at the learning point. In our implementation, the mutation is doneautomatically by instrumenting statements which mutate the according variables at thelearning point. We then run the test suite with the mutated program, collect new featurevectors and new test results. These new feature vectors are added into the matrix tolearn new predicates. 11he mutation at a learning point in the middle of the program may result in programstates which may not be reachable. As a result, the final learned predicate, which isexpected to be an invariant, may be weaker than the actual one (if the mutated programstate is labeled as positive ). However, a weaker invariant may still serve our goal ofverifying the program. To give an example, in the extreme case, if the postcondition is true (and there is no risk of memory error), it is sufficient to learn the invariant true .We repeat this process of mutation and learning until the learned invariant converges.For our example, at the learning point 2, after obtaining the first predicate x = null ∨ y = null , we apply mutation and obtain more feature vectors. The new featurevectors are shown in Table 3 where N is a special value denoting that the feature is notapplicable. Next, applying Algorithm 1, the chosen features this time are x = null , is sll ( x ) ∧ is sll ( y ) ∧ sep ( x, y ) , len sll ( x ) < len sll ( y ) , and len sll ( x ) = len sll ( y ) (column 1, 12, 21 and 24). From these 4 columns, we form 3 regions: { , } , { , } and { , } , which are transformed into the invariant inv we showin Section 2. Similarly, with the help of state mutation, we improve the learned invariantat l from x = null ∨ n > to x = null ∨ ( is sll ( x ) ∧ len sll ( x ) ≤ n ) .The process of mutation and learning always terminates. As we only have a finiteset of variables and features, the set of feature vectors is finite and thus the processof mutation converges eventually. Furthermore, matrix normalisation guarantees we donot have redundant rows in the matrix and, hence, the matrix is finite and the learningprocess always terminates. Lastly, we show how we use the learned invariants to verify heap programs in a compo-sitional way. Firstly, we transform each loop in the program into a fresh tail recursivefunction. Then the loop is replaced with a call to the corresponding function. Note thatin the case of nested loops, we create multiple functions in which the function accord-ing to the outer loop will call the function according to the inner loop. This is a standardstrategy adopted from existing program verifiers for heap programs [10]. We then treatloops in the same way as (recursive) function calls.Secondly, we identify the learning points, i.e., before and after each function callstatements and learn invariants at these points. Note that we do not learn before/after re-cursive function calls. This is because program verifiers for heap programs like GRASS-hopper and HIP support inductive reasoning and thus one specification for each recur-sive function is sufficient. Assume that the invariant learned before function call C i is I i and the one learned after C i is I i +1 .Thirdly, for each function call C i , we generate a proof obligation in the form ofa Hoare triple { I i } C i { I i +1 } , to prove that calling function C i with I i being satisfiedresults in a state satisfying I i +1 . Each proof obligation is submitted to a program ver-ifier. Once the proof obligation is discharged, we replace the function call C i with itsnow-established specification, i.e., two statements assert I i ; assume I i + . That is,we instrument the learned invariants into the program such that the invariant learnedbefore/after C i becomes an assert/assume-statement respectively.Finally, we use an existing program verifier to verify the transformed program. Notethat the program does not contain any function call (other than possibly a recursive call12able 4: Results on GRASShopper (Gh) Gh Gh+SLearner Gh+SLearner-MutationData Structure Functions of itself) now. It is straightforward to see that the program satisfies the postconditionand is memory-safe with the precondition if all proof obligations are discharged andthe transformed program is verified. If any part is not proved and a counterexample isconstructed by the verifier, we use the counterexample to learn new invariants and thentry to prove new Hoare triples.
Our approach has been implemented as a prototype, called SLearner, with 3070 lines ofJava code. In the following, we evaluate SLearner to answer multiple research questions(RQ). All experiments are conducted on a laptop with one 2.20GHz CPU and 16 GBRAM. To reduce the effect of randomness, we run each experiment 20 times with 10random test cases each time.
RQ1: Can our approach enhance state-of-the-art verifiers for heap programs?
We in-tegrate SLearner into two state-of-the-art verifiers for heap programs: GRASShopperand HIP. Although GRASShopper and HIP target the same class of programs, their ap-proaches differ in multiple ways, e.g., they provide a different library of user-definedpredicates and they have different verification strategies. They thus allow us to checkwhether SLearner is general enough to support different program verifiers. We remarkthat alternative program verifiers like CPAChecker [6] and SeaHorn [20] target differ-ent classes of programs or program properties and hence are not applicable. The onlyother tool which is capable of verifying heap programs with heap-related specificationis jStar [12], which is, however, no longer maintained.We conduct two sets of experiments based on these two verifiers. Our first experi-ment is with GRASShopper. Although GRASShopper supports inductive predicates fordescribing data structures, unlike HIP, it does not support reasoning about separationlogic directly. The inductive predicates in GRASShopper are defined based on first-order logic with some built-in predefined predicates. Due to GRASShopper’s limitation,we conduct an experiment based on a set of benchmark programs in its distribution. Allprograms and experimental results are available at [2] and the tool is available at [3].The GRASShopper distribution contains many functions for different types of data-structures. We focus on those non-trivial recursive functions with precondition and post-13able 5: Results on HIP
HIP HIP+SLearner HIP+SLearner-MutationData Structure Program Result condition. To check how GRASShopper performs with and without SLearner, we gener-ate a set of composite programs which randomly invoke one or more of these functions.The function call sequence is formed such that the postcondition of a previous functionis identical (via syntactical checking) to the precondition of the subsequent function.The precondition of the composite program is composed from preconditions of invokedfunctions and the postcondition of the last function in the call sequence is the postcondi-tion of the composite program. In total, we generate 65 composite programs containing1, 2 and 3 function calls. 14able 4 shows the results, where the first four columns show the type of data struc-ture, the involved functions, the number of function calls and the number of programsin the category. The next column shows the result of GRASShopper without the helpof SLearner, i.e., the program is verified using GRASShopper without the specificationof each invoked function in the program. We measure the number of verified programs(column null , or traveling through the data structure).Lastly, a function supported by HIP for this data structure is called which may modifythe data structure. The postcondition of the program is the postcondition of the last func-tion. The precondition is manually written and checked to guarantee that the programterminates and satisfies the postcondition without any memory error.Table 5 shows the results, where column P rogram shows the last function calledin the program. Column
HIP+SLearner shows the results using HIP enhanced withSLearner. Note that we may not be able to learn the same invariants every time due torandomness in generating the initial set of test cases. Thus, we add a column
Succ to show how many times, out of 20, we are able to learn the invariant and verify theprogram. No additional user-defined predicates besides those defined in HIP are used inour experiments. Column
HIP shows that without SLearner, none of these programs isverified. With SLearner, HIP successfully verifies 27 programs. In all but 1 case (high-lighted with bold) we are able to learn the same invariant consistently.
RQ2: Which features are useful in verifying heap programs?
We learn invariants basedon two groups of features, i.e., general heap-related features and those specific to user-defined predicates. The question is whether these two groups of features are useful andwhether there are other features which we could learn based on.In total, SLearner learned 104 invariants (74 with GRASShopper and 30 with HIP)to help solving the verification tasks. Among them, 93 invariants (66 with GRASShop-per and 27 with HIP) contain only features extracted based on the user-defined predi-cates (e.g., ds ( x ) or ds ( x ) ∗ ds ( y ) with ds being a user-defined predicate). The remain-ing 11 invariants are additionally constituted with generic features (e.g., x = null or x = null ). None of the invariants is constituted with general heap-related featuresonly. The results show that the user-defined predicates are important and invariants spe-15ific to a verification problem are needed for proving the program. Generic heap-relatedfeatures are also necessary sometimes (in 11% of the cases).A total of 24 programs (6 with GRASShopper and 18 with HIP) are not verified.There are two main reasons why they cannot be proved even with the help of SLearner.Firstly, some programs can only be verified with complex function specifications whichrequire features that are not supported in SLearner. For example, to prove the remaining6 programs in the experiment with GRASShopper, we need a feature characterizing thepaths in the tree, which cannot be derived from user-defined predicates. This is simi-larly the case for experiments with HIP. One remedy is to extend our implementationwith additional features through automatic lemma learning [28]. Secondly, there areprograms that have a hierarchy of function calls, e.g., function calls within recursivefunctions. Some of the function calls occur under strict condition which is never satis-fied by the test cases and thus we are unable to learn the specification of those functioncalls. This is a fundamental limitation of dynamic analysis approaches, which could beovercome with a comprehensive test suite from a systematic test case generation ap-proach [39,40,41]. RQ3: Is memory graph mutation helpful?
We compare the performance of the enhancedGRASShopper and HIP with and without memory graph mutation. The results areshown in the last columns of Table 4 and 5. It can be observed that without memorygraph mutation, the number of verified programs by GRASShopper is reduced from59 to 23, and the number of verified programs by HIP is reduced from 27 to 17. Itthus clearly shows that memory graph mutation helps to improve the correctness of thelearned invariants. Furthermore, we observe that without memory graph mutation, it ismore likely that different invariants are learned in different runs of the same experi-ments (refer to column
Succ ). This is expected as without memory graph mutation,we cannot discard invariants which are the result of limited test cases.
RQ4: What is the overhead of invariant generation?
We measure the time taken to learnthe invariants. Columns
L T ime in Table 4 and 5 show the results. In general, the learn-ing time depends on the number of learning points, the complexity of the program andthe initial test suite. Overall, the time required for learning is reasonable, ranging fromseconds to minutes. In the most time consuming case, we spent 92 seconds to learn twoinvariants for program “doubly-linked list append”. For most of the cases, the learningtime is about 20 seconds.
RQ5: Does our invariant generation approach complement existing ones?
The mostnoticeable invariant generation tool for heap program is Infer [1]. However, Infer isnot designed to support verification task. Instead, it generates generic specifications tocapture the footprints of the pointers used in the functions based on bi-abduction. Weapply Infer to generate specifications (e.g., pre/postconditions) for every function ex-perimented above and notice that they are too weak for program verification.
Threats to Validity
Firstly, the set of programs used in our experiments are limited com-pared to real-world data-structure libraries. This is because state-of-the-art verifiers for16eap programs are still limited to relatively simple heap programs due to the great dif-ficulty in verifying heap programs. As our experiments show, SLearner successfullyenhances the capability of state-of-the-art heap program verifiers so that programs withmultiple functions can be automatically verified. Secondly, SLearner only works whenwe have the right features in the learning process. We expect that applying lemma syn-thesis could help us obtain more features and overcome this limitation.
The closest to our work is approach for invariant inference using dynamic analysiswith separation logic abstraction [30]. Similar to our work, it generates invariant basedon user-defined predicates (i.e., features in our work). In contrast to ours, it made useof positive features only and did not support mutation. Close to our work are propos-als for automatic program verification using black-box techniques adopted from themachine learning community. In particular, the method presented in [47] is based onuser-supplied templates. It is designed to learn specification for heap programs whichensures no memory errors. The approach in [32] proposes to learn features from graph-structured inputs based on neural networks. The authors showed an application on ver-ifying memory safety using the learning results. In contrast to [32], our goal is to learninvariants to compositionally verify the program against a given specification as well asensure no memory errors. In [25], the authors presented a method to learn shared mod-ule codes and reuse them during an analysis. The work in [16] builds polynomial timeactive learning algorithms for automaton model of array and list structures. Our pro-posal also relies on a learning algorithm and actively improves the learned invariants.In [35], the authors proposed a learning method targeted lists only. This method learnsthe sequence of actions (remove or insert) from a program and infers the data struc-tures manipulated by the program. However, it is hard to extend the method to supportarbitrary heap-based programs. Similarly to ours, [7] guesses invariants from concreteprogram states and checks them by a theorem prover. However, their work only focuseson list-based programs. The ICE method proposed in [17,18] supports inductive prop-erties of loop invariant learning. Besides using the positive and negative points, ICEproposes additional implication points to encode the inductive checking for learning in-variant. It is our future work to integrate the idea of ICE learning with our graph-basedlearning. The work in [38] presents an approach for precondition inference. The maincontribution is feature learning for functional programs. It is interesting to apply thefeature learning techniques in our future work.Our work is also related to automatic and static analyzers for the shape analysisproblems, e.g., TVLA [46] and separation logic [9,10,13,22,26], and for the verifica-tion problem of programs that requires both heap and data reasoning, e.g., PDR [24],interpolation [4] and template-based invariant generation [33]. To infer shape-basedspecification, while tools [9,13,26] are based on the bi-abduction technique, we usemachine learning to obtain a generalized invariant from a set of concrete executions. Inour implementation, we use GRASSHopper and HIP as external verification engines.As our approach is independent from the program verifiers, we plan to build a generalframework so that different verifiers can be used. Lastly, this work is related to previ-17us works on invariant generation, e.g., Daikon [14], or Houdini [15]. However, thoseworks do not focus on learning invariants related to data structures like this one.
We have presented a novel learning approach to the automated and compositional ver-ification of heap-manipulating programs. The essence of our approach is an algorithmto infer invariants based on a set of memory graphs representing the program statesobtained from concrete executing traces. We further enhance the precision of learnedinvariant with memory graph mutation. We have implemented a prototype tool andevaluated it over a set of programs which manipulate complex data structures. Theexperimental results show that our tool enhances the capability of existing program ver-ifiers to verify nontrivial heap-based programs. In the future, we might apply our toolto more verifiers and more test subjects as well as compare our tool with other tools,e.g., Predator [13], Forester [21,22], S2 [26], and SLING [30].
References
1. Facebook Infer. https://fbinfer.com https://figshare.com/s/ba1c12ad90c138fbb240 https://github.com/sunjun-group/Ziyuan
4. Albarghouthi, A., Berdine, J., Cook, B., Kincaid, Z.: Spatial Interpolants. In: Vitek, J. (ed.)ESOP 2015, pp. 634–660 (2015). https://doi.org/10.1007/978-3-662-46669-8 265. Berdine, J., Cox, A., Ishtiaq, S., Wintersteiger, C.M.: Diagnosing Abstraction Failure forSeparation Logic-Based Analyses. In: Madhusudan, P., Seshia, S.A. (eds.) CAV 2012, pp.155–173 (2012). https://doi.org/10.1007/978-3-642-31424-7 166. Beyer, D., Keremoglu, M.E.: CPAchecker: A Tool for Configurable Software Veri-fication. In: Gopalakrishnan, G., Qadeer, S. (eds.) CAV 2011, pp. 184–190 (2011).https://doi.org/10.1007/978-3-642-22110-1 167. Brockschmidt, M., Chen, Y., Kohli, P., Krishna, S., Tarlow, D.: Learning Shape Analysis. In:Ranzato, F. (ed.) SAS 2017, pp. 66–87 (2017). https://doi.org/10.1007/978-3-319-66706-5 48. Bshouty, N.H., Goldman, S.A., Mathias, H.D., Suri, S., Tamaki, H.: Noise-TolerantDistribution-Free Learning of General Geometric Concepts. J. ACM (5), 863–890 (1998).https://doi.org/10.1145/290179.2901849. Calcagno, C., Distefano, D., O’Hearn, P.W., Yang, H.: Compositional ShapeAnalysis by Means of Bi-Abduction. J. ACM (6), 26:1–26:66 (2011).https://doi.org/10.1145/2049697.204970010. Chin, W., David, C., Nguyen, H.H., Qin, S.: Automated Verification of Shape, Size and BagProperties via User-Defined Predicates in Separation Logic. Sci. Comput. Program. (9),1006–1036 (2012). https://doi.org/10.1016/j.scico.2010.07.00411. Cleve, H., Zeller, A.: Locating Causes of Program Failures. In: Roman,G., Griswold, W.G., Nuseibeh, B. (eds.) ICSE 2005, pp. 342–351 (2005).https://doi.org/10.1145/1062455.106252212. Distefano, D., Parkinson, M.J.: jStar: Towards Practical Verification for Java. In: Harris, G.E.(ed.) OOPSLA 2008, pp. 213–226 (2008). https://doi.org/10.1145/1449764.144978213. Dudka, K., Peringer, P., Vojnar, T.: Predator: A Practical Tool for Checking Manipulation ofDynamic Data Structures Using Separation Logic. In: Gopalakrishnan, G., Qadeer, S. (eds.)CAV 2011, pp. 372–378 (2011). https://doi.org/10.1007/978-3-642-22110-1 29
4. Ernst, M.D., Perkins, J.H., Guo, P.J., McCamant, S., Pacheco, C., Tschantz, M.S., Xiao,C.: The Daikon System for Dynamic Detection of Likely Invariants. Sci. Comput. Program. (1-3), 35–45 (2007). https://doi.org/10.1016/j.scico.2007.01.01515. Flanagan, C., Leino, K.R.M.: Houdini, an Annotation Assistant for ESC/-Java. In: Oliveira, J.N., Zave, P. (eds.) FME 2001, pp. 500–517 (2001).https://doi.org/10.1007/3-540-45251-6 2916. Garg, P., L¨oding, C., Madhusudan, P., Neider, D.: Learning Universally Quantified Invariantsof Linear Data Structures. In: Sharygina, N., Veith, H. (eds.) CAV 2013, pp. 813–829 (2013).https://doi.org/10.1007/978-3-642-39799-8 5717. Garg, P., L¨oding, C., Madhusudan, P., Neider, D.: ICE: A Robust Framework forLearning Invariants. In: Biere, A., Bloem, R. (eds.) CAV 2014, pp. 69–87 (2014).https://doi.org/10.1007/978-3-319-08867-9 518. Garg, P., Neider, D., Madhusudan, P., Roth, D.: Learning Invariants using Decision Trees andImplication Counterexamples. In: Bod´ık, R., Majumdar, R. (eds.) POPL 2016, pp. 499–512(2016). https://doi.org/10.1145/2837614.283766419. Ginsburg, S., Spanier, E.: Semigroups, Presburger Formulas, and Languages. Pacific journalof Mathematics (2), 285–296 (1966)20. Gurfinkel, A., Kahsai, T., Komuravelli, A., Navas, J.A.: The SeaHorn VerificationFramework. In: Kroening, D., Pasareanu, C.S. (eds.) CAV 2015, pp. 343–361 (2015).https://doi.org/10.1007/978-3-319-21690-4 2021. Hol´ık, L., Hruska, M., Leng´al, O., Rogalewicz, A., Sim´acek, J., Vojnar, T.: Forester: FromHeap Shapes to Automata Predicates - (Competition Contribution). In: Legay, A., Margaria,T. (eds.) TACAS 2017, pp. 365–369 (2017). https://doi.org/10.1007/978-3-662-54580-5 2422. Hol´ık, L., Leng´al, O., Rogalewicz, A., Sim´acek, J., Vojnar, T.: Fully Automated Shape Anal-ysis Based on Forest Automata. In: Sharygina, N., Veith, H. (eds.) CAV 2013, pp. 740–755(2013). https://doi.org/10.1007/978-3-642-39799-8 5223. Ishtiaq, S.S., O’Hearn, P.W.: BI as an Assertion Language for Mutable Data Structures. In:Hankin, C., Schmidt, D. (eds.) POPL 2001, pp. 14–26 (2001)24. Itzhaky, S., Bjørner, N., Reps, T.W., Sagiv, M., Thakur, A.V.: Property-DirectedShape Analysis. In: Biere, A., Bloem, R. (eds.) CAV 2014, pp. 35–51 (2014).https://doi.org/10.1007/978-3-319-08867-9 325. Kulkarni, S., Mangal, R., Zhang, X., Naik, M.: Accelerating Program Analyses by Cross-Program Training. In: Visser, E., Smaragdakis, Y. (eds.) OOPSLA 2016, pp. 359–377 (2016).https://doi.org/10.1145/2983990.298402326. Le, Q.L., Gherghina, C., Qin, S., Chin, W.: Shape Analysis via Second-OrderBi-Abduction. In: Biere, A., Bloem, R. (eds.) CAV 2014, pp. 52–68 (2014).https://doi.org/10.1007/978-3-319-08867-9 427. Le, Q.L., Sun, J., Chin, W.: Satisfiability Modulo Heap-Based Programs.In: Chaudhuri, S., Farzan, A. (eds.) CAV 2016, pp. 382–404 (2016).https://doi.org/10.1007/978-3-319-41528-4 2128. Le, Q.L., Sun, J., Qin, S.: Frame Inference for Inductive Entailment Proofs in Sep-aration Logic. In: Beyer, D., Huisman, M. (eds.) TACAS 2018, pp. 41–60 (2018).https://doi.org/10.1007/978-3-319-89960-2 329. Le, Q.L., Tatsuta, M., Sun, J., Chin, W.: A Decidable Fragment in Separation Logic withInductive Predicates and Arithmetic. In: Majumdar, R., Kuncak, V. (eds.) CAV 2017, pp.495–517 (2017). https://doi.org/10.1007/978-3-319-63390-9 2630. Le, T.C., Zheng, G., Nguyen, T.: SLING: Using Dynamic Analysis to Infer Program In-variants in Separation Logic. In: McKinley, K.S., Fisher, K. (eds.) PLDI 2019, pp. 788–801(2019). https://doi.org/10.1145/3314221.3314634
1. Leino, K.R.M.: Dafny: An Automatic Program Verifier for Functional Correct-ness. In: Clarke, E.M., Voronkov, A. (eds.) LPAR 2010, pp. 348–370 (2010).https://doi.org/10.1007/978-3-642-17511-4 2032. Li, Y., Tarlow, D., Brockschmidt, M., Zemel, R.S.: Gated Graph Sequence Neural Networks.CoRR abs/1511.05493 (2015)33. Mal´ık, V., Hruska, M., Schrammel, P., Vojnar, T.: Template-Based Verification of Heap-Manipulating Programs. In: Bjørner, N., Gurfinkel, A. (eds.) FMCAD 2018, pp. 1–9 (2018).https://doi.org/10.23919/FMCAD.2018.860300934. Min´e, A.: The Octagon Abstract Domain. Higher-Order and Symbolic Computation (1),31–100 (2006). https://doi.org/10.1007/s10990-006-8609-135. M¨uhlberg, J.T., White, D.H., Dodds, M., L¨uttgen, G., Piessens, F.: Learning Assertions toVerify Linked-List Programs. In: Calinescu, R., Rumpe, B. (eds.) SEFM 2015, pp. 37–52(2015). https://doi.org/10.1007/978-3-319-22969-0 336. O’Hearn, P.W., Reynolds, J.C., Yang, H.: Local Reasoning about Programsthat Alter Data Structures. In: Fribourg, L. (ed.) CSL 2001, pp. 1–19 (2001).https://doi.org/10.1007/3-540-44802-0 137. Pacheco, C., Lahiri, S.K., Ernst, M.D., Ball, T.: Feedback-Directed Random Test Generation.In: ICSE 2007, pp. 75–84 (2007). https://doi.org/10.1109/ICSE.2007.3738. Padhi, S., Sharma, R., Millstein, T.D.: Data-driven Precondition Inference withLearned Features. In: Krintz, C., Berger, E. (eds.) PLDI 2016, pp. 42–56 (2016).https://doi.org/10.1145/2908080.290809939. Pham, L.H., Le, Q.L., Phan, Q.S., Sun, J.: Concolic Testing Heap-Manipulating Programs.In: FM 2019. To appear40. Pham, L.H., Le, Q.L., Phan, Q.S., Sun, J., Qin, S.: Enhancing Symbolic Execution of Heap-based Programs with Separation Logic for Test Input Generation. In: ATVA 2019. To appear41. Pham, L.H., Le, Q.L., Phan, Q.S., Sun, J., Qin, S.: Testing Heap-based Programs with JavaStarFinder. In: Chaudron, M., Crnkovic, I., Chechik, M., Harman, M. (eds.) ICSE 2018, pp.268–269. ACM (2018). https://doi.org/10.1145/3183440.319496442. Pham, L.H., Thi, L.T., Sun, J.: Assertion Generation Through Active Learn-ing. In: Duan, Z., Ong, L. (eds.) ICFEM 2017, pp. 174–191 (2017).https://doi.org/10.1007/978-3-319-68690-5 1143. Piskac, R., Wies, T., Zufferey, D.: Automating Separation Logic with Treesand Data. In: Biere, A., Bloem, R. (eds.) CAV 2014, pp. 711–728 (2014).https://doi.org/10.1007/978-3-319-08867-9 4744. Piskac, R., Wies, T., Zufferey, D.: GRASShopper - Complete Heap Verification with MixedSpecifications. In: ´Abrah´am, E., Havelund, K. (eds.) TACAS 2014, pp. 124–139 (2014).https://doi.org/10.1007/978-3-642-54862-8 945. Reynolds, J.C.: Separation Logic: A Logic for Shared Mutable Data Structures. In: LICS2002, pp. 55–74 (2002). https://doi.org/10.1109/LICS.2002.102981746. Sagiv, S., Reps, T.W., Wilhelm, R.: Parametric Shape Analysis via 3-ValuedLogic. In: Appel, A.W., Aiken, A. (eds.) POPL 1999, pp. 105–118 (1999).https://doi.org/10.1145/292540.29255247. Zhu, H., Petri, G., Jagannathan, S.: Automatically Learning Shape Specifi-cations. In: Krintz, C., Berger, E. (eds.) PLDI 2016, pp. 491–507 (2016).https://doi.org/10.1145/2908080.2908125(1),31–100 (2006). https://doi.org/10.1007/s10990-006-8609-135. M¨uhlberg, J.T., White, D.H., Dodds, M., L¨uttgen, G., Piessens, F.: Learning Assertions toVerify Linked-List Programs. In: Calinescu, R., Rumpe, B. (eds.) SEFM 2015, pp. 37–52(2015). https://doi.org/10.1007/978-3-319-22969-0 336. O’Hearn, P.W., Reynolds, J.C., Yang, H.: Local Reasoning about Programsthat Alter Data Structures. In: Fribourg, L. (ed.) CSL 2001, pp. 1–19 (2001).https://doi.org/10.1007/3-540-44802-0 137. Pacheco, C., Lahiri, S.K., Ernst, M.D., Ball, T.: Feedback-Directed Random Test Generation.In: ICSE 2007, pp. 75–84 (2007). https://doi.org/10.1109/ICSE.2007.3738. Padhi, S., Sharma, R., Millstein, T.D.: Data-driven Precondition Inference withLearned Features. In: Krintz, C., Berger, E. (eds.) PLDI 2016, pp. 42–56 (2016).https://doi.org/10.1145/2908080.290809939. Pham, L.H., Le, Q.L., Phan, Q.S., Sun, J.: Concolic Testing Heap-Manipulating Programs.In: FM 2019. To appear40. Pham, L.H., Le, Q.L., Phan, Q.S., Sun, J., Qin, S.: Enhancing Symbolic Execution of Heap-based Programs with Separation Logic for Test Input Generation. In: ATVA 2019. To appear41. Pham, L.H., Le, Q.L., Phan, Q.S., Sun, J., Qin, S.: Testing Heap-based Programs with JavaStarFinder. In: Chaudron, M., Crnkovic, I., Chechik, M., Harman, M. (eds.) ICSE 2018, pp.268–269. ACM (2018). https://doi.org/10.1145/3183440.319496442. Pham, L.H., Thi, L.T., Sun, J.: Assertion Generation Through Active Learn-ing. In: Duan, Z., Ong, L. (eds.) ICFEM 2017, pp. 174–191 (2017).https://doi.org/10.1007/978-3-319-68690-5 1143. Piskac, R., Wies, T., Zufferey, D.: Automating Separation Logic with Treesand Data. In: Biere, A., Bloem, R. (eds.) CAV 2014, pp. 711–728 (2014).https://doi.org/10.1007/978-3-319-08867-9 4744. Piskac, R., Wies, T., Zufferey, D.: GRASShopper - Complete Heap Verification with MixedSpecifications. In: ´Abrah´am, E., Havelund, K. (eds.) TACAS 2014, pp. 124–139 (2014).https://doi.org/10.1007/978-3-642-54862-8 945. Reynolds, J.C.: Separation Logic: A Logic for Shared Mutable Data Structures. In: LICS2002, pp. 55–74 (2002). https://doi.org/10.1109/LICS.2002.102981746. Sagiv, S., Reps, T.W., Wilhelm, R.: Parametric Shape Analysis via 3-ValuedLogic. In: Appel, A.W., Aiken, A. (eds.) POPL 1999, pp. 105–118 (1999).https://doi.org/10.1145/292540.29255247. Zhu, H., Petri, G., Jagannathan, S.: Automatically Learning Shape Specifi-cations. In: Krintz, C., Berger, E. (eds.) PLDI 2016, pp. 491–507 (2016).https://doi.org/10.1145/2908080.2908125