[PDF] Spotting Silent Buffer Overflows in Execution Trace through Graph Neural Network Assisted Data Flow Analysis

Abstract

A software vulnerability could be exploited without any visible symptoms. When no source code is available, although such silent program executions could cause very serious damage, the general problem of analyzing silent yet harmful executions is still an open problem. In this work, we propose a graph neural network (GNN) assisted data flow analysis method for spotting silent buffer overflows in execution traces. The new method combines a novel graph structure (denoted DFG+) beyond data-flow graphs, a tool to extract {\tt DFG+} from execution traces, and a modified Relational Graph Convolutional Network as the GNN model to be trained. The evaluation results show that a well-trained model can be used to analyze vulnerabilities in execution traces (of previously-unseen programs) without support of any source code. Our model achieves 94.39\% accuracy on the test data and successfully locates 29 out of 30 real-world silent buffer overflow vulnerabilities. Leveraging deep learning, the proposed method is, to our best knowledge, the first general-purpose analysis method for silent buffer overflows. It is also the first method to spot silent buffer overflows in global variables, stack variables, or heap variables without crossing the boundary of allocated chunks.

Full PDF

SSpotting Silent Buffer Overﬂows in Execution Trace throughGraph Neural Network Assisted Data Flow Analysis

Zhilong Wang, Li Yu, Suhang Wang and Peng Liu

College of Information Sciences and TechnologyThe Pennsylvania State University, [email protected], [email protected], [email protected], [email protected]

Abstract —A software vulnerability could be exploited withoutany visible symptoms. When no source code is available, althoughsuch silent program executions could cause very serious damage,the general problem of analyzing silent yet harmful executionsis still an open problem. In this work, we propose a graphneural network (GNN) assisted data ﬂow analysis method forspotting silent buffer overﬂows in execution traces. The newmethod combines a novel graph structure (denoted

DFG+ ) beyonddata-ﬂow graphs, a tool to extract

DFG+ from execution traces,and a modiﬁed Relational Graph Convolutional Network as theGNN model to be trained. The evaluation results show thata well-trained model can be used to analyze vulnerabilities inexecution traces (of previously-unseen programs) without supportof any source code. Our model achieves 94.39% accuracy onthe test data, and successfully locates 29 out of 30 real-worldsilent buffer overﬂow vulnerabilities. Leveraging deep learning,the proposed method is, to our best knowledge, the ﬁrst general-purpose analysis method for silent buffer overﬂows. It is also theﬁrst method to spot silent buffer overﬂows in global variables,stack variables, or heap variables without crossing the boundaryof allocated chunks.

I. I

NTRODUCTION

A fundamental challenge in cybersecurity is that vulner-abilities widely exist in all kinds of programs [1] despitesoftware engineers and security analysts have been spendinglots of efforts to avoid and test them. These vulnerabilitiescould be exploited by the attackers and expose a huge threatto individuals, organizations and governments [2]. Althoughresearchers have proposed a variety of techniques to automat-ically discover and analyze software vulnerabilities [3], [4],[5], [6], almost all existing techniques rely on visible “symp-toms” (e.g., crashes, failing assertions, and errors found byintegrity checkers). Most vulnerability discovery methods usesuch symptoms to distinguish (potentially) harmful programexecutions in which a vulnerability is triggered and benignexecutions [7], [8], [9].However, a software vulnerability could be exploited with-out any visible symptoms, and the corresponding programexecutions are often called a “silent” yet harmful execution.For example, some silent buffer overﬂows, silent Use-After-Free, and silent information leak could happen given speciﬁcmalicious inputs. All these silent yet harmful program execu-tions, though not as frequently seen as harmful as executionscarrying visible symptoms, could still be leveraged by attack-ers to compromise the system [10] and cause serious damage(e.g., altering program variables, leaking critical information).When the source code is available, silent yet harmful execu-tions in principle can be identiﬁed and analyzed. By leveragingsemantic information obtained from source code, researchershave developed various tools to identify and analyze them [11], [12], [13]. For example, Konstantin et al. [11] developedAddressSanitizer to detect memory errors and diagnose rootcauses (of silent yet harmful executions) through source codelevel instrumentation.However, in many cases the commercial software and legacycode targeted by the attacker has no source code available(to organizations using the software), and it is widely rec-ognized in the research community that when source codeis not available, analyzing silent yet harmful executions isan extremely difﬁcult problem. A fundamental challenge insolving this problem is lack of high-level, semantically richinformation about data structures in the executables [14].Due to the fundamental challenge, the general problem ofidentifying and analyzing silent yet harmful executions is yetto be solved. In the literature, only a small portion of silentvulnerabilities can be identiﬁed. For example, there are aspectrum of silent buffer overﬂow, but only overﬂow acrossthe boundary of allocated chunks in heap can be detected byexisting methods [15], [16]. These methods capture the lengthof dynamically allocated buffer by hooking the heap allocationfunctions. Then they check the integrity of buffer accessby comparing the buffer length and offset of buffer access.So far, no effective method has been proposed to analyzesilent buffer overﬂows in global variables, stack variables,or heap variables without crossing the boundary of allocatedchunks. As stated by Dinesh et al. [16], binary disassemblyis insufﬁcient to recover data section layouts and semanticinformation lost during compilation. Not surprisingly, silentvulnerability analysis without source code is still an openproblem and there lacks a general purpose analysis methodfor even one main category of silent vulnerabilities.In this work, we seek to develop a general purpose analysismethod for silent buffer overﬂows, one most important cate-gory of silent vulnerabilities. The proposed method is basedon a key observation:

Key Observation.

In silent yet harmful program executions,the data ﬂows towards the variables corrupted by silentbuffer overﬂow and the memory space layout of some of thecorrupted variables are inherently different from those of non-affected variables.

It is worth noting that human analysts haveto examine enough data ﬂows and memory layout patterns,which is usually very time consuming, before they couldleverage this key observation and identify the exact differencesbetween corrupted and non-affected variables.In light of this, we propose to leverage Graph NeuralNetwork [17] to signiﬁcantly reduce manual efforts. In fact,our method is close to 100% automatic.

Our insight is thatcritical information about the difference between corruptedand non-affected variables could be represented by a novel a r X i v : . [ c s . CR ] F e b raph structure. Then Graph Neural Network (GNN) couldlearn essential features (from graphs extracted from executiontraces) through representation learning. Then the learned rep-resentations could enable the GNN model to “analyze” a givenexecution trace by classifying the nodes in the graph extractedfrom the trace. Finally, the nodes classiﬁed as “vulnerable”may provide enough information for automatically locatingthe address of vulnerable instructions and vulnerable buffers,respectively.Speciﬁcally, we design a novel graph data structure to holdimportant features obtained from program executions, includ-ing data ﬂows, variables’ spatial information, and some usefulimplicit information ﬂows. During the model training phase,we utilize a dynamic analyzer based on Intel Pin [18] to buildthe newly designed graph automatically from execution traces(of various programs) and customize AddressSanitizer [11] tohelp assign labels to nodes in the graph. A node with label“vulnerable” corresponds to a corrupted variable resulted fromsilent buffer overﬂow. Using the labeled graph as training data,we design and train a Bi-directional Propagation RelationalGraph Convolutional Network ( BRGCN ) to perform node clas-siﬁcation.After the

BRGCN model is well trained and deployed, themodel can be used to analyze vulnerabilities in executiontraces (of previously-unseen programs) without support of anysource code. The experiments show that our model achieves94.39% accuracy on the balanced test data, and successfullylocates 29 out of 30 real-world vulnerabilities which we obtainfrom a public vulnerability database [19], [20]. The evaluationresults show that graph neural network assisted data ﬂowanalysis is an effective general-purpose method in spottingsilent buffer overﬂows when source code in not available.In summary, we made the following contributions: • To the best of our knowledge, this work proposes the ﬁrstgraph neural network assisted data ﬂow analysis methodfor spotting silent buffer overﬂows in execution traces. Itcan analyze a full spectrum of silent buffer overﬂows. • We designed a new type of graph data structure

DFG+ to represent programs’ data ﬂow, variables’ spatial infor-mation, and implicit information ﬂow, in an integratedmanner. • We implemented a tool based on Intel Pin to auto-matically generate

DFG+ from program executions andcustomized AddressSanitizer to help assign ground truthlabels for nodes in

DFG+ . • We modiﬁed the Relational Graph Convolutional NeuralNetwork (RGCN) [21] by introducing bi-directional rela-tion types to make it more effective in program analysis. • We evaluated the effectiveness of the newly designed

DFG+ and the newly designed

BRGCN model, and com-pared with other baseline methods.

In our view, the pro-posed method is neither a “competitor” nor an extensionof existing fuzzing tools.

Without source code, existingfuzzing tools, though very efﬁcient, simply cannot iden-tify silent buffer overﬂows in global variables, stack vari-ables, or heap variables without crossing the boundary ofallocated chunks. Hence, comparing the proposed methodand fuzzing tools could result in “comparing apples andoranges.” II. B

ACKGROUND AND R ELATED W ORK

A. Buffer Overﬂow

For decades buffer overﬂow (BOF) remains as one ofthe main security threats plaguing the cyberspace, attributingto the prevalence of the buffer overﬂow bugs in commod-ity software and the fundamental difﬁculty to ﬁnd and ﬁxthem. Conventionally, BOF vulnerability refers to a categoryof software vulnerabilities which could corrupt the adjacentmemory region due to insufﬁcient bound checking. The bufferassociated with the vulnerability is called vulnerable buffer.According to the location of vulerable buffer, BOF can begroup into heap, stack, and global BOF. When BOF happens,it can cause the program to crash by corrupting data/codepointers (e.g., return address and jump table), or changeprogram state by altering some non-control-data [10]. In thepaper, we classify wild

BOF into three categories accordingto its symptoms. A wild

BOF means the input is not manuallycrafted by analyst, e.g., in exploit generation.1)

Visible Buffer Overﬂow . We call a BOF visible if itshows visible symptoms, such as program crashing, assertionfailing, or displaying garbled string on the screen.2)

Silent Buffer Overﬂow . Another kind of BOFs happenswithout any visible symptoms. For instance, some BOFs thatonly corrupt some dead variables (please see the deﬁne of dead variable in Section V-A), or only corrupt some localvariable in stack, will not crash.3)

Innocent Buffer Overﬂow . A BOF has no effect on theprogram if it only overwrites a padding space, which isinserted by some compilers to the boundary of buffers dueto data alignment speciﬁed by attributes of variable [22] orlanguage features [23].In general, visible BOFs are believed to be easily discoveredand analyzed when they happen, attributing to their visiblesymptoms. If source code is available, silent BOF can alsobe identiﬁed through sanitizer [11] or bound-checking [12].However, it is extremely challenging to analyze silent BOFwhen source code is unavailable, because critical high levelinformation such as length of buffer and type of variables,is lost in the binary during the compilation. The literaturereview shows that all the existing works [15], [16], [24], [25],[26] can only identify silent heap BOFs which overﬂow crossthe boundary of allocated memory chunks. The details ofthese approaches will be discussed shortly in Section II-D.Therefore, the general problem of silent BOF analysis is stillan open problem.It is worth noting that certain BOF vulnerabilities in ex-ecutiables could display either visible symptom or invisiblesymptom in different executions, given different inputs. Inthis case if the vulnerability is triggered by one or morevisible executions, the resulted BOF would be a visible BOF,even though the vulnerability could also be exploited by somesilent executions. Based on this fact, some works in programtesting try to ﬁnd vulnerabilities “shared by” visible BOFs int age, i, total = 0, ages[0x20]; for(i = 0; i <= 0x20; i++){ age = receive(); if(age == -1) break; ages[i] = age; //overflow when i=0x20 total += ages[i]; } Code 1: A piece of code with silent buffer overﬂow.2nd silent BOFs by varying input lengths [7], [9]. However,this is only feasible when the vulnerability is caused by theexcess length of inputs. And it obviously cannot solve thegeneral problem of silent BOF analysis. Firstly, many BOFvulnerabilities only have silent BOF (execution) instances. Anillustrative example is shown in Code 1. In the code, ages isan integer array of length 20 and i is the index in the for loopto access ages . We could see there will be a BOF if i equals but the program won’t terminate at this point. Secondly,the length of buffer access does not necessarily depend on thelength of input. It can depend on the length of a portion ofthe input, or the value of one or several bytes in the input.Under these circumstances, it is almost impossible to knowwhich segment crashes the program. B. Data Flow Graph

Data ﬂow graph was ﬁrst introduced in the data ﬂowmachines to describe parallel computation [27]. A data ﬂowgraph is a directed graph, G ( N, E ) , where nodes in N repre-sent instructions and edges in E represent data dependenciesamong the nodes. Data ﬂow graph is widely used in compileroptimization, such as register allocation, instruction schedulingand dead code elimination [28]. Although there is no univer-sally accepted deﬁnition, data ﬂow analysis generally refers tothe process of collecting and deriving information about theway the variables are deﬁned, used in the program [29].Data ﬂow graph and analysis are also widely used insoftware security to analyze software defects, enforce securepolicies, and so on. Compared with control ﬂow graph, dataﬂow graph is more informative, which contain semantic infor-mation of programs. C. Graph Neural Networks

In recent years, deep neural networks have shown increas-ingly noticeable success in security domains, such as security-oriented program analysis [30], [31], [32] and anomaly detec-tion [33], [34], due to their remarkable representation learningcapabilities [35]. Some representative genres of deep neuralnetworks are convolutional neural network (CNN), recurrentneural network (RNN), and graph neural network (GNN).CNN is developed to capture information from grid data,whereas RNN is designed to capture sequential information.Given the nature of our proposed graph data structure

DFG+ ,GNN is more compelling because of its great ability inrepresentation learning on graphs.Convolutional graph neural networks (ConvGNN), amongother GNNs, adopts convolution operations on graphs tocapture local and global structural patterns, through designingspecial convolution and readout functions [36]. Standard con-volutions on images or text embeddings are not applicable tographs because graphs have irregular structures, so that specialconvolutions have to be designed to work on graph data [17].Depending on how convolution is performed, existing GNNscan be classiﬁed into two categorizes, i.e., spectral-basedconvGNN and spatial-based ConvGNN [37].

1) Spectral-based ConvGNN:

Spectral-based convolution isdeﬁned based on spectral graph theory [38]. In this framework,a graph Laplacian is deﬁned and signals on graphs are ﬁlteredusing eigen-decomposition of graph Laplacian. The graph Note that some compilers may change the layout of variables in stack. Inorder to make the example simple, we do not consider variable reorder here. convolution operators are introduced by deﬁning the graphFourier transform. However, despite the solid mathematicalfoundations, such approaches suffer from large computationalburden, spatially non-localized issue and generalization prob-lem. Considering the computation complexity of spectral-based ConvGNN, we don’t adopt it in our analysis of

DFG+ .

2) Spatial-based ConvGNN:

The main idea of spatial-basedConvGNN (massage passing GNN) is to generate a node v ’s representation by aggregating its own features x v andneighbors’ features x u , where u ∈ set of neighbors of v .Generally, spatial ConvGNN can be deﬁned as: H ( l +1) = σ (cid:32)(cid:88) s C ( s ) H ( l ) W ( l,s ) (cid:33) , (1)where H ( l ) ∈ R n × d l is the latent representation of the n nodesin the l -th layer and d l is number of features. C ( s ) is the s -th convolution kernel that deﬁnes how the node features arepropagated to the neighborhood nodes, W ( l,s ) ∈ R d l × d l +1 is the trainable weight matrix that maps the d l -dimensionalfeatures into d l +1 dimension, σ is the activation functionsuch as ReLU [ ? ] and tanh. Equation 1 covers a broad classof ConvGNN, and different designs of spatial ConvGNNare distinguished by the their convolution kernel C ( s ) andvariability induced by W ( l,s ) in Equation 1.In each layer, message passing algorithm (neighborhoodaggregation) deﬁned in the Equation 1 aggregates featuresfrom a node’s local neighborhood. Therefore, the node rep-resentation learned by the a k -layer ConvGNN include notonly the features of itself, but also the features of its k -hop neighborhoods and the local graph structure [36]. Thenode representation learning ability of ConvGNN has beenintensively investigated by recent research [39]. The greatability of ConvGNNs in modeling graph structured data havefacilitated various domains such as social network mining [40],[41], knowledge graphs [42], bioinformatics [43] and recentlycode similarity comparison [44]. Thus, it has great potentialto adopt ConvGNNs for representation learning on DFG+ todetect vulnerabilities.

D. Related Works

In this section, we will introduce some works that spotBOF on the source code and binary level, and discuss theirlimitations.

Source Code based Schemes.

The source code basedschemes usually adopt source code analysis to collect semanticinformation (e.g., buffer length) and enforce their detectionrules through source code instrumentation. Among severalexisting works [11], [12], [13], we choose one representativescheme – AddressSanitizer (ASAN) – to introduce. Brieﬂy,ASAN detects spatial bugs by reserving

Redzone aroundheap, stack, and global objects, detects temporal bugs byquarantine for heap and stack objects to delay the reuse ofdead objects. Technically, it leverages shadow memory tomark whether an address in the program space belongs to

Redzone or not, and checks legality of target addressesbefore instructions access variables in memory. Despite itseffectiveness to detect all kinds of BOFs in global, stackand heap, ASAN has two limitations: ﬁrslty, ASAN needsprogram’s source code, thereby cannot detect bugs in legacycode and commercial software. Secondly, it fails to detect non-linear buffer-overﬂow (an access that jumps over a

Redzone ).3ABLE I: Comparison of the related works’ effectiveness to detect buffer overﬂow.

Defence Tools Require Source Code or Not Silent Heap BOF Silent Stack BOF Silent Global BOFB

OIL

No Partial No NoAddressSanitizer Yes Yes Yes YesTaintCheck No No No NoMemcheck No Partial No NoFuzzing No No No NoSymbolic Execution No No No NoThe Proposed Method No Yes Yes Yes

Static Approaches on Binary.

Rawat et al. [45] researchedthe detection of potential stack-based BOF vulnerability inbinary code. Different from traditional works that usuallydeﬁne vulnerability patterns at the syntactic level (e.g., func-tion name), they considered more features of vulnerabilitieson semantic level and deﬁned the buffer overﬂow inducingloops (

BOIL ) to summary the semantic patterns of poten-tial vulnerable loop. Based on the proposed patterns, theydeveloped a prototype to identify potential vulnerable loopsfrom executables. The advantage of their approach is that itdoes not need to execute the program and can achieve highcode coverage. However, as pointed out in their paper, theirscheme can only deal with a special case of BOF. We thinkit is due to the challenges to summarize all patterns basedon human efforts. In addition, the reported positive functionsin this scheme can only be viewed as potential vulnerablefunctions and needs further veriﬁcation, because no concreteinput is available to verify the reported BOF.

Dynamic Approaches on Binary.

Dynamic approachesanalyze BOF vulnerabilities by ﬁnding vulnerable execu-tions. In the binary level, taint analysis (also known as dataﬂow tracking) is a popular method to debug vulnerabilities.TaintCheck [46], proposed by James Newsome and DawnSong, which locates BOF based on one simple assumption – innormal data ﬂow, pre-deﬁned taint source, such as user inputs,environment variables, and network data will not propagate topointers. Therefore, their approach can not detect the silentBOF which does not violate their assumption.Memcheck [47] is a well-known vulnerability analysis toolthat is implemented based on dynamic binary instrumentationframework Valgrind [48]. Technically speaking, it obtains theaddress and size of buffers in heap by hooking function callsto heap allocation functions and parsing their parameters andreturn values. By comparing the offset of buffer accessesand the length of allocated buffer in heap, they detect heapBOF which writes out of allocated heap chunk. As mentionedin [11], Memcheck and some other tools (Dr. Memory [24],Purify [25] and Intel Parallel Inspector [26]) that adopt similarapproaches are not capable to ﬁnd out-of-bounds bugs in thestack (other than beyond the top of the stack frame), globaland in heap if the overwrite does not across the boundary ofallocated chunk.Fuzzing [7], symbolic execution [49], gradient descent [9]and hybrid approaches [8] are widely adopted path explorationmethod to automatically ﬁnd vulnerable paths in software.These schemes try to generate input that achieve high cover-age, and ﬁnd input that can trigger bugs. However, they selectpositive inputs based on whether it can crash the program.In such a case, the silent BOFs are ignored. Recent researchworks [15], [16] trying to detect silent vulnerabilities duringFuzzing suffer from same limitations with the works discussed in last paragraph.Table I summary the limitations of related works we men-tioned above. In conclusion, the general problem of silent BOFanalysis is still an open problem, and it remains a fundamentalchallenge to cope with the full spectrum of silent BOF at thebinary level.III. P ROBLEM F ORMULATION AND C HALLENGES

In this section, we will formalize the silent BOF analysisproblem and present the challenges to solve it.

A. The Problem and Research Goal

In this example shown in Code 1, an integer array ages with a ﬁxed length is deﬁned and allocated on the stack. Theloop copies one excess integer to the array and the nearestvariable total in the heap address will be overﬂowed, withoutdisturbing the normal execution of the program. We namethe out-of-bound access (writing/reading) during execution as invalid operations or invalid access , deﬁne the instructionaddress of the invalid operation as the overﬂow point , nameexecution with invalid operation as BOF instance , and acollection of runtime information as execution trace .With the above notations, our research goal is to locate the overﬂow point of silent BOF in an executable by analyzing itsexecution trace. To be more speciﬁc, we want to:1) distinguish

BOF instances from normal executions.2) locate the invalid operations in execution trace and overﬂow point in an executable.3) pinpoint each of them separately if there are more thanone overﬂow points in one execution trace . B. Why is This Problem Hard?

Due to the unavailability of type information in binary, it isnot possible to identify invalid operations by comparing theoffset of buffer reading/writing with length of target buffer.As shown in Code 2, which is the assembly code generatedfrom Code 1, the instruction at line 1 allocates memory for sub $0x94,%esp movl $0x0,-0x10(%ebp) movl $0x0,-0x14(%ebp) jmp target2 target1: call mov %eax,-0xc(%ebp) cmpl $0xffffffff,-0xc(%ebp) je target3 mov -0x10(%ebp),%ebx mov -0x94(%ebp,%ebx,4),%eax add %eax,-0x14(%ebp) addl $0x1,-0x10(%ebp) target2: cmpl $0x20,-0x10(%ebp) jle target1 target3: Code 2: The assemble code compiled from Code 1.4ariable age , i , total and array ages , and the instructionsat line 2 and line 3 initialize two variables with . Numberof allocated variables, the type and length of each variable areunrevealed in variable allocation and initialization instructions.Therefore, these lost information need to be inferred from theexecution trace. For instance, we could develop heuristic rulesto infer the boundary of buffers by distinguishing the dataﬂow patterns to access buffers and to access their adjacentvariables.However, the inferences usually need a lot of domain knowl-edge associated with instruction set, conventions of compiler,and features of programming language. Generally, it is unclearwhich patterns the silent BOFs follow and which featuresexist in execution trace that could be used for analysis. Infact, designing heuristic rules is very time consuming andcomplicated [45] because it requires observing both enoughsilent BOF traces and normal execution traces.IV. A PPROACH O VERVIEW

The challenges faced by traditional methods motivate usto solve the problem through deep neural networks. Althoughthe complexity of patterns to identify silent BOF is a dauntingchallenge for human analysts to develop heuristic rules, it maynot be a challenge for a deep learning algorithm given enoughtraining data. Accordingly, we propose to spot silent BOFbased on graph neural network assisted data ﬂow analysis.In this section, we ﬁrst provide several insights based on ourdomain knowledge, which motivate us to choose our technicalapproach, and these insights will be veriﬁed through severalexperiments. Then, we provide an overview of our proposedapproach, and point out the challenges we must address.

A. Insights1) The Essential Information Need to be Captured for BOFAnalysis:

Through dynamic binary instrumentation, lots ofinformation can be collected along with program’s execution,such as control ﬂow, data ﬂow, accessed memory, valuesof operands for each instruction, and executed instructionsequence. However, not all these information are useful toidentify silent BOF. If the unnecessary information get in-cluded in training data, it will introduce noise and reduce theaccuracy of the model. Hence, two questions are raised andthe answers to them are associated with the domain knowledgerelated to buffer overﬂow and dynamic program analysis:

Q1.

What information should be selected?

Q2.

How to design the data structure to hold the data?Firstly, as discussed in the Section II-A, most silent BOFsdo not violate a valid control ﬂow (by corrupted code pointers),but they always violate a valid data ﬂow. Therefore, the dataﬂow is meaningful to be integrated to our training data. A dataﬂow graph (DFG) is the most popular way to represent pro-gram’s data ﬂow. Secondly, the spatial information (variablelayout) is useful to diagnose BOF, because BOF is an spatialerror [1]. We will discuss the challenge to represent the spatialinformation later on.Thirdly, some other (implicit) information ﬂow, such asinformation ﬂow from a data pointer to its pointed variable,and information ﬂow from condition variables to branch target,could be useful to infer whether variables are pointers orloop control variables. These information could be of greatimportance as many BOFs happen due to unsafe the pointerdereference in a loop. Fourthly, besides the information ﬂow, information itself,i.e., the values of variables, could also be useful. In fact,the value of certain variables like loop control variablescould be very useful to analyze BOF. However, 1) there isno deterministic relationship between value of loop controlvariables and BOFs, and 2) the values of variables can be verynoisy and hard to interpret on the binary level. Therefore, wedecide not to include variable values in our data. Instead wepropose to incorporate some attributes of variables such aswhether a variable is immediate or is copied from the userinput.Based on the above insights, we leverage a novel graph datastructure to capture the essential information. Since the graphwe build is based on the program’s runtime data ﬂow graph,together with some spatial information, we call it Data FlowGraph Plus (

DFG+ ).

2) Model Selection:

Q3.

Why is this a node classiﬁcation problem?With the graph structure and its nodes and edges, graphanalysis tasks can be grouped into three categories [17]: graphclassiﬁcation, node classiﬁcation, and link prediction.The graph classiﬁcation aims to classify graphs into dif-ferent types. When it is applied to our

DFG+ , the problembecomes classifying whether a program execution containssilent BOF or not. Since the goal of our work is not onlyto identify vulnerable execution, but also to locate the invalidoperations inside the execution, we cannot follow the graphclassiﬁcation task. Link prediction is the problem of inferringmissing relationships between entities (nodes), which also doesnot ﬁt our need. Node classiﬁcation, on the other hand, aimsto classify nodes into different categories. If adopted, it coulddistinguish vulnerable nodes and benign nodes in graph, andthe vulnerable point in execution trace can be located bymapping nodes from the graph to program trace. Hence, theproposed research goals can be achieved by solving a nodeclassiﬁcation problem.

Q4.

Why is graph neural network a promising approach?Firstly, the graph neural network can learn node featuresand graph structure, which is exactly how the

DFG+ encodesdata ﬂow information. Secondly, deep learn has shown verypromising result in some reverse engineering works, suchas [30], [31]. In these works, it has shown superior perfor-mance compared with traditional methods, which indicatesthat it has great learning ability. Compared with other machinelearning algorithms, the deep neural network has following 2superiority: As stated in [30], there are some attractive featuresof neural network, “ﬁrst, neural networks can learn directlyfrom the original representation with minimal feature engi-neering” and “second, neural networks can learn end-to-end,where each of its constituent stages are trained simultaneouslyin order to best solve the end goal”.

Q5.

Why do we choose relational graph neural network?Given the

DFG+ with multiple types of edges as the trainingdata, it is natural to adopt the relational graph neural network.RGCN [21] was originally proposed to represent knowledgebases with entities and triples as directed labeled multi-graphs.The entities are treated as nodes and the triples of the form(subject, predicate, object) are encoded by labeled edges,which is similar to the data structure in

DFG+ . There are othermodels capable of modeling graph structured data, that wethink not suitable in our case. Graph recurrent neural network(GRNN) [50] works on dynamic graphs where graphs are5 ource code executablessource code instrumentation labeled DFG+runtime analyzer Trained BRGCNmodel trainingexecutables unlabeled DFG+ labeled DFG+ vulnerability inforuntime analyzer Trained BRGCN vulnerability identification

Training PhaseTesting Phase

Fig. 1: Approach overview.evolving over time [51]. The GraphSAGE network [52] doesnot consider different types of edges in node classiﬁcation. TheHeterogeneous graph neural network [53] aggregates hetero-geneous attributes or contents associated with nodes, which isoverly complicated for

DFG+ , where nodes contain relativelyeasy-to-encode attributes. Finally, label propagation [54] relieson a nearest neighbor graph to generate pseudo-labels fornodes and is often used in semi-supervised learning [55].

B. Overview

Fig.1 provides an overview of our proposed method, whichconsists of two major phases. In the training phase, we developa runtime analyzer based on Intel Pin to trace program runtimeinformation and organize it into a graph structure (i.e.,

DFG+ ).In the training samples, locations of invalid operations insilent BOF execution are obtained through source code instru-mentation. The invalid operations are reﬂected as vulnerablelabels for nodes in graph. We then train a

BRGCN model onthe labeled

DFG+ data, for testing in the following phase.In the testing phase, the trained

BRGCN model is used topredict silent BOF in programs with binary only. The analyzerwill trace program runtime information and construct

DFG+ without node label. It also generates maps that mapping eachnode from

DFG+ to instructions in program and in executiontrace. After labels are predicted for each node in

DFG+ , themapping can help us to locate the invalid operation in silentBOF.

C. ChallengesChallenge 1: How to apply the model trained on multipleprograms/graphs to a previously-unseen program?

Due to thespeciﬁcity of each program, selecting more semantic informa-tion from program execution inevitably introduce knowledgerelated to particular program logic into our dataset, which maynot hold in other programs. These knowledge, if learned bythe model, will hurt model’s generalization ability. As a result,some previous works [56], [57] on neural network assistedFuzzing can only let a model be trained and tested on thesame program. We will discuss how to cope this challengeshortly when presenting the design of

DFG+ and graph neuralnetwork in Section V-A and Section V-D, respectively.

Challenge 2: How to generate labels for

DFG+ ? Generally,training a high quality model needs a fair amount of trainingsamples with ground truth. We do not want to manually labelthe nodes in

DFG+ , which requires lots of human efforts.Hence, we need to develop a tool to label the data samplesautomatically. The detail of how data are labeled is presentedin Section V-C1.

Challenge 3: How to represent spatial information in a deeplearning-friendly manner?

Adding variable address to the training data could be the simplest way to include the spatialinformation. However, it is hard for the deep learning modelto learn the variable layout from the variable addresses. Wewill discuss how we represent the spatial information so thatthe deep learning model can quickly capture it in Section V-A.

Challenge 4: How to cope with extremely unbalanceddataset?

Each

DFG+ generated from program execution hasmore than 200,000 nodes on average, but only a few of themare vulnerability nodes, which means the dataset is extremelyunbalanced. V. D

ESIGN AND I MPLEMENTATION

In this section, we will ﬁrstly introduce the design of

DFG+ ,technique details of compiler plugin and run-time analyzer,and how they work together to generate labeled

DFG+ . Then,we present the

BRGCN and how it helps to spot silent BOFs.

A. DFG+1) Spatial Information:

The trained model should be able tocapture the general information and ignore program-speciﬁcinformation , so that the model trained on one set of DFG+can be applied to the other set. The general information is theknowledge shared among programs, such as the knowledge todetermine whether two variables are adjacent to each other.The program-speciﬁc information is the knowledge that onlycomes with a speciﬁc program, for example an integer variableis located at .To encode spatial information, there are two potentialmethods: to integrate address of each variable into variableattributes in data ﬂow graph or to use relations to reﬂectthe adjacency relationships of variables. We did not choosethe ﬁrst method due to two observations: 1) ﬁrst, the spatialinformation, such as adjacency of two variable, are speciﬁcrelations between variables, rather than entities or attributes,which should not be encoded as node features. 2) second,the value of variable address is always associated with aconcrete execution and will change if a program is com-piled with different compilers or options, or run in differentsystem environment (e.g., different heap allocation), or evenat different executions (e.g, different loading address due toASLR [58]). Integrating address into data ﬂow graph willintroduce program-speciﬁc information which is not helpfulfor the model. Therefore, we instead use relations (edges in a

DFG+ ) to indicate if two variables are adjacent to each other.In this way, we can represent the spatial information in a deeplearning-friendly manner.

2) Basic Design:

Using the terminology from data ﬂowanalysis [59], a live variable is deﬁned when an instructionwrites value to the variable, and a live variable is used whenan instruction reads the value of the variable. A variable is6 ive at a program point p if current value of this variable willbe used in future. A live variable v is dead at program point q if after program point q the value of v is redeﬁned before itis used or not will not be used anymore. Nodes of Graph.

A node in

DFG+ represents a live variable.Therefore, multiple nodes will created for a variable if thevariable is deﬁned and redeﬁned along with program execu-tion. Note that the “variable” in our context not only refersto variables deﬁned in source code, it can be operands ofany instructions (e.g., the return address on stack, register andimmediate value). According to the attributes of the variablethat a node corresponding to, we group nodes in graph into 4types: − Memory Node (m-node) denotes a live variablestored in memory. − Register Node (r-node) denotes a live variablestored in register. − Immediate Node (i-node) denotes an immediateoperand. − External Node (e-node) denotes a variable de-ﬁned by a system call. The e-node is a special type ofnodes for variables associated to external data (e.g., user inputsand environment variables, and so on). The input data usuallycontain dangerous variables, which could result in BOF.

Edges of Graph.

We deﬁne 5 classes of directed edges in

DFG+ to reﬂect program’s direct or implicit information ﬂowand spatial information. Note that each “variable” that appearsin following list is corresponding to a node in graph. − Data Flow Edge (d-edge) denotes a direct infor-mation ﬂow from a source variable to a target variable. Thereexists a direct information if the value of the source variableis used to calculate the value of the target variable. − Adjacency Edge (a-edge) denotes that two vari-ables are adjacent to each other. The direction of a-edge denotes relative high or low of two variable addresses. − Index Edge (i-edge) denotes an implicit infor-mation ﬂow (implicit data ﬂow). The information ﬂow isimplicit if a pointer or offset a is used to address a variable b to be read or written. − Redefine Edge (r-edge) denotes that a live vari-able is covered by another live variable. The r-edge not onlyindicates that these two live variables are in the same address,but also implicates the order of data ﬂow for this variable. − Comparison Edge (c-edge) denotes another kindof implicit information ﬂow, which happens when a live variable be compared with another live variable. The values oftheir operands will affect the value in eflags register, whichthen affect target of a conditional branch.Fig. 2 shows a

DFG+ generated from the execution of apiece of code in Code 2. The executed instruction sequenceis [ , , , , , , , , , , , , , ] . Let’s take the severalnodes and edges generated from sub $0x94, %esp , as anexample to demonstrate how the graph is generated. Node , , and , represent immediate $0x94 , source operand %esp ,and the destination operand %esp , respectively. There is a d-edge and r-edge between and because the old valuein %esp was used to calculate new value for %esp , and live variable in %esp is redeﬁned . Besides, the a-edge betweennode and denotes that live variables in -0x10(%ebp) and -0x14(%ebp) are adjacent to each other. The i-edge fromnode to node denotes that -0x14(%ebp) is used to address m-noder-nodee-nodei-noded-edgei-edger-edgec-edgea-edge1 24 365 781011 12141617 1819 20 2113 2322 242526915 27 Fig. 2: A DFG+ generated from execution of a piece of code.variable in -0x10(%ebp) . The c-edge from node and to denotes that that the comparison between live variable in -0x10(%ebp) and immediate number $0x20 determines thevalue in %eflags . Labels of Nodes.

There are two types of labels for graphsnodes: 1) vulnerable label represents that the node isgenerated from an invalid operation in silent BOF; 2) benignlabel represents that it is generated from normal operation.

3) Reﬂection:

Through the noval design, variable type,information ﬂow and adjacency relationships of variables gath-ered through program tracing are represented by node featuresand graph structure through different types of edges (relations).From the graph, we can not only clearly see the different graphstructure associated with different type of operations, but alsosee the difﬁculty to compare the difference between graphstructures for vulnerable and benign nodes manually throughhuman efforts. In Section V-D, we will show how GraphNeural Network capture these features through representationlearning.Here, we talk about how the design of

DFG+ helps toovercome Challenge 1. The novel design of

DFG+ aims toencode general information and eliminate program-speciﬁcinformation in program so that our trained model on someprograms is able to be applied to other programs. Speciﬁ-cally, variable address, variable value, and opcodes which aretightly associated with a speciﬁc program, are not included in

DFG+ . Instead, we select address agnostic and value agnosticfeatures – information ﬂow, variable adjacency and generalvariable features – from execution trace, and encode them asdifferent types of edges and node features in

DFG+ . So that amodel trained on training set can applied to predict labels onthe testing

DFG+ . B. Compiler Plugin for Data Labeling

We implement a tool to insert some code to binary throughsource code instrumentation, which can automatically distin-guish vulnerable and benign operations in program execution.As discussed in the related works, ASAN can detect out-of-bound memory accesses (i.e., invalid operations ) in BOFexecution. Therefore, we leverage ASAN to detect the invalidvulnerable operations , which helps the graph constructor (tobe discussed in next subsection) to label the nodes. However,ASAN has four features, which pose 4 problems in ourscenario: 1) ASAN inserts extra instructions before memoryallocation, access and destroy. 2) ASAN inserts

Redzone among variables, which change the adjacency relationshipsof variables. 3) ASAN reports memory errors by outputing7ulnerable information, then terminate the execution. 4) ASANcan only detect BOF on function level for function linkedfrom external libraries. Speciﬁcally, ASAN hooks function callto library functions, and provides wrapper functions to checkwhether BOF happens by analyzing the parameters passed tothese library functions.The extra instructions inserted by ASAN will introduceirrelevant information ﬂow and the inserted

Redzones breaksome a-edge s in the constructed

DFG+ . Besides, if theexecution terminates at the point of ﬁrst invalid access , thedata ﬂow afterwards will be missing, so we have to modifyASAN to make it report invalid operations without terminatingthe program’s execution. In the following paragraphs, we willshow how we solve these problems.

1) How To Exclude Irrelevant Data Flow from InstructionsInserted by ASAN?:

To deal with the ﬁrst problem, we need todistinguish program’s original instructions and the instructionsinserted by ASAN. To achieve this goal, we modify thecompiler plugin from ASAN to insert a pair of instructions(i.e., prefetcht1 and prefetcht2 ) at the beginning andend of each piece of code inserted by the ASAN. The pair ofinstruction serves as indicators that can be easily distinguishedand skipped when the runtime analyzer builds

DFG+ alongwith program execution. We adopt prefetch instructionsbecause prefetch has no side effect to program’s runtimestate and we can easily disable them.

2) How to Restore the Relation of Variable Adjacency?:

To handle the second problem, we leverage the shadowmemory to restore the original relation of variable adjacency.Speciﬁcally, shadow memory maintained by ASAN’s runtimeenvironment recorder the location of inserted

Redzone inthe address space of target program. The compiler pluginwill save conﬁguration of shadow memory and share it tothe graph constructor. Through these conﬁguration, the graphconstructor can query the shadow memory and restore theoriginal adjacency relationships of variables. We will discussthe details of how the adjacency relationships are restored inSection V-C2.

3) How to Label Nodes Generated from Vulnerable Oper-ations?:

To solve this problem, we let the compiler pluginto emit prefetcha as indicator before each suspicious in-struction (that results to out-of-bound read/write). Since ASANchecks the validity of target address before each suspiciousmemory access, and prefetcha will only be executed whenmemory errors are detected before it really happens. Bythis way, the runtime analysis routine be notiﬁed through prefetcha and assigns different labels to nodes, accord-ingly. Thus, we can achieve our goal without terminating theprogram execution and introducing any irrelevant data ﬂow.

4) How to Identify Vulnerable Operations in Library Func-tions?:

To solve the last problem, we instrument the necessarylibraries with customized compiler, then link the instrumentedlibrary functions to target program. However, we observe thatthe most commonly used library in linux – glibc – cannot becompiled by LLVM due to some unsupported features, and thellvm-libc is still in planing phase [60]. Alternatively, we onlyinstrumented vulnerable functions in glibc, such as scanf and strcpy . Then, in the runtime library (runtime-rt [61])of LLVM, we hook calls to these vulnerable functions andredirect execution to instrumented ones. In such a case, thevulnerable node in the glibc can be labeled accurately. C

AVEAT . The customized compiler plugin is only used tohelp the runtime analyzer to assign labels to vulnerable nodesin built graph. The runtimer analyzer will assume all othernodes in graphs as benign nodes. Therefore, there is no need toinstrument other functions without vulnerabilities in libraries.However, the memory allocation and free functions, such as malloc and free , are special cases. Even through no BOFhappens in these functions, we still need to instrument thesefunctions to update the shadow memory.

C. DFG+ Construction based on Runtime Analyzer

The runtime analyzer is implemented based on IntelPin [18], which builds

DFG+ along with program’s execution.Intel Pin provides comprehensive APIs for code inspectionand instrumentation: the inspection APIs helps to analyzeinstructions in binary and the code instrumentation APIs helpto instrument code according to the results of inspection.The developed runtime analyzer consists of three components:dynamic code analysis and instrumentation, memory layoutrestoration, and graph construction. Fig. 3 demonstrates thewhole workﬂow.

1) Dynamic Code Analysis and Instrumentation:

The dy-namic code instrumentation consists of three phases: codeinspection, code instrumentation and runtime analysis. Beforecode instrumentation the analyzer ﬁrstly analyzes instructionsand system calls. Three types of callback functions will beregistered according to the analysis results: • Instruction Callback.

The structureof information ﬂow can be easily understoodgiven some examples: the code analysisroutine uses and to deﬁne the structuresof information ﬂow in mov 0x8048000, %eax , sub%eax, %ebx , respectively. Then, callback functions areregistered to instructions according to the types andstructures of information ﬂow as demonstrated in Fig. 3. • System Call Callback.

Some system calls copysome external data to program space, the variables inwhich should be recognized as e-nodes . Call backfunctions are registered to these system calls to labelcorresponding memory regions at runtime. • Control Callback.

Two callback functions shouldbe registered to prefetcht1 and prefetcht2 to stopand resubmit the runtime tracing respectively, so thatinserted pieces of code by ASAN can be skipped. Acallback function should be registered to prefetchta to receive the signal about invalid operations , and assignlabels to corresponding vulnerable nodes accordingly.The compiler plugin based on LLVM instruments codeon the intermediate representation (IR) during compilation.During experiments, we observe that some instructions resideoutside of prefetcht1 - prefetcht2 pair in IR level dur-ing source code instrumentation ﬂoat to position which areenclosed by prefetcht1 - prefetcht2 , due to instructionreordering [62]. In such case, information ﬂow resulting fromthe ﬂoated instructions will be lost if we simply stop theanalysis process when execution enters code enclosed by prefetcht1 - prefetcht2 .To solve the problem, we adopt static data ﬂow analysisto identify the ﬂoated instructions insides the prefetcht1 - prefetcht2 pair based on one heuristic rule: an instruction8 ..... sub $0x94,%espmovl$0x0,-0x10(%ebp)movl$0x0,-0x14(%ebp)jmptarget2target1:call mov %eax,-0xc(%ebp)cmpl$0xffffffff,-0xc(%ebp)je target3 …… DFG+ mov $0x0, -0x14(%ebp)

Callback Fun _mov_r2r_(){ ……} return

Code

Data Space … Shadow Mem … Program Space

Query Interfacereturn … Supporting data

Dynamic Code Analysis and Instrumentation Adjacency Relationships Restoration Graph ConstructionInformation flow V a r i ab l e A d j a ce n cy … read or writecontrol flowNodes and edges Restorer

Accessed Addresses

Constructor

Fig. 3: Workﬂow of the runtime analyzer to build

DFG+ with node labels. i inside prefetcht1 - prefetcht2 pair is a ﬂoated instruc-tion if there is a data dependency between an instructions j which is after prefetcht2 and i . Accordingly, we will notexclude information ﬂow resulting from the ﬂoated instruc-tions in runtime analyzer. Then, along with program execution,the callback functions will capture the information ﬂow andaccess memory addresses from executed instructions, and sendthem to the graph constructor and adjacency relationshipsrestorer.

2) Adjacency Relationships Restoration:

We observe thatthere are three kinds of changed adjacency relationships ofvariables, requiring different treatments respectively. We willshow these three cases based on Fig. 4, which shows the layoutof a memory fragment before and after instrumentation byASAN.Firstly, case (cid:172) does not need any restoration. For byte(s) in-side a buffer or variable, the inserted

Redzones do not affectits adjacency relationships, thereby needing no restoration.Secondly, case (cid:173) needs restoration. For byte(s) on theboundary of a buffer or variable, its adjacency relationshipsget changed because of the inserted

Redzone . For example,the adjacent bytes of i +4 in ML is the byte i +3 and byte i +5.However, in the ML w/ Redzone , the adjacent bytes for j +12is byte j +11 and j +13, and the byte j +11 is located in the Redzone . To restore adjacency relationships for this kind ofbyte(s), we ﬁnd the real adjacent bytes by skipping bytes in

Redzone . By skipping

Redzone2 , the real adjacent byte, ...... var2 i +12 buf1 i +4 var1 i (a) Original memory layout (ML). ...... red4 j +28 var2 j +24 red3 j +20 buf1 j +12 red2 j +8 var1 j +4 red1 j (b) Memory layout (ML) after code instrumentation (w/ Redzone ). Fig. 4: Comparison between memory layouts with andwithout

Redzone . i.e., j +7, of byte j +12 can be found.Thirdly, case (cid:174) , which happens in BOF, also needsrestoration. When out-of-bound access happens in a ML w/ Redzone , one or several byte(s) (e.g., x ) in Redzone willbe read/written. If the invalid access is mapped to the ML,the out-of-bound read/write will read/write a byte(s) nearthe vulnerable buffer in high address. The following threesteps can help to ﬁnd the corresponding byte(s) in ML w/

Redzone :1) First, ﬁnding the boundary byte ( b ) of BOF, which is thebyte near the ﬁrst overﬂow byte(s) in low address.2) Second, calculating the distance ( d ) between b and x .3) Third, ﬁnding byte(s) ( y ) by shifting d -1 byte from the b to higher address, while skipping all bytes in Redzone .After mapping the byte(s) x in Redzone to a byte(s) y outside of Redzone , the adjacent bytes found throughstrategies adopted in case (cid:172) , (cid:173) , and (cid:174) for y is the restoredadjacent bytes for x . For example, through aforementionedstrategies, we can map byte in j +21 to byte in j +25 and ﬁndits real adjacent bytes in j +24 and j +26.

3) Graph Construction:

After the information ﬂow arecaptured and ﬁltered through callback functions at runtime,and the adjacency relationships are restored through afore-mentioned three strategies, it is straightforward to construct

DFG+ . We will not cover the details of graph construction.

Supporting Data.

Some data, named as supporting data asshown in Fig. 3, is important not only in the graph constructionphase, but also in vulnerability identiﬁcation phase. For exam-ple, 1) map that maps a node in

DFG+ to the address of itscorresponding variable, 2) map maps a node to the instruction,which creates the node, and so on. We will show how theseinformation is used in Section V-E.C

AVEAT . After the model is trained, we no longer needsource code to generate labels, as the model will predict themfor us. The building the unlabeled graphs for binary-onlyprograms in testing phase as shown in Fig. 1, is much easier.Since the analyzed programs are not to be instrumented, thereis no need to exclude irrelevant instructions and restore theadjacency relationships of variables.

D. Our Graph Neural Network

DFG+ is a novel graph data structure to hold variableattributes, program information ﬂow and variable layout. Gen-erally, the vulnerable data ﬂow in the execution context and thevariable layout for some variables corrupted by silent buffer9verﬂow could be different from that of non-affected variables.In other word, the local graph centered at a vulnerable nodewould be slightly different from that of benign node. Thus,detecting vulnerability is equivalent to node classiﬁcation byconsidering the local graph centered at each node. Thus, weneed to design a model that’s able to learn node representa-tions that capture the local graph structure and neighborhoodinformation, which facilitates the differentiation of vulnerablenodes from benign nodes. Thanks to the message passingmechanism, GNNs are good at learning node representationsby aggregating a node’s neighborhood information. Thus, weadopt GNNs for

DFG+ . As

DFG+ has different types of nodesand edges, we propose to adopt RGCN [21] as our basicmodel because it is developed for representation learning inknowledge graph, which also have different types of nodesand edges.Essentially, the multiple layer RGCN learns node represen-tation for a node v x by aggregating features (attributes) of node v x and its neighbors through message passing. In particular,for different types of edges/nodes, it use different parametersduring message passing, thus preserving the edge/node typeinformation. The propagation rule of RGCN in the l -th layerfor calculating the forward-pass update of a node v i is: h ( l +1) i = σ (cid:88) r ∈R (cid:88) j ∈N ri c i,r W ( l ) r h ( l ) j + W ( l ) s h ( l ) i  , (2)where N ri denotes the set of neighbor indices of node i underrelation r ∈ R and R is a set of relation (edge types). c i,r isa normalization constant that we set as the count of neighborrelation r for node i . W ( l ) r is relation-speciﬁc transformationsmatrix for relation r , which enables relation-speciﬁc messagepassing, thus preserving edge type relationship. To ensurethat the representation of a node at layer l + 1 can also beinformed by the corresponding representation at layer l , asingle self-connection (i.e., W ( l ) s ) term is added. All messagespassed along with incoming edges are aggregated through anelement-wise activation function σ ( · ) . W ( l ) r and W ( l ) s are theparameters to be learned. By stacking K -layers of RGCNtogether, the representation of node v i could capture the K -hop local graph information centered at node v i . Limitation of RGCN.

Equation 2 is the basic designof RGCN, which has shown promising result in the earlyresearch [21]. For node classiﬁcation in

DFG+ , however, thefeatures of a node x ’s outgoing nodes are not used in anappropriate way. If N ri is deﬁned as the set of neighbor indicesof node v i under relation r through incoming edge, messagecan only pass along in these directions. As a consequence,node representation learned by the network only aggregatesfeatures from incoming nodes and some important featuresfrom outgoing nodes are lost. For example, if a global variableis used twice at runtime, its corresponding node x in the DFG+ will have two outgoing d-edges to node y and z re-spectively, i.e., y (cid:13) ← x (cid:13) → z (cid:13) . In such case, the feature cannotbe propagated from y to z or from z to y through x , whichis undesired because node y could be very useful to classifynode z and vice versa. For nodes without incoming links suchas x in the previous example, no information will propagateto them and thus we cannot learn good representations. RGCN with Bi-directional Propagation.

One straightfor-ward solution to the above issues would be ignoring the direc-tion of the edge, i.e., if the N ri is deﬁned as the set of neighborindices of node i under the relation r through either incomingedge or outgoing edge, messages will get processed withthe same relation-speciﬁc transformations W ( l ) r . However, thiswill ignore the difference of incoming and outgoing directions.From the observations above, we extend the basic design toa bi-directional propagation for directed graphs. Speciﬁcally,we adopt two set of parameter for each type of edge:1) W inr is used to propagate messages along with thedirection of directed edge;2) W outr is used to propagate messages against the directionof directed edge;We deﬁne the propagation rule as: h ( l +1) i = σ (cid:88) r ∈R  (cid:88) j ∈IN ri c i,r W in ( l ) r h ( l ) j + (cid:88) k ∈OUT ri c i,r W out ( l ) r h ( l ) k  + W ( l )0 h ( l ) i  (3)where IN ri and OUT ri denote the set of incoming neighborsand outgoing neighbors for node i under the relation r ∈ R ,respectively. The transformations W ( l ) is applied based on thetype and direction of edge. By designing two sets of weightsfor both directions, we make sure that the information ofnode y has a chance to propagate to node z and vice versa(the example shown in the last subsection). In our evaluation,we will quantitatively evaluate the model represented byEquation 3 and compare its effectiveness with the basic designdenoted by Equation 2.Moreover, we deprecate the common one-hot encoding ofIDs for each node as adopted by [21]. Instead, we make thetype of node as the node features, and expect the model tobehave the same regardless of the node order. We will evaluateits effectiveness in Section VII.Fig. 5(a) shows the framework of BRGCN , which takes

DFG+ as input and predict the labels for each nodes.

DFG+ consists of difference types of nodes and edges, which aremarked with different colors in the ﬁgure. The W and c inEquation 3 are parameters of the model, which is learnedduring the training phase. Initially, node features in DFG+ are embedded and fed into the model as the input of theﬁrst layer. Then, layer l computes the update feature (latentnode representation h ( l +1) ) for each node v i by aggregatingfeatures from its neighbors and itself. The output of theprevious layer become the input to the next layer. Finally, inthe output layer, sof tmax ( · ) activation is applied to generatelabel probabilities.Fig. 5(b) illustrates message-passing when calculating up-date feature for a node i in layer l . Features from neighboringnodes are gathered and then transformed for each relationtype individually, together with different transformation matrix W s for different types of edges. For example, W out ( l ) blue is thetransformation matrix for outgoing blue edges. The resultingrepresentations are accumulated and normalized. We choose ReLU as activation function in our model.

DFG+ is a directed graph G = ( V , F , E , R ) with nodes(entities) v i ∈ V have feature f i ∈ F , and edges with (rela-tions) ( v i , r, v j ) ∈ E , where r ∈ R is a relation type. In each10 FG+ … ReLU … ReLU … Output

Hidden Layer (1)

Hidden Layer (2) (a) The model overview. (b) Message propagation for one node.

Fig. 5: Bi-directional Relational Graph Convolutional Network.layer l , node features are updated through function deﬁnedin Equation 3. For each node, its old features ( h ( l ) i ) and itsneighbors’ old features ( h ( l ) j ) are passed along with the edges( ( v i , r, v j ) ∈ E ∨ ( v j , r, v i ) ∈ E ), and then aggregated througha normalized sum ( (cid:80) ( · ) ) and an activation function ( σ ( · ) ) toget the updated new features ( h ( l +1) i ), where h (1) i = f i inthe input layer. A n -layer network allows for message passingacross n -hop in the graph. Therefore, the representation of anode x learned by a n -layer BRGCN model aggregates nodefeatures from a n -hop subgraph centered on node x . Besides,the different sets of weights W d ( l ) r for different types of edgesand sum-aggregation adopted in Equation 3 can help to learnthe graph structures corresponding to information ﬂow andvariable adjacency, respectively. By learning different graphstructures and node features, we believe that the network candistinguish vulnerable and benign nodes.To train the model, we minimize the following cross-entropyloss on all labeled nodes: min θ L = − (cid:88) G ∈G |Y| (cid:88) i ∈Y K (cid:88) k =1 w k · y ik ln h ( L ) ik , (4)where G is a graph in the training set G , Y is the set ofnodes in our training samples. h ( L ) i is the output of BRGCNfor node i . Note that we used softmax function for the lastlayer. Thus h ( L ) i denotes the predicted class distribution fornode i with h ( L ) ik being the probability of node i belongingto class k , k = { , } . w k is the weight for class k and y ik denotes respective ground truth label for node i . We introduce w k in our loss function because class distribution in DFG+ isextremely imbalanced, i.e., the majority of the nodes are nega-tive nodes (benign nodes), while the positive nodes (vulnerablenodes) only take up a very small portion. To avoid the majorityclass dominate the loss function, we assign larger weight topositive class. θ = { W ( l )0 , W in ( l ) r , W in ( l ) r ; r ∈ R} Ll =1 is theset of the model parameters. After the loss is calculated ineach training epoch, backward propagation computes gradientof the loss function with respect to the trainable parameters θ ,then parameters are updated to minimize loss. Model Parameter Size and Time Complexity.

For simplicityof the analysis, we ﬁrst deﬁne the dimensionality of W in ( l ) r and W out ( l ) r as W in ( l ) r ∈ R d l × d l +1 and W out ( l ) r ∈ R d l × d l +1 ,where d l and d l +1 is the dimensionality of the node rep-resentation in the l -th layer and ( l + 1) -th layer, respec-tively. Since θ = { W ( l )0 , W in ( l ) r , W in ( l ) r ; r ∈ R} Ll =1 is the set of parameters for BRGCN, the model parameter size is O ( (cid:80) Ll =1 (cid:80) r ∈R d l · d l +1 ) = O ( (cid:80) Ll =1 d l · d l +1 · |R| ) .For the forward pass of BRGCN , the main time complexityin the l -th layer for node v i is the calculation of Equation 3,which is O ( (cid:80) r ∈R ( |IN ri | + |OUT ri | ) · d l · d l +1 ) . It is equivalentto O ( D i · d l · d l +1 ) , where D i = (cid:80) r ∈R ( |IN ri | + |OUT ri | ) is thesummation of the in-degree and out-degree of node v i . Thus,the time complexity of BRGCN for node v i is O ( D i · (cid:80) Ll =1 d l · d l +1 ) . Then, the computational cost of BRGCN for a

DFG+ graph is O ( (cid:80) i D i · (cid:80) Ll =1 d l · d l +1 ) , which is equivalent to O ( |E| (cid:80) Ll =1 d l · d l +1 ) , where E is the set of edge in DFG+ . Thecomplexity of the backward propagation via gradient descentis the same as the forward pass. Thus, the total cost in oneiteration is O ( |E| (cid:80) Ll =1 d l · d l +1 ) . E. Vulnerability Identiﬁcation

In this subsection, we demonstrate how the trained modelachieve the goal as proposed in Section III-A. In the trainingphase, we leverage source code of vulnerable programs, inputsthat can trigger the vulnerabilities, and the tools that wepresent in early subsection to generate labeled

DFG+ andsave the corresponding supporting data. Then we train themodel with the generated

DFG+ from the vulnerable execution.Through the forward prorogation (message passing) rulesdeﬁned in Equation 3, loss function deﬁned in Equation 4,and backward prorogation to update the trainable parametersin the model, we get an effective model which can predictlabels for nodes an given unlabeled

DFG+ .In the testing phase we apply the trained model to predictthe labels for unlabeled

DFG+ generated from binary-onlysoftware. Through the maps in the supporting data createdby the runtime analyzer, the vulnerable nodes in

DFG+ canbe mapped to corresponding instruction addresses in binarycode and execution trace. Since the execution trace containsthe address of memory operands, the address of corruptedvariable can also be identiﬁed. Note that in some cases thatone execution may trigger several silent BOF vulnerabilities,the vulnerable instructions and corrupted variables can beidentiﬁed separately.

F. Implementation

We implement our system on 32-bit Linux system with Intelx86 Instruction Set Architecture. The compiler plugin is builtbased on the LLVM-5.0.0 and its runtime library is built onruntime-rt-5.0.0. The plugin and runtime library consists of 82lines of new code and 2236 lines of new code, respectively,when comparing with implementation of ASAN. The dynamic11inary analyzer and graph constructor are developed on theIntel Pin 3.10., which consists of 9900 lines of C++ code intotal. The graph model consists of 800 lines of Python code,and is implemented base on the DGL v0.4.3 [63], an highperformance and salable Python package for deep learning ongraph typed data.VI. E

XPERIMENT AND R ESULTS

A. Data Generation and Preprocessing

We select 30 reproducible CVEs as shown in Table II froma repository of Linux vulnerabilities [64]. We generate threelabeled

DFG+ for each CVE, from three different executions:In the ﬁrst execution, we compile the program by the compilerplugin and ﬁnd an input to trigger the vulnerability. In thesecond execution, we change the input which overﬂow thebuffer with different length. In the third execution, we changethe length of vulnerable buffer in the program’s source code,recompile it to binary through our compiler plugin, and runthe modiﬁed program again. In all three executions, inputs areable to trigger the vulnerable without crashing the execution.Since the length of vulnerable buffer or input can be hardly bechanged in some programs, we ﬁnally get 86 labeled

DFG+ swith over 35 millions (35084810) nodes, of which only 6708nodes are positive.We observe that the constructed

DFG+ vary largely in size(number of nodes), from a few thousands to a few millions.It is impossible to ﬁt an entire

DFG+ into

BRGCN for end-to-end training, especially for those

DFG+ with more than 3millions of nodes. To alleviate this problem, we propose agraph cutting algorithm (the detaill of the Algorithm 1 is inthe appendices). In the cutting algorithm, we ﬁrstly cut a biggraph into several small graph by removing edges that connectdifferent sub-graphs, all the nodes in the sub-graphs is samplenodes . Secondly, we add n -hop neighbors to each sub-graphas supporting node , where n is the number of model layers.When training model on the sub-graphs, both supporting node and sample nodes are involved in forward propagation whereasonly the sample nodes was considered for calculating loss.As can be noticed above, the dataset is extremely imbal-anced – the ratio between positive and negative nodes isroughly 1/5230. To further reduce the number of negativenodes, we exclude all the r-node and i-node in samplenodes because BOF can only overwrite variables in mem-ory. We also exclude nodes without any incoming d-edge because the live variables associated with vulnerable nodesmust be written through invalid operation in BOF. After theexclusion, we are able to reduce the ratio to 1/659. Note thatby excluding we mean we won’t choose them as sample nodes in sub-graphs. Instead, we select them as supporting nodes ifthey are the neighbors of sample nodes to help classify the sample nodes in sub-graphs. Finally, we further reduce thenumber of negative through random sample. B. Evaluation

We experimented with different number of layers, size ofhidden states, and dropout rates to ﬁnd the best performingmodel. Currently

BRGCN has 4 layers including an input layerand an output layer and each layer has hidden states withdimension 16. 10 sets of parameters ( W ) are used for 5 typesfor edges (2 sets of parameters for each type).After we get the best conﬁgurations, we adopt 8-fold cross-validation to comprehensively evaluate the model. In each TABLE II: Information and testing results of each CVE cases. Vulnerability Information Analysis Result

CVE-ID Name Region Detected

CVE-2004-0597 pngslap stack (cid:51)

CVE-2004-1120 proz stack (cid:51)

CVE-2004-1255 2fax stack (cid:51)

CVE-2004-1257 abc2mtex stack (cid:51)

CVE-2004-1261 asp2php stack (cid:51)

CVE-2004-1262 bsb2ppm stack (cid:51)

CVE-2004-1275 html2hdml stack (cid:51)

CVE-2004-1278 jcabc2ps stack (cid:51)

CVE-2004-1279 jpegtoavi stack (cid:51)

CVE-2004-1287 nasm stack (cid:51)

CVE-2004-1288 o3read stack (cid:51)

CVE-2004-1289 pcal stack (cid:51)

CVE-2004-1290 pgn2web stack (cid:51)

CVE-2004-1292 ringtonetools stack (cid:51)

CVE-2004-1293 rtf2latex2e.bin stack (cid:51)

CVE-2004-1297 unrtf stack (cid:51)

CVE-2004-2093 rsync stack (cid:51)

CVE-2004-2167 latex2rtf stack (cid:55)

CVE-2005-0101 newspost stack (cid:51)

CVE-2005-3862 unalz stack (cid:51)

CVE-2005-4807 as-new stack (cid:51)

CVE-2007-1465 dproxy stack (cid:51)

CVE-2009-1759 ctorrent stack (cid:51)

CVE-2009-2286 compface stack (cid:51)

CVE-2009-5018 gif2png stack (cid:51)

CVE-2010-2891 smisubtree stack (cid:51)

EDB-890 psnup stack (cid:51)

EDB-9264 stftp stack (cid:51)

EDB-14904 fcrackzip stack (cid:51)

EDB-15062 rarcrack stack (cid:51) round of the cross-validation, we select 75%, 12.5% and12.5% of 86 graphs as training set, validation set and testingset. Table III presents the

Accuracy , Precision , Recall and F1 on the test set. Our model achieve 94.39% accuracy and94.18% F1 score on the sampled dataset. Since we are the ﬁrstone to analyze the silent vulnerability through deep learning,we cannot ﬁnd the similar works to compare. However, wewill compare our design with some other potential designs innext section.Then, we examine our model’s ability to identify vulnerableoperations in silent BOFs. Since a silent BOF will result inone or more vulnerable nodes, we can successfully locatethe vulnerable operation as long as one vulnerable node isidentiﬁed. As a result the vulnerabilities detection rate is muchbetter than the vulnerable node detection rate. Table II showsthe detection results when we map the vulnerable node intest phase to the executables. Due the limited numbers ofglobal/heap buffer overﬂow in vulnerability database, we didnot ﬁnd a reproducible one in our evaluation. But we modifyvulnerable stack buffers in several cases displaied in Table II toTABLE III: The overall performance of our proposed models. Fold Accuracy Precision Recall F1 fold-1 0.8871 0.9828 0.7703 0.8636fold-2 0.9623 1.0000 0.9298 0.9636fold-3 0.9712 0.9455 1.0000 0.9719fold-4 0.9072 0.9167 0.8958 0.9060fold-5 0.9617 0.9657 0.9574 0.9615fold-6 0.9503 0.9244 0.9821 0.9523fold-7 0.9359 0.8864 1.0000 0.9397fold-8 0.9757 0.9537 1.0000 0.9763Average 0.9439 0.9469 0.9419 0.9418

XPLAINABILITY

When designing

DFG+ and

BRGCN , we raise several insightsbased on our intuitions, in this section we try to explain theireffectiveness through several experiments. Accordingly, we putforward several evaluation questions: 1) Can sequenced modelsuch as RNN and LSTM solve the problem through analysison the instruction sequence directly? 2) If a homogeneousgraph, rather than a more complex relational graph, is enoughto classify vulnerable nodes in

DFG+ ? 3) Is

BRGCN deﬁned inEquation 3 more effective than RGCN deﬁned in Equation 2?4) Can

BRGCN effectively identify vulnerable nodes in tra-ditional data ﬂow graphs? 5) Can

BRGCN effectively identifyvulnerable nodes in

DFG+ with ID as node attributes? 6) Does

BRGCN really beneﬁt from training on multiple graphs?To answer these questions, we setup 4 groups of experi-ments. In the ﬁrst group of experiments, we ﬁrstly generateinstruction trace which includes executed instructions andaccess memory addresses, secondly split the instruction traceinto ﬁx-length sequence to make them end with memoryaccess instructions. Thirdly, if the last instruction of a sequenceresults in vulnerable operation in silent BOF, we label thissequence as vulnerable sequence, otherwise we label it asbenign sequence. Fourthly, we sample the same positive sam-ples and negative samples as that sampled in training

BRGCN .Finally, we adopt an open source implementation of Memory-Augmented RNN and LSTM [65] to classify execution tracesand the results are reported in Table IV. From the experimentresults, we can easily conclude that RNN and LSTM cannothelp to identify vulnerable operation by analyzing instructionsequence with access memory addresses.In the second group of experiments, we adopt ConvGNNand RGCN and train models on

DFG+ . In the ConvGNN,all types of edges are treated homogeneously and processedwith the same weight matrix W . The RGCN adopts differentpropagation rules for different edge types, and propagate nodefeatures along with the incoming direction of edges. Theexperiment results in Table IV shows that the performance of BRGCN is better than RGCN, and the performance of RGCNis better than ConvGNN. This indicates that adopting two setsof parameters for each type of edge is more effective than oneset of parameter for each type and a single set of parameterregardless of edge types.In the third group of experiments, we change the structureof

DFG+ . There are two variants: 1) graphs with only programruntime data ﬂow, and 2) graphs with nodes that are assignedunique IDs as node attributes. Then, we train

BRGCN modelon the two sets of modiﬁed graphs and display their resultin Table IV. From the results we know: 1) the

BRGCN cannot distinguish signiﬁcant difference between local graphstructures of invalid operations and other benign operations indata ﬂow graph only, the adoption of spatial information andother implicit information ﬂow plays an important role for thenode classiﬁcation problem and 2) when training a model ondifferent graphs, the adopting of node IDs as node attributesis harmful. TABLE IV: The performance comparison of different neuralnetworks and graph structures.

Group

Setting Accuracy Precision Recall F1 RNN 0.4977 0.4994 0.5557 0.5260LSTM 0.4948 0.4929 0.5136 0.5030 ConvGCN 0.7914 0.8105 0.7619 0.7616RGCN 0.8411 0.8699 0.8158 0.8175 BRGCN w/DF-Only 0.7001 0.6126 0.7702 0.6793

BRGCN w/Node-ID 0.7686 0.7769 0.7466 0.7577 BRGCN w/One-Program 0.5741 0.4839 0.3562 0.4215

In the last group of experiments, we try to train our modelon graphs generated from a single program and test it onother programs. The evaluation result shows that the modeltrained on a single program is signiﬁcantly worse than modeltrained on multiple programs. We conclude that by carefullydesigning the

DFG+ , our model can beneﬁt from differentprograms/graphs. It indicates common semantic features forBOF vulnerabilities are shared by different programs.VIII. L

IMITATIONS AND C ONCLUSION

Our approach suffers from several limitations. First, al-though we achieve high detection rate of silent BOFs, there isconsiderable false-positive predictions on nodes, meaning thatsome benign nodes are falsely classiﬁed as vulnerable nodes.This is inherent from the extreme imbalance of the numberof positive and negative nodes, which has a ratio as low as . . We have tried to down-sample the negative nodes andapply a class-weighted loss function during training, it is stillan issue because any single percentage drop of classiﬁcationprecision would lead to considerable false positive predictions.Second, although our model can test a graph very quickly (lessthan 0.2 seconds on average), there is an overhead in collectingprogram runtime data ﬂow and building DFG+ . Unless wecan trace program data ﬂow on the ﬂy, it is not practical todeploy our framework for detecting vulnerabilities in real timeproduction environment.In this paper, we design a novel graph data structure

DFG+ to represent the program’s runtime information ﬂowand variables’ spatial information. A runtime analyzer isimplemented to construct

DFG+ and the AddressSanitizer iscustomized to help label the nodes. We further propose

BRGCN to analyze

DFG+ and detect vulnerable nodes with 94.39%accuracy. Through mapping of the vulnerable nodes back tothe execution trace, we are able to locate the vulnerable pointsin the program at the binary level. We believe the

DFG+ andthe

BRGCN proposed in our work have wide applications.Our proposed scheme could be used in vulnerability analysisto help locate vulnerable point, in software patch to helpgenerate patches in the binary, in exploit generation to helpattack vulnerable software, and in software testing to help ﬁndsoftware bugs.Finally, we would like to suggest some future worksthat could supplement our approach. Some possible avenuesinclude applying the GNN-based approach to other silentvulnerable executions such as detecting buffer overread or onobfuscated programs.13

EFERENCES[1] L. Szekeres, M. Payer, T. Wei, and D. Song, “Sok: Eternal war inmemory,” in . IEEE,2013, pp. 48–62.[2] B. Liu, L. Shi, Z. Cai, and M. Li, “Software vulnerability discoverytechniques: A survey,” in , Nov 2012, pp. 152–156.[3] J. C. King, “Symbolic execution and program testing,”

Commun.ACM , vol. 19, no. 7, pp. 385–394, Jul. 1976. [Online]. Available:http://doi.acm.org/10.1145/360248.360252[4] I. Yun, S. Lee, M. Xu, Y. Jang, and T. Kim, “QSYM : Apractical concolic execution engine tailored for hybrid fuzzing,” in

Proceedings of the 2016 ACM SIGSAC Conferenceon Computer and Communications Security , ser. CCS ’16. NewYork, NY, USA: ACM, 2016, pp. 529–540. [Online]. Available:http://doi.acm.org/10.1145/2976749.2978340[6] A. Arora, R. Krishnan, R. Telang, and Y. Yang, “An empirical analysisof software vendors’ patch release behavior: Impact of vulnerabilitydisclosure,”

Information Systems Research , vol. 21, no. 1, pp. 115–132,2010. [Online]. Available: https://doi.org/10.1287/isre.1080.0226[7] M. Zalewski, “American fuzzy lop,” 2014.[8] I. Yun, S. Lee, M. Xu, Y. Jang, and T. Kim, “QSYM: A PracticalConcolic Execution Engine Tailored for Hybrid Fuzzing,” in

Proceedingsof the 27th USENIX Security Symposium (Security) , Baltimore, MD,Aug. 2018.[9] P. Chen and H. Chen, “Angora: Efﬁcient fuzzing by principled search,”in . IEEE, 2018,pp. 711–725.[10] S. Chen, J. Xu, E. C. Sezer, P. Gauriar, and R. K. Iyer, “Non-Control-Data Attacks Are Realistic Threats.” in

USENIX Security Symposium ,vol. 5, 2005.[11] K. Serebryany, D. Bruening, A. Potapenko, and D. Vyukov, “Address-sanitizer: A fast address sanity checker,” in

Presented as part of the2012 { USENIX } Annual Technical Conference ( { USENIX }{ ATC } ,2012, pp. 309–318.[12] R. W. Jones and P. H. Kelly, “Backwards-compatible bounds checkingfor arrays and pointers in c programs.” in AADEBUG . Citeseer, 1997,pp. 13–26.[13] L. Lam and T. Chiueh, “Checking array bound violation using seg-mentation hardware,” in , 2005, pp. 388–397.[14] Y. Shoshitaishvili, R. Wang, C. Salls, N. Stephens, M. Polino,A. Dutcher, J. Grosen, S. Feng, C. Hauser, C. Kruegel, and G. Vigna,“Sok: (state of) the art of war: Offensive techniques in binary analysis,”in , 2016, pp. 138–157.[15] A. Fioraldi, D. C. D’Elia, and L. Querzoni, “Fuzzing binaries formemory safety errors with qasan,” in . IEEE, 2020, pp. 23–30.[16] S. Dinesh, N. Burow, D. Xu, and M. Payer, “Retrowrite: Staticallyinstrumenting cots binaries for fuzzing and sanitization,” in . IEEE, 2020, pp. 1497–1511.[17] Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and S. Y. Philip, “Acomprehensive survey on graph neural networks,”

IEEE Transactionson Neural Networks and Learning Systems , 2020.[18] C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney,S. Wallace, V. J. Reddi, and K. Hazelwood, “Pin: building customizedprogram analysis tools with dynamic instrumentation,”

Acm sigplannotices

International Symposium on Code Generation and Optimization(CGO 2011) . IEEE, 2011, pp. 213–223.[25] B. J. Reed Hastings, “Purify: Fast detection of memory leaks and accesserrors,” in

In proc. of the winter 1992 usenix conference . Citeseer, 1991.[26] “Intel Parallel Inspector,” http://software.intel.com/en-us/intel-parallel-inspector/, Intel.[27] A. R. Hurson and K. M. Kavi, “Dataﬂow computers: Their historyand future,”

Wiley Encyclopedia of Computer Science and Engineering ,2007.[28] K. Kennedy,

A survey of data ﬂow analysis techniques . IBM ThomasJ. Watson Research Division, 1979.[29] J. Badlaney, R. Ghatol, and R. Jadhwani, “An introduction to data-ﬂowtesting,” North Carolina State University. Dept. of Computer Science,Tech. Rep., 2006.[30] E. C. R. Shin, D. Song, and R. Moazzezi, “Recognizing functions inbinaries with neural networks,” in { USENIX } Security Symposium( { USENIX } Security 15) , 2015, pp. 611–626.[31] Z. L. Chua, S. Shen, P. Saxena, and Z. Liang, “Neural Nets Can LearnFunction Type Signatures from Binaries,” in , 2017, pp. 99–116.[32] W. Guo, D. Mu, X. Xing, M. Du, and D. Song, “ { DEEPVSA } :Facilitating Value-set Analysis with Deep Learning for PostmortemProgram Analysis,” in , 2019, pp. 1787–1804.[33] M. Du, F. Li, G. Zheng, and V. Srikumar, “DeepLog: AnomalyDetection and Diagnosis from System Logs Through Deep Learning,”in Proceedings of the 2017 ACM SIGSAC Conference on Computerand Communications Security , ser. CCS ’17. New York, NY,USA: ACM, 2017, pp. 1285–1298. [Online]. Available: http://doi.acm.org/10.1145/3133956.3134015[34] W. Meng, Y. Liu, Y. Zhu, S. Zhang, D. Pei, Y. Liu, Y. Chen, R. Zhang,S. Tao, P. Sun, and R. Zhou, “Loganomaly: Unsupervised Detectionof Sequential and Quantitative Anomalies in Unstructured Logs,” in

Proceedings of the 28th International Joint Conference on ArtiﬁcialIntelligence , ser. IJCAI’19. AAAI Press, 2019, pp. 4739–4745.[Online]. Available: http://dl.acm.org/citation.cfm?id=3367471.3367702[35] Y. Bengio, A. Courville, and P. Vincent, “Representation learning: Areview and new perspectives,”

IEEE transactions on pattern analysisand machine intelligence , vol. 35, no. 8, pp. 1798–1828, 2013.[36] Z. Zhang, P. Cui, and W. Zhu, “Deep learning on graphs: A survey,”

IEEE Transactions on Knowledge and Data Engineering , 2020.[37] M. Balcilar, G. Renton, P. H´eroux, B. Gauzere, S. Adam, and P. Honeine,“Bridging the gap between spectral and spatial domains in graph neuralnetworks,” arXiv preprint arXiv:2003.11702 , 2020.[38] F. R. Chung and F. C. Graham,

Spectral graph theory . AmericanMathematical Soc., 1997, no. 92.[39] K. Xu, W. Hu, J. Leskovec, and S. Jegelka, “How powerful are graphneural networks?” arXiv preprint arXiv:1810.00826 , 2018.[40] J. Qiu, J. Tang, H. Ma, Y. Dong, K. Wang, and J. Tang, “Deepinf:Social inﬂuence prediction with deep learning,” in

Proceedings of the24th ACM SIGKDD International Conference on Knowledge Discovery& Data Mining , 2018, pp. 2110–2119.[41] Q. Tan, N. Liu, and X. Hu, “Deep representation learning for socialnetwork analysis,”

Frontiers in Big Data , vol. 2, p. 2, 2019.[42] H. Wang, F. Zhang, M. Zhang, J. Leskovec, M. Zhao, W. Li, andZ. Wang, “Knowledge-aware graph neural networks with label smooth-ness regularization for recommender systems,” in

Proceedings of the25th ACM SIGKDD International Conference on Knowledge Discovery& Data Mining , 2019, pp. 968–977.[43] A. Fout, J. Byrd, B. Shariat, and A. Ben-Hur, “Protein interfaceprediction using graph convolutional networks,” in

Advances in neuralinformation processing systems , 2017, pp. 6530–6539.[44] A. Nair, A. Roy, and K. Meinke, “funcgnn: A graph neural networkapproach to program similarity,” in

Proceedings of the 14th ACM/IEEEInternational Symposium on Empirical Software Engineering and Mea-surement (ESEM) , 2020, pp. 1–11.[45] S. Rawat and L. Mounier, “Finding buffer overﬂow inducing loops inbinary executables,” in . IEEE, 2012, pp. 177–186.[46] J. Newsome and D. Song, “Dynamic taint analysis for automaticdetection, analysis, and signaturegeneration of exploits on commoditysoftware.” in

NDSS , vol. 5. Citeseer, 2005, pp. 3–4.

47] J. Seward and N. Nethercote, “Using valgrind to detect undeﬁned valueerrors with bit-precision.” in

USENIX Annual Technical Conference,General Track , 2005, pp. 17–30.[48] N. Nethercote and J. Seward, “Valgrind: a framework for heavyweightdynamic binary instrumentation,”

ACM Sigplan notices , vol. 42, no. 6,pp. 89–100, 2007.[49] N. Stephens, J. Grosen, C. Salls, A. Dutcher, R. Wang, J. Corbetta,Y. Shoshitaishvili, C. Kruegel, and G. Vigna, “Driller: Augmentingfuzzing through selective symbolic execution.” in

NDSS , vol. 16, no.2016, 2016, pp. 1–16.[50] Y. Seo, M. Defferrard, P. Vandergheynst, and X. Bresson, “Structuredsequence modeling with graph convolutional recurrent networks,” in

International Conference on Neural Information Processing . Springer,2018, pp. 362–373.[51] E. Hajiramezanali, A. Hasanzadeh, K. Narayanan, N. Dufﬁeld, M. Zhou,and X. Qian, “Variational graph recurrent neural networks,” in

Advancesin neural information processing systems , 2019, pp. 10 701–10 711.[52] W. Hamilton, Z. Ying, and J. Leskovec, “Inductive representationlearning on large graphs,” in

Advances in neural information processingsystems , 2017, pp. 1024–1034.[53] C. Zhang, D. Song, C. Huang, A. Swami, and N. V. Chawla, “Het-erogeneous graph neural network,” in

Proceedings of the 25th ACMSIGKDD International Conference on Knowledge Discovery & DataMining , 2019, pp. 793–803.[54] M. Karasuyama and H. Mamitsuka, “Multiple graph label propagationby sparse integration,”

IEEE transactions on neural networks andlearning systems , vol. 24, no. 12, pp. 1999–2012, 2013.[55] A. Iscen, G. Tolias, Y. Avrithis, and O. Chum, “Label propagation fordeep semi-supervised learning,” in

Proceedings of the IEEE conferenceon computer vision and pattern recognition , 2019, pp. 5070–5079.[56] M. Rajpal, W. Blum, and R. Singh, “Not all bytes are equal: Neuralbyte sieve for fuzzing,” arXiv preprint arXiv:1711.04596 , 2017.[57] D. She, K. Pei, D. Epstein, J. Yang, B. Ray, and S. Jana, “Neuzz: Efﬁ-cient fuzzing with neural program smoothing,” in . IEEE, 2019, pp. 803–817.[58] K. Z. Snow, F. Monrose, L. Davi, A. Dmitrienko, C. Liebchen, andA.-R. Sadeghi, “Just-in-time code reuse: On the effectiveness of ﬁne-grained address space layout randomization,” in . IEEE, 2013, pp. 574–588.[59] U. Khedker, A. Sanyal, and B. Sathe,

Data ﬂow analysis: theory andpractice . CRC Press, 2017.[60] “Memory-augmented recurrent neural networks,” 2019. [Online].Available: https://github.com/suzgunmirac/marnns[61] “”compiler-rt” runtime libraries,” LLVM project, 2020. [Online].Available: https://compiler-rt.llvm.org[62] R. R. Heisch, “Method and system for reordering the instructions of acomputer program to optimize its execution,” Dec. 21 1999, uS Patent6,006,033.[63] M. Wang, D. Zheng, Z. Ye, Q. Gan, M. Li, X. Song, J. Zhou, C. Ma,L. Yu, Y. Gai, T. Xiao, T. He, G. Karypis, J. Li, and Z. Zhang, “Deepgraph library: A graph-centric, highly-performant package for graphneural networks,” arXiv preprint arXiv:1909.01315 , 2019.[64] D. Mu, “Linuxﬂaw,” 2019. [Online]. Available: https://github.com/mudongliang/LinuxFlaw[65] ““llvm-libc” c standard library,” LLVM project, 2020. [Online]. Avail-able: https://llvm.org/docs/Proposals/LLVMLibC.html A PPENDIX AG RAPH C UT A LGORITHM

Algorithm 1

Graph Cut AlgorithmI

NPUT : graph G ( N , E ); number of layer l ; number of sub-graph n ;O UTPUT : a set with m subgraph: C = { G i | (cid:54) i < n } , andthe IDs of sampled nodes S i in each subgraph G i ; initialize a set of n subgraph: C = { G i | (cid:54) i < n } ,where G i = ( N i , E i ), N i = ∅ , E i = ∅ ; divided nodes in G into m samples: S i | (cid:54) i < n ,satisfying | S i | (cid:54) (cid:100)|N | /m (cid:101) and ∪ n-1i=0 S i = N for i = n down to 1 do N i := S i for j = l down to 1 do for e ∈ E do /* src ( e ) , des ( e ) denotes the source node anddestination node of e */ if src ( e ) ∈ N i and des ( e ) / ∈ N i then add dst ( e ) to N i ; end if if src ( e ) / ∈ N i and des ( e ) ∈ N i then add src ( e ) to N i end if end for end for for e ∈ E do if src ( e ) ∈ N i or dst ( e ) ∈ N i then add e to E i ; end if end for end forend for