[PDF] Advanced Graph-Based Deep Learning for Probabilistic Type Inference

Abstract

Dynamically typed languages such as JavaScript and Python have emerged as the most popular programming languages in use. Important benefits can accrue from including type annotations in dynamically typed programs. This approach to gradual typing is exemplified by the TypeScript programming system which allows programmers to specify partially typed programs, and then uses static analysis to infer the remaining types. However, in general, the effectiveness of static type inference is limited and depends on the complexity of the program's structure and the initial type annotations. As a result, there is a strong motivation for new approaches that can advance the state of the art in statically predicting types in dynamically typed programs, and that do so with acceptable performance for use in interactive programming environments. Previous work has demonstrated the promise of probabilistic type inference using deep learning. In this paper, we advance past work by introducing a range of graph neural network (GNN) models that operate on a novel type flow graph (TFG) representation. The TFG represents an input program's elements as graph nodes connected with syntax edges and data flow edges, and our GNN models are trained to predict the type labels in the TFG for a given input program. We study different design choices for our GNN models for the 100 most common types in our evaluation dataset, and show that our best two GNN configurations for accuracy achieve a top-1 accuracy of 87.76% and 86.89% respectively, outperforming the two most closely related deep learning type inference approaches from past work -- DeepTyper with a top-1 accuracy of 84.62% and LambdaNet with a top-1 accuracy of 79.45%. Further, the average inference throughputs of those two configurations are 353.8 and 1,303.9 files/second, compared to 186.7 files/second for DeepTyper and 1,050.3 files/second for LambdaNet.

Full PDF

AAdvanced Graph-Based Deep Learning for ProbabilisticType Inference

FANGKE YE,

Georgia Institute of Technology, USA

JISHENG ZHAO,

Georgia Institute of Technology, USA

VIVEK SARKAR,

Georgia Institute of Technology, USADynamically typed languages such as JavaScript and Python have have emerged as the most popular pro-gramming languages in use today. However, when possible to do so, there are also important benefits thataccrue from including static type annotations in dynamically typed programs, e.g., improved documentation,improved static analysis of program errors, and improved code optimization. This approach to gradual typingis exemplified by the TypeScript programming system which allows programmers to specify partially typedprograms, and then uses static analysis to infer as many remaining types as possible. However, in general,static type inference is unable to infer all types in a program; and, in practice, the effectiveness of statictype inference depends on the complexity of the program’s structure and the initial types specified by theprogrammer. As a result, there is a strong motivation for new approaches that can advance the state of the artin statically predicting types in dynamically typed programs, and that do so with acceptable performance foruse in interactive programming environments.Previous work has demonstrated the promise of probabilistic type analysis techniques that use deep learningmethods such as recurrent neural networks and graph neural networks (GNNs) to predict types for variabledeclarations and occurrences. In this paper, we advance past work by introducing a range of graph-based deeplearning models that operate on a novel type flow graph (TFG) representation. The TFG represents an inputprogram’s elements as graph nodes connected with syntax edges and over-approximated data flow edges, andour GNN models are trained to predict the type labels in the TFG for a given input program.We study different design choices for our GNN-based type inference system for the 100 most common typesin our evaluation corpus, and show that our best GNN configuration for accuracy (

R-GNN

NS-CTX ) achievesa top-1 accuracy of 87.76%. This outperforms the two most closely related deep learning type inferenceapproaches from past work – DeepTyper with a top-1 accuracy of 84.62% and LambdaNet with a top-1accuracy of 79.45%. Alternatively, we can state the error (100% - accuracy) for

R-GNN

NS-CTX is 0.80 × thatof DeepTyper and 0.60 × that of LambdaNet. Further, the average inference throughput of R-GNN

NS-CTX is353.8 files/second, compared to 186.7 files/second for DeepTyper and 1,050.3 files/second for LambdaNet. Ifinference throughput is a higher priority, then the recommended model to use from our approach is the nextbest GNN configuration from the perspective of accuracy (

R-GNN NS ) which achieved a top-1 accuracy of86.89% and an average inference throughput of 1,303.9 files/second. In summary, our work introduces advancesin graph-based deep learning that yield superior accuracy and performance to past work on probabilistic typeanalysis, while also providing a range of GNN models that could be applicable in the future to other graphstructures used in program analysis beyond the TFG.Additional Key Words and Phrases: Static Analysis, Type Inference, Machine Learning Dynamically typed languages such as JavaScript and Python have emerged as some of the mostpopular programming languages in use today, as evidenced by their popularity ranks on GitHub. Compared with statically typed programs, dynamically typed programs are usually more succinctand easier to modify. However, when possible to do so, there are also important benefits that accruefrom including static type annotations in dynamically typed programs, e.g., improved documen-tation and static analysis of program errors. The classical approach to address this problem is toperform static type inference , usually accomplished by applying control and data flow analyses to The State of the Octoverse: https://octoverse.github.com/. a r X i v : . [ c s . P L ] S e p :2 Fangke Ye, Jisheng Zhao, and Vivek Sarkar infer the relevant type lattices for expressions in a program. However, in general, static type infer-ence is unable to infer all types in a dynamically typed program; and, in practice, the effectivenessof static type inference depends on the complexity of the program’s structure, e.g., use of implicitpolymorphic variables, object properties, and dynamic type evaluation. An alternate gradual typing approach is exemplified by the TypeScript programming system which allows programmers tospecify partially typed programs, and then uses static analysis to infer as many remaining types aspossible. While an improvement over a fully static approach, gradual typing is also encumberedby the inherent challenges faced by static analysis. As a result, there is a strong motivation fornew approaches that can advance the state of the art in statically predicting types in dynamicallytyped programs, and that do so with acceptable performance for use in interactive programmingenvironments.Previous work has demonstrated the promise of probabilistic type analysis techniques that usedeep learning methods to predict types for variable declarations and occurrences. One approachfrom past work, DeepTyper, formulates type prediction as a sequence tagging problem and trains arecurrent neural network (RNN) model to solve that problem [Hellendoorn et al. 2018]. Anotherapproach from past work, LambdaNet, employs an auxiliary analysis to infer type dependencesbetween program elements and extract them into a graphical representation, on which it trains agraph neural network (GNN) model to perform type prediction [Wei et al. 2019]. The introductionof these approaches is a natural follow-on to work from the past two decades on applying machinelearning to address a number of compilation-related problems, and that has been enabled by therapid growth of open source online code repositories such as the open-source projects hosted onGitHub. Examples of programming tasks that have been aided by the use of machine learninginclude property prediction, code search, anomaly detection, and code completion [Allamanis et al.2018].In this paper, we advance past work by introducing a GNN-based type inference system consistingof a range of GNN models operating on a novel type flow graph (TFG) representation. The TFGrepresents an input program’s elements as graph nodes connected with syntax edges and over-approximated data flow edges, and our GNN models are trained to predict type labels from theTFG of a given input program. The TFG is efficient to construct, and can be built via simplebottom-up traversals of the program’s abstract syntax tree. Our GNN models learn to propagatetype information on the graph and predict types for the graph nodes. We study different designchoices for our GNN-based type inference system for the 100 most common types in our evaluationcorpus, and show that our best GNN configuration for accuracy ( R-GNN

NS-CTX ) achieves a top-1accuracy of 87.76%. This outperforms the two most closely related deep learning type inferenceapproaches from past work – DeepTyper with a top-1 accuracy of 84.62% and LambdaNet with atop-1 accuracy of 79.45%. Alternatively, we can state the error (100% - accuracy) for

R-GNN

NS-CTX is 0.80 × that of DeepTyper and 0.60 × that of LambdaNet. Further, the average inference throughputof R-GNN

NS-CTX is 353.8 files/second, compared to 186.7 files/second for DeepTyper and 1,050.3files/second for LambdaNet. If inference throughput is a higher priority, then the recommendedmodel to use from our approach is the next best GNN configuration from the perspective of accuracy(

R-GNN NS ) which achieved a top-1 accuracy of 86.89% and an average inference throughput of1,303.9 files/second. Thus, our work introduces advances in graph-based deep learning that yieldsuperior accuracy and performance to past work on probabilistic type analysis, while also providinga range of GNN models that could be applicable in the future to other graph structures used inprogram analysis beyond the TFG.A summary of the contributions of this paper is as follows: • We propose a probabilistic analysis framework for type inference based on a family of GNNs . dvanced Graph-Based Deep Learning for Probabilistic Type Inference 1:3 • We introduce a lightweight type flow graph structure that efficiently captures type flowinformation from a program’s abstract syntax tree. • We propose several design choices for our family of GNNs, and evaluate their accuracy andefficiency as described above. • We compare our approach with two state-of-the-art deep learning based approaches forprobabilistic type inference (DeepTyper and LambdaNet) and show that our approach yieldssuperior accuracy and performances to both past approaches.The rest of this paper is organized as follows. Section 2 provides background on type inference andgraph neural networks. Section 3 introduces our GNN-based type inference framework, includinggraph construction and the model architecture. In Section 4, we evaluate our approach with differentdesign choices and compare it with previous work. Section 5 presents related work, and Section 6concludes the paper.

The goal of type inference is to automatically deduce the type of an expression in a program. Thededuced type can be either partially or fully, depends on the simplicity of target programminglanguage or the robustness of the type inference analysis.Static type inference algorithms infer type information via solving the type constraints collectedfrom the target program. A type constraint represents the aggregation and reduction of the typeinformation between expressions in the program. This allows type inference to be formulated as agraph analysis problem.

Graph neural networks (GNNs) are machine neural network models that directly operate on graph-structured data. A graph can be defined as G = ( V , E ) , where V is the set of nodes and E is theset of edges. The neighborhood of a node v is defined as N ( v ) = { u ∈ V | ( u , v ) ∈ E } . A node v can have attributes represented as a feature vector x v , and similarly, an edge ( u , v ) can haveattributes represented as a feature vector e uv . GNNs learn the representations of graph structures bypropagating the states of nodes to their neighbors iteratively and transforming the nodes states intooutput representations. Let h v be the state vector of node v and K be the number of propagationsteps, a general form of a GNN can be defined as: h ( ) v = x v v ∈ V (1) m ( k ) uv = f ( k ) message (cid:16) h ( k − ) u , e uv (cid:17) k = .. K , v ∈ V , u ∈ N ( v ) (2) a ( k ) v = f ( k ) aggregate (cid:16) h ( k − ) v , (cid:110) m ( k ) uv (cid:12)(cid:12)(cid:12) u ∈ N ( v ) (cid:111)(cid:17) k = .. K , v ∈ V (3) h ( k ) v = f ( k ) update (cid:16) h ( k − ) v , a ( k ) v (cid:17) k = .. K , v ∈ V (4)The state of each node h v is initialized with its feature vector x v in Eq. (1). At step k , a nodereceives messages from its neighbors through corresponding edges. A message vector m uv isgenerated from a neighbor’s hidden state h ( k − ) u at step ( k − ) and the feature vector of the edgewhere the message is passed through (i.e., e uv ), using the message function f ( k ) message in Eq. (2). The In this paper, we use the term propagation step to represent the computation in an iteration. A propagation step is alsocalled a GNN layer in the literature. Although there exists a more general form that allows edge features to be updated during the propagation [Battaglia et al.2018], the one we present here is sufficient to formulate all the GNN architectures we will discuss in this paper. :4 Fangke Ye, Jisheng Zhao, and Vivek Sarkar messages are then aggregated to a single vector using the aggregation function f ( k ) aggregate in Eq. (3).At the end of step k , the node’s state h ( k ) v is updated from its state at the previous step h ( k − ) v andthe aggregated message vector a ( k ) v using the update function f ( k ) update in Eq. (4).The functions f ( k ) message , f ( k ) aggregate , f ( k ) update are neural networks containing trainable parameters.Those functions can be the same and share parameters across different steps (i.e., f ( ) _ = f ( ) _ = . . . = f ( K ) _ ) or they can be different functions and use a separate set of parameters for each step.Following the taxonomy in [Wu et al. 2020], we use the term recurrent graph neural networks torefer to GNNs with shared functions and parameters for all steps, and convolutional graph neuralnetworks to refer to GNNs with separate functions and parameters for different steps. There has been a large body of work that leverages machine learning to assist static analysis. Bytheir tasks, those approaches can be categorized into the following categories:

Program Property Prediction.

In this class of tasks, the goal is to assign property labels to(part of) a program. The typical approach is to apply supervised learning to build a model thatpredict the property labels from a program’s representation. The program representations thathave been adopted by previous work include token sequences [Hellendoorn et al. 2018], abstractsyntax trees [Alon et al. 2019; Raychev et al. 2015], and some more complex graphs [Allamaniset al. 2017; Si et al. 2018].

Learning to Parametric Program Analysis.

To make a trade-off between precision and cost,parametric analyses have been applied in real-world solutions [Oh et al. 2014; Zhang et al. 2013].Unlike manual parameterization and heuristic parameter search, which rely on the expertise ofusers or heuristic developers, statistical learning approaches can learn from automatically generateddata and have shown a good capacity for finding the optimal configuration for parameters includingthe granularity of program abstraction [Liang and Naik 2011; Oh et al. 2015].

Program Specification Identification.

This scenario focus on mining program specificationfrom code corpora, including the API specifications [Chibotaru et al. 2019; Murali et al. 2017], thebehaviors of a library [Bastani et al. 2018] and code idioms [Allamanis and Sutton 2014]. Thesetechniques can been applied for detecting API misuses, performing code completion, etc.In this paper, we focus on type inference, which can be categorized as a program propertyprediction problem. We investigate the approach of using graph structure to present the inputprogram and building machine learning models to predict program properties. A GNN-basedlearning framework is designed to learn from graph-structured data extracted from code andpredict types for program elements.

This section describes our methodology that uses GNN based machine learning framework toperform probabilistic type inference. Section 3.1 describes the problem statement and the proposedlearning framework. Section 3.2 gives the definition of the graph we use as input to our GNNmodels. The design of the GNN architecture is introduced in Section 3.3.

We formulate type inference as a graph node label prediction problem. The prediction is at the filelevel for better scalability and flexibility. Our GNN-based type inference framework is shown inFig. 1. For each source file, we construct a graph whose nodes correspond to program elements dvanced Graph-Based Deep Learning for Probabilistic Type Inference 1:5 in the file. The graph edges include sytax edges extracted from the abstract syntax tree and over-approximated data flow edges derived from name-matching-based static analysis. A GNN model istrained to predict type for the graph nodes from a fixed type vocabulary. The target programminglanguage we choose is JavaScript, and the unit of program element for prediction is the identifier,which covers variables, function parameters, and object properties in JavaScript. Following themethodology of a previous work [Hellendoorn et al. 2018], we train the model on a dataset ofTypeScript programs. The type labels are obtained from the TypeScript compiler, and the graphsare constructed from the code without using any information from existing type annotations. T raining Project 1…

TypeScript Compiler AST BuilderType AnalysisGraph Extractor …… GNN Model Traning GNN Model

Type Flow Graph with Type LabelsType Flow Graph

TypeScript Compiler AST BuilderGraph Extractor GNN Model Infernce GNN Model … Type Flow Graph with Predicted Labels (a) Model Training(b) Type Inference

Training Project 2 T raining Project m TS Project m+1 … TS Project m+2TS Project m+3TS Project n

Validation

Project 1…

Validation

Project 2

Validation

Project n (a) Model Training. TS Project 1 … TypeScript Compiler AST BuilderType AnalysisGraph Extractor …… GNN Model Traning GNN Model

Type Flow Graph with Type LabelsType Flow Graph

TypeScript Compiler AST BuilderGraph Extractor GNN Model Infernce GNN Model … Type Flow Graph with Predicted Labels (a) Model Training(b) Type InferenceTS Project 2TS Project 3TS Project m T est Project 1…T est

Project 2T est

Project 3T est

Project p (b) Type Inference.Fig. 1. The graph neural network based type inference framework. A data-driven approach relies on high-quality data containing features that are relevant to thetarget problem. For the type inference problem, the relevant features include the source of typeinformation (e.g., literals), the code patterns that have implications about types, and the paths oftype propagation. Ideally, the input data to the learning model should at least cover those features.In this work, we defined a type flow graph (TFG) that carries those features mentioned above byencoding syntactic and approximate semantic information in a program.A TFG is a directed graph whose nodes represent program elements in the code and edges encodetype-relevant relationships between the nodes. It is constructed from a program’s abstract syntaxtree (AST). We give the definitions of the nodes and edges and how they can be extracted in thefollowing paragraphs.

Graph Node . The nodes in a TFG present the program elements that either carry types orprovide hints about types. They are categorized into the following node types: • Identifier node ( IdentNode ): It represents an identifier token in a program and corresponds toan occurrence of a named element in the program, including variables, object properties, andfunctions. • Token node ( TokNode ): It represents a non-identifier token that appear as leaves in an AST.Most of the token nodes in a program are literals. :6 Fangke Ye, Jisheng Zhao, and Vivek Sarkar • Expression node ( ExprNode ): It represents an expression in a program, including binary, unaryexpressions, function and variable declarations; • Variable symbol node ( VarSymNode ): It represents a variable and its occurrences within thelive range; • Object property node ( ObjPropNode ): It represents an object property name in a program. Itcan be viewed as an over-approximation of an object property symbol; • Context node ( CtxNode ): It represents the context where an expression appears. Every

CtxNode is associated with an expression node in the AST whose parent is a statement node.Each node has an associated feature. An

IdentNode ’s feature is its name and a

TokNode ’sfeature is its token type. A

CtxNode ’s feature is a ( statement_type , child_name ) pair, where statement_type is the type of a statement node in the AST and child_name is the name of a childnode of that statement node (e.g., for an if-statement node with an expression child node servingas its condition, the expression node in the TFG will be connected to a CtxNode with the feature(

IfStmt , condition )). The features of other nodes are their node types. IdentNode s and

ExprNode s are the units for type prediction.

TokNode s serve as sources of typeinformation since they contains literals that have known types. A

VarSymNode acts as a “hub” thathelps the type information exchange between different occurrences of the same variable. And an

ObjPropNode has a similar functionality for object properties that share the same name. A

CtxNode provides hints about the type of an expression based on its role in the enclosing statement.

Graph Edge . The functionality of graph edges is to establish information flow paths between thenodes. To enable efficient information exchange, we make all edges bi-directional, and each edgetype we describe below has a corresponding backward edge type in order to distinguish between aforward edge and a backward edge. The edges are categorized into the following types: • Expression edge ( ExpEdge ): It connects an

IdentNode / TokNode / ExprNode to an

ExprNode . Thesource and the destination of this edge should correspond to a child-parent pair in the AST. • Variable symbol edge ( VarSymEdge ): It connects an

IdentNode to a

VarSymNode . The

IdentNode must represents a variable whose symbol corresponds to the

VarSymNode . • Object property edge ( ObjPropEdge ): It connects an

IdentNode to an

ObjPropNode . The

IdentNode must represents an object property whose name corresponds to the

ObjPropNode . • Return edge ( RetEdge ): It connects an

ExprNode representing a returned expression to an

ExprNode representing its enclosing function declaration. • Call edge ( CallEdge ): It can connect an

ExprNode representing a returned expression to an

ExprNode representing a call expression who is a potential caller of the enclosing function ofthat returned expression. It can also connect an

ExprNode representing an argument in a callexpression to an

ExprNode representing a parameter of a function who is a potential callee ofthat call expression. • Context edge ( CtxEdge ): It connects a

CtxNode to an

ExprNode it is associated with.Each edge has an associated feature. The feature of an

ExpEdge , presented as ( c , p ), and its back-ward edge, ( p , c ), is a ( expression_type , child_name , direction ) tuple, where expression_type is the expression type of p , child_name is the child name of c in the AST, and direction is either f for a forward edge and b for a backward edge. For other edges, their features are the edge types.Fig. 2 gives an example of the TFG. Fig. 2a shows a piece of code and Fig. 2b displays its AST.The corresponding TFG is shown in Fig. 2c. To simplify the presentation, the edge features for ExpEdge s are not shown in this example. It can be seen that

ExpEdge s establish type information This is equivalent to applying a flow-insensitive and context-insensitive alias analysis for object properties, which isover-approximated since it only checks if properties’ names are the same. dvanced Graph-Based Deep Learning for Probabilistic Type Inference 1:7 function foo(a) { if (a.val) x = " Hello "; return x;}r.val = true ; let c = foo(r); (a) Code example.(b) Abstract syntax tree.(c) Type flow graph.Fig. 2. An example of TFG. flow paths within expressions. VarSymEdge s and

ObjPropNode s allows type information exchangebetween the different occurrences of the same variable and object properties.

RetEdge s allow typeinformation to flow between a function declaration and its return statements.

CallEdge s enable the :8 Fangke Ye, Jisheng Zhao, and Vivek Sarkar information exchange between call sites and their potential callees. CtxEdge s allow expressionnodes to get type hints from its context within a statement.

Graph Extraction . The graph extraction is based on the traversal of the an input program’sAST in a bottom-up manner. For each AST node that matches a

TFG node pattern, the graphextractor creates the corresponding TFG node and edges based on the definitions described above.For

CallEdge creation, the graph extractor runs a lightweight pre-pass to scan the program file tocollect function declaration information for later use.

Here we introduce our machine learning model design for static type inference. This learningmodel is presented as a customizable system composed of a core GNN model with different buildingblocks and several other components that converts node/edge features into vector representations.

As shown in Fig. 3, the learning system contains 6 componentsthat are organized into three phases: (1) feature embedding for nodes and edges; (2) GNN messagepropagation; and (3) label prediction.

1. Node feature embedding 2. Edge feature embedding3. Message4. Aggregation5. Update6. PredictionType flow graphFeature embedding phaseGNN message propagation phaseLabel prediction phaseType flow graph with type labels

Fig. 3. Overall model architecture.

The feature embedding phase (component 1 and 2) translates node/edge features in the typeflow graph to vector representations (i.e., embedding vectors). For edge feature embedding and thebasic design of node feature embedding (demonstrated in Fig. 4a), we assign a trainable vector toeach feature in the vocabulary. Section 3.3.2 discusses several alternative methods for embeddingidentifier names, which are the features of

IdentNode s. The call graph construction uses the same name matching mechanism as object properties’ and thus is also over-approximated. dvanced Graph-Based Deep Learning for Probabilistic Type Inference 1:9

In the GNN message propagation phase, the nodes feature embedding vectors are used as nodeinitial states and the edge embedding vectors are used to generate messages on the correspondingedges. The GNN iteratively propagates and updates node states for K steps using the message,aggregation, and update functions (components 3, 4, 5).In the label prediction phase (component 6), we use a fully-connected layer followed by a softmax function to transform a node’s final state after K steps of propagation into a probability distributionover candidate types and output the most probable type as the inferred type for a node. Identifier names can sometimes provide hints on the types ofassociated program elements. For example, a variable named “ num ” usually has an integral type.We design our model to utilize such type information by treating identifier names as node featuresof corresponding

IdentNode s and encode them into real-valued embedding vectors, which are usedas the initial states of those

IdentNode s. The GNN then iteratively propagates the type informationencoded in the state vectors to other nodes. Fig. 4 shows four methods of encoding identifier namesthat are adopted by our model. Fig. 4a is a basic method that uses a single embedding layer whichmaps each unique name to a trainable parameter vector. Figs. 4b and 4c demonstrate two techniquesto improve the name encoding, which will be described in detail in the following paragraphs. Thelast method shown in Fig. 4d is the combination of the two techniques. (a)(b) (c)(d)Fig. 4.

IdentNode state initialization. :10 Fangke Ye, Jisheng Zhao, and Vivek Sarkar

Name Segmentation.

The simple embedding method in Fig. 4a requires a fixed-size vocabulary ofnode identifier names. However, the size of such vocabulary can be huge due to the diversity ofidentifier names in programs, making it difficult to learn the information encoded in the names.Previous works have shown that segmenting identifier names can help reduce the vocabularysize and boost the performance of machine learning models in code modeling [Allamanis et al.2017; Karampatsis et al. 2020]. Similar to those prior works, we introduce a name segmentation mechanism shown in Fig. 4b. First, an identifier name is split into several segments. Then eachof the segments is embedded into a vector through an embedding layer. Finally, a bi-directionalRNN is applied to the sequence of segment embedding vectors of an identifier name to generatea single encoding vector, which is used as the initial state vector of the

IdentNode correspondingto the identifier name. We perform name segmentation by first splitting an identifier name intosubtokens according to the

CamelCase and snake_case patterns, and then further segmenting thesubtokens with byte pair encoding [Gage 1994; Sennrich et al. 2015].

Contextualized Initialization.

The context (surrounding tokens) of an identifier name can alsoprovide useful type information, and thus it could be beneficial to encode it into the initial state ofan

IdentNode . Although the propagation on the type flow graph can help gather information fromthe context, the size of such context is limited by the number of GNN propagation steps, whichis usually small. As shown in Fig. 4c, to encode the information from a broader range of contextfor

IdentNode s, we introduce a bi-directional RNN layer that takes the sequence of code tokenembedding vectors as input and generate an output vector for each token. Those output vectorsthat correspond to identifiers are used as the initial states of

IdentNode s. We name this RNN layeras the contextual layer . The GNN architecture we designed follows the generalframework described in Eqs. (1) to (4). We describe the instantiation of the framework in thefollowing paragraphs. Unless otherwise specified, we define the functions below as components ofrecurrent GNNs and thus do not distinguish between parameters in different propagation steps.

Message Function . The functionality of the message function is to produce a message vector m uv ∈ R d h that passed from u to its neighbor v that encodes the type information of u . In TFG,we introduce a feature for each edge that is embedded to vector e uv ∈ R d e . The edge featureembedding vector is integrated in to the message vector by the following message function: m ( k ) uv = W MO (cid:16)(cid:16) W MI h ( k − ) u + b MI (cid:17) ⊙ e uv (cid:17) + b MO v ∈ V , u ∈ N ( v ) (5) W MI ∈ R d e × d h , W MO ∈ R d h × d e , b MI ∈ R d e , and b MO ∈ R d h are learnable parameters; ⊙ stands forelement-wise multiplication.Alternatively, we can ignore edge features in the TFG and use the following identity function asthe message function: m ( k ) uv = h ( k − ) u v ∈ V , u ∈ N ( v ) (6) Aggregation Function . After the calculation of message vector m uv for all u ∈ N ( v ) , theaggregation function aggregates those message vectors for v into a vector a v ∈ R d h . A simplesolution is to apply a mean aggregation shown below: a ( k ) v = |N ( v )| (cid:213) u ∈N( v ) m ( k ) uv v ∈ V (7) dvanced Graph-Based Deep Learning for Probabilistic Type Inference 1:11 During type information propagation on the graph, it is likely the neighbors do not pass equallyimportant/useful information to a node. For example, the type information coming from a literalnode may be more informative than that coming from an identifier node. Consider this simpleexample: x = "hello" + y . The string literal node "hello" presents stronger information thanidentifier node y . Thus it could be beneficial to assign different weights to messages from differentneighbors during aggregation.To this end, we introduce an alternative aggregation process that contains an attention mechanismto weigh the importance of different incoming messages. It is adapted from the one in graph attentionnetworks [Velickovic et al. 2018] and is shown in the following equations: α ( k ) uv = exp (cid:16) LeakyReLU (cid:16) w ⊤ (cid:104) W QK h ( k − ) v ; W QK m ( k ) uv (cid:105) (cid:17)(cid:17)(cid:205) u ′ ∈N( v ) exp (cid:16) LeakyReLU (cid:16) w ⊤ (cid:104) W QK h ( k − ) v ; W QK m ( k ) u ′ v (cid:105) (cid:17)(cid:17) v ∈ V , u ∈ N ( v ) (8) a ( k ) v = (cid:213) u ∈N( v ) α ( k ) uv W V m ( k ) uv v ∈ V (9) W QK ∈ R d h × d h , W V ∈ R d h × d h and w ∈ R d h are parameters to be learned. LeakyReLU is anon-linear activation function [Xu et al. 2015]. Update Function . In type inference, the type information could be propagated many stepsacross multiple expressions or even procedure boundaries. To enable effective learning from thepotential long-term dependencies, our recurrent GNN architectures integrate the gated recurrentunit (GRU) [Cho et al. 2014] as our update function. This update function has also been adopted inother recurrent GNN architectures [Li et al. 2016]. The update function can be written as: h ( k ) v = GRU (cid:16) a ( k ) v , h ( k − ) v (cid:17) v ∈ V (10)Our GNN architecture can also be configured to be a convolutional GNN, i.e., parameters infunctions defined above do not share between propagation steps. In this case, the update functionwill not be a recurrent unit, but the one specified below: h ( k ) v = ReLU (cid:16) W ( k ) h h ( k − ) v + b ( k ) v (cid:17) v ∈ V (11) h ( k ) v ∈ R d h × d h and b ( k ) v ∈ R d h are learnable parameters of step k .As an exception, we do not update the states for TokNode s and

CtxNode s (i.e., their updatefunction is the identity function). This is because the type information they contain is known inadvance and thus no information update is needed.

In this section, we evaluate our GNN based type inference system to answer the following researchquestions: • RQ . : What are the impacts of our GNN model design choices on the accuracy of typeinference? • RQ . : How efficient is our GNN-based type inference? • RQ . : How does our approach compare with state-of-the-art deep learning type inferenceapproaches? :12 Fangke Ye, Jisheng Zhao, and Vivek Sarkar Experimental Platform . We conducted our experiments on a server that has two Intel XeonE5-2623 v4 processors with 128GB of RAM and an NVIDIA Tesla V100 GPU with 16GB of graphicsmemory. The neural network models were trained and evaluated on the GPU.

Dataset and Preprocessing . We constructed our dataset based on a corpus consisting of popularTypeScript projects in Github that were also used in past work on type inference for JavaScript [Hel-lendoorn et al. 2018]. We obtained the projects that are still publicly available from that set, and,following the methodology in [Hellendoorn et al. 2018], removed all files containing more than5,000 tokens so as to enable efficient batching. The resulting set of projects was then randomlysplit into training , validation and test subsets, which respectively contain 789, 99 and 99 projects, or74,801, 3,729 and 3,838 files.Our tool chain was used to construct a type flow graph (TFG) for each file. Fig. 5 shows thedistributions of the TFG size in the training, validation, and test sets, in terms of numbers of nodesand edges. Note that TypeScript type annotations are not used when constructing the TFGs. Training Set Validation Set Test Set0200400600800100012001400 N u m b e r o f N o d e s / E d g e s NodesEdges

Fig. 5. Distributions of the numbers of nodes/edges in a graph for the training, validation, and test sets whichcontain 74,801, 3,729 and 3,838 files/graphs respectively, shown as box-and-whiskers plots. Each box spansvalues from the first quartile ( Q ) to the third quartile ( Q ), with the median shown as a horizontal line inthe box. The endpoint whiskers indicate the values, Q − . × ( Q − Q ) and Q + . × ( Q − Q ) , for therespective datasets. To identify type labels in the training set, we used the TypeScript compiler to extract the typesfor all nodes in our TFGs, including identifier nodes and other expression nodes. Since TypeScriptpermits partial type annotations, these type labels can include types provided by the developer aswell as types inferred by the TypeScript compiler. The extracted types were then preprocessed withthe following steps: (1) function types are mapped to their return types; (2) the type parameters inparametric types are removed (e.g.,

Array becomes

Array ); (3) literal types are mapped totheir base types (e.g., the literal type "a"|"b" becomes string ); (4) types with names that consistof a single character are filtered out, as they usually represent type parameters (e.g., T , U , V ). Atype vocabulary was built after the preprocessing, and the models were trained to predict typesfrom this vocabulary. Similar to [Wei et al. 2019], we picked the top-100 frequent types from thetraining set, excluding the any type. We did not use a larger type vocabulary because our focus is dvanced Graph-Based Deep Learning for Probabilistic Type Inference 1:13 on predicting types that are used across different projects rather than types defined locally in aproject. Prediction of locally defined types is beyond the scope of this paper/We constructed fixed-sized vocabularies for graph edge features, non-name node features, andidentifier names (or name segments) for them to be embedded as trainable vectors in the neuralnetworks. We included all edge features in the training set in the edge vocabulary. The resultingedge feature vocabulary has a size of 210. We also constructed a vocabulary of non-name featuresin the same manner and its size is 184. For names, we constructed a vocabulary of full names whenwe do not perform name segmentation, and a vocabulary of name segments otherwise. The fullname vocabulary contains the 10,000 most frequent identifier names in the training set. We alsoinclude a special UNKNOWN token to represent out-of-vocabulary names. For name segmentation, wegenerated the segment vocabulary by performing up to 10,000 merges during byte pair encodingon the training set.When performing inference on files in the validation and test sets, we followed the methodologyadopted by prior work [Hellendoorn et al. 2018; Wei et al. 2019], and only used labels availablein type annotations provided by the developer (and ignored the types inferred by the TypeScriptcompiler). Doing so only assigns labels to identifier nodes. We then repeat steps (1) to (4) listedabove for these extracted types in the validation and test sets. We also removed edges with out-of-vocabulary edge features from the type flow graphs built from the validation and test sets.

Evaluated Model Designs . As mentioned in Section 3.3.3, we studied different GNN modeldesign choices, including the setup for node state initialization and the GNN architecture. Themotivation for evaluating different model designs is to gain insight as to which is the best learningsystem for type inference, and to answer RQ.1: “What are the impacts of our GNN model designchoices on the accuracy of type inference?”Table 1 lists the various GNN designs (along with their different design options) that are encom-passed by our evaluation.

R-GNN is our baseline model, which is a recurrent GNN (i.e., a GNN thatshares parameters across propagation steps) that uses the GRU update function and mean messageaggregation. It takes in edge features but does not perform name segmentation or contextualizedinitialization.

C-GNN is a variant of

R-GNN that does not share parameters across steps and usesthe update function defined in Eq. (11).

R-GAT extends

R-GNN with the attention-based messageaggregation function defined in Eqs. (8) and (9).

R-GNN NS introduces name segmentation in thenode initial state generation phase to R-GNN . Similarly,

R-GNN

CTX introduces the contextual layerto

R-GNN . R-GNN

NS-CTX adds both name segmentation and the contextual layer into

R-GNN . Weobserve that none of these eight GNN variants have been studied before in past work on typeinference.All of the GNN designs described above are based on the TFG with edge features. To evaluate theimpact of not including edge features in TFGs, and the effectiveness of the attention mechanismin such situations, we built two more models that do not take edge features as input. They are

R-GNN

NEF and

R-GAT

NEF , which correspond to

R-GNN and

R-GAT respectively.

Table 1. Designs of GNN Architectures.

C-GNN R-GNN R-GAT R-GNN NS R-GNN

CTX

R-GNN

NS-CTX

R-GNN

NEF

R-GAT

NEF

GNN Type Convolutional Recurrent Recurrent Recurrent Recurrent Recurrent Recurrent RecurrentAttention ✓ ✓

Name Segmentation ✓ ✓

Contextual Layer ✓ ✓

Edge Features ✓ ✓ ✓ ✓ ✓ ✓ :14 Fangke Ye, Jisheng Zhao, and Vivek Sarkar

Evaluation Metrics . We evaluated the models in terms of their type prediction accuracy andefficiency. The accuracy metrics include top-1 and top-5 accuracy for predicting types for the testset. Those two kinds of accuracy metrics are obtained by using the model to output the top-1 andtop-5 probable types for each prediction location and then computing how often the ground truthtype label matches the output type(s). We divided types in the vocabulary into 2 categories: the 10most frequent types in the training set and the other 90 types, and measured the top-1 and top-5accuracy for all 100 types and the two categories separately. The efficiency metrics we used are thenumber of parameters in the model and its throughput (files/sec) for inference.

Implementation Details . We used the TypeScript compiler to extract type labels and generateASTs for source code files in the dataset. Our TFGs were constructed from those ASTs. The neuralnetwork architectures used in our experiments were built using PyTorch [Paszke et al. 2019]. Inall experiments, we fixed the batch size to be 64 and trained the models for 60 epochs using theAdamW optimizer [Loshchilov and Hutter 2019] with a learning rate of 10 − . We evaluated themodel on the validation set after each epoch and used the model from the epoch that produced thelowest validation loss to be evaluated on the test set. We used 128-dimensional node type/nameembeddings, 32-dimensional name segment embeddings, 32-dimensional RNN hidden states forname segment sequence encoding, 256-dimensional edge type embeddings, 128-dimensional nodestates, and 128-dimensional RNN hidden states for both directions in the contextual layer. RQ . : What are the impacts of our GNN model design choices on the accuracy oftype inference? As discussed in Section 2.2, our GNN-based type inference system allows multiple design choicesacross the different components. We studied how they could affect the accuracy of type predictionby evaluating various combinations of those design choices on our dataset.

The GNN architecture includes a K -step iterativeprocess of propagating information on the input graph. This means a node can gather type informa-tion from nodes that are at most K hops away. Intuitively, a larger value of K would likely lead to ahigher type prediction accuracy since more information can be utilized for making the prediction.However, as the value of K increases, the model needs more computation resources. To select anappropriate value of K , we made an empirical study: trained the R-GNN model with values of K ranging from 2 to 12 and measured the resulting accuracy on the test set. The results are shown inFig. 6. As can be seen from the figure, the top-1 accuracy increases monotonically as the value of K gets larger, but begins to saturate around the point where K =

8. Based on this observation, wechose K = A recurrent GNN architecture uses the same setof functions (message, aggregation and update functions) and parameters for different steps toaggregate messages from neighbors and update node states, while a convolutional GNN does notshare parameters or even functions across steps. As a result, recurrent GNNs can learn to propagatethe same type of information between the nodes through several time-steps, and convolutionalGNNs usually learn a different kind of node representation at each step. We hypothesize thatrecurrent GNN architectures can fit the type inference problem better because it can potentiallymodel type propagation rules on TFGs. To validate this hypothesis, we compared the performanceof two architectures:

R-GNN , our baseline recurrent GNN, and its convolutional variant,

C-GNN . Asshown in Part 1 of Table 2,

R-GNN outperforms

C-GNN in all of the 6 accuracy metrics, indicatingthat our hypothesis holds true for the datasets used in our evaluation. However, since the difference dvanced Graph-Based Deep Learning for Probabilistic Type Inference 1:15 A cc u r a c y ( % ) Fig. 6. The impact of the number of propagation steps ( K ) on type prediction accuracy for the R-GNN model. in the accuracy metrics is small in some cases, further study may be warranted to better understandthe capability of both architectures on type inference as part of future work.

The attention mechanism in message aggrega-tion allows for assigning different importances to messages received from the neighborhood of anode, and thus can help filter out noise or irrelevant information coming from the neighbors. Herewe study it’s effectiveness when applied to our type inference framework. Based on the resultsshown in Table 2 parts 1 and 2, the precision of

R-GNN , which applies a mean aggregation (Eq. (7)),is better than

R-GAT , which applies an attention-based aggregation (Eqs. (8) and (9)), across allaccuracy metrics. This implies that attention did not result in a benefit for our framework on theinference problem studied in our evaluation.As the messages are the results of combining neighbors’ node states and edge features (seeEq. (5)), those edge features may help encode enough information of a message’s importance intoit, and the usefulness of an attention mechanism could be reduced due to this reason. To figure outthe impact of edge features an their interaction with the attention mechanism, we also comparedthe accuracy of models that do not take edge features as input. As shown in Table 2 part 2, thetwo models,

R-GNN

NEF and

R-GAT

NEF , which correspond to

R-GNN and

R-GAT , but with no edgefeature inputs, yield worse accuracy than their counterparts, indicating that edge features play animportant role in the message aggregation process. However,

R-GAT

NEF still fails to show significantadvantages over

R-GNN

NEF . This indicates that the attention mechanism may not be able to showits effectiveness on our TFG structure, no matter whether edge features are present or not.

There is a general belief that identifier names convey pro-gramers’ intuition related to types. The

IdentNode state initialization techniques discussed inSection 3.3.2 explore the potential of using identifier names and their context to help with type pre-diction. There are two improvements in node state initialization relative to the baseline method used :16 Fangke Ye, Jisheng Zhao, and Vivek Sarkar

Table 2. The accuracy comparison among various GNN designs and prior work.

All 100 Types Top-10 Frequent Types Other 90 TypesAccuracy

Top-1 Top-5 Top-1 Top-5 Top-1 Top-5Part 1: recurrent GNN vs. convolutional GNNC-GNN

R-GNN

Part 2: the impact of the attention mechanism and edge featuresR-GAT

R-GNN

NEF

R-GAT

NEF

Part 3: the impact of name segmentation and the contextual layerR-GNN NS R-GNN

CTX

R-GNN

NS-CTX

Part 4: state-of-the-art deep learning type inferenceDeepTyper

LambdaNet

R-GNN , which simply assigns a trainable embedding vector to a name and uses it as the initialstate for the corresponding identifier nodes.

R-GNN NS incorporated the first improvement – namesegmentation – into R-GNN , and

R-GNN

CTX added the second improvement – the contextual layer –to

R-GNN . R-GNN

NS-CTX combined the two improvements together. In Table 2 part 3, we comparedthe accuracy among the three improved models with

R-GNN . All of the three models producedhigher accuracy than

R-GNN , and the model that combines the two techniques (

R-GNN

NS-CTX ) gavebetter accuracy than the two models that only incorporated one of the techniques (

R-GNN NS and R-GNN

CTX ). This indicates that the combination of name segmentation and the contextual layercan effectively capture type-relevant information and produce high-quality name embeddings toserve as

IdentNode intial states, which can benefit GNN-based type prediction. RQ . : How efficient is our GNN-based type inference? For our approach to be applicable to real-world interactive programming environments, it isimportant for our techniques to be efficient in practice. To evaluate the efficiency of the modelsin our framework, We measured their sizes (i.e., the number of model parameters) and inferencethroughputs. The results are shown in Fig. 7. Here we only compare the models that take in edgefeatures.The number of parameters in a model affects its applicability in several ways. A larger model costsmore memory to be loaded and can sometimes bring more computation workload. The blue bars inFig. 7a shows the number of parameters of our GNN models.

C-GNN contains the largest number ofparameters due to its non-recurrent design that requires a separate set of parameters for each step.The size of

R-GAT is slightly larger than

R-GNN because of its attention mechanism that requiresextra parameters. Among the

R-GNN variants, the two with name segmentation are significantly dvanced Graph-Based Deep Learning for Probabilistic Type Inference 1:17 smaller than the others thanks to their smaller name segment embedding size (32-dimensional),although the contextual layer increases the model size by bringing in more parameters.Fig. 7b compares the computation efficiency of the models by showing their inference throughputs.This metric is computed as the number of source files in the test set divided by the total runtimeof performing inference on those files. The batch size used for inference is set to 64 graphs for allmodels. We took 6 measurements for each model and reported their mean and standard deviation.

C-GNN and

R-GNN are the two simplest models in terms of the architecture, requiring less computationduring inference, and thus produced the highest throughputs. The attention mechanism in

R-GAT adds additional computation overhead, causing it to produce a lower throughput than

R-GNN .Similarly, name segmentation and the contextual layer both make the models containing them tobe slower. It is worth noting that the addition of the contextual layer brings more computationoverhead. This is because it is based on a recurrent neural network, whose workload is determinedby the length of the input code token sequence. D ee p T y p e r L a m b d a N e t C - G N N R - G A T R - G N N R - G N N N S R - G N N C T X R - G N N N S _ C T X N u m b e r o f P a r a m e t e r s ( M i ll i o n ) (a) Number of parameters in a model. D ee p T y p e r L a m b d a N e t C - G N N R - G A T R - G N N R - G N N N S R - G N N C T X R - G N N N S _ C T X T h r o u g h p u t ( f i l e / s e c ) (b) Inference throughput on the test set. The errorbars show the standard deviation of 6 measurements.Fig. 7. The comparison of the model size and computation efficiency among our GNN architectures andmodels proposed in prior work. RQ . : How does our approach compare with state-of-the-art deep learning typeinference approaches? We compared our approach with two state-of-the-art deep learning type inference approaches:DeepTyper [Hellendoorn et al. 2018] and LambdaNet [Wei et al. 2019].DeepTyper treats a source file as a sequence of tokens and runs a two-layer bidirectional RNN onit to predict types for identifier tokens. To improve the consistency of type prediction for the samevariable, the model also includes a consistency layer between the two RNN layers that takes themean of hidden states for identifiers with the same name and adds it back to those hidden states.We implemented DeepTyper in our evaluation framework and used the same hyperparametersas suggested in their paper and open-sourced implementation (300-dimensional embeddings and650-dimensional hidden layers). LambdaNet is a GNN-based neural type inference model. It extracts a project-level type de-pendency graph through static analysis and uses a convolutional GNN to perform type inference https://github.com/DeepTyper/DeepTyper :18 Fangke Ye, Jisheng Zhao, and Vivek Sarkar for the graph nodes, each of which is associated with a unique variable or an expression. Wealso implemented LambdaNet in our evaluation framework. Unlike the original LambdaNet, ourimplementation gives a prediction for each occurrence of a variable instead of having one predictionper declaration, and uses a prediction space consisting of types in a fixed vocabulary. We didn’tchange the neural network architecture and used the hyperparameters suggested in their paperand open-sourced implementation (32-dimensional embeddings and hidden layers, 6 steps of graphpropagation). Table 2 part 4 shows the accuracy DeepTyper and LambdaNet achieved in our experiments.DeepTyper delivered a similar level of accuracy compared with

R-GNN , our baseline GNN model.However, it is outperformed by the three variant of

R-GNN that integrates name segmentationand/or the contextualized layer in terms of top-1 accuracy metrics. LambdaNet failed to reachthe same level of accuracy as

R-GNN . In terms of top-1 accuracy for predicting all 100 types, ourbest model,

R-GNN

NS-CTX , outperformed DeepTyper by 3.14% (absolute) and LambdaNet by 8.31%.Note that since we ported both prior approaches into our experimental framework, which containsdifferent data processing procedures, the accuracy numbers we obtained could not be directlycompared with the ones reported in their papers. This is especially the case for LambdaNet due toall the changes we made to it in our implementation.The orange bars in Fig. 7 shows the sizes and inference throughputs of DeepTyper and LambdaNet.As can be seen from Fig. 7a, DeepTyper contains significantly more parameters than other models.This is due to its large hidden state size. In contrast, LambdaNet employs a relatively small hiddenstate size, resulting in a more compact model. The sizes of our GNN models fall in between and arecloser to LambdaNet’s. Shwon in Fig. 7b, DeepTyper yielded the lowest throughput among all themodels being compared. This is because DeepTyper runs muliple layers RNN through the codetoken sequence which can be very long. Our models with the contextual layer (

R-GNN

CTX and

R-GNN

NS-CTX ) suffered from the same situation, but still gave higher throughputs than DeepTyper,because that the contextual layer used only single layer of RNN. Other four GNN models withoutthe contextual layer achieved higher throughputs than LambdaNet. The reason comes from thatLambdaNet generally contain more edges than our TFGs, thus requiring more computation resourceduring message passing and aggregation.

As mentioned earlier, there has been a significant amount of work during the past two decades onapplying machine learning to address a number of compilation-related problems, and this workhas been enabled by the rapid growth of open source online code repositories. We only discuss themost closely related efforts from past work in this section.JSNice [Raychev et al. 2015] is a program property predictor for variable name and type in-formation. It applies probabilistic reasoning to infer primitive types in JavaScript code. Its inputis a dependency network among variables in JavaScript. The trained model learns the statisticalcorrelations among the node in the networks extracted from the code corpus, and predicts typeinformation for variables by giving the rank of probabilities. More recently, a deep neural networkbased type inference mechanism, DeepTyper, was proposed [Hellendoorn et al. 2018]. Inspiredby past work on natural language processing, DeepTyper takes a tokenized source program asinput and produces type tokens for identifiers. Its actual architecture employs a bi-directionalgated recurrent unit metwork to capture a large context around each token. They also incorporatednaming information (e.g., variable names) into the type inference. LambdaNet [Wei et al. 2019] useda graph neural network to perform type inference. It leveraged an auxiliary analysis to help with https://github.com/MrVPlusOne/LambdaNet dvanced Graph-Based Deep Learning for Probabilistic Type Inference 1:19 building a dataflow-graph-like structure – the type dependence graph – as input, which establishesthe relationship among program elements. Section 4 included a detailed comparison of the accuracyand efficiency of our approach relative to DeepTyper and LambdaNet.The application of GNNs to program analyses can be seen in other recent work as well. Oneexample is the gated graph neural network (GGNN) [Li et al. 2016], which has been used to learnprogram representations [Allamanis et al. 2017]. The graph used in this work is based on anextension to the abstract syntax tree by adding few types of edges to represent extra informationincluding lexical order and variable usages. Their approach is applied to program property predictiontasks, e.g., variable misuse detection. [Brockschmidt et al. 2019] introduced a generative codemodeling approach, which is also based on GGNNs, and can be applied to program repair andvariable-misuse tasks. [Hellendoorn et al. 2019] used a combination of sequence layers and graphmessage-passing layers in their model to capture contextual information from the input program. Byleveraging more program context information using the sequence layers, their model outperformedthe earlier work in [Allamanis et al. 2017]. [Si et al. 2018] introduced a reinforcement learningsystem to infer loop invariants in an input program. A GNN was used to construct the structuralexternal memory representation for a program.Compared with past work, our approach introduces a lightweight GNN-based type inferencesystem that constructs a simple graph structure (the type flow graph) via an efficient traversal of theabstract syntax tree. To enhance the system’s accuracy and achieve high efficiency, we evaluatedseveral GNN design choices that can improve the information representation and computationefficiency, including recurrent update process, attention mechanism, and a contextual layer forimproving the graph node state initialization. Based on our neural architecture design approach,our GNN-based system shows a strong potential to be a viable candidate for lightweight analyseswith superior accuracy for use in a variety of interactive programming tasks. In this paper, we advanced past work by introducing a range of graph-based deep learning modelsthat operate on a novel type flow graph (TFG) representation. Our GNN based models learn topropagate type information on the graph and then predict types for the graph nodes. We studieddifferent design choices for our GNN-based type inference system for the 100 most common typesin our evaluation corpus, and show that our best GNN configuration for accuracy (

R-GNN

NS-CTX )achieves a top-1 accuracy of 87.76%. This outperforms the two most closely related deep learningtype inference approaches from past work – DeepTyper with a top-1 accuracy of 84.62% andLambdaNet with a top-1 accuracy of 79.45%. Alternatively, we can state the error (100% - accuracy)for

R-GNN

NS-CTX is 0.80 × that of DeepTyper and 0.60 × that of LambdaNet. Further, the averageinference throughput of R-GNN

NS-CTX is 353.8 files/second, compared to 186.7 files/second forDeepTyper and 1,050.3 files/second for LambdaNet. If inference throughput is a higher priority,then the recommended model to use from our approach is the next best GNN configuration fromthe perspective of accuracy (

Miltiadis Allamanis, Earl T. Barr, Premkumar Devanbu, and Charles Sutton. 2018. A Survey of Machine Learning for BigCode and Naturalness.

ACM Comput. Surv.

51, 4, Article 81 (July 2018), 37 pages. https://doi.org/10.1145/3212695 :20 Fangke Ye, Jisheng Zhao, and Vivek Sarkar

Miltiadis Allamanis, Marc Brockschmidt, and Mahmoud Khademi. 2017. Learning to Represent Programs with Graphs.

CoRR abs/1711.00740 (2017). arXiv:1711.00740 http://arxiv.org/abs/1711.00740Miltiadis Allamanis and Charles Sutton. 2014. Mining Idioms from Source Code. In

Proceedings of the 22Nd ACM SIGSOFTInternational Symposium on Foundations of Software Engineering (Hong Kong, China) (FSE 2014) . ACM, New York, NY,USA, 472–483. https://doi.org/10.1145/2635868.2635901Uri Alon, Meital Zilberstein, Omer Levy, and Eran Yahav. 2019. code2vec: learning distributed representations of code.

PACMPL

3, POPL (2019), 40:1–40:29. https://doi.org/10.1145/3290353Osbert Bastani, Rahul Sharma, Alex Aiken, and Percy Liang. 2018. Active learning of points-to specifications. In

Proceedingsof the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2018, Philadelphia, PA,USA, June 18-22, 2018 , Jeffrey S. Foster and Dan Grossman (Eds.). ACM, 678–692. https://doi.org/10.1145/3192366.3192383Peter Battaglia, Jessica Blake Chandler Hamrick, Victor Bapst, Alvaro Sanchez, Vinicius Zambaldi, Mateusz Malinowski,Andrea Tacchetti, David Raposo, Adam Santoro, Ryan Faulkner, Caglar Gulcehre, Francis Song, Andy Ballard, JustinGilmer, George E. Dahl, Ashish Vaswani, Kelsey Allen, Charles Nash, Victoria Jayne Langston, Chris Dyer, Nicolas Heess,Daan Wierstra, Pushmeet Kohli, Matt Botvinick, Oriol Vinyals, Yujia Li, and Razvan Pascanu. 2018. Relational inductivebiases, deep learning, and graph networks. arXiv (2018). https://arxiv.org/pdf/1806.01261.pdfMarc Brockschmidt, Miltiadis Allamanis, Alexander L. Gaunt, and Oleksandr Polozov. 2019. Generative Code Modelingwith Graphs. In .OpenReview.net. https://openreview.net/forum?id=Bke4KsA5FXVictor Chibotaru, Benjamin Bichsel, Veselin Raychev, and Martin T. Vechev. 2019. Scalable taint specification inference withbig code. In

Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation,PLDI 2019, Phoenix, AZ, USA, June 22-26, 2019 , Kathryn S. McKinley and Kathleen Fisher (Eds.). ACM, 760–774. https://doi.org/10.1145/3314221.3314648Kyunghyun Cho, Bart van Merrienboer, Dzmitry Bahdanau, and Yoshua Bengio. 2014. On the Properties of Neural MachineTranslation: Encoder-Decoder Approaches. In

Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semanticsand Structure in Statistical Translation, Doha, Qatar, 25 October 2014 , Dekai Wu, Marine Carpuat, Xavier Carreras, andEva Maria Vecchi (Eds.). Association for Computational Linguistics, 103–111. https://doi.org/10.3115/v1/W14-4012Philip Gage. 1994. A New Algorithm for Data Compression.

C Users J.

12, 2 (Feb. 1994), 23âĂŞ38.Vincent J. Hellendoorn, Christian Bird, Earl T. Barr, and Miltiadis Allamanis. 2018. Deep learning type inference. In

Proceedingsof the 2018 ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of SoftwareEngineering, ESEC/SIGSOFT FSE 2018, Lake Buena Vista, FL, USA, November 04-09, 2018 , Gary T. Leavens, AlessandroGarcia, and Corina S. Pasareanu (Eds.). ACM, 152–162. https://doi.org/10.1145/3236024.3236051Vincent J. Hellendoorn, Charles Sutton, Rishabh Singh, Petros Maniatis, and David Bieber. 2019. Global Relational Models ofSource Code. In . OpenReview.net. https://openreview.net/pdf?id=Hkx6hANtwHRafael-Michael Karampatsis, Hlib Babii, Romain Robbes, Charles Sutton, and Andrea Janes. 2020. Big Code!= Big Vocabulary:Open-Vocabulary Models for Source Code. arXiv preprint arXiv:2003.07914 (2020).Yujia Li, Daniel Tarlow, Marc Brockschmidt, and Richard S. Zemel. 2016. Gated Graph Sequence Neural Networks. In , Yoshua Bengio and Yann LeCun (Eds.). http://arxiv.org/abs/1511.05493Percy Liang and Mayur Naik. 2011. Scaling abstraction refinement via pruning. In

Proceedings of the 32nd ACM SIGPLANConference on Programming Language Design and Implementation, PLDI 2011, San Jose, CA, USA, June 4-8, 2011 , Mary W.Hall and David A. Padua (Eds.). ACM, 590–601. https://doi.org/10.1145/1993498.1993567Ilya Loshchilov and Frank Hutter. 2019. Decoupled Weight Decay Regularization. In

International Conference on LearningRepresentations . https://openreview.net/forum?id=Bkg6RiCqY7Vijayaraghavan Murali, Swarat Chaudhuri, and Chris Jermaine. 2017. Bayesian specification learning for finding API usageerrors. In

Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2017, Paderborn,Germany, September 4-8, 2017 , Eric Bodden, Wilhelm Schäfer, Arie van Deursen, and Andrea Zisman (Eds.). ACM, 151–162.https://doi.org/10.1145/3106237.3106284Hakjoo Oh, Wonchan Lee, Kihong Heo, Hongseok Yang, and Kwangkeun Yi. 2014. Selective context-sensitivity guidedby impact pre-analysis. In

ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI’14, Edinburgh, United Kingdom - June 09 - 11, 2014 , Michael F. P. O’Boyle and Keshav Pingali (Eds.). ACM, 475–484.https://doi.org/10.1145/2594291.2594318Hakjoo Oh, Hongseok Yang, and Kwangkeun Yi. 2015. Learning a strategy for adapting a program analysis via bayesianoptimisation. In

Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems,Languages, and Applications, OOPSLA 2015, part of SPLASH 2015, Pittsburgh, PA, USA, October 25-30, 2015 , JonathanAldrich and Patrick Eugster (Eds.). ACM, 572–588. https://doi.org/10.1145/2814270.2814309 dvanced Graph-Based Deep Learning for Probabilistic Type Inference 1:21

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin,Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Köpf, Edward Yang, Zachary DeVito, Martin Raison, AlykhanTejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An ImperativeStyle, High-Performance Deep Learning Library. In

Advances in Neural Information Processing Systems 32: AnnualConference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada ,Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d’Alché-Buc, Emily B. Fox, and Roman Garnett (Eds.).8024–8035. http://papers.nips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-libraryVeselin Raychev, Martin T. Vechev, and Andreas Krause. 2015. Predicting Program Properties from "Big Code". In

Proceedingsof the 42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2015, Mumbai, India,January 15-17, 2015 , Sriram K. Rajamani and David Walker (Eds.). ACM, 111–124. https://doi.org/10.1145/2676726.2677009Rico Sennrich, Barry Haddow, and Alexandra Birch. 2015. Neural Machine Translation of Rare Words with Subword Units.

CoRR abs/1508.07909 (2015). arXiv:1508.07909 http://arxiv.org/abs/1508.07909Xujie Si, Hanjun Dai, Mukund Raghothaman, Mayur Naik, and Le Song. 2018. Learning Loop Invariants for ProgramVerification. In

Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Pro-cessing Systems 2018, NeurIPS 2018, 3-8 December 2018, Montréal, Canada , Samy Bengio, Hanna M. Wallach, HugoLarochelle, Kristen Grauman, Nicolò Cesa-Bianchi, and Roman Garnett (Eds.). 7762–7773. http://papers.nips.cc/paper/8001-learning-loop-invariants-for-program-verificationPetar Velickovic, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. 2018. GraphAttention Networks. In . OpenReview.net. https://openreview.net/forum?id=rJXMpikCZJiayi Wei, Maruth Goyal, Greg Durrett, and Isil Dillig. 2019. LambdaNet: Probabilistic Type Inference Using GraphNeural Networks. In . OpenReview.net. https://openreview.net/pdf?id=Hkx6hANtwHZ. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and P. S. Yu. 2020. A Comprehensive Survey on Graph Neural Networks.

IEEETransactions on Neural Networks and Learning Systems (2020), 1–21.Bing Xu, Naiyan Wang, Tianqi Chen, and Mu Li. 2015. Empirical Evaluation of Rectified Activations in ConvolutionalNetwork.

CoRR abs/1505.00853 (2015). arXiv:1505.00853 http://arxiv.org/abs/1505.00853Xin Zhang, Mayur Naik, and Hongseok Yang. 2013. Finding optimum abstractions in parametric dataflow analysis. In