[PDF] Graph Neural Network to Dilute Outliers for Refactoring Monolith Application

Abstract

Microservices are becoming the defacto design choice for software architecture. It involves partitioning the software components into finer modules such that the development can happen independently. It also provides natural benefits when deployed on the cloud since resources can be allocated dynamically to necessary components based on demand. Therefore, enterprises as part of their journey to cloud, are increasingly looking to refactor their monolith application into one or more candidate microservices; wherein each service contains a group of software entities (e.g., classes) that are responsible for a common functionality. Graphs are a natural choice to represent a software system. Each software entity can be represented as nodes and its dependencies with other entities as links. Therefore, this problem of refactoring can be viewed as a graph based clustering task. In this work, we propose a novel method to adapt the recent advancements in graph neural networks in the context of code to better understand the software and apply them in the clustering task. In that process, we also identify the outliers in the graph which can be directly mapped to top refactor candidates in the software. Our solution is able to improve state-of-the-art performance compared to works from both software engineering and existing graph representation based techniques.

Full PDF

GGraph Neural Network to Dilute Outliers for Refactoring Monolith Application

Utkarsh Desai, Sambaran Bandyopadhyay, Srikanth Tamilselvam

IBM [email protected], [email protected], [email protected]

Abstract

Microservices are becoming the defacto design choice forsoftware architecture. It involves partitioning the softwarecomponents into ﬁner modules such that the development canhappen independently. It also provides natural beneﬁts whendeployed on the cloud since resources can be allocated dy-namically to necessary components based on demand. There-fore, enterprises as part of their journey to cloud, are increas-ingly looking to refactor their monolith application into oneor more candidate microservices; wherein each service con-tains a group of software entities (e.g., classes) that are re-sponsible for a common functionality. Graphs are a naturalchoice to represent a software system. Each software entitycan be represented as nodes and its dependencies with otherentities as links. Therefore, this problem of refactoring canbe viewed as a graph based clustering task. In this work, wepropose a novel method to adapt the recent advancements ingraph neural networks in the context of code to better under-stand the software and apply them in the clustering task. Inthat process, we also identify the outliers in the graph whichcan be directly mapped to top refactor candidates in the soft-ware. Our solution is able to improve state-of-the-art perfor-mance compared to works from both software engineeringand existing graph representation based techniques.

Microservices is an architectural style that structures an ap-plication as a set of smaller services (Lewis and Fowler2014). These services are built around business function-alities and follow “Single Responsibility Principle” . Thisallows the team to develop business functionalities indepen-dently. Also, they naturally beneﬁt from cloud deploymentdue to the support for differential and dynamic addition ofresources like CPU, memory, disk space to speciﬁc servicesbased on the demand. However, there are lot of existingmonolith applications currently in use that cannot fully tapthese beneﬁts due to their architecture style. Monoliths pack-age all the business functionalities into a single deployableunit making them unsuitable to fully leverage cloud ben-eﬁts. Therefore, there is a surge in enterprises wanting torefactor their monolith applications into microservices. This is done by mapping business functions onto the code struc-ture and identifying the functional boundary in such a waythat there are less dependencies across the services (Jin et al.2019). In typical monoliths, there are classes (or programs)loaded with overlapping functionalities (Kalske et al. 2018).This can be identiﬁed by their dependencies with cross func-tional classes. We refer to such classes as outliers or refac-torable candidates. They typically require top attention fromthe developers for modiﬁcation during refactoring to makethe microservices independent and deployable. But identi-fying functional boundaries on the existing code is a hardtask (Gouigoux and Tamzalit 2017) and the effort gets mul-tiplied when done without the help of original developers,which is typically the case with legacy applications.In the software engineering community, the problem isoften referred as software decomposition and several ap-proaches (Fritzsch et al. 2018) have been proposed. The ap-proaches range from process mining, genetic algorithms tograph based clustering. Graphs are a natural way to representapplication implementation structure. The classes in the ap-plication can be considered as nodes and its interaction withthe other classes can be considered as edges. Further, thenodes can carry multiple features based on their type andtheir invocation pattern. Figure 1 demonstrates the transla-tion of an application into a graph. Therefore, the applicationrefactoring problem can be viewed as a graph based cluster-ing task. In the past, many clustering techniques have beenapplied on code (Shtern and Tzerpos 2012), but they oftenconsider only the structural features of the application i.e thedependency of classes. Also, none of these approaches havelooked into attributed graph networks or attempted to mini-mize the effect of outlier nodes during clustering.Graph based mining tasks have received signiﬁcant atten-tion in recent years due to development of graph represen-tation learning that maps the nodes of a graph to a vectorspace (Perozzi, Al-Rfou, and Skiena 2014; Hamilton, Ying,and Leskovec 2017). They have also been applied to a di-verse set of applications such as social networks (Kipf andWelling 2017), drug discovery (Gilmer et al. 2017), trans-portation and trafﬁc networks (Guo et al. 2019), etc. In thiswork, we propose a novel graph neural network based solu-tion to refactor monolith applications into a desired numberof microservices. The main contributions of our paper arelisted below. a r X i v : . [ c s . S E ] F e b . We propose a novel way to translate the application im-plementation structure to an attributed graph structurethrough static program analysis.2. We introduce two types of outliers that reﬂect the toprefactoring program candidates.3. We propose a novel graph neural network (GNN), re-ferred as CO-GCN (Clustering and Outlier aware GraphConvolution Network), which uniﬁes node representa-tion, outlier node detection & dilution and node clusteringinto the same framework for refactoring monolith appli-cations.4. We improve the state-of-the-art performance with respectto both software engineering and graph representationbased techniques to refactor four publicly available mono-lith applications. We attach the source code in the supple-mentary material for reproducibility of the results. Fritzsc et al. (Fritzsch et al. 2018) presented a survey on tendifferent approaches towards refactoring a monolith appli-cation into microservices. Of these, only four works wereapplied directly on application code and the rest used otherapplication artefacts such as logs, commit histories, UMLdiagrams etc. However, all of these works have drawbackssince they either (1) focus on only structural features; or (2)propose partitions focusing more on technical layers whichis not desirable (Taibi and Lenarduzzi 2018); or (3) partitiononly a subset of program ﬁles like EJBs in java. (Mazlami,Cito, and Leitner 2017) proposed a graph based clusteringapproach with a focus on version history. (Jin et al. 2019)proposed hierarchical clustering of program ﬁles, but re-quires access to the runtime behavior of application which ispractically difﬁcult. Moreover, these approaches do not ex-ploit the power of representation learning and graph neuralnetworks. Also, they do not recommend refactorable classes.Graph representation learning (Hamilton, Ying, andLeskovec 2017) shows promising results on multiple down-stream graph mining tasks. Graph neural networks (Wuet al. 2020) apply neural network directly on a graph struc-ture. In Graph convolution networks introduced by (Kipfand Welling 2017), a localized ﬁrst-order approximationof spectral graph convolutions is proposed and experi-mented for semi-supervised node classiﬁcation. An unsu-pervised variation GCN autoencoder is proposed in (Kipfand Welling 2016). GNNs are also proposed for super-vised (Chen, Li, and Bruna 2019) and unsupervised com-munity detection (Zhang et al. 2019) in graphs. Recently,a self-supervised learning based GNN, Deep Graph Info-max (DGI) (Veliˇckovi´c et al. 2019) is proposed for obtain-ing node representation by using the principle of informa-tion maximization. Outlier nodes are present in any real-world graph. Outliers are shown to have adverse effect onthe embeddings of regular nodes in a graph (Liang et al.2018). Unsupervised algorithms to minimize the effect ofoutliers in the framework of graph representation learningare proposed recently (Bandyopadhyay, Lokesh, and Murty Code available at: https://github.com/utkd/cogcn

Figure 1: Representation of a sample Java application asgraph: The method order() from class A invokes the method set() from class B , establishing a direct relation between thetwo classes. If we represent classes A and B as nodes in agraph, we can deﬁne a directed edge, e ( A, B ) from A to B .2019; Bandyopadhyay et al. 2020; Bandyopadhyay, Vivek,and Murty 2020). However, minimizing the effect of outliersin the framework of GNN has not been addressed in the lit-erature. Given a monolith application, we want to partition themonolith into K clusters of classes, with K provided by asubject matter expert (SME), where each cluster is a groupof classes that perform a well-deﬁned functionality. Theclusters should exhibit high cohesion, i.e., have strong in-teraction within the cluster and low coupling i.e., less inter-action between clusters. We also want to identify the follow-ing outlier classes from a monolith application (Bandyopad-hyay, Lokesh, and Murty 2019) to be handled by an SME.• Structural Outlier : A class which has high interactionwith classes from different clusters.•

Attribute Outlier : A class which has attributes, such asusage patterns, similar to attributes from other clusters.

We now describe our approach to represent an applicationas a graph, given it’s source code. Consider a simple Javaapplication comprising of multiple classes as shown in Fig-ure 1. Each class in the application can be represented as anode in a graph. We denote the set of such nodes as V . Weestablish a directed edge from node A to node B if there ismethod in the class A that calls a method from class B. Weperform static analysis of the application code to identifyall such method calls between classes and obtain a set ofedges, E between the corresponding nodes. The edges areunweighted and multiple method calls from class A to classB are still represented by a single edge from A to B.We now describe the process to generate the attribute ma-trix, X , coressponding to the nodes V of the graph. Mostmodern web applications expose multiple APIs that performvarious functions. These APIs (UI elements in the case of https://github.com/soot-oss/soot non web-based application) are referred to as EntryPointSpeciﬁcations (Dietrich, Gauthier, and Krishnan 2018), orsimply, Entrypoints . The methods invoked through theseAPIs are specially annotated as such and are called entry-point methods in this work. Figure 1 shows an example ofsuch entrypoint methods annotated with @API . We refer tothe classes containing such entrypoint methods as entrypointclasses . Each entrypoint class can thus be associated withmultiple Entrypoints. Starting from an entrypoint method,we can follow the call sequence of methods through the ap-plication, keeping track of all classes invoked during the ex-ecution trace of that Entrypoint. If P is the set of Entry-points in an application, we can deﬁne a matrix EP | V |×| P | ,such that EP ( i, p ) = 1 if class i is present in the execu-tion trace of entrypoint p , else . Additionally, we deﬁne C | V |×| V | such that C ( i, j ) is the number of Entrypoint exe-cution traces that contain both classes i and j . If a class is notinvoked in an execution trace for any Entrypoint, we removethe corresponding node from the graph. Finally, classes mayalso inherit from other classes or Interfaces. In Figure 1,class A inherits from class Base . Although this establishesa dependency between the classes, it does not involve directmethod invocation. Hence, we do not include this depen-dency as an edge in the graph, but as a node attribute. There-fore, we set In ( i, j ) = In ( j, i ) = 1 if classes i and j arerelated via an inheritance relationship and otherwise. Theattribute matrix X is the concatenation of EP , C and In matrices. Thus, X ∈ R | V |× F where F = | P | + 2 | V | . Eachconstituent of X is row-normalized individually. The appli-cation can thus be represented as a graph G = ( V, E, X ) . Given the graph G , we want to develop a graph neural net-work which can jointly (i) derive vector representations (em-beddings) of the nodes, (ii) minimize the effect of outliernodes in the embeddings of other regular nodes, (iii) obtaincommunities in the graph. Let us use A ∈ R | V |×| V | to de-note the adjacency matrix of G , where A ij is the weight ofthe edge e ij if it exists, otherwise A ij = 0 . We use a 2-layered graph convolution encoder (Kipf and Welling 2017)to obtain representation of each node as shown below: Z = f ( X, A ) =

ReLU ( ˆ A ReLU ( ˆ

AXW (0) ) W (1) ) (1)where each row of Z ∈ R | V |× F (cid:48) contains the correspond-ing node representation. We compute ˜ A = A + I , where I ∈ R | V |×| V | is the identity matrix and the degree diag-onal matrix ˜ D ii with ˜ D ii = (cid:80) j ∈ V ˜ A ij , ∀ i ∈ V . We set ˆ A = ˜ D − ˜ A ˜ D − . W (0) and W (1) are the trainable param-eter matrices of GCN encoder. Traditionally, these param-eters are trained on a node classiﬁcation or link predictionloss (Kipf and Welling 2016) in a graph.However, our objective in the work is to consider and min-imize the effect of outlier nodes in the framework of graphconvolution. We also want to do this in an unsupervised wayas obtaining ground truth class labels and outlier informationare extremely difﬁcult for monolith applications. Towards this, we use the following GCN based decoder to map the F (cid:48) dimensional node embeddings to the input feature space. ˆ X = f ( Z, A ) =

ReLU ( ˆ A ReLU ( ˆ

AZW (2) ) W (3) ) (2)Here, ˆ X ∈ R F , W (2) and W (3) are the trainable param-eters of the decoder. Let us use W = { W (0) , · · · , W (3) } to denote the parameters of the encoder and decoder com-bined. In the ideal world scenario when there is no outliernode present in a graph, one can train the parameters of theGCN autoencoder by directly minimizing some reconstruc-tion loss. But as mentioned in Section 1, the presence of out-liers in monolith applications is prevalent and if not handledproperly, they can adversely affect the embeddings of reg-ular regular nodes in a graph (Bandyopadhyay et al. 2020).To address them, we use the framework of multi-task learn-ing where we design two loss components to detect struc-tural and attribute outliers respectively. We denote structuraland attribute outlierness (positive scalars) by O si and O ai respectively, for each node i ∈ V .First, we ensure that presence of an edge should be pre-served by the similarity of the two corresponding node em-beddings in the vector space for the regular nodes. However,structural outliers being inconsistent in their link structure,do not necessarily follow this assumption. Hence, we designthe following loss component which needs to be minimizedwith respect to the parameters of GCN and structural out-lierness of the nodes: L str = (cid:88) i ∈ V log (cid:16) O si (cid:17) || A i : − ZZ Ti : || (3)Here, A i : is the i th row of the adjacency matrix and Z i : isthe i th row (embedding of node i ) of the node representa-tion matrix. Clearly, higher the value of O si , i.e., higher theoutlierness of node i , less will be the value of log (cid:16) O si (cid:17) .Consequently, contribution of the structural outlier nodes inthis loss component will be less. We also assume that to-tal structural outlierness in a graph is bounded. So we set (cid:80) i ∈ V O si = 1 . Without such a bound, the optimization inEquation 3 would reach to a degenerate solution with each O si assigned to + ∞ at the inﬁmum. We also tried replacing with a hyperparameter µ as the bound, but that does nothave much impact on the quality of the ﬁnal solution.Next, to preserve the impact of node attributes in thenode representations, we want the reconstructed attributesin Equation 2 from the GCN decoder to match the initialnode attributes for most of the regular nodes in the graph.However for attribute outliers, as their node attributes aresigniﬁcantly different from the attributes of their respectiveneighboring nodes, we reduce their contribution in the at-tribute reconstruction loss as follows: L att = (cid:88) i ∈ V log (cid:16) O ai (cid:17) || X i : − ˆ X i : || (4)Here, X and ˆ X are the given and reconstructed node fea-ture matrices. Similar to the case of structural outlierness,nodes with more attribute outlierness score O ai would haveess impact in Equation 4 and consequently the optimizerwill be able to focus more on the regular nodes of the graph.Again, we assume that O ai > , ∀ i ∈ V and (cid:80) i ∈ V O ai = 1 .Minimizing the loss components in Equations 3 and 4with respect to the parameters of GCN and outlier scoreswould be able to provide unsupervised node embeddings.This will also detect the outlier nodes while minimize theirnegative impact on the other nodes of the graph. Howeveras discussed in Section 1, our main goal in this work is toseparate microservices within a monolith application. Thisneeds discovering clusters of nodes (or communities) in thegraph. One can potentially obtain the node embeddings ﬁrstand then use a clustering algorithm (for example, k-means++(Arthur and Vassilvitskii 2006)) as a post-processing step.But such a decoupled approach often leads to a suboptimalsolution as shown in (Yang et al. 2017). Hence, we integratenode embedding, outlier detection and node clustering in ajoint framework of graph neural network. To achieve this,we use the following loss to cluster the nodes in the graph,assuming their embeddings are already given. L clus = N (cid:88) i =1 K (cid:88) k =1 M ik || Z i : − C k || (5)where M ∈ { , } | V |× K is the binary cluster assignmentmatrix. We assume to know the number of clusters K . M ik = 1 if node i belongs to k th cluster and M ik = 0 otherwise. C k ∈ R F (cid:48) is the center of each cluster in the em-bedding space. Equation 5 needs to be minimized with re-spect to M and C = [ C · · · C K ] T to obtain the clustering.We call this method CO-GCN (Clustering and Outlier awareGraph Convolution Network) and the joint loss function is:min W , O ,M,C L total = α L str + α L att + α L clus (6)such that, (cid:88) i ∈ V O si = (cid:88) i ∈ V O ai = 1 (7) M ∈ { , } | V |× K , O si , O ai > ∀ i ∈ V (8) The nature of the optimization problem in Eq. 6 is differentwith respect to different variables. We use alternate mini-mization technique, where we minimize the objective onlywith respect to one set of variables, keeping others ﬁxed.

Parameters of GCN

The set W contains all the parame-ters of the GCN encoder and decoder as described in Section3. We use standard ADAM optimization technique (Kingmaand Ba 2014) to minimize the total loss w.r.t. W , keepingother variables ﬁxed. We use an initial learning rate of . and exponential decay rate of . every iterations. Outliers

One can show that optimization in Equation 6 isconvex with respect to each outlier variable when all othervariables are ﬁxed. This is because < O si , O ai ≤ , ∀ i and log( · ) is a concave function and thus, − log( · ) is convex. Fi-nally, L2 norms in both Equations 3 and 4 are non-negative.We aim to ﬁnd the closed form update rules for the outlierterms to speed up the optimization process. Taking the Lagrangian of Eq. 6 with respect to the con-straint (cid:80) i ∈ V O si = 1 , we get (after ignoring terms that do notinclude O si ), ∂∂O si (cid:88) j ∈ V log (cid:16) O sj (cid:17) || A j : − Z Tj : Z || + λ ( (cid:88) j ∈ V O sj − λ ∈ R is the Lagrangian constant. Equating the partialderivative w.r.t. O si to 0: − || A i : − Z Ti : Z || O si + λ = 0 , ⇒ O si = || A i : − Z Ti : Z || λ But, N (cid:80) j =1 O ji = 1 implies (cid:80) j ∈ V || A j : − Z Tj : Z || λ = 1 . Hence, O si = || A i : − Z Ti : Z || (cid:80) j ∈ V || A j : − Z Tj : Z || (9)The ﬁnal update rule for structural outliers turns out to bequite intuitive. Our goal while deriving the loss in Equation3 was to approximate adjacency structure of the graph by thesimilarity in the embedding space with outliers being dis-counted. The structural outlierness of a node in Equation 10is proportional to the difference between the two after everyiteration. In other words, if some node is not able to pre-serve its adjacency structure in the embedding space, it ismore prone to be a structural outlier.Similar to above, update rule for attribute outlier at eachiteration can be derived to the following. O ai = || X i : − ˆ X i : || (cid:80) j ∈ V || X j : − ˆ X j : || (10)Because of the convexity of total loss in Equation 6 w.r.t.individual outlier scores, derivations of the update rules foroutlier scores ensure the following lemma. Lemma 1

Keeping all other variables ﬁxed, the total loss inEquation 6 decreases after every update of the outlier scoresby Equations 10 and 9 until it reaches to a stationary point.

Clustering Parameters

The total loss of CO-GCN alsoinvolves clustering parameters M and C . While all othervariables to be ﬁxed, cluster assignment matrix M can beobtained as: M ( i, k ) =  , if k = argmin k (cid:48) ∈{ , ··· ,K } || Z i − C k (cid:48) || , Otherwise (11)In the next step, k th row of cluster center matrix C can beobtained as (Arthur and Vassilvitskii 2006): C k = 1 N k (cid:88) i ∈C k Z i : (12)where C k = { i ∈ V | M ik = 1 } is the k -th cluster and N k = |C k | is the size of k -th cluster. ataset Description Language DayTrader Trading App Java 111 203 8PBW Online plant store Java 36 47 6Acme-Air Airline App Java 38 20 4DietApp DietTracker C

Table 1: Details about the monolith applications studied

To run CO-GCN, we ﬁrst pre-train the GCN encoder anddecoder by minimizing L str and L att in Equations 3 and 4respectively, initializing O si , O ai ∀ i ∈ V to uniform values.We also use k-means++ (Arthur and Vassilvitskii 2006) toinitialize the cluster assignment and cluster center matrices.Then over iterations, we sequentially solve L total by alter-nating minimization technique described in Section 3.3 withrespect to different variables. Overall procedure of CO-GCNis presented in Algorithm 1. Algorithm 1 CO-GCN

Input : Class dependencies and Entrypoint deﬁnitions1: Convert the application to a graph representation as deﬁned inSection 3.1 and obtain the V , E and X

2: Initialize outlier scores O si and O ai uniformly ∀ i ∈ V .3: Pre-train the GCN encoder and decoder4: Use k-means++ to initialize the cluster assignment and clustercenters5: for T iterations do

6: Update outlier scores O by Eq. 10 and 9.7: Update cluster assignment and center matrices by Eq. 11and 128: Update the parameters by GCN encoder and decoder byminimizing Eq. 6 using ADAM.9: end forOutput : Cluster assignment matrix M , Cluster center matrix C and the outlier scores O Time Complexity

Time taken by GCN encoder and de-coder is O ( | E | F F (cid:48) ) . Updating each value of outlier scoretakes O ( N F (cid:48) ) and the total time to update all outlier scoresis O ( N F (cid:48) ) . Updating the parameters of cluster assignmentand cluster center matrices takes O ( N F (cid:48) K ) time. Thus,each iteration of CO-GCN takes O ( | E | F F (cid:48) + N F (cid:48) + N F (cid:48) K ) . The outlier update rules although expensive, con-verge quickly because of the closed-form solution and the-oretical guarantee (Lemma 1). Also, for most real-worldmonolith applications, number of classes is not very large(in 1000s). So the quadratic dependency of the runtime onthe number of classes is not a bottleneck. However, one cantry negative sampling approaches (Goldberg and Levy 2014)to approximate the similarity between the embeddings in theoutlier update rules for other applications if needed. To study the effectiveness of our approach, we chose fourpublicly-available web-based monolith applications namely Daytrader , Plantsbywebsphere , Acme-Air , Diet App .They vary in programming languages, technologies, objec-tives and complexity in terms of lines of code, function sizesetc. Details of the monoliths are provided in Table 1. To evaluate the quality of the clusters identiﬁed as microser-vice candidates, we deﬁne four metrics. The ﬁrst two aim tocapture the structural quality of the clusters recommendedas microservices and are the primary metrics in the evalu-ation. The other two metrics deﬁne additional properties ofthe clusters that are desirable.1.

Modularity : Modularity is a commonly used metric toevaluate the quality of clusters in a graph (Newman andGirvan 2004)(Newman 2006). It measures the fraction ofedges of the graph between members of the same clus-ter relative to that of the same partition members but ran-domly generated graph edges. Higher values of Modular-ity indicate a stronger community structure.2.

Structural Modularity : An alternate measure of struc-tural soundness of a cluster that is more suited to softwareapplications is deﬁned in (Jin et al. 2019). Structural Mod-ularity, (SM) is deﬁned as SM = 1 K K (cid:88) k =1 u k N k − K ( K − / K (cid:88) k (cid:54) = k σ k ,k N k N k ) and u k is the number of edges that lie completely within acluster k , σ k ,k is the number of edges between cluster k and cluster k . N k and N k are the number of membersin clusters k and k respectively.3. Non-Extreme Distribution(NED) : It is desired that a mi-croservice may not have too many or too few classes. Wetherefore measure how evenly distributed the sizes of therecommended clusters are as

N ED = (cid:80) Kk =1 ,k not extreme n k | V | n k is the number of classes in cluster k and V is the setof classes. k is not extreme if it’s size is within boundsof { , } . N ED captures the architectural soundness ofthe clusters (Wu, Hassan, and Holt 2005)(Bittencourt andGuerrero 2009). For better interpretability, we measure − N ED and lower values are favorable.4.

Interface Number(IFN) : As deﬁned in (Jin et al. 2019),this is the average number of published interfaces of amicroservices partitioning.

IF N = 1 K K (cid:88) k =1 if n k , if n k = | I k | where I k is the number of published interfaces in the mi-croservice k and K is the number of such micorservices. https://github.com/WASdev/sample.daytrader7 https://github.com/WASdev/sample.plantsbywebsphere https://github.com/acmeair/acmeair https://github.com/SebastianBienert/DietApp/ igure 2: Comparison of the CO-GCN method with the baselines across the four applications on the (a) Structural Modularity(b) Modularity (c) 1-NED and (d) IFN metrics. The CO-GCN method clearly outperforms the baselines considered.We deﬁne a published interface as any class in the mi-croservice that is referenced by another class from a dif-ferent microservice. Lower values of IF N are preferred.

For each application in Table 1, we generate the adjacencymatrix, A and the feature matrix, X . The CO-GCN encodercomprises of two layers with dimensionality and . Thedecoder consists of one layer of size followed by anotherof the appropriate feature dimension. We pretrain for iterations and set T = 500 in Algorithm 1. The ﬁnal val-ues of M ( i, k ) are used as the cluster assignments from ouralgorithm. We set { α , α , α } = { . , . , . } in Eq. 6.We evaluate our approach against multiple unsupervisedbaselines for learning node representations: Deepwalk (Per-ozzi, Al-Rfou, and Skiena 2014), Node2vec (Grover andLeskovec 2016), ONE (Bandyopadhyay, Lokesh, and Murty2019) GCN (Kipf and Welling 2016) and DGI (Veliˇckovi´cet al. 2019). Among these, ONE accounts for the effectsof outliers in learning node embeddings. For all our exper-iments, we set the size of the node embeddings to be .We use k-means++ algorithm on the embeddings generatedby these baselines to obtain clusters. K is carefully chosenbased on online sources and SME inputs.In contrast to theserepresentation learning based baselines, the method of (Ma-zlami, Cito, and Leitner 2017) is a state-of-the-art approachfor extracting microservices from a monolith application.This leverages Semantic Coupling (SC) information withgraph partitioning to identify clusters. We also use it as abaseline. Since the implementation for the SC method does Figure 3: Clusters and top 5 outliers identiﬁed for the PBWapplication, with manual labels about their functionality.not support .Net applications, we do not use it for DietApp. Figure 2 shows the metrics values on all four application forthe evaluated methods. The three attributed graph neural net-work based methods (GCN, DGI and CO-GCN) outperformigure 4: Results from the ablation study on the structuralmodularity and modularity metrics across the applicationsthe rest of the methods by a signiﬁcant margin. The CO-GCN method consistently achieves better modularity andstructural modularity scores which clearly validates the in-clusion of outlier and clustering objectives in the training.The CO-GCN method also achieves better

N ED and

IF N scores in most cases. Another interesting observation is thenegative scores for many of the baseline methods. This im-plies that are many inter-cluster edges for the clusters recom-mended by these methods, hinting at the fact that monolithapplications may have several high-trafﬁc nodes and assign-ing them to appropriate clusters is difﬁcult, but critical. Fig-ure 3 shows the identiﬁed clusters for the PBW applicationand our manual annotations to highlight the functionalitiesoffered. We can notice the clear distinction of functionalities

The values of O si and O ai at the end of training representthe ﬁnal outlier scores of each node. The ranked list of out-lier nodes represents the top candidates for refactoring aspart of microservices decomposition. Figure 3 highlights thecombined top outliers detected (across structural and at-tribute outlier scores) for PBW application by our approach.Among the baselines, we report outlier detection results onlyfor GCN and DGI as they performed good for obtainingmicroservices. As GCN and DGI do not output outliers di-rectly, we use Isolation forest (Liu, Ting, and Zhou 2008) onthe embeddings generated by them to detect outliers.To study the goodness of the outliers, we performed aqualitative study with ﬁve software engineers who have min-imum seven years industrial experience. We randomly pre-sented them with two out of the four monoliths and sharedtheir code repositories. We asked them to rank the top ﬁverefactor candidate classes and compared them with the out-liers identiﬁed by GCN, DGI and CO-GCN. On an average,the top ﬁve outliers provided by the annotators overlappedwith our approach by 60%, GCN by 45% and DGI by 55%.Details of top outliers detected by each approach, our ques-tionnaire and the results are provided in the supplementarymaterial. We can conclude that the outliers identiﬁed by ourapproach are more relevant. The low overlap numbers indi-cate the highly difﬁcult and subjective nature of this task. Figure 5: Sensitivity analysis on embedding size We perform another set of experiments to measure the use-fulness of individual components of CO-GCN.1. We remove the clustering objective from L total ., i.e., set α = 0 in Equation 6. Comparing the performance ofthis variant with CO-GCN shows marginal contributionof integrating the clustering loss. We denote this variantas CO-GCN ~ (C). We use k-means++ on the node embed-dings generated by this approach to obtain the clusters.2. We remove the effect of the O si and O ai on L str and L att respectively, by removing the log( · ) terms. This is equiv-alent to traditional link and attribute reconstruction, withthe clustering loss L clus . The goal is to evaluate the use-fulness of minimizing the effect of outliers for identifyinggood clusters. We denote this variant as CO-GCN ~ (O).The results of the ablation study are shown in Figure 4. Ingeneral, incorporating outlier scores and the clustering ob-jective does result in higher modularity and structural mod-ularity scores. However, the degree to which these compo-nents contribute to the overall clustering quality vary foreach application and the metric used. For instance, in theDaytrader application, removing the clustering objective re-duces structural modularity signiﬁcantly, but has no effecton modularity. Conversely, removing the outlier informationreduces the modularity score, but has negligible effect onstructural modularity. This effect is also visible in the otherapplications. Interestingly, removing the outlier informationleads to improved modularity for PBW, but this is balancedby a reduced structural modularity score. We can still con-clude that including the outlier scores and clustering loss inthe training objective improves cluster quality in general.Finally, we also evaluate the effect of the node embed-dings size on the modularity and structural modularity val-ues for each application. We experiment with embeddingsizes in { , , , } . The results are presented in Figure 5.We notice the modularity scores do not have any signiﬁcantvariation with a change in node embedding size. There isrelatively more variation in the structural modularity scoreswith change in embedding sizes and once again, this varia-tion is application dependent. There is not enough evidenceto make any substantial claims, but in general, the perfor-mance seems to be better at higher embedding sizes. Conclusion

We introduced the traditional software engineering problemof monolith to microservices decomposition as a cluster-ing task building upon graph representation learning. Weshowed how the application implementation structure canbe translated into an attributed graph network. We then pro-posed a multi-objective Graph Convolution Network (GCN)based novel framework to not just generate clusters whichcan be candidate microservices, but also identiﬁed the out-liers in the graph which can be considered as the importantrefactor classes for the architect to focus on. Our approachimproved state of the art on multiple metrics from both graphand software engineering literature and performed betterthan others in human evaluation for the outlier detection. Infuture, we want to extend this work to automatically iden-tify the number of microservices and expand the studies toprocedural programming languages like COBOL.

We believe this work doesn’t have any direct societal or eth-ical impact.

We would like to thank Giriprasad Sridhara, Amith Singhee,Shivali Agarwal, Raunak Sinha from IBM India ResearchLabs and Yasu Kastuno, Ai Ishida, Aki Tozawa, FumikoSatoh from IBM Tokyo Research Labs for their insightfulsuggestions during this work and valuable feedback towardsimproving the paper.

References

Arthur, D.; and Vassilvitskii, S. 2006. k-means++: The ad-vantages of careful seeding. Technical report, Stanford.Bandyopadhyay, S.; Lokesh, N.; and Murty, M. N. 2019.Outlier aware network embedding for attributed networks.In

Proceedings of the AAAI Conference on Artiﬁcial Intelli-gence , volume 33, 12–19.Bandyopadhyay, S.; Lokesh, N.; Vivek, S. V.; and Murty,M. 2020. Outlier Resistant Unsupervised Deep Architec-tures for Attributed Network Embedding. In

Proceedings ofthe 13th International Conference on Web Search and DataMining , 25–33.Bandyopadhyay, S.; Vivek, S. V.; and Murty, M. N. 2020. In-tegrating Network Embedding and Community Outlier De-tection via Multiclass Graph Description. In

ECAI 2020- 24th European Conference on Artiﬁcial Intelligence, 29August-8 September 2020, Santiago de Compostela, Spain ,976–983. IOS Press. doi:10.3233/FAIA200191.Bittencourt, R. A.; and Guerrero, D. D. S. 2009. Comparisonof graph clustering algorithms for recovering software archi-tecture module views. In , 251–254. IEEE.Chen, Z.; Li, L.; and Bruna, J. 2019. Supervised Commu-nity Detection with Line Graph Neural Networks. In

In-ternational Conference on Learning Representations . URLhttps://openreview.net/forum?id=H1g0Z3A9Fm. Dietrich, J.; Gauthier, F.; and Krishnan, P. 2018. Driver Gen-eration for Java EE Web Applications. In , 121–125. IEEE.Fritzsch, J.; Bogner, J.; Zimmermann, A.; and Wagner, S.2018. From Monolith to Microservices: A Classiﬁcation ofRefactoring Approaches.

CoRR abs/1807.10059. URL http://arxiv.org/abs/1807.10059.Gilmer, J.; Schoenholz, S. S.; Riley, P. F.; Vinyals, O.; andDahl, G. E. 2017. Neural message passing for Quantumchemistry. In

Proceedings of the 34th International Con-ference on Machine Learning-Volume 70 , 1263–1272.Goldberg, Y.; and Levy, O. 2014. word2vec Explained: de-riving Mikolov et al.’s negative-sampling word-embeddingmethod. arXiv preprint arXiv:1402.3722 .Gouigoux, J.-P.; and Tamzalit, D. 2017. From monolith tomicroservices: Lessons learned on an industrial migrationto a web oriented architecture. In ,62–65. IEEE.Grover, A.; and Leskovec, J. 2016. node2vec: Scalable fea-ture learning for networks. In

Proceedings of the 22nd ACMSIGKDD international conference on Knowledge discoveryand data mining , 855–864.Guo, S.; Lin, Y.; Feng, N.; Song, C.; and Wan, H. 2019. At-tention based spatial-temporal graph convolutional networksfor trafﬁc ﬂow forecasting. In

Proceedings of the AAAI Con-ference on Artiﬁcial Intelligence , volume 33, 922–929.Hamilton, W. L.; Ying, R.; and Leskovec, J. 2017. Represen-tation learning on graphs: Methods and applications. arXivpreprint arXiv:1709.05584 .Jin, W.; Liu, T.; Cai, Y.; Kazman, R.; Mo, R.; and Zheng,Q. 2019. Service candidate identiﬁcation from monolithicsystems based on execution traces.

IEEE Transactions onSoftware Engineering .Kalske, M.; et al. 2018. Transforming monolithic architec-ture towards microservice architecture .Kingma, D. P.; and Ba, J. 2014. Adam: A method forstochastic optimization. arXiv preprint arXiv:1412.6980 .Kipf, T. N.; and Welling, M. 2016. Variational graph auto-encoders. arXiv preprint arXiv:1611.07308 .Kipf, T. N.; and Welling, M. 2017. Semi-Supervised Clas-siﬁcation with Graph Convolutional Networks. In

Proceedings of the 2018 SIAM International Con-ference on Data Mining , 153–161. SIAM.iu, F. T.; Ting, K. M.; and Zhou, Z.-H. 2008. Isolationforest. In , 413–422. IEEE.Mazlami, G.; Cito, J.; and Leitner, P. 2017. Extractionof Microservices from Monolithic Software Architectures.In , 524–531.Newman, M. E. 2006. Modularity and community structurein networks.

Proceedings of the national academy of sci-ences

Physical review E

Proceedings ofthe 20th ACM SIGKDD international conference on Knowl-edge discovery and data mining , 701–710.Shtern, M.; and Tzerpos, V. 2012. Clustering Methodologiesfor Software Engineering.

Adv. Soft. Eng.

IEEE Software

In-ternational Conference on Learning Representations . URLhttps://openreview.net/forum?id=rklz9iAcKQ.Wu, J.; Hassan, A. E.; and Holt, R. C. 2005. Comparison ofclustering algorithms in the context of software evolution.In , 525–535. IEEE.Wu, Z.; Pan, S.; Chen, F.; Long, G.; Zhang, C.; and Philip,S. Y. 2020. A comprehensive survey on graph neural net-works.

IEEE Transactions on Neural Networks and Learn-ing Systems .Yang, B.; Fu, X.; Sidiropoulos, N. D.; and Hong, M. 2017.Towards k-means-friendly spaces: Simultaneous deep learn-ing and clustering. In international conference on machinelearning , 3861–3870. PMLR.Zhang, X.; Liu, H.; Li, Q.; and Wu, X.-M. 2019. Attributedgraph clustering via adaptive graph convolution. In

Proceed-ings of the 28th International Joint Conference on ArtiﬁcialIntelligence , 4327–4333. AAAI Press.

Notation

Different notations used in the paper are summarized in Ta-2ble 1

B Outliers

B.1 Examples of an outlier from the source code

We deﬁne outliers as classes that have no unique businessfunctionality identity. Primarily, these are classes that areoverloaded with multiple business functionalities as seen inFigure 6.

Daytrader is an online stock trading monolithapplication. The application allows users to setup the trad-ing platform like conﬁguring the database, user account andtrading amount etc. Once setup the platform allows users toview market summary, check stock quotes, and buy or sellstocks. TradeDirect is one of the core class in the appli-cation which implements TradeServices interface. This in-terface exposes six different abstract functionalities throughroughly twenty methods to implement. TradeDirect imple-ments the interface and provides logical support to all sixbusiness functionalities. Now, the identity of this class isoverloaded and therefore, become top refactor candidates forthe developers. Developers tend to breakdown the methodsand separate them into six classes speciﬁc to the businessfunctionality. There can be other type of outliers too likeclasses that doesn’t support any business functionality butthe core business function classes depend on them for com-pletion. They can be identiﬁed by dependencies from multi-ple classes. Typically, utilities display such a behavior. 2 liststhe top ﬁve outliers determined by the three attributed graphneural network based methods (CO-GCN, GCN and DGI)for each of the four monolith applications namely Daytrader , Plantsbywebsphere (PBW) , Acme-Air , Diet App . B.2 Details of Outlier Evaluation Study

To study the goodness of the outliers, we performed a qual-itative study with ﬁve software engineers who have mini-mum seven years industrial experience. We randomly pre-sented them an overview of two out of the four four mono-liths and shared their code repositories. We requested themto manually go through the program classes to understandthe functionality exposed and the classes that come togetherto support them. We intentionally did not explain the appli-cation details since it might inﬂuence their perspective ofoutliers. Application understanding is an time consumingand specialized activity. All the annotators had prior expe-rience in building web applications and are proﬁcient in thelanguage used in the monolith applications. All the annota-tor took roughly twenty four hours to study the monoliths.For the study, we asked each annotator to rank the top ﬁverefactor candidate classes for the two application they werepresented. We then compared them with the outliers identi-ﬁed by CO-GCN, GCN and DGI methods. On an average, https://github.com/WASdev/sample.daytrader7 https://github.com/WASdev/sample.daytrader7 https://github.com/WASdev/sample.plantsbywebsphere https://github.com/acmeair/acmeair https://github.com/SebastianBienert/DietApp/ the top ﬁve outliers provided by the annotators overlappedwith our approach by 60%, GCN by 45% and DGI by 55%.Details of the overlap for the annotators for each of the ap-proaches for the four applications are listed in Table 3, Ta-ble 4 ,Table 5 and Table 6. We can conclude that the outliersidentiﬁed by our approach are more relevant. The low over-lap numbers also indicate the highly difﬁcult and subjectivenature of this task.igure 6: TradeDirect class implements TradeServices interface to support six different business functionalities through thetwenty two methods. This overladed business functionality classs is considered outlier class and becomes a candidate refac-torable class for the developer to break into six classes containing only speciﬁc business function methods ataset Approaches Top 5 Outliers classes Daytrader CO-GCN 1. com.ibm.websphere.samples.daytrader.web.jsf.AccountDataJSF2. com.ibm.websphere.samples.daytrader.web.prims.PingJSONP3. com.ibm.websphere.samples.daytrader.util.FinancialUtils4. com.ibm.websphere.samples.daytrader.direct.TradeDirect5. om.ibm.websphere.samples.daytrader.TradeActionGCN 1. com.ibm.websphere.samples.daytrader.web.prims.PingCDIBean2. com.ibm.websphere.samples.daytrader.util.TradeConﬁg3. com.ibm.websphere.samples.daytrader.util.Log4. com.ibm.websphere.samples.daytrader.util.TradeDirect5. com.ibm.websphere.samples.daytrader.web.prims.PingJSONPDGI 1. com.ibm.websphere.samples.daytrader.web.jsf.QuoteData2. com.ibm.websphere.samples.daytrader.util.KeyBlock3. com.ibm.websphere.samples.daytrader.util.Log4. com.ibm.websphere.samples.daytrader.web.prims.PingUpgradeServlet5. com.ibm.websphere.samples.daytrader.web.prims.PingServletCDIPBW CO-GCN 1. com.ibm.websphere.samples.pbw.ejb.ResetDBBean2. com.ibm.websphere.samples.pbw.war.ShoppingBean3. com.ibm.websphere.samples.pbw.war.AccountBean4. com.ibm.websphere.samples.pbw.ejb.CustomerMgr5. com.ibm.websphere.samples.pbw.jpa.OrderItemGCN 1. com.ibm.websphere.samples.pbw.ejb.EMailMessage2. com.ibm.websphere.samples.pbw.ejb.MailerBean3. com.ibm.websphere.samples.pbw.war.AccountBean4. com.ibm.websphere.samples.pbw.war.LoginInfo5. com.ibm.websphere.samples.pbw.jpa.SupplierDGI 1. com.ibm.websphere.samples.pbw.ejb.EMailMessage2. com.ibm.websphere.samples.pbw.war.LoginInfo3. com.ibm.websphere.samples.pbw.jpa.Supplier4. com.ibm.websphere.samples.pbw.utils.ListProperties5. com.ibm.websphere.samples.pbw.war.AccountBeanAcme-Air CO-GCN 1. com.acmeair.service.ServiceLocator2. com.acmeair.mongo.services.BookingServiceImpl3. com.acmeair.AirportCodeMapping4. com.acmeair.loader.BookingLoader5. com.acmeair.mongo.services.CustomerServiceImplGCN 1. com.acmeair.loader.Loader2. com.acmeair.conﬁg.AcmeAirConﬁguration.ServiceData3. com.acmeair.web.BookingsREST4. com.acmeair.conﬁg.LoaderREST5. com.acmeair.mongo.services.BookingServiceImplDGI 1. com.acmeair.conﬁg.AcmeAirConﬁguration.ServiceData2. com.acmeair.mongo.services.AuthServiceImpl3. com.acmeair.service.AuthService4. com.acmeair.service.ServiceLocator5. com.acmeair.service.FlightServiceDietApp CO-GCN 1. WebApplication1.Controllers.ManageController2. WebApplication1.Controllers.ProductController3. WebApplication1.Services.ProductService4. WebApplication1.Services.DietService5. WebApplication1.Models.DietDbContextGCN 1. WebApplication1.RouteConﬁg2. WebApplication1.FilterConﬁg3. WebApplication1.Services.ProductService4. WebApplication1.BundleConﬁg5. WebApplication1.Repositories.Concrete.UserRepostioryDGI 1. WebApplication1.Controllers.ManageController2. WebApplication1.Services.DietService3. WebApplication1.Repositories.Concrete.UserRepostiory4. WebApplication1.Services.ProductService5. WebApplication1.Controllers.EntryControllerTable 2: Top ﬁve outliers detected using each of the approaches for the four monolith applications ataset Annotator Approaches

Daytrader Annotator1 CO-GCN 3GCN 2DGI 3Annotator2 CO-GCN 3GCN 3DGI 2Table 3: Overlapping outliers for the three approaches from the two annotators for the Daytrader monolith application

Dataset Annotator Approaches

PBW Annotator1 CO-GCN 3GCN 2DGI 2Annotator2 CO-GCN 3GCN 3DGI 3Table 4: Overlapping outliers for the three approaches from the two annotators for the PlantsByWebsphere monolith application

Dataset Annotator Approaches

Acme-Air Annotator1 CO-GCN 2GCN 2DGI 3Annotator2 CO-GCN 3GCN 2DGI 2Table 5: Overlapping outliers for the three approaches from the two annotators for the Acme-Air monolith application