DeepNC: Deep Generative Network Completion
11 DeepNC: Deep Generative Network Completion
Cong Tran,
Student Member, IEEE , Won-Yong Shin,
Senior Member, IEEE , Andreas Spitz,and Michael Gertz
Abstract —Most network data are collected from only partially observable networks with both missing nodes and edges, for example,due to limited resources and privacy settings specified by users on social media. Thus, it stands to reason that inferring the missingparts of the networks by performing network completion should precede downstream mining or learning tasks on the networks.However, despite this need, the recovery of missing nodes and edges in such incomplete networks is an insufficiently exploredproblem. In this paper, we present DeepNC, a novel method for inferring the missing parts of a network that is based on a deepgenerative model of graphs. Specifically, our method first learns a likelihood over edges via an autoregressive generative model, andthen identifies the graph that maximizes the learned likelihood conditioned on the observable graph topology. Moreover, we propose acomputationally efficient DeepNC algorithm that consecutively finds individual nodes that maximize the probability in each nodegeneration step, as well as an enhanced version using the expectation-maximization algorithm. The runtime complexities of bothalgorithms are shown to be almost linear in the number of nodes in the network. We empirically demonstrate the superiority of DeepNCover state-of-the-art network completion approaches.
Index Terms —Autoregressive generative model; deep generative model of graphs; inference; network completion; partially observablenetwork (cid:70)
NTRODUCTION
Real-world networks extracted from various biological, so-cial, technological, and information systems tend to beonly partially observable and thus missing both nodes andedges [1]. For example, users and organizations may havelimited access to data due to insufficient resources or a lackof authority. In social networks, a source of incompletenessstems from privacy settings specified by users who partiallyor completely hide their identities and/or friendships [2]. Asan example, consider a demographic analysis of Facebookusers in New York City in June 2011 that showed 52.6%of the users to be hiding their Facebook friends [3]. Usingsuch incomplete network data may severely degrade theperformance of downstream analyses such as communitydetection, link prediction, and node classification due tosignificantly altered estimates of structural properties (see,e.g., [1], [4], [5], [6] and references therein).This motivates us to conduct network completion to inferthe missing part (i.e., a set of both missing nodes andassociated edges), prior to performing downstream appli-cations. While intuitively similar, network completion isa much more challenging task than the well-studied link • C. Tran is with the Department of Computer Science and Engineering,Dankook University, Yongin 16890, Republic of Korea, and also with theDepartment of Computational Science and Engineering, Yonsei Univer-sity, Seoul 03722, Republic of Korea. E-mail: [email protected]. • W.-Y. Shin is with the Department of Computational Science andEngineering, Yonsei University, Seoul 03722, Republic of Korea.E-mail: [email protected]. • A. Spitz is with the School of Computer and Communication Sciences,cole Polytechnique Fdrale de Lausanne, Lausanne 1015, Switzerland.E-mail: andreas.spitz@epfl.ch. • M. Gertz is with the Institute of Computer Science, Heidelberg University,Heidelberg 69120, Germany. E-mail: [email protected].(Corresponding author: Won-Yong Shin.) prediction , since it jointly infers both missing nodes and edges while link prediction infers missing edges only. Althoughthere has been an attempt to recover both missing nodesand edges [5], it suffers from several limitations. A state-of-the-art network completion method that aims at inferringthe missing part of a network based on the Kronecker graphmodel, dubbed KronEM [5], suffers from three major prob-lems: 1) setting the size of a Kronecker generative parameteris not trivial; 2) the Kronecker graph model is inherentlydesigned under the assumption of a pure power-law degreedistribution that not all real-world networks necessarilyfollow; and 3) its inference accuracy is not satisfactory.As a way of further enhancing the performance of net-work completion, our study is intuitively motivated by theexistence of structurally similar graphs with respect to agraph distance, whose topologies are almost entirely ob-servable. Such similar graphs can be retrieved from thesame domain as that of the target graph (see [7], [8], [9]for more information). Suppose that many citizens residingin country A strongly protect the privacy of their socialrelationships, while citizens of country B tend to providetheir friendship relations on social media. Intuitively, aslong as the graph structures between two countries aresimilar to each other, latent information within the (almost)complete data collected from country B can be uncoveredand leveraged to infer the missing part of the collected datafrom country A. Additionally, the use of deep learning ongraphs has been actively studied by exploiting this struc-tural similarity of graphs (see, e.g., [10], [11] and referencestherein), which enables us to model complex structures overgraphs with high accuracy. For example, the framework ofrecurrent neural networks (RNN) and generative adversar-ial networks (GAN) were recently introduced to construct
1. Note that, in this paper, we use the terms “network” and “graph”interchangeably. a r X i v : . [ c s . S I] J un deep generative models of graphs [10], [11]. Thus, a naturalquestion is how such structural similarity can be incorporatedinto the problem of network completion by taking advan-tage of effective deep learning-based approaches. In this paper, we introduce
DeepNC , a novel method forcompleting the missing part of an observed incompletenetwork G O based on a deep generative model of graphs.Specifically, we first learn a likelihood over edges (i.e., alatent representation) via an autoregressive generative modelof graphs, e.g., GraphRNN [10] built upon RNN, by using aset of structurally similar graphs as training data, and theninfer the missing part of the network. Unlike GraphRNN,which is only applicable to fully observable graphs, ourmethod is capable of accommodating both observable andmissing parts by imputing a number of missing nodes andedges with sampled values from a multivariate Bernoullidistribution. To this end, we formulate a new optimizationproblem with the aim of finding the graph that maximizesthe learned likelihood conditioned on the observable graphtopology. To efficiently solve the problem, we first propose alow-complexity DeepNC algorithm, termed
DeepNC-L , that consecutively finds a single node maximizing the probabilityin each node generation step in a greedy fashion underthe assumption that there are no missing edges betweentwo nodes in a partially observable network G O . We thenpresent judicious approximation and computational reduc-tion techniques to DeepNC-L by exploiting the sparseness ofreal-world networks. Second, by relaxing this assumptionto deal with a more realistic scenario in which there aremissing edges in G O , we propose an enhanced versionof DeepNC using the expectation-maximization (EM) algo-rithm, termed
DeepNC-EM , which enables us to jointly findboth missing edges between nodes in G O and edges associ-ated with missing nodes by executing DeepNC-L iteratively.That is, the
DeepNC-EM algorithm jointly solves networkcompletion and link prediction in a single module. We showthat the computational complexity of both
DeepNC algo-rithms is almost linear in the number of nodes in the network.By adopting the graph edit distance (GED) [12] as a per-formance metric, we empirically evaluate the performanceof both
DeepNC algorithms for various environments. Ex-perimental results show that our algorithms consistentlyoutperform state-of-the-art network completion approachesby up to 68.25% in terms of GED. The results also demon-strate the robustness of our method not only on variousreal-world networks that do not necessarily follow a power-law degree distribution, but also in three more difficult andchallenging situtations where 1) a large portion of nodesare missing, 2) training graphs are only partially observed,and 3) a large portion of edges between nodes in G O aremissing. Additionally, we analyze and empirically validatethe computational complexity of DeepNC algorithms. Ourmain contributions are five-fold and summarized as follows: • We introduce
DeepNC , a deep learning-based net-work completion method for partially observablenetworks; • We formalize our problem as the imputation of miss-ing data in an optimization problem that maximizes TABLE 1: Summary of notations
Notation Description G T true graph G O partially observable graph V O set of nodes in G O E O set of edges in G O V M set of missing nodes E M set of missing edges G I training graph p model probability distribution over edges of agraph Θ parameter of p model ˆ G recovered graph π node ordering S π a sequence of nodes and edges under π the conditional probability of a generated node se-quence; • We design two computationally efficient
DeepNC algorithms to solve the problem by exploiting thesparsity of networks; • We validate
DeepNC through extensive experimentsusing real-world datasets across various domains, aswell as synthetic datasets; • We analyze and empirically validate the computa-tional complexity of
DeepNC .To the best of our knowledge, this study is the first workthat applies deep learning to network completion.
The remainder of this paper is organized as follows. InSection 2, we summarize significant studies that are relatedto our work. In Section 3, we explain the methodology of ourwork, including the problem definition and an overview ofour
DeepNC method. Section 4 describes implementationdetails of two
DeepNC algorithms and analyzes their com-putational complexities. Experimental results are discussedin Section 5. Finally, we provide a summary and concludingremarks in Section 6.Table 1 summarizes the notation that is used in thispaper. This notation will be formally defined in the follow-ing sections when we introduce our methodology and thetechnical details.
ELATED W ORK
The method that we propose in this paper is related tothree broader areas of research, namely generative modelsof graphs, link prediction, and network completion.
Generative models of graphs.
The study of generativemodels of graphs has a long history, beginning with the firstrandom model of graphs that robustly assigns probabilitiesto large classes of graphs, and was introduced by Erd˝osand R´enyi [13]. Another well-known model generates newnodes based on preferential attachment [14]. More recently, agenerative model based on Kronecker graphs, the so-calledKronFit, was introduced in [15], which generates syntheticnetworks that match many of the structural properties of
TABLE 2: Summary of deep generative models of graphsDeep generative models of graphs Scalable Flexible AttributedAutoregressive [10], [16] (cid:88) (cid:88)
GAN [11], [20] (cid:88)
VAE [17], [18] (cid:88)
Reinforcement learning [19] (cid:88) (cid:88)
General neural network [21] (cid:88) (cid:88) real-world networks such as constant and shrinking diam-eters. Recent advances in deep learning -based approacheshave made further progress towards generative models forcomplex networks [10], [11], [16], [17], [18], [19], [20], [21].GraphRNN [10] and graph recurrent attention networks(GRAN) [16] were presented to learn a distribution overedges by decomposing the graph generation process intosequences of node and edge formations via autoregressivegenerative models; an approach using the Wasserstein GANobjective in the training process was applied to gener-ate discrete output samples [11]; variational autoencoders(VAEs) were employed to design another deep learning-based generative model of graphs [17], [18]; a graph con-volutional policy network was presented for goal-directedgraph generation (e.g., drug molecules) using reinforcementlearning [19]; a multi-scale graph generative model, namedMisc-GAN, was introduced by modeling the underlyingdistribution of graph structures at different levels of granu-larity to aim at generating graphs having similar commu-nity structures [20]; and a more general deep generativemodel was presented to learn distributions over any ar-bitrary graph via graph neural networks [21]. Among theaforementioned methods, autoregressive generative modelssuch as GraphRNN and GRAN are the most scalable andflexible approaches in terms of graph size, while others arebeneficial in generating non-topological information such asnode attributes. Table 2 summarizes the literature overviewof the aforementioned deep generative models of graphs.
Link prediction.
Inferring the presence of links in agiven network according to the neighborhood similarityof existing connections is a longstanding task in networkscience. Although numerous algorithms have been devel-oped based on traditional statistical measures [22] and deeplearning such as graph neural networks [18], [23], existinglink prediction methods are not inherently designed tosolve the network completion problem that jointly recoversmissing nodes and edges in partially observable networks.Specifically, when a node is completely missing from theunderlying network, link prediction models can no longerexploit structural neighborhood information.
Network completion.
Observing a partial sample of anetwork and inferring the remainder of the network is re-ferred to as network completion . As the most influential study,KronEM, an approach based on Kronecker graphs to solv-ing the network completion problem by applying the EMalgorithm, was suggested by Kim and Leskovec [5]. MISCwas developed to tackle the missing node identification prob-lem when the information of connections between missingnodes and observable nodes is assumed to be available [24].A follow-up study of MISC [25] incorporated metadata suchas the demographic information and the nodes historical behavior into the inference process. Furthermore, a graphupscaling method, termed EvoGraph [26], can be regardedas a network completion method using a preferential attach-ment mechanism.
Discussions.
Despite these contributions, there has beenno prior work in the literature that exploits the power ofdeep generative models in the context of network com-pletion. Although generative models of graphs such asGraphRNN can be used as a network completion method,nontrivial extra tasks are required, including computation-ally expensive graph matching to find the correspondencebetween generated graphs and the partially observable net-work. Furthermore, MISC and other follow-up studies donot truly address network completion, since they solve the node identification problem under the assumption that theconnections between missing nodes and observable nodesare known beforehand, which is not feasible in a settingwhere only partial observation of nodes is possible.
ETHODOLOGY
As a basis for the proposed
DeepNC algorithm in Section 4,we first describe our network model with basic assumptionsand formulate our problem. Then, we explain a deep gener-ative graph model and our research methodology adoptingthe deep generative graph model to solve the problem ofnetwork completion.
Let us denote a partially observable network as G O =( V O , E O ) , where V O and E O are the set of vertices and theset of edges, respectively. The network G O with | V O | ob-servable nodes can be interpreted as a subgraph taken froman underlying true network G T = ( V O ∪ V M , E O ∪ E M ) ,where V M is the set of unobservable (missing) nodes and E M is the set of three types of unobservable (missing) edgesincluding i) the edges connecting two nodes in V M , ii) theedges connecting one node in V O and another node in V M ,and iii) the missing edges connecting two nodes in V O . Morespecifically, the set of observable edges, E O , is regarded as asubset of all true edges connecting nodes in V O . In contrastto the conventional setting that assumes no missing edgesbetween two nodes in V O [5], we relax this assumptionby not requiring that G O is a complete subgraph. In thefollowing, we assume both G O and G T to be undirectedunweighted networks without self-loop and repeated edges.Let us denote p model as a family of probability distri-butions over the edges of a graph, which can be param-eterized by a set of model parameters Θ , i.e., ( p Θ model ; Θ) .In this paper, we suppose that G T is a sample drawn Fig. 1: The schematic overview of our
DeepNC method.from the distribution p model . Furthermore, we assume thatthe number of missing nodes, | V M | , is available or can beestimated. In practice, | V M | can be readily estimated by stan-dard statistical methods; for example, a latent non-randommixing model in [27] is capable of estimating a networksize | V O ∪ V M | by asking respondents how many peoplethey know in specific subpopulations. For an overview ofnetwork-relevant notations, see Fig. 1. In the following, we formally define the network completionproblem, the idea behind our approach, and the problemformulation.
Definition 1. Network completion problem.
Given a par-tially observable network G O , network completion aims to recoverall missing edges connecting nodes in the true network G T sothat the inferred network, denoted by ˆ G , is equivalent to G T (upto isomorphism). As illustrated in Fig. 1, a network ˆ G is inferred usingthe partially observable network G O as input of DeepNC .We tackle this problem by minimizing a distance metric δ ( G T , ˆ G ) that measures the difference between G T and ˆ G .Due to the fact that the true network G T is not available, ourmain idea behind this problem is to analyze the connectivitypatterns of one (or multiple) fully observed network(s) G I whose structure is similar to that of G T (i.e., δ ( G T , G I ) issufficiently small) and then to make use of this informationfor recovering the network G O , where G I is a sampledrawn from the distribution p model . To this end, we firstlearn ( p Θ model ; Θ) by using G I as the training data under adeep generative model of graphs described in Section 3.2.Afterwards, we generate graphs with similar structures viathe set of learned model parameters Θ . Among all generatedgraphs G , each of which has | V O | + | V M | nodes, we findthe most likely graph configuration ˆ G given the observablepart G O . In this context, our optimization problem can be
2. The number of nodes in G I should be greater than or equal to thatin G T so that the information (i.e., the distribution p Θ model ) encoded bylearned parameters Θ is sufficient to infer G T . formulated as follows: ˆ G = arg max G P ( G | G O , Θ) s.t. | V G | = | V O | + | V M | , (1)where | V G | denotes the number of nodes in G . The overallprocedure of our approach is visualized in Fig. 1. Deep generative models of graphs have the ability to ap-proximate any distribution of graphs with minimal assump-tions about their structures [10], [21]. Among recently intro-duced deep generative models, GraphRNN [10] is adoptedin our study due to its state-of-the-art performance in gener-ating diverse graphs that match the structural characteristicsof a target set as well as the scalability to much largergraphs than those from other deep generative models (referto Section 4 and Corollary 1 in [10] for more details). Inthis subsection, we briefly describe a variant of GraphRNN,termed simplified GraphRNN (GraphRNN-S), where theprobability of edge connections for a node is assumed to beindependent of each other. This method effectively learns ( p Θ model ; Θ) from the set of structurally similar network(s) G I .We first describe how to vectorize a graph. Given a graph G with a number of nodes equal to | V O | + | V M | , we definea node ordering π that maps nodes to rows or columns ofa given adjacency matrix of G as a permutation functionover the set of nodes. Thus, { π ( v ) , · · · , π ( v | V O | + | V M | ) } is apermutation of { v , · · · , v | V O | + | V M | } , yielding ( | V O | + | V M | )! possible node permutations. Given a node ordering π , asequence S is then defined as: S π (cid:44) ( S π , · · · , S π | V O | + | V M | ) , (2)where each element S πi ∈ { , } i − for i ∈ { , · · · , | V O | + | V M |} is a binary adjacency vector representing the edgesbetween node π ( v i ) and the previous nodes π ( v j ) for j ∈{ , · · · , i − } that already exist in the graph, and S π = ∅ .Here, S πi can be expressed as S πi = ( a π ,i , · · · , a πi − ,i ) , ∀ i ∈ { , · · · , | V O | + | V M |} , (3)where a πu,v denotes the ( u, v ) -th element of the adja-cency matrix A π ∈ { , } ( | V O | + | V M | ) × ( | V O | + | V M | ) for u, v ∈{ , · · · , | V O | + | V M |} (refer to Fig. 2 for an illustration ofthe sequence). Due to the fact that the graphs are discreteobjects, the graph generation process involves discrete de-cisions that are not differentiable and therefore problematicfor back propagation. Thus, instead of directly learning thedistribution p ( G ) , we sample π from the set of ( | V O | + | V M | )! node permutations to generate the sequences S π and learnthe distribution p ( S π ) .Next, we explain how to characterize the probability p ( S π ) . Due to the sequential nature of S π , the probability p ( S π ) can be decomposed into the product of conditionalprobability distributions over the elements as in the follow-ing: p ( S π ) = | V O | + | V M | (cid:89) i =2 p ( S πi | S π , · · · , S πi − ) . (4)For ease of presentation, we simplify p ( S πi | S π , · · · , S πi − ) as p ( S πi | S π
DeepNC method thatrecovers the missing part of the true network G T based onthe deep generative model. We first present the problemformulation built upon (1). Then, we describe the approachthat seamlessly accommodates both observable and missingparts of G T into the graph generation process using thetrained functions f trans and f out in Section 3.2.By modeling graphs as sequences and incorporating theinformation from the observed graph G O into the genera-tion process, we reformulate our optimization problem in(1) as finding a sequence ˆ S π that maximizes p ( S π | G O ; Θ) under a node ordering π as follows: ˆ S π = arg max S π p ( S π | G O ; Θ) , (7)where S π is given by (2) and Θ is the set of learnedparameters of both f trans and f out .We aim at finding ˆ S π by applying data imputation ofthe missing part (i.e., unknown entries) in the sequence S π ,where indices of missing nodes correspond to placeholders(e.g., M and M in Fig. 3). The unknown entries alsoinclude non-existent edges between nodes in G O . Let ˜S π = (˜ S π , · · · , ˜ S π | V O | + | V + M | ) denote the sequence after data imputation under a nodeordering π , which contains both observable edges takendirectly from S π , corresponding to the set E O , and pos-sible instances of all missing entries. Then, we imputeeach missing entry in S π with either 0 or 1, therebyyielding ( | VO | + | VM | )( | VO | + | VM |− −| E O | possible outcomes of ˜S π , where data imputation for non-existent edges betweennodes in G O (i.e., orange entries in S π of Fig. 3) can bethought of as link prediction since structural neighborhoodinformation regarding observable nodes is available. Foreach outcome, we use trained f trans and f out to obtain thecorresponding ϕ i for i ∈ { , · · · , | V O | + | V M |} . Since eachentry of ϕ i represents the likelihood of edge existence, theconditional probability p ( S π | G O ; Θ) in (7) can be computedas p ( S π | G O ; Θ) = p ( ˜S π ; Θ)= | V O | + | V M | (cid:89) i =2 p ( ˜S πi ; ϕ i )= | V O | + | V M | (cid:89) i =2 (cid:89) ˜ s πi,j =1 ϕ i,j (cid:89) ˜ s πi,j =0 (1 − ϕ i,j ) , (8)where ˜ s πi,j denotes the j -th element of the binary vector ˜S πi for i ∈ { , · · · , | V O | + | V M |} and j ∈ { , · · · , i − } ; Fig. 3: An example illustrating the schematic overview of our
DeepNC method, where three nodes (i.e., A, B, and C) andtwo edges with solid lines are observable instead of the true graph G T consisting of five nodes and all associated edges.Both white and orange entries in S π are imputed with either 0 or 1 while grey entries in S π remain unchanged.and ϕ i,j ∈ (0 , is the j -th element of ϕ i . An examplevisualizing our DeepNC method is presented in Fig. 3,where we observe a network G O consisting of three nodes(i.e., A, B, and C) and two edges, instead of the true network G T with 5 nodes (i.e., A, B, C, M , and M ).To solve (7), we need to compute p ( S π | G O ; Θ) via exhaustive search over ( | V O | + | V M | )! node per-mutations. Since computing p (˜ S π ; Θ) in (8) requires ( | V O | + | V M | ) multiplication operations and data impu-tation yields ( | VO | + | VM | )( | VO | + | VM |− −| E O | possible out-comes of ˜S π , its computational complexity is boundedby O (( | V O | + | V M | ) ( | VO | + | VM | )( | VO | + | VM |− −| E O | ( | V O | + | V M | )!) . This motivates us to introduce a low-complexityalgorithm for efficiently solving such a problem in the nextsection. EEP
NC A
LGORITHMS
In this section, we introduce two algorithms that we designto efficiently solve the network completion problem in (7).In designing such algorithms, we focus on how to computethe likelihood of edge existence in the form of a tuple (ˆ π, Φ) ,where ˆ π represents a node ordering to be inferred and Φ = { ϕ , · · · , ϕ | V O | + | V M | } . Then, ˆS π in (7) can be acquiredby sampling from (ˆ π, Φ) . First, we present DeepNC-L , a low-complexity deep network completion algorithm, workingbased on the assumption that a partially observable graph G O is a complete subgraph with no missing edges. Second,we present an enhanced version of DeepNC-L using the EMalgorithm [30], dubbed
DeepNC-EM , to deal with the casewhere edges are missing in G O . The overall architectureof both DeepNC algorithms is illustrated in Fig. 4, wherenotations and detailed descriptions are shown later. We alsoanalyze their computational complexities.
We propose
DeepNC-L that approximates the optimal solu-tion to (7) under the assumption that there are no missing Fig. 4: The overall architecture of
DeepNC algorithms.edges in G O , which implies that the non-existent edges be-tween nodes in G O are regarded as observable entries in S π .Since Φ indicates the set of edge existence probabilities andis thus obtained from the set of learned model parameters Θ for each π , (7) can be simplified to the problem of findinga node ordering ˆ π such that ˆ π = arg max π p ( ˜S π ; Θ) , (9)where ˜S π is the sequence after data imputation under agiven π .To efficiently solve (9), we present two judicious approx-imation methods in the following. First, we design a greedy strategy that selects a single node at each inference (genera-tion) step. More precisely, instead of exhaustively search-ing for the node ordering maximizing p ( ˜S π ; Θ) among ( | V O | + | V M | )! possible permutations, we aim to consecutively find a single node ˆ v ∈ V ( i ) such that ˆ v = arg max v ∈ V ( i ) p ( ˜S πi ; ϕ i ) subject to π ( v ) = i (10)for each step i ∈ { , · · · , | V O | + | V M |} , where V ( i ) is a set of Fig. 5: An illustration of the mechanism of
DeepNC-L . The first three steps are shown as an example.nodes that have not been generated until the i -th inferencestep and ˆ v is removed from V ( i ) after each inference step(that is, V ( i +1) ← V ( i ) \{ ˆ v } ) (refer to Fig. 4 for the node re-moval). We note that the first node can be arbitrarily chosenin the generation process. Second, we further approximatethe solution to (10) by treating all unknown entries (i.e.,missing data) in ˜ S πi equally during the computation whileretrieving ˆ v from the set V ( i ) , rather than computing thelikelihoods in (10) along with all entries in ˜ S πi . Let us define two types of nodes as observable nodes and missing nodes.Then, we select a node of either type at random in proportionto the number of nodes belonging to each type in V ( i ) to ensurethat there is no bias in the node selection. When the selectednode type is “missing”, we choose ˆ v at random from allmissing nodes in V ( i ) without any computation since allmissing nodes are treated equally. In contrast, when theselected node type is “observable”, we choose an observablenode based solely on the computation for the observableentries in S πi by reformulating our problem as follows: ˆ v = arg max v ∈ V O ∩ V ( i ) p ( O πi ; ϕ i ) subject to π ( v ) = i, (11)for each step i ∈ { , · · · , | V O | + | V M |} , where O πi de-notes the set of observable entries in S πi ; p ( O πi ; ϕ i ) = (cid:81) s πi,j =1 ϕ i,j (cid:81) s πi,j =0 (1 − ϕ i,j ) from (8), and V O ∩ V ( i ) in-dicates the set of remaining observable nodes after i − inference steps. Note that p ( O πi ; ϕ i ) is non-computable ifthere is no observable entry in S πi .Now, we are ready to show a stepwise description of the DeepNC-L algorithm.
1. Initialization : For i = 1 , we set V (1) to V O ∪ V M andrandomly choose a node in V (1) to be ˆ v .
2. Node selection : For i ∈ { , · · · , | V O | + | V M |} , we find ˆ v by either randomly selecting a missing node in V ( i ) orsolving (11), depending on which node type is selected.
3. Data imputation : After finding ˆ v , we apply a data im-putation strategy of the missing part (i.e., unknown entries)in S πi through the inference process of GraphRNN-S. To bespecific, suppose that π ( u ) = i and π ( v ) = j , which meansthat the i -th and j -th nodes in a given node ordering π are u and v , respectively. Then, we have ˜ s πij = (cid:40) Bernoulli ( ϕ i [ j ]) , if u / ∈ V O or v / ∈ V O s πi,j , otherwise , (12)where the Bernoulli trial with the probability ϕ i [ j ] maps thevalue of the unknown entry to 1 if the outcome “success”occurs and to 0 otherwise.
4. Repetition : We iterate the second and third steps | V O | + | V M | − times until the recovered graph is fullygenerated.For a more intuitive understanding, consider the follow-ing example. Example 1 : As illustrated in Fig. 5, let us describe threesteps to select the first three nodes of a given graph accord-ing to the aforementioned procedure. We start by randomlyassigning the first node of the inference process to nodeM (i.e., π ( M ) = 1 and V (2) ← V (1) \{ M } ). Since wedo not have any information about the connections for theunseen node M , s π , is unknown for all nodes v ∈ V (2) .Suppose that we generate an observable node at this stepby random selection. Since there is no observable entry in S πi , we randomly choose node A among the three nodes in V O ∩ V (2) as the second node and set π ( A ) = 2 , resultingin V (3) ← V (2) \ { A } . Assuming that ϕ = [0 . and aBernoulli trial with the probability ϕ returns , we impute ˜ s π , with according to (12). Let us turn to the next stepin order to select the third node. In this case, since nodesB and C belong to the type of observable nodes, ˜ s π , takesthe value of either 1 or 0, depending on the connections tonode A. Suppose that we again generate an observable nodeat this step and ϕ = [0 . , . . When either π ( B ) = 3 or π ( C ) = 3 , the likelihood p ( O π ; ϕ ) can be computed as: • If π ( B ) = 3 , then it follows that p ( O π ; ϕ ) = ϕ , =0 . using (8). • If π ( C ) = 3 , then it follows that p ( O π ; ϕ ) = 1 − ϕ , = 1 − . . in a similar manner.Based on the above results, setting π ( C ) to 3 leads to themaximum value of p ( O π ; ϕ ) , which is thus the solution tothe problem in (11) for i = 3 . As depicted in Fig. 5, nodeC is chosen in this step. By assuming that a Bernoulli trial with the probability ϕ , = 0 . returns , we finally have ˜S π = [1 , . From now on, we turn to examining how to efficiently com-pute the likelihoods in (11) through a complexity reductiontechnique. We start by making a helpful observation asillustrated in Fig. 6. Suppose that nodes M , A, B, and E fromthe original graph with 8 observable nodes and 3 missingnodes have already been generated sequentially after fourinference steps, as depicted in Fig. 6. Then, one can see that O π = 0 when node D, G, or H is selected in the fifth step(i.e., π ( D ) = 5 , π ( G ) = 5 , or π ( H ) = 5 ) since each of thethree nodes has no connection to the nodes A, B, and E thathave already been generated. Consequently, the likelihood p ( O π ; ϕ ) is identical for these three cases. We generalizethis observation in the following lemma. Lemma 1.
Let L ( i ) denote the set of not yet selected directneighbors of observable nodes generated for i − inference steps,expressed as L ( i ) = (cid:40) ( L ( i − ∪ N (ˆ v )) ∩ V ( i ) , if ˆ v ∈ V O L ( i − ∩ V ( i ) , otherwise , (13) where i ∈ { , · · · , | V O | + | V M |} , L (1) = ∅ , ˆ v is the selected nodein the ( i − -th step, and N (ˆ v ) is the set of (direct) neighborsof ˆ v . Then, the likelihood p ( O πi ; ϕ i ) in (11) is the same for all u / ∈ L ( i ) , where u ∈ V O and π ( u ) = i .Proof. For the observable node u that does not belong to theset L ( i ) and is not generated for i − inference steps, allobservable entries in S πi (i.e., entries in O πi ) take the valueof 0’s since there is no associated edge. Thus, it follows that p ( O πi ; ϕ i ) = (cid:81) s πi,j =0 (1 − ϕ i,j ) , which is identical for all u / ∈ L ( i ) , where u ∈ V O and π ( u ) = i . This completes theproof of this lemma.Lemma 1 allows us to compute the likelihood p ( O πi ; ϕ i ) only once for all nonselected observable nodes u / ∈ L ( i ) when solving (11), which corresponds to the case wherenode D, G, or H is selected in the fifth step in Fig. 6 while L (5) = { C,F } , indicating the set of nonselected neighbors ofnodes A, B, and E.Next, we explain how to efficiently solve the problemin (11) without computing likelihoods p ( O πi ; ϕ i ) for observ-able nodes. From Fig. 6, one can see that s π , (correspondingto entries with diagonal lines in O π ) is the only term thatmakes the difference between two sets O π for the caseswhen node C is selected and when either node D, G, orH is selected, which implies that it may not be necessary tocompute the likelihoods of s π , and s π , for node selection.Thus, from the fact that most of the entries in O πi tend tobe 0’s in many real-world networks that are usually sparse,the computational complexity can be greatly reduced if wemake the comparison of likelihoods in (11) based only onthe entries in O πi that have a value of 1. To this end, weeliminate all the terms (1 − ϕ i,j ) corresponding to s πi,j = 0 Fig. 6: An example illustrating the fifth inference step of
DeepNC-L , where nodes M , A, B, and E have been gener-ated sequentially.from p ( O πi ; ϕ i ) when a node v ∈ V O ∩ V ( i ) is selected. Forcomputational convenience, we define D v = (cid:81) s πi,j =1 ϕ i,j (cid:81) s πi,j =0 (1 − ϕ i,j ) (cid:81) s πi,j ∈ O πi (1 − ϕ i,j )= (cid:89) s πi,j =1 ϕ i,j (1 − ϕ i,j ) (14)for v ∈ V O ∩ V ( i ) . Since the denominator in (14) is the samefor all v ∈ V O ∩ V ( i ) , it is obvious that ˆ v = arg max v D v is the solution to (11). We note that computing D v is lesscomputationally expensive than computing p ( O πi ; ϕ i ) whenthe number of entries with the value of 1’s in O πi is low.As a special case in which all observable entries in S πi takethe value of 0’s, the denominator in (14) is equivalent to p ( O πi ; ϕ i ) , from which it follows that D u = 1 due to thefact that a node u / ∈ L ( i ) is selected. Thus, if D v < forall v ∈ L ( i ) , then the likelihood in (11) for selecting a node u / ∈ L ( i ) is higher than that for selecting a node v ∈ L ( i ) .In this case, we randomly choose a node ˆ v / ∈ L ( i ) withoutfurther computation from Lemma 1. In consequence, wecompute D v only for nodes in the set L ( i ) , rather thancomputing D v for all nodes in V O ∩ V ( i ) . The followingexample describes how the computational complexity canbe reduced according to the aforementioned technique byrevisiting Fig. 6. Example 2 : Suppose that we generate an observablenode at the fifth inference step and ϕ = [0 . , . , . , . .In this step, one can see that L (5) = { C , F } ; thus, insteadof computing the likelihood p ( O π ; ϕ ) in (11) five timesfor all nonselected observable nodes C, D, F, G, and Hin V (5) , we only compute D C = ϕ , − ϕ , = . − . and Algorithm 1:
DeepNC-L
Input: G O , | V M | , f out , f trans Output: (ˆ π, Φ) Initialization : i ← h ← random initialization; ˜S π ← ∅ ; ˆ v ← v ∈ V O ∪ V M ; π (ˆ v ) ← L (1) ← ∅ ; Update L ( i ) according to (13); function DeepNC-L while i ≥ | V O | + | V M | do h i − ← f trans ( h i − , ˜S πi − ) ϕ i ← f out ( h i − ) Select a node type if the selected node type is “observable” then for v ∈ L ( i ) do Compute D v according to (14) if ( D v < for all v or L ( i ) = ∅ ) and L ( i ) (cid:54) = V O ∩ V ( i ) then Randomly select an observable node ˆ v / ∈ L ( i ) else ˆ v ← arg max v D v Update L ( i ) according to (13) else Randomly select an unobservable node ˆ v ˜S πi ← Impute S πi according to (12) π (ˆ v ) ← i + 1 i ← i + 1 return (ˆ π, Φ) D F = ϕ , − ϕ , = . − . from (14). Since both D C and D F are smaller than 1, we randomly choose one of the threeobservable nodes D, G, and H that are not in L (5) as ˆ v . We summarize the overall procedure of our
DeepNC-L algorithm shown in Algorithm 1. We initially select the firstnode at random, and then start the inference process byidentifying connections for the next node according to thefollowing four stages: Using two functions f trans and f out in (5) and (6),respectively, we obtain ϕ i (refer to lines 4–5). Let m denote the cardinality of the set of missingnodes that can be potentially generated in the i -th step. Wethen randomly select a node type so that the selected nodeis missing with probability of m | V O | + | V M |− i +1 (refer to line 6). If the type of observable nodes is selected, then wecompute D v , which is a function of ϕ i , according to (14) forall v ∈ L ( i ) . When D v < for all v ∈ L ( i ) or L ( i ) = ∅ ,we randomly select an observable node ˆ v / ∈ L ( i ) providedthat L ( i ) (cid:54) = V O ∩ V ( i ) . Otherwise, we select the node ˆ v thatmaximizes D v . Afterwards, we update L ( i ) by includingneighbors of the selected node ˆ v (refer to lines 7–14). If the type of missing nodes is selected, then we selectone node ˆ v randomly among all missing nodes that have notbeen generated until the i -th step. (refer to lines 15–16). The data imputation process takes place before thenext iteration of node generation. Finally, we update thenode ordering π by including the selected node ˆ v for the i -th step. The algorithm continues by repeating stages 1–4 andterminates when a fully inferred sequence S π is generated(refer to lines 17–20).We remark that a node ordering ˆ π is found given aset of edge existence probabilities Φ , which is inferred byour model parameters Θ while assuming that G O is acomplete subgraph; thus, the resulting tuple (ˆ π, Φ) maynot be accurate when there are missing edges in G O . Thismotivates us to develop the DeepNC-EM algorithm in thefollowing subsection.
In this subsection, we introduce
DeepNC-EM to furtherimprove the performance of
DeepNC-L by relaxing theassumption that there are no missing edges between twonodes in G O . A na¨ıve recovery of G O even with state-of-the-art link prediction methods before conducting networkcompletion may lead to suboptimal performance since thenetwork structures of G O are potentially distorted due to theeffect of missing nodes and missing incident edges. Thus,we aim to find the most likely configuration of three typesof missing edges in the set E M specified in Section 3.1.1by jointly estimating a tuple ( π, Φ) . To this end, we solve(7) by designing another DeepNC method using the EMalgorithm.We now describe the proposed
DeepNC-EM , which isbuilt upon the
DeepNC-L algorithm in Section 4.1. Let ( π (0) , Φ (0) ) and Z denote the initial output of DeepNC-L and the set of non-existent edges between nodes in G O , respectively. First, we estimate the potential existence likelihoods of edges in Z , denoted by Φ Z , by extracting | V O | − E O elements corresponding to Z from the likeli-hoods Φ (0) of all edges under the node ordering π (0) . Then,the E-step samples Z ( t ) from p ( Z ( t ) | Φ ( t ) Z ) via Bernoulli trialsto create multiple instances of G ( t ) O , where the supercript ( t ) denotes the EM iteration index. In the M-step, we adopt DeepNC-L to subsequently optimize the parameters Φ Z given the samples obtained in the E-step. The EM itera-tion alternates between performing the E-step and M-stepaccording to the following expressions, respectively: E-step: Z ( t ) ∼ p ( Z | Φ ( t ) Z ) , M-step: Φ ( t +1) Z = arg max Φ Z E [ p ( Z ( t ) | Φ Z )] . The overall procedure of
DeepNC-EM is summarizedin Algorithm 2. Here, Filter( π ( t ) [ i ] , Φ ( t ) [ i ] ) in lines 1 and10 is invoked to retrieve Φ ( t ) Z from Φ ( t ) , η > is anarbitrarily small threshold indicating a stopping criterionfor the algorithm, ∆ s denotes the number of samples in eachE-step, and [ i ] indicates the sample index. In this subsection, we analyze the computational complexi-ties of the
DeepNC-L and
DeepNC-EM algorithms.
We start by examining the complexity of each inference step i ∈ { , · · · , | V O | + | V M |} . It is not difficult to show thatthe case in which a node that is selected in the inferenceprocess is an observable node dominates the complexity.Note that it is possible to compute D v in constant time as Algorithm 2:
DeepNC-EM
Input: π (0) , Φ (0) , G O , | V M | , f out , f trans , ∆ s Output: (ˆ π, ˆΦ) Initialization : t ←
0; Φ (0) Z ← Filter ( π (0) , Φ (0) ); function DeepNC-EM do E-step: for i ∈ { , · · · , ∆ s } do Z ( t ) [ i ] ∼ p ( Z | Φ ( t ) Z ) G ( t ) O [ i ] ← add edges sampled from Z ( t ) [ i ] M-step: for i ∈ { , · · · , ∆ s } do ( π ( t +1) [ i ] , Φ ( t +1) [ i ]) ← DeepNC-L ( G ( t ) O [ i ] , | V M | , f out , f trans ) Φ ( t +1) Z [ i ] ← Filter ( π ( t +1) [ i ] , Φ ( t +1) [ i ]) Φ ( t +1) Z ← s (cid:80) i Φ ( t +1) Z [ i ] t ← t + 1 while (cid:13)(cid:13)(cid:13) Φ ( t ) Z − Φ ( t − Z (cid:13)(cid:13)(cid:13) < η ˆ Z ∼ p ( Z | Φ ( t +1) Z ) ˆ G O ← add edges from ˆ Z (ˆ π, ˆΦ) ← DeepNC-L ( ˆ G O , | V M | , f out , f trans ) return (ˆ π, ˆΦ) the average degree over a network is typically regarded asa constant [31]. Thus, the complexity of this step is boundedby O ( | L ( i ) | ) since we exhaustively compute D v over thenodes v ∈ L ( i ) . The data imputation process is computablein constant time when parallelization can be applied sincethe Bernoulli trials are independent of each other. As ouralgorithm is composed of | V O | + | V M |− inference steps, thetotal complexity is finally given by O (( | V O | + | V M | ) | L ( i ) | ) ,which can be rewritten as O ( | V O |·| L ( i ) | ) from to the fact that | V M | (cid:28) | V O | . The following theorem states a comprehensiveanalysis of the computational complexity. Theorem 1.
Lower and upper bounds on the computationalcomplexity of the proposed
DeepNC-L algorithm are given by Ω( | V O | ) and O ( | V O | ) , respectively.Proof. The parameter L ( i ) is the set of neighboring nodes tothe observable nodes that have already been generated inthe i -th step, while its cardinality depends on the networktopology. For the best case where all nodes are isolatedwith no neighbors, we always have | L ( i ) | = 0 for eachgeneration step; thus, each step is computable in constanttime, yielding the total complexity of Ω( | V O | ) . For the worstcase, corresponding to a fully-connected graph, it followsthat | L ( i ) | = | V O | + | V M | − i for each generation step, thusyielding the total complexity of O ( | V O | ) . This completesthe proof of this theorem.From Theorem 1, it is possible to establish the followingcorollary. Corollary 1.
The computational complexity of the
DeepNC-L algorithm scales as Θ( | V O | (cid:15) ) , where ≤ (cid:15) ≤ depends on agiven network topology such as sparsity of networks. We shall validate this assertion in Corollary 1 via em-pirical evaluation for various datasets in the next section byidentifying that (cid:15) is indeed small, which implies that thecomplexity of
DeepNC-L is almost linear in | V O | . We turn to examining the computational complexity ofeach EM step to finally analyze the overall complexity.In the E-step, we can parallelize both the Bernoulli tri-als for edge sampling and the operation adding samplededges to G ( t ) O [ i ] in lines 5 and 6, respectively. Consequently,the computational complexity of each E-step is given by O (∆ s ) , where ∆ s is the number of samples in each E-step. The M-step is dominated by DeepNC-L as the functionFilter ( · , · ) can also be executed in parallel since all oper-ations therein are performed independently of each other.Thus, the computational complexity of each M-step is givenby O (∆ s | V O | (cid:15) ) . When the number of EM iterations isgiven by k EM , determined by the threshold η , and there are atotal of ∆ s samples, the complexity of DeepNC-EM is finallygiven as Θ( k EM ∆ s | V O | (cid:15) ) based on Corollary 1. Since both k EM and ∆ s are regarded as constants as in [5], the totalcomputational complexity scales as Θ( | V O | (cid:15) ) . XPERIMENTAL E VALUATION
In this section, we first describe both synthetic and real-world datasets that we use in the evaluation. We alsopresent three state-of-the-art methods for network comple-tion as a comparison. After presenting a performance metricand our experimental settings, we intensively evaluate theperformance of our
DeepNC algorithms.
Two synthetic and three real-world datasets across variousdomains (e.g., social, citations, and biological networks)are used as a series of homogeneous networks (graphs),denoted by G I , and described in sequence. For all experi-ments, we treat graphs as undirected and only consider thelargest connected component without isolated nodes. Thestatistics of each dataset, including the number of similargraphs and the range of the number of nodes, is describedin Table 3. In the following, we summarize important char-acteristics of the datasets. Lancichinetti-Fortunato-Radicchi (LFR) [32]. We con-struct a synthetic graph generated using the LFR model inwhich the degree exponent of a power-law distribution, theaverage degree, the minimum community size, the commu-nity size exponent, and the mixing parameter are set to 3, 5,20, 1.5, and 0.1, respectively. Refer to the original paper [32]for a detailed description of these parameters.
Barabasi-Albert (B-A) [14]. We generate further syn-thetic graphs using the B-A model. The attachment parame-ter of the model is set in such a way that each newly addednode is connected to four existing nodes, unless otherwisestated.
Protein [8]. The protein structure is a biological network.Each protein is represented by a graph, in which nodesrepresent amino acids. Two nodes are connected if they areless than 6 Angstroms apart. TABLE 3: Statistics of 5 datasets, where NG and NN denotethe number of similar graphs and the range of the numberof nodes in each dataset, respectively, including traininggraphs G I and a test graph G T . Here, k denotes .Name NG NNLFR 500 1.6k–2kB-A 500 1.6k–2kProtein 918 100–500Ego-CiteSeer 737 50–399Ego-Facebook 10 52–1,034 Ego-CiteSeer [7]. This CiteSeer dataset is an online ci-tation network and is a frequently used benchmark. Nodesand edges represent publications and citations, respectively.
Ego-Facebook [9]. This Facebook dataset is a socialfriendship network extracted from Facebook. Nodes andedges represent people and friendship ties, respectively.
In this subsection, we present three state-of-the-art networkcompletion approaches for comparison.
KronEM [5]. This approach aims to infer the missing partof a true network based solely on the connectivity patternsin the observed part via a generative graph model based onKronecker graphs, where the parameters are estimated viaan EM algorithm.
EvoGraph [26]. To solve the network completion prob-lem, EvoGraph infers the missing nodes and edges in sucha way that the topological properties of the observable net-work are preserved via an efficient preferential attachmentmechanism.
A variant of GraphRNN-S . As a na¨ıve approach for net-work completion using deep generative models of graphs,we modify the inference process of the original GraphRNN-S [10] so that it can be used as a network completion methodas follows. Under a random ordering of observable nodes,we first obtain the sequence { S π , · · · , S π | V O | } along with theobservable entries from G O . Then, by invoking the inferenceprocess of GraphRNN-S, we generate | V M | missing nodesusing trained f trans and f out based on { S π , · · · , S π | V O | } . Thisvariant of GraphRNN-S for network completion is termedvGraphRNN in our study. To assess the performance of our proposed method andother competing approaches, we need to quantify the degreeof agreement between the recovered graph and the originalone. To this end, we adopt the GED as a well-knownperformance metric.
Definition 2. Graph edit distance (GED) [12]. Given a set ofgraph edit operations, the GED between a recovered graph ˆ G andthe true graph G is defined as GED ( ˆ
G, G ) = min ( e ,...,e k ) ∈P ( ˆ G,G ) k (cid:88) i =1 c ( e i ) , (15) where P ( ˆ G, G ) denotes the set of edit paths transforming ˆ G intoa graph isomorphic to G and c ( e ) ≥ is the cost of each graphedit operation e . Note that only four operations are allowed in our setup,including vertex substitution, edge insertion, edge deletion,and edge substitution, and c ( e ) is identically set to one forall operations. Since the problem of computing the GEDis NP-complete [33], we adopt an efficient approximationalgorithm proposed in [34]. In our experiments, GED isnormalized on the average size of the two graphs. We first describe the settings of the neural networks. In ourexperiments, the function f trans is implemented by using 4layers of GRU cells with a 128 dimensional hidden state;and the function f out is implemented by using a two-layerperceptron with a 64 dimensional hidden state and a sig-moid activation function. The Adam optimizer [35] is usedfor minibatch training with a learning rate of 0.001, whereeach minibatch contains 32 graph sequences. We train themodel for 32,000 batches in all experiments.To test the performance of our method, we randomlyselect one graph from each dataset to act as the underlyingtrue network G T . From each dataset, we select all remainingsimilar graphs as training data G I unless otherwise stated.To create a partially observable network from the truenetwork G T , we adopt the following two graph samplingstrategies from [36]. The first strategy, called random node (RN) sampling, selects nodes uniformly at random to createa sample graph. The second strategy, forest fire (FF) sam-pling, starts by picking a seed node uniformly at randomand adding it to a sample graph (referred to as burning).Then, FF sampling burns a fraction of the outgoing linkswith nodes attached to them. This process is repeated recur-sively for each neighbor that is burned until no new nodeis selected to be burned. Afterwards, we sample uniformlyat random a portion of edges from the complete subgraphsampled from G T to finally acquire G O . In our experiments,the partially observable network G O is constructed by 90%of edges in a complete subgraph consisting of 70% ofnodes sampled from G T unless otherwise specified. Eachexperimental result is the average over 10 executions. In this subsection, our empirical study is designed to answerthe following five key research questions. • Q1 . How much does the performance of DeepNC-EM improve with respect to the number of EM iterations? • Q2 . How much do the DeepNC algorithms improvethe accuracy of network completion over the state-of-the-art approaches? • Q3 . How beneficial are the DeepNC algorithms inmore difficult situations where either a large numberof nodes and edges are missing or the training dataare also incomplete? • Q4 . How robust is DeepNC-EM to the portion ofmissing edges in G O in comparison with the otherstate-of-the-art approaches? • Q5 . How scalable are DeepNC algorithms with thesize of the graph?To answer these questions, we carry out six comprehen-sive experiments as follows. Number of EM iterations G E D RN samplingFF sampling (a) The LFR dataset
Number of EM iterations G E D RN samplingFF sampling (b) The B-A dataset
Fig. 7: GED of
DeepNC-EM over the number of EM itera-tions. Here, the performance of
DeepNC-L corresponds tothe case where the number of EM iterations is zero.
In Fig. 7, we show the performance of the proposed
DeepNC-EM algorithm in Section 4.2 with respect to GEDaccording to the number of EM iterations using two syn-thetic datasets, i.e., the LFR and B-A models. From Fig. 7,we discuss our findings as follows: • For both RN and FF sampling strategies, the GED of
DeepNC-EM decreases as the number of EM itera-tions increases. • The number of EM iterations required to achieve asufficiently low GED value is relatively small com-pared to the network size. This can be seen from theLFR dataset, where the performance improvement ismarginal after four iterations. • We observe that
DeepNC-EM exhibits less fluctua-tions over EM iterations when the LFR dataset isused. This might be caused by the fact that graphsgenerated using the LFR model are denser thanthose using the B-A model under our setting, whichenables the algorithm to be more likely to correctlyrecover the edges connecting two nodes in the set V O .In the subsequent experiments, the number of EM iterationsis set to 6. The performance comparison between two
DeepNC al-gorithms and three state-of-the-art network completionmethods, including vGraphRNN, KronEM [5], and Evo-Graph [26], with respect to GED is presented in Table 4 forall five datasets. We note that
DeepNC-EM , DeepNC-L , andvGraphRNN use structurally similar graphs as training data G I ; meanwhile, both KronEM and EvoGraph operate basedsolely on the partially observable graph G O without anytraining phase. We observe the following: • The improvement rates of
DeepNC-EM overvGraphRNN, KronEM, and EvoGraph are up to40.16%, 54.55%, and 68.25%, respectively. These max-imum gains are achieved for the Ego-CiteSeer andB-A datasets. • The
DeepNC-L and
DeepNC-EM algorithms are in-sensitive to sampling strategies for creating a par-tially observable network, whereas the performance of EvoGraph depends on the sampling strategy.Specifically, sampling via FF results in better perfor-mance than that via RN sampling when EvoGraphis used due to the fact that the FF sampling strategytends to preserve the network properties such as thedegree distribution [36]. In reality, if the samplingstrategy is unknown and one only acquires randomlysampled data, then graph upscaling methods suchas EvoGraph would certainly perform poorly. Thisresult displays the robustness of our
DeepNC algo-rithms to graph samplings. • Even with deletions of only 10% of edges, the addi-tional gain of
DeepNC-EM over
DeepNC-L is still sig-nificant for all datasets. The maximum improvementrate of 13.58% is achieved on the Protein dataset. • Let us compare the performance of KronEM andEvoGraph. In most cases, KronEM performs betterthan EvoGraph. However, KronEM is inferior toEvoGraph in the case where the degree distributionof a network does not strictly follow the pure power-law degree distribution. EvoGraph consistently out-performs KronEM in the Protein dataset. • The standard deviation of GED is relatively highwhen vGraphRNN is employed (e.g., 0.2514 for theEgo-CiteSeer dataset), which demonstrates that arandom node ordering of observable nodes for net-work completion does not guarantee a stable solu-tion.Consequently,
DeepNC-EM consistently outperforms allstate-of-the-art methods for all synthetic and real-worlddatasets, which reveals the robustness of our method to-ward diverse network topologies.
Our
DeepNC algorithms are compared to the three state-of-the-art network completion methods in more difficultsettings that often occur in real environments: 1) the casein which a large portion of nodes are missing and 2) thecase in which training graphs are also partially observed.In these experiments, we only show the results for the RNsampling strategy since the results from FF sampling followsimilar trends.First, we create a partially observable network G O con-sisting of only 30% of nodes from the underlying true graph G T via sampling. The performance comparison between DeepNC algorithms and three state-of-the-art methods withrespect to GED is presented in Table 5 for all five datasets.As shown in Tables 4 and 5, a large number of missing nodesand edges result in significant performance degradationfor KronEM and EvoGraph, while
DeepNC-EM , DeepNC-L ,and vGraphRNN are more robust as the latter three methodstake advantage of the topological information from similargraphs (i.e., training data) to infer the missing part.Next, we perform RN sampling so that only a part ofnodes in the training graphs is observable. In Fig. 8, we com-pare the GED of the two
DeepNC algorithms and the threestate-of-the-art methods, where the degree of observabilityin training graphs is set to { , } % in our algorithms. Wefind that DeepNC algorithms still outperform the state-of-the-art methods on all datasets with the exception of the TABLE 4: Performance comparison in terms of GED (average ± standard deviation). Here, the best method for each datasetis highlighted using bold fonts. (cid:88)(cid:88)(cid:88)(cid:88)(cid:88)(cid:88)(cid:88)(cid:88)(cid:88)(cid:88) Dataset Method
DeepNC-EM ( X ) DeepNC-L vGraphRNN ( Y ) KronEM ( Y ) EvoGraph ( Y ) Gain (%) Y − XY × Y − XY × Y − XY × LFR (RN) ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± TABLE 5: Performance comparison in terms of GED when 70% of nodes are missing (average ± standard deviation). Here,the best method for each dataset is highlighted using bold fonts. (cid:88)(cid:88)(cid:88)(cid:88)(cid:88)(cid:88)(cid:88)(cid:88)(cid:88)(cid:88) Dataset Method
DeepNC-EM ( X ) DeepNC-L vGraphRNN ( Y ) KronEM ( Y ) EvoGraph ( Y ) Gain (%) Y − XY × Y − XY × Y − XY × LFR ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± Ego-Facebook dataset where the performance of
DeepNC-L is slightly inferior to that of KronEM when 90% of nodes intraining graphs are observable. G O (Q4) We evaluate the GED performance in the second fringe sce-nario, in which a partially observable network G O is createdby deleting a large portion of edges uniformly at randomfrom a complete subgraph that consists of 70% of nodessampled from G T . In Fig. 9, the performance of the DeepNC algorithms is compared to the state-of-the-art network com-pletion methods using two synthetic datasets, where thefraction of missing edges is set to { , , } % . Our mainfindings are: 1) DeepNC-L outperforms the three state-of-the-art methods for all the cases; 2) the gain of
DeepNC-EM over
DeepNC-L is higher when the LFR dataset is used sincemissing edges are inferred more accurately; and 3) both
DeepNC algorithms exhibit less performance degradation asthe number of missing edges increases, which demonstratesthe robustness of our method for various degrees of edgeobservability.From Tables 4–5 and Figs. 8–9, it is worth noting thatthe proposed
DeepNC-EM algorithm outperforms all state-of-the-art methods for all types of datasets under variousfringe scenarios and experimental settings.
Finally, we empirically show the average runtime complex-ity via experiments using the three sets of B-A syntheticgraphs as it is convenient to scale up the graphs whilepreserving the same structural properties, where the num-ber of connections from each new node to existing nodes,denoted by c , is set to 2, 4, and 8. In these experiments,we focus on evaluating the complexity of DeepNC-EM since EM iterations take constant time by executing
DeepNC-L foreach iteration. In each set of graphs, the number of nodes, | V O | + | V M | , varies from 200 to 2,000 in an increment of 200;and 30% of nodes and their associated edges are deletedby RN sampling to create partially observable networks.Other parameter settings follow those in Section 5.4. InFig. 10, we illustrate the log-log plot of the execution timein seconds versus | V O | , where each point represents theaverage complexity over 10 executions of DeepNC-EM . Inthe figure, dotted lines are also shown from the analyticalresult with a proper bias, showing a tendency that slopesof the lines for c ∈ { , , } are approximately given by1.16, 1.26, and 1.41, respectively. This indicates that thecomputational complexity of DeepNC-EM is dependent onthe average degree in a given graph. Moreover, it is assertedthat an almost linear complexity in | V O | , i.e., Θ( | V O | (cid:15) ) fora small (cid:15) > , is attainable since the slopes are at most 1.41even for the relatively denser graph corresponding to c = 8 . ONCLUDING R EMARKS
In this paper, we introduced a novel method, termed
DeepNC , that infers both missing nodes and edges of anunderlying true network via deep learning. Specifically,we presented an approach to first learning a likelihoodover edges via an RNN-based generative graph model byusing structurally similar graphs as training data and theninferring the missing parts of the network by applyingan imputation strategy for the missing data. Furthermore,we proposed two
DeepNC algorithms whose runtime com-plexities are almost linear in | V O | . Using various syntheticand real-world datasets, we demonstrated that our DeepNC algorithms not only remarkably outperform vGraphRNN,KronEM, and EvoGraph methods but also are robust tomany difficult and challenging situations that often occurin real environments such as 1) a significant portion of LFR B-A Protein Ego-CiteSeer Ego-Facebook00.20.40.60.81 G E D DeepNC-EMDeepNC-LvGraphRNNKronEMEvoGraph (a) 95% observability
LFR B-A Protein Ego-CiteSeer Ego-Facebook00.20.40.60.81 G E D DeepNC-EMDeepNC-LvGraphRNNKronEMEvoGraph (b) 90% observability
Fig. 8: Performance comparison in terms of GED (the lowerthe better), where the degree of observability in traininggraphs is set to { , } %.unobservable nodes, 2) training graphs that are only par-tially observable, or 3) a large portion of missing edgesbetween nodes in the observed network. Additionally, weanalytically and empirically showed the scalability of our DeepNC algorithms.Potential avenues of future research include the designof a unified framework for improving the performance ofvarious downstream mining and learning tasks such asmulti-label node classification, community detection, andinfluence maximization when
DeepNC is adopted in par-tially observable networks. This would be challenging sincetask-specific preprocessing should be accompanied by net-work completion to guarantee satisfactory performance ofeach individual task. A CKNOWLEDGMENTS
This research was supported by the Yonsei University Re-search Fund of 2020 (2020-22-0101). R EFERENCES [1] G. Kossinets, “Effects of missing data in social networks,”
Soc.Netw. , vol. 28, no. 3, pp. 247–268, Jul. 2006.[2] A. Acquisti, L. Brandimarte, and G. Loewenstein, “Privacy andhuman behavior in the age of information,”
Science , vol. 347, no.6221, pp. 509–514, Jan. 2015.[3] R. Dey, Z. Jelveh, and K. Ross, “Facebook users have become muchmore private: A large-scale study,” in
Proc. IEEE Int. Conf. PervasiveComput. Commun. Worksh. , Lugano, Switzerland, Mar. 2012, pp.346–352.
10% 15% 20%00.10.20.30.40.50.6 G E D DeepNC-EMDeepNC-LvGraphRNNKronEMEvoGraph (a) The LFR dataset
10% 15% 20%00.10.20.30.40.50.60.7 G E D DeepNC-EMDeepNC-LvGraphRNNKronEMEvoGraph (b) The B-A dataset
Fig. 9: Performance comparison in terms of GED (the lowerthe better), where the degree of missingness in edges be-tween nodes in G O is set to { , , } % . | V O | E x ec u t i o n t i m e ( s ) c = 2 c = 4 c = 8 | V O | . with proper bias | V O | . with proper bias | V O | . with proper bias Fig. 10: The computational complexity of
DeepNC-EM ,where the log-log plot of the execution time versus | V O | isshown. [4] J. H. Koskinen, G. L. Robins, P. Wang, and P. E. Pattison, “Bayesiananalysis for partially observed network data, missing ties, at-tributes and actors,” Soc. Netw. , vol. 35, no. 4, pp. 514–527, Oct.2013.[5] M. Kim and J. Leskovec, “The network completion problem:Inferring missing nodes and edges in networks,” in
Proc. 2011SIAM Int. Conf. Data Mining (SDM ’11) , Mesa, AZ, USA, Apr. 2011,pp. 47–58.[6] C. Tran, W.-Y. Shin, and A. Spitz, “Community detection in par-tially observable social networks,” arXiv preprint arXiv:1801.00132 ,2017.[7] P. Sen, G. Namata, M. Bilgic, L. Getoor, B. Galligher, and T. Eliassi-Rad, “Collective classification in network data,”
AI Magazine ,vol. 29, no. 3, pp. 93–106, 2008.[8] P. D. Dobson and A. J. Doig, “Distinguishing enzyme structuresfrom non-enzymes without alignments,”
J. Molecular Bio. , vol. 330,no. 4, pp. 771–783, Jul. 2003.[9] A. L. Traud, E. D. Kelsic, P. J. Mucha, and M. A. Porter, “Com-paring community structure to characteristics in online collegiate social networks,” SIAM Rev. , vol. 53, no. 3, pp. 526–543, Aug. 2011.[10] J. You, R. Ying, X. Ren, W. Hamilton, and J. Leskovec, “GraphRNN:Generating realistic graphs with deep auto-regressive models,” in
Proc. Int. Conf. Machine Learning (ICML ’18) , Stockholm, Sweden,Jul. 2018, pp. 5694–5703.[11] A. Bojchevski, O. Shchur, D. Z ¨ugner, and S. G ¨unnemann, “Net-GAN: Generating graphs via random walks,” in
Proc. Int. Conf.Machine Learning (ICML ’18) , Stockholm, Sweden, Jul. 2018, pp.609–618.[12] A. Sanfeliu and K.-S. Fu, “A distance measure between attributedrelational graphs for pattern recognition,”
IEEE Trans. Syst. ManCybernetics , vol. SMC-13, no. 3, pp. 353–362, Jun. 1983.[13] P. Erdos and A. R´enyi, “On random graphs I,”
Publ. Math. Debre-cen , vol. 6, pp. 290–297, 1959.[14] A.-L. Barab´asi and R. Albert, “Emergence of scaling in randomnetworks,”
Science , vol. 286, no. 5439, pp. 509–512, Oct. 1999.[15] J. Leskovec, D. Chakrabarti, J. Kleinberg, C. Faloutsos, andZ. Ghahramani, “Kronecker graphs: An approach to modelingnetworks,”
J. Mach. Learning Res. , vol. 11, pp. 985–1042, Feb. 2010.[16] R. Liao, Y. Li, Y. Song, S. Wang, W. Hamilton, D. K. Duvenaud,R. Urtasun, and R. Zemel, “Efficient graph generation with graphrecurrent attention networks,” in
Proc. Advances Neural Inf. Process-ing Syst. (NIPS ’19) , Vancouver, Canada, Dec. 2019, pp. 4257–4267.[17] M. Simonovsky and N. Komodakis, “GraphVAE: Towards gener-ation of small graphs using variational autoencoders,” in
Proc. Int.Conf. Artificial Neural Netw. Machine Learning (ICANN ’18) , Rhodes,Greece, Oct. 2018, pp. 412–422.[18] T. N. Kipf and M. Welling, “Variational graph auto-encoders,” in
NIPS Worksh. Bayesian Deep Learning , Montral, Canada, Dec. 2018.[19] J. You, B. Liu, Z. Ying, V. Pande, and J. Leskovec, “Graphconvolutional policy network for goal-directed molecular graphgeneration,” in
Proc. Advances Neural Inf. Processing Syst. (NIPS’18) , Montral, Canada, Dec. 2018, pp. 6410–6421.[20] D. Zhou, L. Zheng, J. Xu, and J. He, “Misc-GAN: A multi-scalegenerative model for graphs,”
Fronti. Big Data , vol. 2, pp. 3:1–3:10,Apr. 2019.[21] Y. Li, O. Vinyals, C. Dyer, R. Pascanu, and P. Battaglia,“Learning deep generative models of graphs,” arXiv preprintarXiv:1803.03324 , 2018.[22] L. L ¨u and T. Zhou, “Link prediction in complex networks: Asurvey,”
Phys. A: Stat. Mech. Appl. , vol. 390, no. 6, pp. 1150–1170,Mar. 2011.[23] M. Zhang and Y. Chen, “Link prediction based on graph neuralnetworks,” in
Proc. Advances Neural Inf. Processing Syst. (NIPS ’18) ,Montreal, Canada, Dec. 2018, pp. 5165–5175.[24] R. Eyal, A. Rosenfeld, S. Sina, and S. Kraus, “Predicting andidentifying missing node information in social networks,”
ACMTrans. Knowl. Disc. Data , vol. 8, no. 3, pp. 14:1–14:35, Jun. 2014.[25] S. Sina, A. Rosenfeld, and S. Kraus, “Solving the missing nodeproblem using structure and attribute information,” in
Proc.2013 IEEE/ACM Int. Conf. Advances Social Netw. Analysis Mining(ASONAM ’13) , Niagara Falls, Canada, Aug. 2013, pp. 744–751.[26] H. Park and M.-S. Kim, “EvoGraph: An effective and efficientgraph upscaling method for preserving graph properties,” in
Proc.24th ACM SIGKDD Int. Conf. Knowl. Disc. Data Mining (KDD ’18) ,London, United Kingdom, Aug. 2018, pp. 2051–2059.[27] T. H. McCormick, M. J. Salganik, and T. Zheng, “How manypeople you know?: Efficiently esimating personal network size,”
J. Am. Stat. Assoc. , vol. 105, no. 489, pp. 59–70, Sep. 2010.[28] J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, “Empirical evalua-tion of gated recurrent neural networks on sequence modeling,” in
Proc. Deep Learning and Representation Learning Worksh. , Montreal,Canada, Dec. 2014.[29] S. Hochreiter and J. Schmidhuber, “Long short-term memory,”
Neural Comput. , vol. 9, no. 8, pp. 1735–1780, Nov. 1997.[30] A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likeli-hood from incomplete data via the em algorithm,”
J. Royal Stat.Soc. Series B (Methodological) , vol. 39, no. 1, pp. 1–22, 1977.[31] M. E. Newman, “Random graphs as models of networks,”
Proc.National Acad. Sci. , vol. 99, no. 1, pp. 2566–2572, 2002.[32] A. Lancichinetti and S. Fortunato, “Benchmarks for testing com-munity detection algorithms on directed and weighted graphswith overlapping communities,”
Phys. Rev. E , vol. 80, no. 1, pp.016 118:1–016 118:8, Apr. 2009.[33] Z. Zeng, A. K. H. Tung, J. Wang, J. Feng, and L. Zhou, “Comparingstars: On approximating graph edit distance,”
Proc. VLDB Endow. ,vol. 2, no. 1, pp. 25–36, Aug. 2009. [34] A. Fischer, K. Riesen, and H. Bunke, “Improved quadratic timeapproximation of graph edit distance by combining Hausdorffmatching and greedy assignment,”
Pattern Recogn. Lett. , vol. 87,pp. 55–62, Feb. 2017.[35] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimiza-tion,” in
Proc. Int. Conf. Learning Rep. (ICLR ’15) , San Diego, CA,May 2015.[36] J. Leskovec and C. Faloutsos, “Sampling from large graphs,” in