[PDF] DeepNC: Deep Generative Network Completion

Abstract

Most network data are collected from partially observable networks with both missing nodes and missing edges, for example, due to limited resources and privacy settings specified by users on social media. Thus, it stands to reason that inferring the missing parts of the networks by performing network completion should precede downstream applications. However, despite this need, the recovery of missing nodes and edges in such incomplete networks is an insufficiently explored problem due to the modeling difficulty, which is much more challenging than link prediction that only infers missing edges. In this paper, we present DeepNC, a novel method for inferring the missing parts of a network based on a deep generative model of graphs. Specifically, our method first learns a likelihood over edges via an autoregressive generative model, and then identifies the graph that maximizes the learned likelihood conditioned on the observable graph topology. Moreover, we propose a computationally efficient DeepNC algorithm that consecutively finds individual nodes that maximize the probability in each node generation step, as well as an enhanced version using the expectation-maximization algorithm. The runtime complexities of both algorithms are shown to be almost linear in the number of nodes in the network. We empirically demonstrate the superiority of DeepNC over state-of-the-art network completion approaches.

Full PDF

11 DeepNC: Deep Generative Network Completion

Cong Tran,

Student Member, IEEE , Won-Yong Shin,

Senior Member, IEEE , Andreas Spitz,and Michael Gertz

Abstract —Most network data are collected from only partially observable networks with both missing nodes and edges, for example,due to limited resources and privacy settings speciﬁed by users on social media. Thus, it stands to reason that inferring the missingparts of the networks by performing network completion should precede downstream mining or learning tasks on the networks.However, despite this need, the recovery of missing nodes and edges in such incomplete networks is an insufﬁciently exploredproblem. In this paper, we present DeepNC, a novel method for inferring the missing parts of a network that is based on a deepgenerative model of graphs. Speciﬁcally, our method ﬁrst learns a likelihood over edges via an autoregressive generative model, andthen identiﬁes the graph that maximizes the learned likelihood conditioned on the observable graph topology. Moreover, we propose acomputationally efﬁcient DeepNC algorithm that consecutively ﬁnds individual nodes that maximize the probability in each nodegeneration step, as well as an enhanced version using the expectation-maximization algorithm. The runtime complexities of bothalgorithms are shown to be almost linear in the number of nodes in the network. We empirically demonstrate the superiority of DeepNCover state-of-the-art network completion approaches.

Index Terms —Autoregressive generative model; deep generative model of graphs; inference; network completion; partially observablenetwork (cid:70)

NTRODUCTION

Real-world networks extracted from various biological, so-cial, technological, and information systems tend to beonly partially observable and thus missing both nodes andedges [1]. For example, users and organizations may havelimited access to data due to insufﬁcient resources or a lackof authority. In social networks, a source of incompletenessstems from privacy settings speciﬁed by users who partiallyor completely hide their identities and/or friendships [2]. Asan example, consider a demographic analysis of Facebookusers in New York City in June 2011 that showed 52.6%of the users to be hiding their Facebook friends [3]. Usingsuch incomplete network data may severely degrade theperformance of downstream analyses such as communitydetection, link prediction, and node classiﬁcation due tosigniﬁcantly altered estimates of structural properties (see,e.g., [1], [4], [5], [6] and references therein).This motivates us to conduct network completion to inferthe missing part (i.e., a set of both missing nodes andassociated edges), prior to performing downstream appli-cations. While intuitively similar, network completion isa much more challenging task than the well-studied link • C. Tran is with the Department of Computer Science and Engineering,Dankook University, Yongin 16890, Republic of Korea, and also with theDepartment of Computational Science and Engineering, Yonsei Univer-sity, Seoul 03722, Republic of Korea. E-mail: [email protected]. • W.-Y. Shin is with the Department of Computational Science andEngineering, Yonsei University, Seoul 03722, Republic of Korea.E-mail: [email protected]. • A. Spitz is with the School of Computer and Communication Sciences,cole Polytechnique Fdrale de Lausanne, Lausanne 1015, Switzerland.E-mail: andreas.spitz@epﬂ.ch. • M. Gertz is with the Institute of Computer Science, Heidelberg University,Heidelberg 69120, Germany. E-mail: [email protected].(Corresponding author: Won-Yong Shin.) prediction , since it jointly infers both missing nodes and edges while link prediction infers missing edges only. Althoughthere has been an attempt to recover both missing nodesand edges [5], it suffers from several limitations. A state-of-the-art network completion method that aims at inferringthe missing part of a network based on the Kronecker graphmodel, dubbed KronEM [5], suffers from three major prob-lems: 1) setting the size of a Kronecker generative parameteris not trivial; 2) the Kronecker graph model is inherentlydesigned under the assumption of a pure power-law degreedistribution that not all real-world networks necessarilyfollow; and 3) its inference accuracy is not satisfactory.As a way of further enhancing the performance of net-work completion, our study is intuitively motivated by theexistence of structurally similar graphs with respect to agraph distance, whose topologies are almost entirely ob-servable. Such similar graphs can be retrieved from thesame domain as that of the target graph (see [7], [8], [9]for more information). Suppose that many citizens residingin country A strongly protect the privacy of their socialrelationships, while citizens of country B tend to providetheir friendship relations on social media. Intuitively, aslong as the graph structures between two countries aresimilar to each other, latent information within the (almost)complete data collected from country B can be uncoveredand leveraged to infer the missing part of the collected datafrom country A. Additionally, the use of deep learning ongraphs has been actively studied by exploiting this struc-tural similarity of graphs (see, e.g., [10], [11] and referencestherein), which enables us to model complex structures overgraphs with high accuracy. For example, the framework ofrecurrent neural networks (RNN) and generative adversar-ial networks (GAN) were recently introduced to construct

1. Note that, in this paper, we use the terms “network” and “graph”interchangeably. a r X i v : . [ c s . S I] J un deep generative models of graphs [10], [11]. Thus, a naturalquestion is how such structural similarity can be incorporatedinto the problem of network completion by taking advan-tage of effective deep learning-based approaches. In this paper, we introduce

DeepNC , a novel method forcompleting the missing part of an observed incompletenetwork G O based on a deep generative model of graphs.Speciﬁcally, we ﬁrst learn a likelihood over edges (i.e., alatent representation) via an autoregressive generative modelof graphs, e.g., GraphRNN [10] built upon RNN, by using aset of structurally similar graphs as training data, and theninfer the missing part of the network. Unlike GraphRNN,which is only applicable to fully observable graphs, ourmethod is capable of accommodating both observable andmissing parts by imputing a number of missing nodes andedges with sampled values from a multivariate Bernoullidistribution. To this end, we formulate a new optimizationproblem with the aim of ﬁnding the graph that maximizesthe learned likelihood conditioned on the observable graphtopology. To efﬁciently solve the problem, we ﬁrst propose alow-complexity DeepNC algorithm, termed

DeepNC-L , that consecutively ﬁnds a single node maximizing the probabilityin each node generation step in a greedy fashion underthe assumption that there are no missing edges betweentwo nodes in a partially observable network G O . We thenpresent judicious approximation and computational reduc-tion techniques to DeepNC-L by exploiting the sparseness ofreal-world networks. Second, by relaxing this assumptionto deal with a more realistic scenario in which there aremissing edges in G O , we propose an enhanced versionof DeepNC using the expectation-maximization (EM) algo-rithm, termed

DeepNC-EM , which enables us to jointly ﬁndboth missing edges between nodes in G O and edges associ-ated with missing nodes by executing DeepNC-L iteratively.That is, the

DeepNC-EM algorithm jointly solves networkcompletion and link prediction in a single module. We showthat the computational complexity of both

DeepNC algo-rithms is almost linear in the number of nodes in the network.By adopting the graph edit distance (GED) [12] as a per-formance metric, we empirically evaluate the performanceof both

DeepNC algorithms for various environments. Ex-perimental results show that our algorithms consistentlyoutperform state-of-the-art network completion approachesby up to 68.25% in terms of GED. The results also demon-strate the robustness of our method not only on variousreal-world networks that do not necessarily follow a power-law degree distribution, but also in three more difﬁcult andchallenging situtations where 1) a large portion of nodesare missing, 2) training graphs are only partially observed,and 3) a large portion of edges between nodes in G O aremissing. Additionally, we analyze and empirically validatethe computational complexity of DeepNC algorithms. Ourmain contributions are ﬁve-fold and summarized as follows: • We introduce

DeepNC , a deep learning-based net-work completion method for partially observablenetworks; • We formalize our problem as the imputation of miss-ing data in an optimization problem that maximizes TABLE 1: Summary of notations

Notation Description G T true graph G O partially observable graph V O set of nodes in G O E O set of edges in G O V M set of missing nodes E M set of missing edges G I training graph p model probability distribution over edges of agraph Θ parameter of p model ˆ G recovered graph π node ordering S π a sequence of nodes and edges under π the conditional probability of a generated node se-quence; • We design two computationally efﬁcient

DeepNC algorithms to solve the problem by exploiting thesparsity of networks; • We validate

DeepNC through extensive experimentsusing real-world datasets across various domains, aswell as synthetic datasets; • We analyze and empirically validate the computa-tional complexity of

DeepNC .To the best of our knowledge, this study is the ﬁrst workthat applies deep learning to network completion.

The remainder of this paper is organized as follows. InSection 2, we summarize signiﬁcant studies that are relatedto our work. In Section 3, we explain the methodology of ourwork, including the problem deﬁnition and an overview ofour

DeepNC method. Section 4 describes implementationdetails of two

DeepNC algorithms and analyzes their com-putational complexities. Experimental results are discussedin Section 5. Finally, we provide a summary and concludingremarks in Section 6.Table 1 summarizes the notation that is used in thispaper. This notation will be formally deﬁned in the follow-ing sections when we introduce our methodology and thetechnical details.

ELATED W ORK

The method that we propose in this paper is related tothree broader areas of research, namely generative modelsof graphs, link prediction, and network completion.

Generative models of graphs.

The study of generativemodels of graphs has a long history, beginning with the ﬁrstrandom model of graphs that robustly assigns probabilitiesto large classes of graphs, and was introduced by Erd˝osand R´enyi [13]. Another well-known model generates newnodes based on preferential attachment [14]. More recently, agenerative model based on Kronecker graphs, the so-calledKronFit, was introduced in [15], which generates syntheticnetworks that match many of the structural properties of

TABLE 2: Summary of deep generative models of graphsDeep generative models of graphs Scalable Flexible AttributedAutoregressive [10], [16] (cid:88) (cid:88)

GAN [11], [20] (cid:88)

VAE [17], [18] (cid:88)

Reinforcement learning [19] (cid:88) (cid:88)

General neural network [21] (cid:88) (cid:88) real-world networks such as constant and shrinking diam-eters. Recent advances in deep learning -based approacheshave made further progress towards generative models forcomplex networks [10], [11], [16], [17], [18], [19], [20], [21].GraphRNN [10] and graph recurrent attention networks(GRAN) [16] were presented to learn a distribution overedges by decomposing the graph generation process intosequences of node and edge formations via autoregressivegenerative models; an approach using the Wasserstein GANobjective in the training process was applied to gener-ate discrete output samples [11]; variational autoencoders(VAEs) were employed to design another deep learning-based generative model of graphs [17], [18]; a graph con-volutional policy network was presented for goal-directedgraph generation (e.g., drug molecules) using reinforcementlearning [19]; a multi-scale graph generative model, namedMisc-GAN, was introduced by modeling the underlyingdistribution of graph structures at different levels of granu-larity to aim at generating graphs having similar commu-nity structures [20]; and a more general deep generativemodel was presented to learn distributions over any ar-bitrary graph via graph neural networks [21]. Among theaforementioned methods, autoregressive generative modelssuch as GraphRNN and GRAN are the most scalable andﬂexible approaches in terms of graph size, while others arebeneﬁcial in generating non-topological information such asnode attributes. Table 2 summarizes the literature overviewof the aforementioned deep generative models of graphs.

Link prediction.

Inferring the presence of links in agiven network according to the neighborhood similarityof existing connections is a longstanding task in networkscience. Although numerous algorithms have been devel-oped based on traditional statistical measures [22] and deeplearning such as graph neural networks [18], [23], existinglink prediction methods are not inherently designed tosolve the network completion problem that jointly recoversmissing nodes and edges in partially observable networks.Speciﬁcally, when a node is completely missing from theunderlying network, link prediction models can no longerexploit structural neighborhood information.

Network completion.

Observing a partial sample of anetwork and inferring the remainder of the network is re-ferred to as network completion . As the most inﬂuential study,KronEM, an approach based on Kronecker graphs to solv-ing the network completion problem by applying the EMalgorithm, was suggested by Kim and Leskovec [5]. MISCwas developed to tackle the missing node identiﬁcation prob-lem when the information of connections between missingnodes and observable nodes is assumed to be available [24].A follow-up study of MISC [25] incorporated metadata suchas the demographic information and the nodes historical behavior into the inference process. Furthermore, a graphupscaling method, termed EvoGraph [26], can be regardedas a network completion method using a preferential attach-ment mechanism.

Discussions.

Despite these contributions, there has beenno prior work in the literature that exploits the power ofdeep generative models in the context of network com-pletion. Although generative models of graphs such asGraphRNN can be used as a network completion method,nontrivial extra tasks are required, including computation-ally expensive graph matching to ﬁnd the correspondencebetween generated graphs and the partially observable net-work. Furthermore, MISC and other follow-up studies donot truly address network completion, since they solve the node identiﬁcation problem under the assumption that theconnections between missing nodes and observable nodesare known beforehand, which is not feasible in a settingwhere only partial observation of nodes is possible.

ETHODOLOGY

As a basis for the proposed

DeepNC algorithm in Section 4,we ﬁrst describe our network model with basic assumptionsand formulate our problem. Then, we explain a deep gener-ative graph model and our research methodology adoptingthe deep generative graph model to solve the problem ofnetwork completion.

Let us denote a partially observable network as G O =( V O , E O ) , where V O and E O are the set of vertices and theset of edges, respectively. The network G O with | V O | ob-servable nodes can be interpreted as a subgraph taken froman underlying true network G T = ( V O ∪ V M , E O ∪ E M ) ,where V M is the set of unobservable (missing) nodes and E M is the set of three types of unobservable (missing) edgesincluding i) the edges connecting two nodes in V M , ii) theedges connecting one node in V O and another node in V M ,and iii) the missing edges connecting two nodes in V O . Morespeciﬁcally, the set of observable edges, E O , is regarded as asubset of all true edges connecting nodes in V O . In contrastto the conventional setting that assumes no missing edgesbetween two nodes in V O [5], we relax this assumptionby not requiring that G O is a complete subgraph. In thefollowing, we assume both G O and G T to be undirectedunweighted networks without self-loop and repeated edges.Let us denote p model as a family of probability distri-butions over the edges of a graph, which can be param-eterized by a set of model parameters Θ , i.e., ( p Θ model ; Θ) .In this paper, we suppose that G T is a sample drawn Fig. 1: The schematic overview of our

DeepNC method.from the distribution p model . Furthermore, we assume thatthe number of missing nodes, | V M | , is available or can beestimated. In practice, | V M | can be readily estimated by stan-dard statistical methods; for example, a latent non-randommixing model in [27] is capable of estimating a networksize | V O ∪ V M | by asking respondents how many peoplethey know in speciﬁc subpopulations. For an overview ofnetwork-relevant notations, see Fig. 1. In the following, we formally deﬁne the network completionproblem, the idea behind our approach, and the problemformulation.

Deﬁnition 1. Network completion problem.

Given a par-tially observable network G O , network completion aims to recoverall missing edges connecting nodes in the true network G T sothat the inferred network, denoted by ˆ G , is equivalent to G T (upto isomorphism). As illustrated in Fig. 1, a network ˆ G is inferred usingthe partially observable network G O as input of DeepNC .We tackle this problem by minimizing a distance metric δ ( G T , ˆ G ) that measures the difference between G T and ˆ G .Due to the fact that the true network G T is not available, ourmain idea behind this problem is to analyze the connectivitypatterns of one (or multiple) fully observed network(s) G I whose structure is similar to that of G T (i.e., δ ( G T , G I ) issufﬁciently small) and then to make use of this informationfor recovering the network G O , where G I is a sampledrawn from the distribution p model . To this end, we ﬁrstlearn ( p Θ model ; Θ) by using G I as the training data under adeep generative model of graphs described in Section 3.2.Afterwards, we generate graphs with similar structures viathe set of learned model parameters Θ . Among all generatedgraphs G , each of which has | V O | + | V M | nodes, we ﬁndthe most likely graph conﬁguration ˆ G given the observablepart G O . In this context, our optimization problem can be

2. The number of nodes in G I should be greater than or equal to thatin G T so that the information (i.e., the distribution p Θ model ) encoded bylearned parameters Θ is sufﬁcient to infer G T . formulated as follows: ˆ G = arg max G P ( G | G O , Θ) s.t. | V G | = | V O | + | V M | , (1)where | V G | denotes the number of nodes in G . The overallprocedure of our approach is visualized in Fig. 1. Deep generative models of graphs have the ability to ap-proximate any distribution of graphs with minimal assump-tions about their structures [10], [21]. Among recently intro-duced deep generative models, GraphRNN [10] is adoptedin our study due to its state-of-the-art performance in gener-ating diverse graphs that match the structural characteristicsof a target set as well as the scalability to much largergraphs than those from other deep generative models (referto Section 4 and Corollary 1 in [10] for more details). Inthis subsection, we brieﬂy describe a variant of GraphRNN,termed simpliﬁed GraphRNN (GraphRNN-S), where theprobability of edge connections for a node is assumed to beindependent of each other. This method effectively learns ( p Θ model ; Θ) from the set of structurally similar network(s) G I .We ﬁrst describe how to vectorize a graph. Given a graph G with a number of nodes equal to | V O | + | V M | , we deﬁnea node ordering π that maps nodes to rows or columns ofa given adjacency matrix of G as a permutation functionover the set of nodes. Thus, { π ( v ) , · · · , π ( v | V O | + | V M | ) } is apermutation of { v , · · · , v | V O | + | V M | } , yielding ( | V O | + | V M | )! possible node permutations. Given a node ordering π , asequence S is then deﬁned as: S π (cid:44) ( S π , · · · , S π | V O | + | V M | ) , (2)where each element S πi ∈ { , } i − for i ∈ { , · · · , | V O | + | V M |} is a binary adjacency vector representing the edgesbetween node π ( v i ) and the previous nodes π ( v j ) for j ∈{ , · · · , i − } that already exist in the graph, and S π = ∅ .Here, S πi can be expressed as S πi = ( a π ,i , · · · , a πi − ,i ) , ∀ i ∈ { , · · · , | V O | + | V M |} , (3)where a πu,v denotes the ( u, v ) -th element of the adja-cency matrix A π ∈ { , } ( | V O | + | V M | ) × ( | V O | + | V M | ) for u, v ∈{ , · · · , | V O | + | V M |} (refer to Fig. 2 for an illustration ofthe sequence). Due to the fact that the graphs are discreteobjects, the graph generation process involves discrete de-cisions that are not differentiable and therefore problematicfor back propagation. Thus, instead of directly learning thedistribution p ( G ) , we sample π from the set of ( | V O | + | V M | )! node permutations to generate the sequences S π and learnthe distribution p ( S π ) .Next, we explain how to characterize the probability p ( S π ) . Due to the sequential nature of S π , the probability p ( S π ) can be decomposed into the product of conditionalprobability distributions over the elements as in the follow-ing: p ( S π ) = | V O | + | V M | (cid:89) i =2 p ( S πi | S π , · · · , S πi − ) . (4)For ease of presentation, we simplify p ( S πi | S π , · · · , S πi − ) as p ( S πi | S π

DeepNC method, where three nodes (i.e., A, B, and C) andtwo edges with solid lines are observable instead of the true graph G T consisting of ﬁve nodes and all associated edges.Both white and orange entries in S π are imputed with either 0 or 1 while grey entries in S π remain unchanged.and ϕ i,j ∈ (0 , is the j -th element of ϕ i . An examplevisualizing our DeepNC method is presented in Fig. 3,where we observe a network G O consisting of three nodes(i.e., A, B, and C) and two edges, instead of the true network G T with 5 nodes (i.e., A, B, C, M , and M ).To solve (7), we need to compute p ( S π | G O ; Θ) via exhaustive search over ( | V O | + | V M | )! node per-mutations. Since computing p (˜ S π ; Θ) in (8) requires ( | V O | + | V M | ) multiplication operations and data impu-tation yields ( | VO | + | VM | )( | VO | + | VM |− −| E O | possible out-comes of ˜S π , its computational complexity is boundedby O (( | V O | + | V M | ) ( | VO | + | VM | )( | VO | + | VM |− −| E O | ( | V O | + | V M | )!) . This motivates us to introduce a low-complexityalgorithm for efﬁciently solving such a problem in the nextsection. EEP

NC A

LGORITHMS

In this section, we introduce two algorithms that we designto efﬁciently solve the network completion problem in (7).In designing such algorithms, we focus on how to computethe likelihood of edge existence in the form of a tuple (ˆ π, Φ) ,where ˆ π represents a node ordering to be inferred and Φ = { ϕ , · · · , ϕ | V O | + | V M | } . Then, ˆS π in (7) can be acquiredby sampling from (ˆ π, Φ) . First, we present DeepNC-L , a low-complexity deep network completion algorithm, workingbased on the assumption that a partially observable graph G O is a complete subgraph with no missing edges. Second,we present an enhanced version of DeepNC-L using the EMalgorithm [30], dubbed

DeepNC-EM , to deal with the casewhere edges are missing in G O . The overall architectureof both DeepNC algorithms is illustrated in Fig. 4, wherenotations and detailed descriptions are shown later. We alsoanalyze their computational complexities.

We propose

DeepNC-L that approximates the optimal solu-tion to (7) under the assumption that there are no missing Fig. 4: The overall architecture of

DeepNC algorithms.edges in G O , which implies that the non-existent edges be-tween nodes in G O are regarded as observable entries in S π .Since Φ indicates the set of edge existence probabilities andis thus obtained from the set of learned model parameters Θ for each π , (7) can be simpliﬁed to the problem of ﬁndinga node ordering ˆ π such that ˆ π = arg max π p ( ˜S π ; Θ) , (9)where ˜S π is the sequence after data imputation under agiven π .To efﬁciently solve (9), we present two judicious approx-imation methods in the following. First, we design a greedy strategy that selects a single node at each inference (genera-tion) step. More precisely, instead of exhaustively search-ing for the node ordering maximizing p ( ˜S π ; Θ) among ( | V O | + | V M | )! possible permutations, we aim to consecutively ﬁnd a single node ˆ v ∈ V ( i ) such that ˆ v = arg max v ∈ V ( i ) p ( ˜S πi ; ϕ i ) subject to π ( v ) = i (10)for each step i ∈ { , · · · , | V O | + | V M |} , where V ( i ) is a set of Fig. 5: An illustration of the mechanism of

DeepNC-L . The ﬁrst three steps are shown as an example.nodes that have not been generated until the i -th inferencestep and ˆ v is removed from V ( i ) after each inference step(that is, V ( i +1) ← V ( i ) \{ ˆ v } ) (refer to Fig. 4 for the node re-moval). We note that the ﬁrst node can be arbitrarily chosenin the generation process. Second, we further approximatethe solution to (10) by treating all unknown entries (i.e.,missing data) in ˜ S πi equally during the computation whileretrieving ˆ v from the set V ( i ) , rather than computing thelikelihoods in (10) along with all entries in ˜ S πi . Let us deﬁne two types of nodes as observable nodes and missing nodes.Then, we select a node of either type at random in proportionto the number of nodes belonging to each type in V ( i ) to ensurethat there is no bias in the node selection. When the selectednode type is “missing”, we choose ˆ v at random from allmissing nodes in V ( i ) without any computation since allmissing nodes are treated equally. In contrast, when theselected node type is “observable”, we choose an observablenode based solely on the computation for the observableentries in S πi by reformulating our problem as follows: ˆ v = arg max v ∈ V O ∩ V ( i ) p ( O πi ; ϕ i ) subject to π ( v ) = i, (11)for each step i ∈ { , · · · , | V O | + | V M |} , where O πi de-notes the set of observable entries in S πi ; p ( O πi ; ϕ i ) = (cid:81) s πi,j =1 ϕ i,j (cid:81) s πi,j =0 (1 − ϕ i,j ) from (8), and V O ∩ V ( i ) in-dicates the set of remaining observable nodes after i − inference steps. Note that p ( O πi ; ϕ i ) is non-computable ifthere is no observable entry in S πi .Now, we are ready to show a stepwise description of the DeepNC-L algorithm.

1. Initialization : For i = 1 , we set V (1) to V O ∪ V M andrandomly choose a node in V (1) to be ˆ v .

2. Node selection : For i ∈ { , · · · , | V O | + | V M |} , we ﬁnd ˆ v by either randomly selecting a missing node in V ( i ) orsolving (11), depending on which node type is selected.

3. Data imputation : After ﬁnding ˆ v , we apply a data im-putation strategy of the missing part (i.e., unknown entries)in S πi through the inference process of GraphRNN-S. To bespeciﬁc, suppose that π ( u ) = i and π ( v ) = j , which meansthat the i -th and j -th nodes in a given node ordering π are u and v , respectively. Then, we have ˜ s πij = (cid:40) Bernoulli ( ϕ i [ j ]) , if u / ∈ V O or v / ∈ V O s πi,j , otherwise , (12)where the Bernoulli trial with the probability ϕ i [ j ] maps thevalue of the unknown entry to 1 if the outcome “success”occurs and to 0 otherwise.

4. Repetition : We iterate the second and third steps | V O | + | V M | − times until the recovered graph is fullygenerated.For a more intuitive understanding, consider the follow-ing example. Example 1 : As illustrated in Fig. 5, let us describe threesteps to select the ﬁrst three nodes of a given graph accord-ing to the aforementioned procedure. We start by randomlyassigning the ﬁrst node of the inference process to nodeM (i.e., π ( M ) = 1 and V (2) ← V (1) \{ M } ). Since wedo not have any information about the connections for theunseen node M , s π , is unknown for all nodes v ∈ V (2) .Suppose that we generate an observable node at this stepby random selection. Since there is no observable entry in S πi , we randomly choose node A among the three nodes in V O ∩ V (2) as the second node and set π ( A ) = 2 , resultingin V (3) ← V (2) \ { A } . Assuming that ϕ = [0 . and aBernoulli trial with the probability ϕ returns , we impute ˜ s π , with according to (12). Let us turn to the next stepin order to select the third node. In this case, since nodesB and C belong to the type of observable nodes, ˜ s π , takesthe value of either 1 or 0, depending on the connections tonode A. Suppose that we again generate an observable nodeat this step and ϕ = [0 . , . . When either π ( B ) = 3 or π ( C ) = 3 , the likelihood p ( O π ; ϕ ) can be computed as: • If π ( B ) = 3 , then it follows that p ( O π ; ϕ ) = ϕ , =0 . using (8). • If π ( C ) = 3 , then it follows that p ( O π ; ϕ ) = 1 − ϕ , = 1 − . . in a similar manner.Based on the above results, setting π ( C ) to 3 leads to themaximum value of p ( O π ; ϕ ) , which is thus the solution tothe problem in (11) for i = 3 . As depicted in Fig. 5, nodeC is chosen in this step. By assuming that a Bernoulli trial with the probability ϕ , = 0 . returns , we ﬁnally have ˜S π = [1 , . From now on, we turn to examining how to efﬁciently com-pute the likelihoods in (11) through a complexity reductiontechnique. We start by making a helpful observation asillustrated in Fig. 6. Suppose that nodes M , A, B, and E fromthe original graph with 8 observable nodes and 3 missingnodes have already been generated sequentially after fourinference steps, as depicted in Fig. 6. Then, one can see that O π = 0 when node D, G, or H is selected in the ﬁfth step(i.e., π ( D ) = 5 , π ( G ) = 5 , or π ( H ) = 5 ) since each of thethree nodes has no connection to the nodes A, B, and E thathave already been generated. Consequently, the likelihood p ( O π ; ϕ ) is identical for these three cases. We generalizethis observation in the following lemma. Lemma 1.

Let L ( i ) denote the set of not yet selected directneighbors of observable nodes generated for i − inference steps,expressed as L ( i ) = (cid:40) ( L ( i − ∪ N (ˆ v )) ∩ V ( i ) , if ˆ v ∈ V O L ( i − ∩ V ( i ) , otherwise , (13) where i ∈ { , · · · , | V O | + | V M |} , L (1) = ∅ , ˆ v is the selected nodein the ( i − -th step, and N (ˆ v ) is the set of (direct) neighborsof ˆ v . Then, the likelihood p ( O πi ; ϕ i ) in (11) is the same for all u / ∈ L ( i ) , where u ∈ V O and π ( u ) = i .Proof. For the observable node u that does not belong to theset L ( i ) and is not generated for i − inference steps, allobservable entries in S πi (i.e., entries in O πi ) take the valueof 0’s since there is no associated edge. Thus, it follows that p ( O πi ; ϕ i ) = (cid:81) s πi,j =0 (1 − ϕ i,j ) , which is identical for all u / ∈ L ( i ) , where u ∈ V O and π ( u ) = i . This completes theproof of this lemma.Lemma 1 allows us to compute the likelihood p ( O πi ; ϕ i ) only once for all nonselected observable nodes u / ∈ L ( i ) when solving (11), which corresponds to the case wherenode D, G, or H is selected in the ﬁfth step in Fig. 6 while L (5) = { C,F } , indicating the set of nonselected neighbors ofnodes A, B, and E.Next, we explain how to efﬁciently solve the problemin (11) without computing likelihoods p ( O πi ; ϕ i ) for observ-able nodes. From Fig. 6, one can see that s π , (correspondingto entries with diagonal lines in O π ) is the only term thatmakes the difference between two sets O π for the caseswhen node C is selected and when either node D, G, orH is selected, which implies that it may not be necessary tocompute the likelihoods of s π , and s π , for node selection.Thus, from the fact that most of the entries in O πi tend tobe 0’s in many real-world networks that are usually sparse,the computational complexity can be greatly reduced if wemake the comparison of likelihoods in (11) based only onthe entries in O πi that have a value of 1. To this end, weeliminate all the terms (1 − ϕ i,j ) corresponding to s πi,j = 0 Fig. 6: An example illustrating the ﬁfth inference step of

DeepNC-L , where nodes M , A, B, and E have been gener-ated sequentially.from p ( O πi ; ϕ i ) when a node v ∈ V O ∩ V ( i ) is selected. Forcomputational convenience, we deﬁne D v = (cid:81) s πi,j =1 ϕ i,j (cid:81) s πi,j =0 (1 − ϕ i,j ) (cid:81) s πi,j ∈ O πi (1 − ϕ i,j )= (cid:89) s πi,j =1 ϕ i,j (1 − ϕ i,j ) (14)for v ∈ V O ∩ V ( i ) . Since the denominator in (14) is the samefor all v ∈ V O ∩ V ( i ) , it is obvious that ˆ v = arg max v D v is the solution to (11). We note that computing D v is lesscomputationally expensive than computing p ( O πi ; ϕ i ) whenthe number of entries with the value of 1’s in O πi is low.As a special case in which all observable entries in S πi takethe value of 0’s, the denominator in (14) is equivalent to p ( O πi ; ϕ i ) , from which it follows that D u = 1 due to thefact that a node u / ∈ L ( i ) is selected. Thus, if D v < forall v ∈ L ( i ) , then the likelihood in (11) for selecting a node u / ∈ L ( i ) is higher than that for selecting a node v ∈ L ( i ) .In this case, we randomly choose a node ˆ v / ∈ L ( i ) withoutfurther computation from Lemma 1. In consequence, wecompute D v only for nodes in the set L ( i ) , rather thancomputing D v for all nodes in V O ∩ V ( i ) . The followingexample describes how the computational complexity canbe reduced according to the aforementioned technique byrevisiting Fig. 6. Example 2 : Suppose that we generate an observablenode at the ﬁfth inference step and ϕ = [0 . , . , . , . .In this step, one can see that L (5) = { C , F } ; thus, insteadof computing the likelihood p ( O π ; ϕ ) in (11) ﬁve timesfor all nonselected observable nodes C, D, F, G, and Hin V (5) , we only compute D C = ϕ , − ϕ , = . − . and Algorithm 1:

DeepNC-L

Input: G O , | V M | , f out , f trans Output: (ˆ π, Φ) Initialization : i ← h ← random initialization; ˜S π ← ∅ ; ˆ v ← v ∈ V O ∪ V M ; π (ˆ v ) ← L (1) ← ∅ ; Update L ( i ) according to (13); function DeepNC-L while i ≥ | V O | + | V M | do h i − ← f trans ( h i − , ˜S πi − ) ϕ i ← f out ( h i − ) Select a node type if the selected node type is “observable” then for v ∈ L ( i ) do Compute D v according to (14) if ( D v < for all v or L ( i ) = ∅ ) and L ( i ) (cid:54) = V O ∩ V ( i ) then Randomly select an observable node ˆ v / ∈ L ( i ) else ˆ v ← arg max v D v Update L ( i ) according to (13) else Randomly select an unobservable node ˆ v ˜S πi ← Impute S πi according to (12) π (ˆ v ) ← i + 1 i ← i + 1 return (ˆ π, Φ) D F = ϕ , − ϕ , = . − . from (14). Since both D C and D F are smaller than 1, we randomly choose one of the threeobservable nodes D, G, and H that are not in L (5) as ˆ v . We summarize the overall procedure of our

DeepNC-L algorithm shown in Algorithm 1. We initially select the ﬁrstnode at random, and then start the inference process byidentifying connections for the next node according to thefollowing four stages: Using two functions f trans and f out in (5) and (6),respectively, we obtain ϕ i (refer to lines 4–5). Let m denote the cardinality of the set of missingnodes that can be potentially generated in the i -th step. Wethen randomly select a node type so that the selected nodeis missing with probability of m | V O | + | V M |− i +1 (refer to line 6). If the type of observable nodes is selected, then wecompute D v , which is a function of ϕ i , according to (14) forall v ∈ L ( i ) . When D v < for all v ∈ L ( i ) or L ( i ) = ∅ ,we randomly select an observable node ˆ v / ∈ L ( i ) providedthat L ( i ) (cid:54) = V O ∩ V ( i ) . Otherwise, we select the node ˆ v thatmaximizes D v . Afterwards, we update L ( i ) by includingneighbors of the selected node ˆ v (refer to lines 7–14). If the type of missing nodes is selected, then we selectone node ˆ v randomly among all missing nodes that have notbeen generated until the i -th step. (refer to lines 15–16). The data imputation process takes place before thenext iteration of node generation. Finally, we update thenode ordering π by including the selected node ˆ v for the i -th step. The algorithm continues by repeating stages 1–4 andterminates when a fully inferred sequence S π is generated(refer to lines 17–20).We remark that a node ordering ˆ π is found given aset of edge existence probabilities Φ , which is inferred byour model parameters Θ while assuming that G O is acomplete subgraph; thus, the resulting tuple (ˆ π, Φ) maynot be accurate when there are missing edges in G O . Thismotivates us to develop the DeepNC-EM algorithm in thefollowing subsection.

In this subsection, we introduce

DeepNC-EM to furtherimprove the performance of

DeepNC-L by relaxing theassumption that there are no missing edges between twonodes in G O . A na¨ıve recovery of G O even with state-of-the-art link prediction methods before conducting networkcompletion may lead to suboptimal performance since thenetwork structures of G O are potentially distorted due to theeffect of missing nodes and missing incident edges. Thus,we aim to ﬁnd the most likely conﬁguration of three typesof missing edges in the set E M speciﬁed in Section 3.1.1by jointly estimating a tuple ( π, Φ) . To this end, we solve(7) by designing another DeepNC method using the EMalgorithm.We now describe the proposed

DeepNC-EM , which isbuilt upon the

DeepNC-L algorithm in Section 4.1. Let ( π (0) , Φ (0) ) and Z denote the initial output of DeepNC-L and the set of non-existent edges between nodes in G O , respectively. First, we estimate the potential existence likelihoods of edges in Z , denoted by Φ Z , by extracting | V O | − E O elements corresponding to Z from the likeli-hoods Φ (0) of all edges under the node ordering π (0) . Then,the E-step samples Z ( t ) from p ( Z ( t ) | Φ ( t ) Z ) via Bernoulli trialsto create multiple instances of G ( t ) O , where the supercript ( t ) denotes the EM iteration index. In the M-step, we adopt DeepNC-L to subsequently optimize the parameters Φ Z given the samples obtained in the E-step. The EM itera-tion alternates between performing the E-step and M-stepaccording to the following expressions, respectively: E-step: Z ( t ) ∼ p ( Z | Φ ( t ) Z ) , M-step: Φ ( t +1) Z = arg max Φ Z E [ p ( Z ( t ) | Φ Z )] . The overall procedure of

DeepNC-EM is summarizedin Algorithm 2. Here, Filter( π ( t ) [ i ] , Φ ( t ) [ i ] ) in lines 1 and10 is invoked to retrieve Φ ( t ) Z from Φ ( t ) , η > is anarbitrarily small threshold indicating a stopping criterionfor the algorithm, ∆ s denotes the number of samples in eachE-step, and [ i ] indicates the sample index. In this subsection, we analyze the computational complexi-ties of the

DeepNC-L and

DeepNC-EM algorithms.

We start by examining the complexity of each inference step i ∈ { , · · · , | V O | + | V M |} . It is not difﬁcult to show thatthe case in which a node that is selected in the inferenceprocess is an observable node dominates the complexity.Note that it is possible to compute D v in constant time as Algorithm 2:

DeepNC-EM

Input: π (0) , Φ (0) , G O , | V M | , f out , f trans , ∆ s Output: (ˆ π, ˆΦ) Initialization : t ←

0; Φ (0) Z ← Filter ( π (0) , Φ (0) ); function DeepNC-EM do E-step: for i ∈ { , · · · , ∆ s } do Z ( t ) [ i ] ∼ p ( Z | Φ ( t ) Z ) G ( t ) O [ i ] ← add edges sampled from Z ( t ) [ i ] M-step: for i ∈ { , · · · , ∆ s } do ( π ( t +1) [ i ] , Φ ( t +1) [ i ]) ← DeepNC-L ( G ( t ) O [ i ] , | V M | , f out , f trans ) Φ ( t +1) Z [ i ] ← Filter ( π ( t +1) [ i ] , Φ ( t +1) [ i ]) Φ ( t +1) Z ← s (cid:80) i Φ ( t +1) Z [ i ] t ← t + 1 while (cid:13)(cid:13)(cid:13) Φ ( t ) Z − Φ ( t − Z (cid:13)(cid:13)(cid:13) < η ˆ Z ∼ p ( Z | Φ ( t +1) Z ) ˆ G O ← add edges from ˆ Z (ˆ π, ˆΦ) ← DeepNC-L ( ˆ G O , | V M | , f out , f trans ) return (ˆ π, ˆΦ) the average degree over a network is typically regarded asa constant [31]. Thus, the complexity of this step is boundedby O ( | L ( i ) | ) since we exhaustively compute D v over thenodes v ∈ L ( i ) . The data imputation process is computablein constant time when parallelization can be applied sincethe Bernoulli trials are independent of each other. As ouralgorithm is composed of | V O | + | V M |− inference steps, thetotal complexity is ﬁnally given by O (( | V O | + | V M | ) | L ( i ) | ) ,which can be rewritten as O ( | V O |·| L ( i ) | ) from to the fact that | V M | (cid:28) | V O | . The following theorem states a comprehensiveanalysis of the computational complexity. Theorem 1.

Lower and upper bounds on the computationalcomplexity of the proposed

DeepNC-L algorithm are given by Ω( | V O | ) and O ( | V O | ) , respectively.Proof. The parameter L ( i ) is the set of neighboring nodes tothe observable nodes that have already been generated inthe i -th step, while its cardinality depends on the networktopology. For the best case where all nodes are isolatedwith no neighbors, we always have | L ( i ) | = 0 for eachgeneration step; thus, each step is computable in constanttime, yielding the total complexity of Ω( | V O | ) . For the worstcase, corresponding to a fully-connected graph, it followsthat | L ( i ) | = | V O | + | V M | − i for each generation step, thusyielding the total complexity of O ( | V O | ) . This completesthe proof of this theorem.From Theorem 1, it is possible to establish the followingcorollary. Corollary 1.

The computational complexity of the

DeepNC-L algorithm scales as Θ( | V O | (cid:15) ) , where ≤ (cid:15) ≤ depends on agiven network topology such as sparsity of networks. We shall validate this assertion in Corollary 1 via em-pirical evaluation for various datasets in the next section byidentifying that (cid:15) is indeed small, which implies that thecomplexity of

DeepNC-L is almost linear in | V O | . We turn to examining the computational complexity ofeach EM step to ﬁnally analyze the overall complexity.In the E-step, we can parallelize both the Bernoulli tri-als for edge sampling and the operation adding samplededges to G ( t ) O [ i ] in lines 5 and 6, respectively. Consequently,the computational complexity of each E-step is given by O (∆ s ) , where ∆ s is the number of samples in each E-step. The M-step is dominated by DeepNC-L as the functionFilter ( · , · ) can also be executed in parallel since all oper-ations therein are performed independently of each other.Thus, the computational complexity of each M-step is givenby O (∆ s | V O | (cid:15) ) . When the number of EM iterations isgiven by k EM , determined by the threshold η , and there are atotal of ∆ s samples, the complexity of DeepNC-EM is ﬁnallygiven as Θ( k EM ∆ s | V O | (cid:15) ) based on Corollary 1. Since both k EM and ∆ s are regarded as constants as in [5], the totalcomputational complexity scales as Θ( | V O | (cid:15) ) . XPERIMENTAL E VALUATION

In this section, we ﬁrst describe both synthetic and real-world datasets that we use in the evaluation. We alsopresent three state-of-the-art methods for network comple-tion as a comparison. After presenting a performance metricand our experimental settings, we intensively evaluate theperformance of our

DeepNC algorithms.

Two synthetic and three real-world datasets across variousdomains (e.g., social, citations, and biological networks)are used as a series of homogeneous networks (graphs),denoted by G I , and described in sequence. For all experi-ments, we treat graphs as undirected and only consider thelargest connected component without isolated nodes. Thestatistics of each dataset, including the number of similargraphs and the range of the number of nodes, is describedin Table 3. In the following, we summarize important char-acteristics of the datasets. Lancichinetti-Fortunato-Radicchi (LFR) [32]. We con-struct a synthetic graph generated using the LFR model inwhich the degree exponent of a power-law distribution, theaverage degree, the minimum community size, the commu-nity size exponent, and the mixing parameter are set to 3, 5,20, 1.5, and 0.1, respectively. Refer to the original paper [32]for a detailed description of these parameters.

Barabasi-Albert (B-A) [14]. We generate further syn-thetic graphs using the B-A model. The attachment parame-ter of the model is set in such a way that each newly addednode is connected to four existing nodes, unless otherwisestated.

Protein [8]. The protein structure is a biological network.Each protein is represented by a graph, in which nodesrepresent amino acids. Two nodes are connected if they areless than 6 Angstroms apart. TABLE 3: Statistics of 5 datasets, where NG and NN denotethe number of similar graphs and the range of the numberof nodes in each dataset, respectively, including traininggraphs G I and a test graph G T . Here, k denotes .Name NG NNLFR 500 1.6k–2kB-A 500 1.6k–2kProtein 918 100–500Ego-CiteSeer 737 50–399Ego-Facebook 10 52–1,034 Ego-CiteSeer [7]. This CiteSeer dataset is an online ci-tation network and is a frequently used benchmark. Nodesand edges represent publications and citations, respectively.

Ego-Facebook [9]. This Facebook dataset is a socialfriendship network extracted from Facebook. Nodes andedges represent people and friendship ties, respectively.

In this subsection, we present three state-of-the-art networkcompletion approaches for comparison.

KronEM [5]. This approach aims to infer the missing partof a true network based solely on the connectivity patternsin the observed part via a generative graph model based onKronecker graphs, where the parameters are estimated viaan EM algorithm.

EvoGraph [26]. To solve the network completion prob-lem, EvoGraph infers the missing nodes and edges in sucha way that the topological properties of the observable net-work are preserved via an efﬁcient preferential attachmentmechanism.

A variant of GraphRNN-S . As a na¨ıve approach for net-work completion using deep generative models of graphs,we modify the inference process of the original GraphRNN-S [10] so that it can be used as a network completion methodas follows. Under a random ordering of observable nodes,we ﬁrst obtain the sequence { S π , · · · , S π | V O | } along with theobservable entries from G O . Then, by invoking the inferenceprocess of GraphRNN-S, we generate | V M | missing nodesusing trained f trans and f out based on { S π , · · · , S π | V O | } . Thisvariant of GraphRNN-S for network completion is termedvGraphRNN in our study. To assess the performance of our proposed method andother competing approaches, we need to quantify the degreeof agreement between the recovered graph and the originalone. To this end, we adopt the GED as a well-knownperformance metric.

Deﬁnition 2. Graph edit distance (GED) [12]. Given a set ofgraph edit operations, the GED between a recovered graph ˆ G andthe true graph G is deﬁned as GED ( ˆ

G, G ) = min ( e ,...,e k ) ∈P ( ˆ G,G ) k (cid:88) i =1 c ( e i ) , (15) where P ( ˆ G, G ) denotes the set of edit paths transforming ˆ G intoa graph isomorphic to G and c ( e ) ≥ is the cost of each graphedit operation e . Note that only four operations are allowed in our setup,including vertex substitution, edge insertion, edge deletion,and edge substitution, and c ( e ) is identically set to one forall operations. Since the problem of computing the GEDis NP-complete [33], we adopt an efﬁcient approximationalgorithm proposed in [34]. In our experiments, GED isnormalized on the average size of the two graphs. We ﬁrst describe the settings of the neural networks. In ourexperiments, the function f trans is implemented by using 4layers of GRU cells with a 128 dimensional hidden state;and the function f out is implemented by using a two-layerperceptron with a 64 dimensional hidden state and a sig-moid activation function. The Adam optimizer [35] is usedfor minibatch training with a learning rate of 0.001, whereeach minibatch contains 32 graph sequences. We train themodel for 32,000 batches in all experiments.To test the performance of our method, we randomlyselect one graph from each dataset to act as the underlyingtrue network G T . From each dataset, we select all remainingsimilar graphs as training data G I unless otherwise stated.To create a partially observable network from the truenetwork G T , we adopt the following two graph samplingstrategies from [36]. The ﬁrst strategy, called random node (RN) sampling, selects nodes uniformly at random to createa sample graph. The second strategy, forest ﬁre (FF) sam-pling, starts by picking a seed node uniformly at randomand adding it to a sample graph (referred to as burning).Then, FF sampling burns a fraction of the outgoing linkswith nodes attached to them. This process is repeated recur-sively for each neighbor that is burned until no new nodeis selected to be burned. Afterwards, we sample uniformlyat random a portion of edges from the complete subgraphsampled from G T to ﬁnally acquire G O . In our experiments,the partially observable network G O is constructed by 90%of edges in a complete subgraph consisting of 70% ofnodes sampled from G T unless otherwise speciﬁed. Eachexperimental result is the average over 10 executions. In this subsection, our empirical study is designed to answerthe following ﬁve key research questions. • Q1 . How much does the performance of DeepNC-EM improve with respect to the number of EM iterations? • Q2 . How much do the DeepNC algorithms improvethe accuracy of network completion over the state-of-the-art approaches? • Q3 . How beneﬁcial are the DeepNC algorithms inmore difﬁcult situations where either a large numberof nodes and edges are missing or the training dataare also incomplete? • Q4 . How robust is DeepNC-EM to the portion ofmissing edges in G O in comparison with the otherstate-of-the-art approaches? • Q5 . How scalable are DeepNC algorithms with thesize of the graph?To answer these questions, we carry out six comprehen-sive experiments as follows. Number of EM iterations G E D RN samplingFF sampling (a) The LFR dataset

Number of EM iterations G E D RN samplingFF sampling (b) The B-A dataset

Fig. 7: GED of

DeepNC-EM over the number of EM itera-tions. Here, the performance of

DeepNC-L corresponds tothe case where the number of EM iterations is zero.

In Fig. 7, we show the performance of the proposed

DeepNC-EM algorithm in Section 4.2 with respect to GEDaccording to the number of EM iterations using two syn-thetic datasets, i.e., the LFR and B-A models. From Fig. 7,we discuss our ﬁndings as follows: • For both RN and FF sampling strategies, the GED of

DeepNC-EM decreases as the number of EM itera-tions increases. • The number of EM iterations required to achieve asufﬁciently low GED value is relatively small com-pared to the network size. This can be seen from theLFR dataset, where the performance improvement ismarginal after four iterations. • We observe that

DeepNC-EM exhibits less ﬂuctua-tions over EM iterations when the LFR dataset isused. This might be caused by the fact that graphsgenerated using the LFR model are denser thanthose using the B-A model under our setting, whichenables the algorithm to be more likely to correctlyrecover the edges connecting two nodes in the set V O .In the subsequent experiments, the number of EM iterationsis set to 6. The performance comparison between two

DeepNC al-gorithms and three state-of-the-art network completionmethods, including vGraphRNN, KronEM [5], and Evo-Graph [26], with respect to GED is presented in Table 4 forall ﬁve datasets. We note that

DeepNC-EM , DeepNC-L , andvGraphRNN use structurally similar graphs as training data G I ; meanwhile, both KronEM and EvoGraph operate basedsolely on the partially observable graph G O without anytraining phase. We observe the following: • The improvement rates of

DeepNC-EM overvGraphRNN, KronEM, and EvoGraph are up to40.16%, 54.55%, and 68.25%, respectively. These max-imum gains are achieved for the Ego-CiteSeer andB-A datasets. • The

DeepNC-L and

DeepNC-EM algorithms are in-sensitive to sampling strategies for creating a par-tially observable network, whereas the performance of EvoGraph depends on the sampling strategy.Speciﬁcally, sampling via FF results in better perfor-mance than that via RN sampling when EvoGraphis used due to the fact that the FF sampling strategytends to preserve the network properties such as thedegree distribution [36]. In reality, if the samplingstrategy is unknown and one only acquires randomlysampled data, then graph upscaling methods suchas EvoGraph would certainly perform poorly. Thisresult displays the robustness of our

DeepNC algo-rithms to graph samplings. • Even with deletions of only 10% of edges, the addi-tional gain of

DeepNC-EM over

DeepNC-L is still sig-niﬁcant for all datasets. The maximum improvementrate of 13.58% is achieved on the Protein dataset. • Let us compare the performance of KronEM andEvoGraph. In most cases, KronEM performs betterthan EvoGraph. However, KronEM is inferior toEvoGraph in the case where the degree distributionof a network does not strictly follow the pure power-law degree distribution. EvoGraph consistently out-performs KronEM in the Protein dataset. • The standard deviation of GED is relatively highwhen vGraphRNN is employed (e.g., 0.2514 for theEgo-CiteSeer dataset), which demonstrates that arandom node ordering of observable nodes for net-work completion does not guarantee a stable solu-tion.Consequently,

DeepNC-EM consistently outperforms allstate-of-the-art methods for all synthetic and real-worlddatasets, which reveals the robustness of our method to-ward diverse network topologies.

Our

DeepNC algorithms are compared to the three state-of-the-art network completion methods in more difﬁcultsettings that often occur in real environments: 1) the casein which a large portion of nodes are missing and 2) thecase in which training graphs are also partially observed.In these experiments, we only show the results for the RNsampling strategy since the results from FF sampling followsimilar trends.First, we create a partially observable network G O con-sisting of only 30% of nodes from the underlying true graph G T via sampling. The performance comparison between DeepNC algorithms and three state-of-the-art methods withrespect to GED is presented in Table 5 for all ﬁve datasets.As shown in Tables 4 and 5, a large number of missing nodesand edges result in signiﬁcant performance degradationfor KronEM and EvoGraph, while

DeepNC-EM , DeepNC-L ,and vGraphRNN are more robust as the latter three methodstake advantage of the topological information from similargraphs (i.e., training data) to infer the missing part.Next, we perform RN sampling so that only a part ofnodes in the training graphs is observable. In Fig. 8, we com-pare the GED of the two

DeepNC algorithms and the threestate-of-the-art methods, where the degree of observabilityin training graphs is set to { , } % in our algorithms. Weﬁnd that DeepNC algorithms still outperform the state-of-the-art methods on all datasets with the exception of the TABLE 4: Performance comparison in terms of GED (average ± standard deviation). Here, the best method for each datasetis highlighted using bold fonts. (cid:88)(cid:88)(cid:88)(cid:88)(cid:88)(cid:88)(cid:88)(cid:88)(cid:88)(cid:88) Dataset Method

DeepNC-EM ( X ) DeepNC-L vGraphRNN ( Y ) KronEM ( Y ) EvoGraph ( Y ) Gain (%) Y − XY × Y − XY × Y − XY × LFR (RN) ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± TABLE 5: Performance comparison in terms of GED when 70% of nodes are missing (average ± standard deviation). Here,the best method for each dataset is highlighted using bold fonts. (cid:88)(cid:88)(cid:88)(cid:88)(cid:88)(cid:88)(cid:88)(cid:88)(cid:88)(cid:88) Dataset Method

DeepNC-EM ( X ) DeepNC-L vGraphRNN ( Y ) KronEM ( Y ) EvoGraph ( Y ) Gain (%) Y − XY × Y − XY × Y − XY × LFR ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± Ego-Facebook dataset where the performance of

DeepNC-L is slightly inferior to that of KronEM when 90% of nodes intraining graphs are observable. G O (Q4) We evaluate the GED performance in the second fringe sce-nario, in which a partially observable network G O is createdby deleting a large portion of edges uniformly at randomfrom a complete subgraph that consists of 70% of nodessampled from G T . In Fig. 9, the performance of the DeepNC algorithms is compared to the state-of-the-art network com-pletion methods using two synthetic datasets, where thefraction of missing edges is set to { , , } % . Our mainﬁndings are: 1) DeepNC-L outperforms the three state-of-the-art methods for all the cases; 2) the gain of

DeepNC-EM over

DeepNC-L is higher when the LFR dataset is used sincemissing edges are inferred more accurately; and 3) both

DeepNC algorithms exhibit less performance degradation asthe number of missing edges increases, which demonstratesthe robustness of our method for various degrees of edgeobservability.From Tables 4–5 and Figs. 8–9, it is worth noting thatthe proposed

DeepNC-EM algorithm outperforms all state-of-the-art methods for all types of datasets under variousfringe scenarios and experimental settings.

Finally, we empirically show the average runtime complex-ity via experiments using the three sets of B-A syntheticgraphs as it is convenient to scale up the graphs whilepreserving the same structural properties, where the num-ber of connections from each new node to existing nodes,denoted by c , is set to 2, 4, and 8. In these experiments,we focus on evaluating the complexity of DeepNC-EM since EM iterations take constant time by executing

DeepNC-L foreach iteration. In each set of graphs, the number of nodes, | V O | + | V M | , varies from 200 to 2,000 in an increment of 200;and 30% of nodes and their associated edges are deletedby RN sampling to create partially observable networks.Other parameter settings follow those in Section 5.4. InFig. 10, we illustrate the log-log plot of the execution timein seconds versus | V O | , where each point represents theaverage complexity over 10 executions of DeepNC-EM . Inthe ﬁgure, dotted lines are also shown from the analyticalresult with a proper bias, showing a tendency that slopesof the lines for c ∈ { , , } are approximately given by1.16, 1.26, and 1.41, respectively. This indicates that thecomputational complexity of DeepNC-EM is dependent onthe average degree in a given graph. Moreover, it is assertedthat an almost linear complexity in | V O | , i.e., Θ( | V O | (cid:15) ) fora small (cid:15) > , is attainable since the slopes are at most 1.41even for the relatively denser graph corresponding to c = 8 . ONCLUDING R EMARKS

In this paper, we introduced a novel method, termed

DeepNC , that infers both missing nodes and edges of anunderlying true network via deep learning. Speciﬁcally,we presented an approach to ﬁrst learning a likelihoodover edges via an RNN-based generative graph model byusing structurally similar graphs as training data and theninferring the missing parts of the network by applyingan imputation strategy for the missing data. Furthermore,we proposed two

DeepNC algorithms whose runtime com-plexities are almost linear in | V O | . Using various syntheticand real-world datasets, we demonstrated that our DeepNC algorithms not only remarkably outperform vGraphRNN,KronEM, and EvoGraph methods but also are robust tomany difﬁcult and challenging situations that often occurin real environments such as 1) a signiﬁcant portion of LFR B-A Protein Ego-CiteSeer Ego-Facebook00.20.40.60.81 G E D DeepNC-EMDeepNC-LvGraphRNNKronEMEvoGraph (a) 95% observability

LFR B-A Protein Ego-CiteSeer Ego-Facebook00.20.40.60.81 G E D DeepNC-EMDeepNC-LvGraphRNNKronEMEvoGraph (b) 90% observability

Fig. 8: Performance comparison in terms of GED (the lowerthe better), where the degree of observability in traininggraphs is set to { , } %.unobservable nodes, 2) training graphs that are only par-tially observable, or 3) a large portion of missing edgesbetween nodes in the observed network. Additionally, weanalytically and empirically showed the scalability of our DeepNC algorithms.Potential avenues of future research include the designof a uniﬁed framework for improving the performance ofvarious downstream mining and learning tasks such asmulti-label node classiﬁcation, community detection, andinﬂuence maximization when

DeepNC is adopted in par-tially observable networks. This would be challenging sincetask-speciﬁc preprocessing should be accompanied by net-work completion to guarantee satisfactory performance ofeach individual task. A CKNOWLEDGMENTS

This research was supported by the Yonsei University Re-search Fund of 2020 (2020-22-0101). R EFERENCES [1] G. Kossinets, “Effects of missing data in social networks,”

Soc.Netw. , vol. 28, no. 3, pp. 247–268, Jul. 2006.[2] A. Acquisti, L. Brandimarte, and G. Loewenstein, “Privacy andhuman behavior in the age of information,”

Science , vol. 347, no.6221, pp. 509–514, Jan. 2015.[3] R. Dey, Z. Jelveh, and K. Ross, “Facebook users have become muchmore private: A large-scale study,” in

Proc. IEEE Int. Conf. PervasiveComput. Commun. Worksh. , Lugano, Switzerland, Mar. 2012, pp.346–352.

10% 15% 20%00.10.20.30.40.50.6 G E D DeepNC-EMDeepNC-LvGraphRNNKronEMEvoGraph (a) The LFR dataset

10% 15% 20%00.10.20.30.40.50.60.7 G E D DeepNC-EMDeepNC-LvGraphRNNKronEMEvoGraph (b) The B-A dataset

Fig. 9: Performance comparison in terms of GED (the lowerthe better), where the degree of missingness in edges be-tween nodes in G O is set to { , , } % . | V O | E x ec u t i o n t i m e ( s ) c = 2 c = 4 c = 8 | V O | . with proper bias | V O | . with proper bias | V O | . with proper bias Fig. 10: The computational complexity of

DeepNC-EM ,where the log-log plot of the execution time versus | V O | isshown. [4] J. H. Koskinen, G. L. Robins, P. Wang, and P. E. Pattison, “Bayesiananalysis for partially observed network data, missing ties, at-tributes and actors,” Soc. Netw. , vol. 35, no. 4, pp. 514–527, Oct.2013.[5] M. Kim and J. Leskovec, “The network completion problem:Inferring missing nodes and edges in networks,” in

Proc. 2011SIAM Int. Conf. Data Mining (SDM ’11) , Mesa, AZ, USA, Apr. 2011,pp. 47–58.[6] C. Tran, W.-Y. Shin, and A. Spitz, “Community detection in par-tially observable social networks,” arXiv preprint arXiv:1801.00132 ,2017.[7] P. Sen, G. Namata, M. Bilgic, L. Getoor, B. Galligher, and T. Eliassi-Rad, “Collective classiﬁcation in network data,”

AI Magazine ,vol. 29, no. 3, pp. 93–106, 2008.[8] P. D. Dobson and A. J. Doig, “Distinguishing enzyme structuresfrom non-enzymes without alignments,”

J. Molecular Bio. , vol. 330,no. 4, pp. 771–783, Jul. 2003.[9] A. L. Traud, E. D. Kelsic, P. J. Mucha, and M. A. Porter, “Com-paring community structure to characteristics in online collegiate social networks,” SIAM Rev. , vol. 53, no. 3, pp. 526–543, Aug. 2011.[10] J. You, R. Ying, X. Ren, W. Hamilton, and J. Leskovec, “GraphRNN:Generating realistic graphs with deep auto-regressive models,” in

Proc. Int. Conf. Machine Learning (ICML ’18) , Stockholm, Sweden,Jul. 2018, pp. 5694–5703.[11] A. Bojchevski, O. Shchur, D. Z ¨ugner, and S. G ¨unnemann, “Net-GAN: Generating graphs via random walks,” in

Proc. Int. Conf.Machine Learning (ICML ’18) , Stockholm, Sweden, Jul. 2018, pp.609–618.[12] A. Sanfeliu and K.-S. Fu, “A distance measure between attributedrelational graphs for pattern recognition,”

IEEE Trans. Syst. ManCybernetics , vol. SMC-13, no. 3, pp. 353–362, Jun. 1983.[13] P. Erdos and A. R´enyi, “On random graphs I,”

Publ. Math. Debre-cen , vol. 6, pp. 290–297, 1959.[14] A.-L. Barab´asi and R. Albert, “Emergence of scaling in randomnetworks,”

Science , vol. 286, no. 5439, pp. 509–512, Oct. 1999.[15] J. Leskovec, D. Chakrabarti, J. Kleinberg, C. Faloutsos, andZ. Ghahramani, “Kronecker graphs: An approach to modelingnetworks,”

J. Mach. Learning Res. , vol. 11, pp. 985–1042, Feb. 2010.[16] R. Liao, Y. Li, Y. Song, S. Wang, W. Hamilton, D. K. Duvenaud,R. Urtasun, and R. Zemel, “Efﬁcient graph generation with graphrecurrent attention networks,” in

Proc. Advances Neural Inf. Process-ing Syst. (NIPS ’19) , Vancouver, Canada, Dec. 2019, pp. 4257–4267.[17] M. Simonovsky and N. Komodakis, “GraphVAE: Towards gener-ation of small graphs using variational autoencoders,” in

Proc. Int.Conf. Artiﬁcial Neural Netw. Machine Learning (ICANN ’18) , Rhodes,Greece, Oct. 2018, pp. 412–422.[18] T. N. Kipf and M. Welling, “Variational graph auto-encoders,” in

NIPS Worksh. Bayesian Deep Learning , Montral, Canada, Dec. 2018.[19] J. You, B. Liu, Z. Ying, V. Pande, and J. Leskovec, “Graphconvolutional policy network for goal-directed molecular graphgeneration,” in

Proc. Advances Neural Inf. Processing Syst. (NIPS’18) , Montral, Canada, Dec. 2018, pp. 6410–6421.[20] D. Zhou, L. Zheng, J. Xu, and J. He, “Misc-GAN: A multi-scalegenerative model for graphs,”

Fronti. Big Data , vol. 2, pp. 3:1–3:10,Apr. 2019.[21] Y. Li, O. Vinyals, C. Dyer, R. Pascanu, and P. Battaglia,“Learning deep generative models of graphs,” arXiv preprintarXiv:1803.03324 , 2018.[22] L. L ¨u and T. Zhou, “Link prediction in complex networks: Asurvey,”

Phys. A: Stat. Mech. Appl. , vol. 390, no. 6, pp. 1150–1170,Mar. 2011.[23] M. Zhang and Y. Chen, “Link prediction based on graph neuralnetworks,” in

Proc. Advances Neural Inf. Processing Syst. (NIPS ’18) ,Montreal, Canada, Dec. 2018, pp. 5165–5175.[24] R. Eyal, A. Rosenfeld, S. Sina, and S. Kraus, “Predicting andidentifying missing node information in social networks,”

ACMTrans. Knowl. Disc. Data , vol. 8, no. 3, pp. 14:1–14:35, Jun. 2014.[25] S. Sina, A. Rosenfeld, and S. Kraus, “Solving the missing nodeproblem using structure and attribute information,” in

Proc.2013 IEEE/ACM Int. Conf. Advances Social Netw. Analysis Mining(ASONAM ’13) , Niagara Falls, Canada, Aug. 2013, pp. 744–751.[26] H. Park and M.-S. Kim, “EvoGraph: An effective and efﬁcientgraph upscaling method for preserving graph properties,” in

Proc.24th ACM SIGKDD Int. Conf. Knowl. Disc. Data Mining (KDD ’18) ,London, United Kingdom, Aug. 2018, pp. 2051–2059.[27] T. H. McCormick, M. J. Salganik, and T. Zheng, “How manypeople you know?: Efﬁciently esimating personal network size,”

J. Am. Stat. Assoc. , vol. 105, no. 489, pp. 59–70, Sep. 2010.[28] J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, “Empirical evalua-tion of gated recurrent neural networks on sequence modeling,” in

Proc. Deep Learning and Representation Learning Worksh. , Montreal,Canada, Dec. 2014.[29] S. Hochreiter and J. Schmidhuber, “Long short-term memory,”

Neural Comput. , vol. 9, no. 8, pp. 1735–1780, Nov. 1997.[30] A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likeli-hood from incomplete data via the em algorithm,”

J. Royal Stat.Soc. Series B (Methodological) , vol. 39, no. 1, pp. 1–22, 1977.[31] M. E. Newman, “Random graphs as models of networks,”

Proc.National Acad. Sci. , vol. 99, no. 1, pp. 2566–2572, 2002.[32] A. Lancichinetti and S. Fortunato, “Benchmarks for testing com-munity detection algorithms on directed and weighted graphswith overlapping communities,”

Phys. Rev. E , vol. 80, no. 1, pp.016 118:1–016 118:8, Apr. 2009.[33] Z. Zeng, A. K. H. Tung, J. Wang, J. Feng, and L. Zhou, “Comparingstars: On approximating graph edit distance,”

Proc. VLDB Endow. ,vol. 2, no. 1, pp. 25–36, Aug. 2009. [34] A. Fischer, K. Riesen, and H. Bunke, “Improved quadratic timeapproximation of graph edit distance by combining Hausdorffmatching and greedy assignment,”

Pattern Recogn. Lett. , vol. 87,pp. 55–62, Feb. 2017.[35] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimiza-tion,” in

Proc. Int. Conf. Learning Rep. (ICLR ’15) , San Diego, CA,May 2015.[36] J. Leskovec and C. Faloutsos, “Sampling from large graphs,” in