[PDF] Adversarial Directed Graph Embedding

Abstract

Node representation learning for directed graphs is critically important to facilitate many graph mining tasks. To capture the directed edges between nodes, existing methods mostly learn two embedding vectors for each node, source vector and target vector. However, these methods learn the source and target vectors separately. For the node with very low indegree or outdegree, the corresponding target vector or source vector cannot be effectively learned. In this paper, we propose a novel Directed Graph embedding framework based on Generative Adversarial Network, called DGGAN. The main idea is to use adversarial mechanisms to deploy a discriminator and two generators that jointly learn each node's source and target vectors. For a given node, the two generators are trained to generate its fake target and source neighbor nodes from the same underlying distribution, and the discriminator aims to distinguish whether a neighbor node is real or fake. The two generators are formulated into a unified framework and could mutually reinforce each other to learn more robust source and target vectors. Extensive experiments show that DGGAN consistently and significantly outperforms existing state-of-the-art methods across multiple graph mining tasks on directed graphs.

Full PDF

AAdversarial Directed Graph Embedding

Shijie Zhu , Jianxin Li , Hao Peng , Senzhang Wang , Philip S. Yu and Lifang He Beijing Advanced Innovation Center for Big Data and Brain Computing, Beihang University College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics Department of Computer Science, University of Illinois at Chicago Department of Computer Science and Engineering, Lehigh University Beijing, China, Nanjing, China, Chicago, IL, USA, Bethlehem, PA, USA { zhusj,lijx,penghao } @act.buaa.edu.cn, [email protected], [email protected], [email protected] Abstract

Node representation learning for directed graphs iscritically important to facilitate many graph miningtasks. To capture the directed edges between nodes,existing methods mostly learn two embedding vec-tors for each node, source vector and target vector.However, these methods learn the source and targetvectors separately. For the node with very low in-degree or outdegree, the corresponding target vec-tor or source vector cannot be effectively learned.In this paper, we propose a novel Directed Graphembedding framework based on Generative Adver-sarial Network, called DGGAN. The main idea is touse adversarial mechanisms to deploy a discrimina-tor and two generators that jointly learn each node’ssource and target vectors. For a given node, the twogenerators are trained to generate its fake target andsource neighbor nodes from the same underlyingdistribution, and the discriminator aims to distin-guish whether a neighbor node is real or fake. Thetwo generators are formulated into a uniﬁed frame-work and could mutually reinforce each other tolearn more robust source and target vectors. Exten-sive experiments show that DGGAN consistentlyand signiﬁcantly outperforms existing state-of-the-art methods across multiple graph mining tasks ondirected graphs.

Graph embedding aims to learn a low-dimensional vector rep-resentation of each node in a graph, and has gained increasingresearch attention recently due to its wide and practical ap-plications, such as link prediction [Liben-Nowell and Klein-berg, 2007], graph reconstruction [Tsitsulin et al. , 2018],node recommendation [Ying et al. , 2018], and node classi-ﬁcation [Bhagat et al. , 2011].Most existing methods, such as DeepWalk [Perozzi et al. ,2014], node2vec [Grover and Leskovec, 2016], LINE [Tang et al. , 2015b], and GraphGAN [Wang et al. , 2018], are de-signed to handle undirected graphs, but ignore the directionsof the edges. While the directionality often contains impor-tant asymmetric semantic information in directed graphs such

A BC

Figure 1: The left ﬁgure is a toy example of directed graph. Consid-ering the edge from A to B , B is the target neighborhood of A and A is the source neighborhood of B . The right two ﬁgures are statisticsfrom the social network of Twitter [Cha et al. , 2010] and the citationnetwork of CiteSeer [Bollacker et al. , 1998], respectively. as social networks, citation networks and webpage networks.To preserve the directionality of the edges, some recent workstry to use two node embedding spaces to represent the sourcerole and the target role of the nodes, one corresponding to theoutgoing direction and one for the incoming direction.We argue that there are two major limitations for exist-ing directed graph embedding methods: (1) Methods likeHOPE [Ou et al. , 2016] rely on strict proximity measureslike Katz [Katz, 1953] and low rank assumption of the graph.Thus, they are difﬁcult to be generalized to different types ofgraphs [Khosla et al. , 2018]. Moreover, HOPE is not scalableto large graphs as it requires the entire graph matrix as inputand then adopts matrix factorization [Tsitsulin et al. , 2018].(2) Existing shallow methods focus on preserving the struc-ture proximities but ignore the underlying distribution of thenodes. Methods like APP [Zhou et al. , 2017] adopt directedrandom walk sampling technique which follows the outgoingdirection to sample node pairs. These methods utilize neg-ative sampling technique to randomly select existing nodesfrom the graph as negative samples. However, for nodes withonly outgoing edges or only incoming edges, the target orsource vectors cannot be effectively trained. Figure 1 presentsa toy example. Although both of the nodes A and C have no a r X i v : . [ c s . S I] A ug ncoming edges, it is more likely to exist a edge from C to A than the other way round. However, the proximities of thenode pairs ( A, C ) and ( C, A ) predicted by APP are both zero,since the two node pairs are regarded as negative samples.The several proximity measurements introduced by HOPElike Katz [Katz, 1953] all predict both proximities to be zero,as well. As shown in Figure 1, the nodes with 0 indegree or0 outdegree (e.g., A and B ) account for a large proportion ofthe graph. The directed graph embedding methods mentionedabove treat the source and target role of each node separately,which causes these methods not robust. However, a node’ssource role and target role are two types of properties of thenode and are likely to be implicitly related. For instance, onsocial network like Twitter, fans who follow a star may befollowed by other fans with common interests.In this paper, we propose DGGAN, a novel Directed Graphembedding framework based on Generative Adversarial Net-work (GAN) [Goodfellow et al. , 2014]. Speciﬁcally, we trainone discriminator and two generators which jointly generatethe target and source neighborhoods for each node from thesame underlying continuous distribution. Compared with ex-isting methods, DGGAN generates fake nodes directly from acontinuous distribution and is not sensitive to different graphstructures. Furthermore, the two generators are formulatedinto a uniﬁed framework and could naturally beneﬁt fromeach other for better generations. Under such framework,DGGAN could learn an effective target vector for the node A in Figure 1, and will predict a high proximity for the node pair ( C, A ) . The discriminator is trained to distinguish whetherthe generated neighborhood is real or fake. Competition be-tween the generators and discriminator drives both of themto improve their capability until the generators are indistin-guishable from the true connectivity distribution.The key contributions of this paper are as follows: • To the best of our knowledge, DGGAN is the ﬁrst deep(GAN-based) method for directed graph embedding thatcould jointly learn the source vector and target vector foreach node. • The two generators deployed in DGGAN are also able togenerate effective negative samples for nodes with lowor zero out- or in- degree, which makes the model learnmore robust node embeddings across various graphs. • Through extensive experiments on four real-world net-work datasets, we present that the proposed DGGANmethod consistently and signiﬁcantly outperforms var-ious state-of-the-art methods in the link prediction taskand also in the graph reconstruction task.

Graph embedding methods can be classiﬁed into three cat-egories: matrix factorization-based models, random walk-based models and deep learning-based models. The ma-trix factorization-based models, such as GraRep [Cao et al. ,2015] and M-NMF [Wang et al. , 2017] ﬁrst preprocess ad-jacency matrix which preserves the graph structure, and then decompose the prepocessed matrix to obtain graph embed-dings. It has been shown that many recent emergence randomwalk-based models such as DeepWalk [Perozzi et al. , 2014],LINE [Tang et al. , 2015b], PTE [Tang et al. , 2015a] andnode2vec [Grover and Leskovec, 2016] can be uniﬁed into thematrix factorization framework with closed forms [Qiu et al. ,2018]. The deep learning-based models like SDNE [Wang etal. , 2016] and DNGR [Cao et al. , 2016] learn graph embed-dings by deep autoencoder model.

Adversarial Graph Embedding

Recently, Generative Ad-versarial Network (GAN) [Goodfellow et al. , 2014] has re-ceived increasing attention due to its impressing performanceon the unsupervised task. GAN can be viewed as a minimaxgame between generator G and discriminator D . Formally,the objective function is deﬁned as follows: min θ G max θ D E x ∼ p data ( x ) (cid:2) log D ( x ; θ D ) (cid:3) + E z ∼ p z ( z ) (cid:2) log(1 − D ( G ( z ; θ G ); θ D )) (cid:3) , (1)where θ G and θ D denote the parameters of G and D , respec-tively. G tries to generate close-to-real fake samples with thenoise z from a predeﬁned distribution p z ( z ) . While D aimsto distinguish the real ones from the distribution p data ( x ) and the fake samples. Several methods have been proposedto apply GAN for graph embedding to improve models ro-bustness and generalization. GraphGAN [Wang et al. , 2018]generates the sampling distribution to sample negative nodes.ANE [Dai et al. , 2018] imposes a prior distribution on graphembeddings through adversarial learning. NetRA [Yu etal. , 2018] and ARGA [Pan et al. , 2019] adopt adversari-ally regularized autoencoders to learn smoothly embeddings.DWNS [Dai et al. , 2019] applies adversarial training bydeﬁning adversarial perturbations in embeddings space. The methods mentioned above mainly focus on undirectedgraphs and thus cannot capture the directions of edges. Thereare some works for directed graph embedding, which com-monly learn source embedding and target embedding for eachnode. HOPE [Ou et al. , 2016] derives the node-similaritymatrix by approximating high-order proximity measures likeKatz measure [Katz, 1953] and Rooted PageRank [Song etal. , 2009], and then decomposes the node-similarity matrixto obtain node embeddings. APP [Zhou et al. , 2017] is adirected random walk-based method to implicitly preserveRooted PageRank proximity. NERD [Khosla et al. , 2018]uses an alternating random walk strategy to sample nodeneighborhoods from a directed graph. ATP [Sun et al. , 2019]incorporates graph hierarchy and reachability to construct theasymmetric matrix. However, these methods are all shallowmethods, failing to capture the highly non-linear property ingraphs and learn robust node embeddings.

In this section, we will ﬁrst introduce the notations to be used.Then we will present an overview of DGGAN, followed bydetailed descriptions of our generator and discriminator. enerators Fake Node Fake SourceNeighbor Fake TargetNeighbor DiscriminatorPositive Sampling Positive SampleNegative Sample realfakefakefakefakePredictionGradient s G t G s G t G s u t u s v t v s u t u s v t v u v u vuv u vuv s u u t u u s v v t v v Figure 2: The architecture of DGGAN. The node pair ( u, v ) denotes real node pair. For node u , the two generators share a underlyingdistribution and jointly generate a fake source neighbor u s and a fake target neighbor u t . Likewise, the fake source and target neighborscan be generated for node v . Those fake node pairs aim to fool the discriminator with highest probability, while discriminator is trained todistinguish between the real node pair and fake node pair. We deﬁne a directed graph as G = {V , E} , where V is thenode set, and E is the directed edge set. For nodes u, v ∈ V , ( u, v ) ∈ E represents a directed edge from u to v . To preservethe asymmetric proximity, each node u needs to have twodifferent roles, the source and target roles, represented by d dimensional vector s u ∈ R d × and t u ∈ R d × , respectively. The objective of DGGAN is to jointly learn the source andtarget vectors for each node on a directed graph. Figure2 demonstrates the proposed framework of DGGAN, whichmainly consists of two components: generator and discrim-inator. Given a node (e.g., u ), two generators are deployedto jointly generate its fake source neighborhood and targetneighborhood from the same underlying continuous distri-bution. And one discriminator is set to distinguish whetherthe source neighborhood and the target neighborhood of thegiven node are real or fake. With the minimax game betweengenerators and discriminator, DGGAN is able to learn morerobust source embeddings and target embeddings for nodeswith low indegree or outdegree, even the nodes with zero in-degree or outdegree (e.g., u and v ). Next, we introduce thedetails of generator and discriminator. Directed, Generalized and Robust Generator

The goal of our generator G is threefold: (1) It should gener-ate close-to-real fake samples concerning speciﬁc direction.Thus, given a node u ∈ V , the generator G aims to generate afake source neighborhood u s and a fake target neighborhood u t where u s and u t should be as close as possible to the realnodes. (2) It should be generalized to non-existent nodes. Inother words, the fake nodes u s and u t can be latent and not restricted to the original graph. (3) It should be able to gener-ate efﬁcient fake source and target neighborhoods for nodeswith low or zero out- or in- degree.To address the ﬁrst aim, we design the generator G con-sisting of two types of generators: one source neighborhoodgenerator G s and one target neighborhood generator G t . Forthe second and third aims, we propose to introduce a latentvariable z ∼ p z ( z ) shared between G s and G t to generatesamples. Rather than directly generating samples from p z ( z ) ,we integrate the multi-layer perception (MLP) into the gen-erator for enhancing the expression of the fake samples, asdeep neural networks have shown strong ability in capturingthe highly non-linear property in a network [Gao et al. , 2019;Hu et al. , 2019]. Therefore, our generator G is formulated asfollows: G s ( u ; θ G s ) = f s ( z ; θ f s ) , G t ( u ; θ G t ) = f t ( z ; θ f t ) ,G ( u ; θ G ) = { G s ( u ; θ G s ) , G t ( u ; θ G t ) } , (2)where f s and f t are implemented by MLP. θ f s and θ f t de-note the parameters of f s and f t , respectively. z serves as abridge between G s and G t . With the help of z , G s and G t arenot independent, and update each other indirectly to generatebetter fake source neighbor and target neighbor. Particularly,we derive z from the following Gaussian distribution: p z ( z ) = N ( z T u , σ I ) , (3)where z u ∈ R d × is a learnable variable and stands for thelatent representation of u ∈ V . The parameters of G s and G t are thus θ G s = { z T u : u ∈ V , θ f s } and θ G t = { z T u : u ∈V , θ f t } , respectively. Since θ G s and θ G t share parameter z T u ,the parameters of the generator G can be obtained as follows: θ G = { θ G s , θ G t } = { z T u : u ∈ V , θ f s , θ f t } . (4)he generator G aims to fool the discriminator D by gener-ating close-to-real fake samples. To this end, the loss functionof the generator is deﬁned as follows: L G = E u ∈V log (1 − D ( u s , u )) + log (cid:0) − D ( u, u t ) (cid:1) , (5)where u s and u t denote the fake source neighborhood andfake target neighborhood of u , respectively. D outputs theprobability that the input node pair is real, and will be in-troduced in the next subsection. The source vector of u s and target vector of u t can be obtained by G s and G t , i.e., s u s ∼ G s ( u ; θ G s ) , t u t ∼ G t ( u ; θ G t ) . The parameters θ G ofthe generator can be optimized by minimizing L G . Directed Discriminator

The discriminator D tries to distinguish the positive samplesfrom the input graph G and the negative samples produced bythe generator G . Thus, D could enforce G to more accuratelyﬁt the real graph distribution p G . Note that for a given nodepair ( u, v ) , D essentially outputs a probability that the sample v is connected to u in the outgoing direction. For this purpose,we deﬁne the D as the sigmoid function of the inner productof the input node pair ( u, v ) : D ( u, v ; θ D ) = 11 + exp( − s u T · t v ) , (6)where θ D = { s u , t u : u ∈ V} is the parameter for D , i.e., theunion of source role embeddings and target role embeddingsof all real node on the observed G . Speciﬁcally, the input nodepair can be divided into the following two cases. Positive Sample

There indeed exists a directed edge from u to v on the G , i.e., ( u, v ) ∈ E , such as ( u, v ) shown in Figure2. Such node pair ( u, v ) is considered positive and can bemodeled by the following loss: L D pos = E ( u,v ) ∼ p G − log D ( u, v ) . (7) Negative Sample

For a given node u ∈ V , u s and u t denote its fake source neighborhood and fake target neigh-borhood generated by G s and G t , respectively, i.e., s u s ∼ G s ( u ; θ G s ) , t u t ∼ G t ( u ; θ G t ) , such as ( u s , u ) and ( u, u t ) shown in Figure 2. Such node pairs ( u s , u ) and ( u, u t ) areconsidered negative and can be modeled by the followingloss: L D neg = E u ∈V − log (1 − D ( u s , u )) − log (cid:0) − D ( u, u t ) (cid:1) (8)Note that the fake node embedding s u s and t u t are not in-cluded in θ D and the discriminator D simply treats them asnon-learnable input.We integrate above two parts to train the discriminator: L D = L D pos + L D neg . (9)The parameters θ D of the discriminator can be optimized byminimizing L D . In each training epoch, we alternate the training between thediscriminator D and generator G with mini-batch gradient de-scent. Speciﬁcally, we ﬁrst ﬁx θ G and the two generators Algorithm 1

DGGAN framework

Require : directed graph G , number of maximum trainingepochs n epoch , numbers of generator and discriminator train-ing iterations per epoch n G , n D , number of samples n s Ensure : θ G , θ D Initialize θ G and θ D for G and D , respectively for epoch = 0; epoch < n epoch do for n = 0; n < n D do Generate n s fake source neighborhoods u s , v s andfake target neighborhoods u t , v t for each node pair ( u, v ) ∈ E Update θ D according to Eq.(9) end for for n = 0; n < n G do Generate n s fake source neighborhoods u s and faketarget neighborhoods u t for each node u ∈ V Update θ G according to Eq.(5) end for end for jointly generate fake neighborhoods for each node pair onthe graph to optimize θ D . Then we ﬁx θ D and optimize θ G to generate close-to-real fake neighborhoods for each nodeunder the guidance of the discriminator D . The discrimina-tor and generator play against each other until DGGAN con-verges. The overall training algorithm for DGGAN is sum-marized in Algorithm 1. In this section, we conduct extensive experiments on severaldatasets to investigate the performance of DGGAN.

We use four different types of directed graphs, including ci-tation network, social network, trust network and hyperlinknetwork to evaluate the performance of the model. The de-tails of the data are described as follows:

Cora is a citationnetwork of academic papers. It contains 23,166 nodes repre-senting academic papers and 91,500 directed edges which in-dicate the citation relationships between papers. Twitter is asocial network. It contains 465,017 nodes representing usersand 834,797 directed edges which are following relationshipsbetween users. Epinions is a trust network from the onlinesocial network Epinions. It contains 75,879 nodes represent-ing users and 508,837 directed edges which represent trustbetween users. Google is a hyperlink network from pageswithin Google’s sites. It contains 15,763 nodes representingpages and 171,206 directed edges which represent hyperlinkbetween pages. To verify the performance of DGGAN, we compare it withseveral state-of-the-art methods. http://konect.uni-koblenz.de/networks/subelj cora http://konect.uni-koblenz.de/networks/munmun twitter social http://konect.uni-koblenz.de/networks/soc-Epinions1 http://konect.uni-koblenz.de/networks/cﬁnder-googleable 1: Area Under Curve (AUC) scores of link prediction on directed graphs with different fractions of positive edges except bi-directionaledges been reversed to create negative edges in the test set. Method Cora Twitter Epinions Google0% 50% 100% 0% 50% 100% 0% 50% 100% 0% 50% 100%DeepWalk • Traditional undirected graph embedding methods :DeepWalk [Perozzi et al. , 2014] uses local information ob-tained from truncated random walks to learn node embed-dings. LINE [Tang et al. , 2015b] learns large-scale informa-tion network embedding using ﬁrst-order and second-orderproximities. node2vec [Grover and Leskovec, 2016] is a vari-ant of DeepWalk and utilizes a biased random walk algorithmto more efﬁciently explore the neighborhood architecture. • GAN-based undirected graph embedding methods :GraphGAN [Wang et al. , 2018] generates the sampling dis-tribution to sample negative nodes from the graph. ANE [Dai et al. , 2018] proposes to train a discriminator to push the em-bedding distribution to match the ﬁxed prior. • Directed graph embedding methods : HOPE [Ou etal. , 2016] preserves the asymmetric role information ofthe nodes by approximating high-order proximity measures.APP [Zhou et al. , 2017] proposes a random walk basedmethod to encode Rooted PageRank proximity. • DGGAN* is a simpliﬁed version of our proposed DG-GAN which uses only one generator G t to generate targetneighborhoods of each node. We omit another simpliﬁed ver-sion which uses only one generator G s as we do not observe asigniﬁcant performance difference compared with DGGAN*.For DeepWalk, node2vec and APP, the number of walks,the walk length and the window size are set to 10, 80 and10, respectively, for fair comparision. For LINE, we utilizeboth the ﬁrst-order the second-order proximities, and for thesecond-order proximities, node embeddings are considered assource embeddings, and context embeddings are used as tar-get embeddings. In addition, the number of negative samplesis empirically set to 5. For GraphGAN, ANE and HOPE, wefollow the parameters settings in the original papers. Notethat we do not report the results of GraphGAN on Twitterand Epinions datasets, since it cannot run on these two largedatasets. For DGGAN* and DGGAN, we choose parametersby cross validation and we ﬁx the numbers of generator anddiscriminator training iterations per epoch n G = 5 , n D = 15 across all datasets and tasks. The dimension of node embed-dings is set to 128 for all methods. In link prediction task, we predict missing edges given a net-work with a fraction of removed edges. A fraction of edgesis removed randomly to serve as test split while the remain-ing network can be utilized for training. When removingedges randomly, we make sure that no node is isolated toavoid meaningless embedding vectors. Speciﬁcally, we re-move 50% of edges for Cora, Epinions and Google datasets,and 40% of edges for Twitter dataset. Note that the test splitis balanced with negative edges sampled from random nodepairs that have no edges between them. Since we are inter-ested in both the existence of the edge between two nodes andthe direction of the edge, we reverse a fraction of node pairs inthe positive samples to replace the original negative samplesif the edges are not bi-directional. A value in (0; 1] deter-mines what fraction of positive edges from the test split areinverted at most to create negative examples. And a value of0 corresponds to the classical undirected graph setting whereall the negative edges are sampled from random node pairs.We summarize the Area Under Curve (AUC) scores for allmethods in Table 1. Note that some methods like DeepWalkwhich mainly focus on undirected graphs, also achieve goodperformance on Cora dataset with random negative edges intest set. But their performance decreases rapidly with the in-crease of reversed positive edges as they cannot model theasymmetric proximity, and their AUC scores are near 0.5 asexpected. HOPE shows good performance on Twitter datasetbut does not perform well on other datasets like Cora andEpinions. It suggests that HOPE is difﬁcult to be generalizedto different types of graphs as mentioned above. Note thaton Epinions dataset, up to 31.5% nodes have no incomingedges and 20.5% nodes have no outgoing edges. The directedgraph embedding methods like APP show poor performanceon Epinions dataset. The reason is that these methods treat thesource role and target role of one node separately, which ren-ders them not robust. We can see that DGGAN* shows muchbetter performance than HOPE and APP across datasets. Thisis because the negative samples of DGGAN* are generateddirectly from a continuous distribution and thus DGGAN*is not sensitive to different graph structures. Moreover, DG-GAN outperforms DGGAN* as DGGAN utilizes two gener-ators which mutually update each other to learn more robust k P r e c i s i o n @ k DeepWalkLINE-1LINE-2node2vecHOPEAPPDGGAN (a) Google k P r e c i s i o n @ k DeepWalkLINE-1LINE-2node2vecHOPEAPPDGGAN (b) EpinionsFigure 3: Precision@k of graph reconstruction on two datasets. source and target vectors. Compared with baselines, the per-formance of DGGAN does not change much with differentfractions of reversed positive test edges. Overall, DGGANshows more robustness across datasets and outperforms allmethods on link prediction.

As the effective representations of a graph, node embeddingsmaintain the edge information and are expected to well re-construct the original graph. We reconstruct the graph edgesbased on the reconstructed proximity between nodes. Sinceeach adjacent node should be close in the embedding space,we use inner product between node vectors to reconstruct theproximity matrix. For a given k , we obtain the k -nearesttarget neighborhoods ranked by reconstructed proximity foreach method. We perform the graph reconstruction task onGoogle and Epinions datasets. In order to create the test setwe randomly sample 10% of the nodes on each graph.We plot the average precision corresponding to differentvalues of k in Figure 3. The results show that for bothdatasets, DGGAN outperforms baselines including HOPEand APP, especially when k is small. For Google dataset,DGGAN shows an improvement of around 33% for k = 1 over the second best performing method, HOPE. This showsthe beneﬁt of jointly learning the source and target vectorsfor each node. Some of the methods that focus on undi-rected graphs like node2vec exhibited good performance inlink prediction. However, these methods show poor perfor-mance in graph reconstruction. This is because this task isharder than link prediction as the model needs to distinguishbetween small number of positive edges with a huge num-ber of negative edges. Besides, we note that all the precisioncurves converge to points with small values when k becomeslarge since most of the real target neighborhoods have beencorrectly predicted by these methods. In this subsection, we analyze the performance of differentmodels under different levels of sparsity of networks andthe converging performance of DGGAN. We choose Googledataset as it is much denser than the others. We ﬁrst investi-gate how the sparsity of the networks affects the three directedgraph embedding methods HOPE, APP and DGGAN. Thesetting of training procedure in this experiment is the same aslink prediction and 50% positive edges of test set are reversed

10 20 30 40 50 60 70 80 90

Training ratio (%) A U C s c o r e HOPEAPPDGGAN (a) Sparsity

Iteration A U C s c o r e Reversed=0.0Reversed=0.5Reversed=1.0 (b) Learning curveFigure 4: Performance change on Google link prediction task. to form negative edges. We randomly select different ratiosof edges from the original network to construct networks withdifferent levels of sparsity.Figure 4(a) shows the results w.r.t. the training ratio ofedges on Google dataset. One can see that DGGAN consis-tently and signiﬁcantly outperforms HOPE and APP acrossdifferent training ratios. Moreover, DGGAN still achievesmuch better performance when the network is very sparse.While HOPE and APP extremely suffer from nodes with verylow outdegree or indegree as mentioned before. It demon-strates that the novel adversarial learning framework DG-GAN, which is designed to jointly learn a node’s source andtarget vectors, can signiﬁcantly improve the robustness.Next, we investigate performance change w.r.t. the train-ing iterations of the discriminator D . Recall that we setthe parameter of discriminator training iterations per epoch n D = 15 . Figure 4(b) shows the converging performanceof DGGAN on Google dataset with different percentage ofreversed positive edges of test set (results on other datasetsshow similar trends and are not included here). With the in-crease of iterations of D , the performance of Reversed=0.0(i.e. random negative edges in test set) keeps stable ﬁrst,and then slightly increases. Besides, the training curve trendof Reversed=1.0 (i.e. all positive edges except bi-directionaledges are reversed to create negative edges in test set) changesevery 15 iterations (i.e. one epoch). Note that the trainingcurve trend of Reversed=1.0 rises gently during second epoch(i.e. iteration [16 , ) for the generator G still been poorlytrained at the moment. The trend rises steep in the followingepoch for G being able to generate close-to-real fake samples. In this paper, we proposed DGGAN, a novel directed graphembedding framework based on GAN. Speciﬁcally, we de-signed two generators which generate fake source neigh-borhood and target neighborhood for each node directlyfrom same continuous distribution. With the jointly learn-ing framework, the two generators can be mutually enhanced,which renders the proposed DGGAN generalized for var-ious graphs and more robust to learn node embeddings.The experimental results on four real-world directed graphdatasets demonstrated that DGGAN consistently and signiﬁ-cantly outperforms various state-of-the-arts on link predictionand graph reconstruction tasks. eferences [Bhagat et al. , 2011] Smriti Bhagat, Graham Cormode, andS Muthukrishnan. Node classiﬁcation in social networks.In

Social network data analytics , pages 115–148. Springer,2011.[Bollacker et al. , 1998] Kurt D Bollacker, Steve Lawrence,and C Lee Giles. Citeseer: An autonomous web agent forautomatic retrieval and identiﬁcation of interesting publi-cations. In

Proceedings of the second international confer-ence on Autonomous agents , pages 116–123. ACM, 1998.[Cao et al. , 2015] Shaosheng Cao, Wei Lu, and QiongkaiXu. Grarep: Learning graph representations with globalstructural information. In

CIKM , pages 891–900. ACM,2015.[Cao et al. , 2016] Shaosheng Cao, Wei Lu, and QiongkaiXu. Deep neural networks for learning graph represen-tations. In

AAAI , 2016.[Cha et al. , 2010] Meeyoung Cha, Hamed Haddadi, FabricioBenevenuto, and Krishna P Gummadi. Measuring userinﬂuence in twitter: The million follower fallacy. In

AAAI ,2010.[Dai et al. , 2018] Quanyu Dai, Qiang Li, Jian Tang, and DanWang. Adversarial network embedding. In

AAAI , 2018.[Dai et al. , 2019] Quanyu Dai, Xiao Shen, Liang Zhang,Qiang Li, and Dan Wang. Adversarial training methodsfor network embedding. In

WWW , pages 329–339, 2019.[Gao et al. , 2019] Hongchang Gao, Jian Pei, and HengHuang. Progan: Network embedding via proximity gen-erative adversarial network. In

KDD , pages 1308–1316,2019.[Goodfellow et al. , 2014] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley,Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Gen-erative adversarial nets. In

NIPS , pages 2672–2680, 2014.[Grover and Leskovec, 2016] Aditya Grover and JureLeskovec. node2vec: Scalable feature learning fornetworks. In

KDD , pages 855–864. ACM, 2016.[Hu et al. , 2019] Binbin Hu, Yuan Fang, and Chuan Shi. Ad-versarial learning on heterogeneous information networks.In

KDD , pages 120–129, 2019.[Katz, 1953] Leo Katz. A new status index derived from so-ciometric analysis.

Psychometrika , 18(1):39–43, 1953.[Khosla et al. , 2018] Megha Khosla, Jurek Leonhardt, Wolf-gang Nejdl, and Avishek Anand. Node represen-tation learning for directed graphs. arXiv preprintarXiv:1810.09176 , 2018.[Liben-Nowell and Kleinberg, 2007] David Liben-Nowelland Jon Kleinberg. The link-prediction problem for socialnetworks.

JASIST , 58(7):1019–1031, 2007.[Ou et al. , 2016] Mingdong Ou, Peng Cui, Jian Pei, ZiweiZhang, and Wenwu Zhu. Asymmetric transitivity preserv-ing graph embedding. In

KDD , pages 1105–1114. ACM,2016. [Pan et al. , 2019] Shirui Pan, Ruiqi Hu, Sai-fu Fung,Guodong Long, Jing Jiang, and Chengqi Zhang. Learninggraph embedding with adversarial training methods. arXivpreprint arXiv:1901.01250 , 2019.[Perozzi et al. , 2014] Bryan Perozzi, Rami Al-Rfou, andSteven Skiena. Deepwalk: Online learning of social rep-resentations. In

KDD , pages 701–710. ACM, 2014.[Qiu et al. , 2018] Jiezhong Qiu, Yuxiao Dong, Hao Ma, JianLi, Kuansan Wang, and Jie Tang. Network embeddingas matrix factorization: Unifying deepwalk, line, pte, andnode2vec. In

WSDM , pages 459–467. ACM, 2018.[Song et al. , 2009] Han Hee Song, Tae Won Cho, VachaDave, Yin Zhang, and Lili Qiu. Scalable proximity esti-mation and link prediction in online social networks. In

IMC , pages 322–335. ACM, 2009.[Sun et al. , 2019] Jiankai Sun, Bortik Bandyopadhyay,Armin Bashizade, Jiongqian Liang, P Sadayappan, andSrinivasan Parthasarathy. Atp: Directed graph embeddingwith asymmetric transitivity preservation. In

AAAI ,volume 33, pages 265–272, 2019.[Tang et al. , 2015a] Jian Tang, Meng Qu, and Qiaozhu Mei.Pte: Predictive text embedding through large-scale hetero-geneous text networks. In

KDD , pages 1165–1174. ACM,2015.[Tang et al. , 2015b] Jian Tang, Meng Qu, Mingzhe Wang,Ming Zhang, Jun Yan, and Qiaozhu Mei. Line: Large-scale information network embedding. In

WWW , pages1067–1077. International World Wide Web ConferencesSteering Committee, 2015.[Tsitsulin et al. , 2018] Anton Tsitsulin, Davide Mottin,Panagiotis Karras, and Emmanuel M¨uller. Verse: Ver-satile graph embeddings from similarity measures. In

WWW , 2018.[Wang et al. , 2016] Daixin Wang, Peng Cui, and WenwuZhu. Structural deep network embedding. In

KDD , pages1225–1234. ACM, 2016.[Wang et al. , 2017] Xiao Wang, Peng Cui, Jing Wang, JianPei, Wenwu Zhu, and Shiqiang Yang. Community pre-serving network embedding. In

AAAI , 2017.[Wang et al. , 2018] Hongwei Wang, Jia Wang, Jialin Wang,Miao Zhao, Weinan Zhang, Fuzheng Zhang, Xing Xie, andMinyi Guo. Graphgan: Graph representation learning withgenerative adversarial nets. In

AAAI , 2018.[Ying et al. , 2018] Rex Ying, Ruining He, Kaifeng Chen,Pong Eksombatchai, William L Hamilton, and JureLeskovec. Graph convolutional neural networks for web-scale recommender systems. In

KDD . ACM, 2018.[Yu et al. , 2018] Wenchao Yu, Cheng Zheng, Wei Cheng,Charu C Aggarwal, Dongjin Song, Bo Zong, HaifengChen, and Wei Wang. Learning deep network represen-tations with adversarially regularized autoencoders. In

KDD , pages 2663–2671. ACM, 2018.[Zhou et al. , 2017] Chang Zhou, Yuqiong Liu, Xiaofei Liu,Zhongyi Liu, and Jun Gao. Scalable graph embedding forasymmetric proximity. In