[PDF] Decoupled Variational Embedding for Signed Directed Networks

Abstract

Node representation learning for signed directed networks has received considerable attention in many real-world applications such as link sign prediction, node classification and node recommendation. The challenge lies in how to adequately encode the complex topological information of the networks. Recent studies mainly focus on preserving the first-order network topology which indicates the closeness relationships of nodes. However, these methods generally fail to capture the high-order topology which indicates the local structures of nodes and serves as an essential characteristic of the network topology. In addition, for the first-order topology, the additional value of non-existent links is largely ignored. In this paper, we propose to learn more representative node embeddings by simultaneously capturing the first-order and high-order topology in signed directed networks. In particular, we reformulate the representation learning problem on signed directed networks from a variational auto-encoding perspective and further develop a decoupled variational embedding (DVE) method. DVE leverages a specially designed auto-encoder structure to capture both the first-order and high-order topology of signed directed networks, and thus learns more representative node embedding. Extensive experiments are conducted on three widely used real-world datasets. Comprehensive results on both link sign prediction and node recommendation task demonstrate the effectiveness of DVE. Qualitative results and analysis are also given to provide a better understanding of DVE.

Full PDF

DDecoupled Variational Embedding for Signed DirectedNetworks

XU CHEN,

Cooperative Medianet Innovation Center, Shanghai Jiao Tong University, China

JIANGCHAO YAO,

Cooperative Medianet Innovation Center, Shanghai Jiao Tong University, China

MAOSEN LI,

Cooperative Medianet Innovation Center, Shanghai Jiao Tong University, China

YA ZHANG ∗ , Cooperative Medianet Innovation Center, Shanghai Jiao Tong University, China

YANFENG WANG,

Cooperative Medianet Innovation Center, Shanghai Jiao Tong University, ChinaNode representation learning for signed directed networks has received considerable attention in many real-world applications such as link sign prediction, node classification and node recommendation. The challengelies in how to adequately encode the complex topological information of the networks. Recent studies mainlyfocus on preserving the first-order network topology which indicates the closeness relationships of nodes.However, these methods generally fail to capture the high-order topology which indicates the local structuresof nodes and serves as an essential characteristic of the network topology. In addition, for the first-order topology, the additional value of non-existent links is largely ignored. In this paper, we propose to learnmore representative node embeddings by simultaneously capturing the first-order and high-order topologyin signed directed networks. In particular, we reformulate the representation learning problem on signeddirected networks from a variational auto-encoding perspective and further develop a decoupled variationalembedding (DVE) method. DVE leverages a specially designed auto-encoder structure to capture both the first-order and high-order topology of signed directed networks, and thus learns more representative nodeembeddings. Extensive experiments are conducted on three widely used real-world datasets. Comprehensiveresults on both link sign prediction and node recommendation task demonstrate the effectiveness of DVE.Qualitative results and analysis are also given to provide a better understanding of DVE. Codes are availableonline: https://github.com/xuChenSJTU/DVE-masterCCS Concepts: •

Information systems → Social networks .Additional Key Words and Phrases: decoupled variational embedding, signed directed networks, graphconvolution, network embedding

ACM Reference Format:

Xu Chen, Jiangchao Yao, Maosen Li, Ya Zhang, and Yanfeng Wang. 2019. Decoupled Variational Embeddingfor Signed Directed Networks.

ACM Trans. Web

1, 1 (August 2019), 30 pages. https://doi.org/10.1145/nnnnnnn.nnnnnnn ∗ Prof. Ya Zhang is the corresponding author.Authors’ addresses: Xu Chen, Cooperative Medianet Innovation Center, Shanghai Jiao Tong University, 800 Dongchuan Rd,Shanghai, China, [email protected]; Jiangchao Yao, Cooperative Medianet Innovation Center, Shanghai Jiao TongUniversity, Shanghai, China, [email protected]; Maosen Li, Cooperative Medianet Innovation Center, Shanghai JiaoTong University, Shanghai, China, [email protected]; Ya Zhang, Cooperative Medianet Innovation Center, ShanghaiJiao Tong University, Shanghai, China, [email protected]; Yanfeng Wang, Cooperative Medianet Innovation Center,Shanghai Jiao Tong University, Shanghai, China, [email protected] to make digital or hard copies of all or part of this work for personal or classroom use is granted without feeprovided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and thefull citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored.Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requiresprior specific permission and/or a fee. Request permissions from [email protected].© 2019 Copyright held by the owner/author(s). Publication rights licensed to ACM.1559-1131/2019/8-ART $15.00https://doi.org/10.1145/nnnnnnn.nnnnnnn ACM Trans. Web, Vol. 1, No. 1, Article . Publication date: August 2019. a r X i v : . [ c s . S I] A ug Xu Chen, et al.

In recent years, learning node representation on graphs, which is called network embeddingor graph embedding, has drawn great interest among various academic topics. Study on thisfield benefits many learning paradigms, such as semi-supervised learning [20, 29] and relationalinference [2, 10, 28] as well as some practical data mining tasks, such as link prediction [15, 35],community detection [14, 40, 69] and node classification [4, 64].Many social networks have both directed and signed (positive and negative) links, such asEpinions and Slashdot , which are called signed directed networks. Negative links in socialnetworks hold opposite semantic meaning and contain additional information [6, 31, 58] that helpsmany tasks, e.g. link sign prediction and node classification. In addition, link direction indicates theasymmetric relationship between two nodes, which is important for node (i.e. user) recommendationin social networks [1, 39]. For example, stars may not follow common people while common peopletend to follow stars. The key of representation learning on signed directed networks lies in howto encode the complex topological information into low-dimensional embeddings for nodes. Inparticular, the topological information is composed of both the high-order and the first-order topology. The high-order topology indicates the local structures since it is generated by informationpropagation of a node’s neighbors and the first-order topology indicates the closeness relationshipsbetween a node and its directly linked neighbors. Both the high-order and first-order topology areintrinsic characteristics of signed directed networks. To make it more clearly, we give an examplein Figure 1.However, existing embedding methods fail to capture both the first-order and high-order topologyfor signed directed networks. Firstly, the majority of them concentrate on how to mine the first-order topology, namely preserving the closeness relationships of nodes. For example, MF [22] performsmatrix factorization on the signed directed adjacent matrix to learn low-dimensional embeddingsfor nodes. SNE [68] exploits random walk and log-bilinear model to learn node embeddings withsigned links. SiNE [63] learns node embeddings through a deep neural network model basedon social theory. These works model the closeness relationships in restrictive distance metricsor usually ignore the additional value of non-existent links. Secondly, the high-order topology,indicating the local structures of nodes, is difficult to be extracted in signed directed networks,because it is coupled with signed directed links. Different signs and directions have distinctiveinformation propagation influence. SNE [68] with random walk applies homophily effects ondifferent signs and fail to capture the high-order pattern in signed directed networks. How toencode both the intrinsic high-order and first-order topological information is an important problemfor representation learning on signed directed networks.In this paper, we propose to learn more representative node embeddings by simultaneouslycapturing the first-order and high-order topology in signed directed networks. In particular, wereformulate the representation learning on signed directed networks from a variational auto-encoding perspective and further propose a decoupled variational embedding (DVE). DVE is aspecially designed auto-encoder structure which contains a decoupled variational encoder anda structure decoder. The general architecture of DVE is shown in Figure 2. In the decoupledvariational encoder, representation of a node is decoupled into source node embeddings andtarget node embeddings according to link direction. Both the source node embeddings and thetarget node embeddings contain the local structure pattern that is extracted by graph convolutionon the decoupled positive and negative graph according to link sign. The structured decoder isformulated as a novel Balance Pair-wise Ranking (BPWR) loss, which is developed from the Extended https://slashdot.org/ACM Trans. Web, Vol. 1, No. 1, Article . Publication date: August 2019. ecoupled Variational Embedding for Signed Directed Networks 3 𝒗 𝒗 𝒗 𝒗 𝒗 𝒗 𝒗 𝒗 𝒗 𝒗 𝒗 𝒗 𝒗 𝒗 + - 𝒗 𝒗 𝒗 𝒗 𝒗 𝒗 𝒗 𝒗 𝒗 + - (a) One signed directed networkexample G e 𝒗 𝒗 𝒗 𝒗 𝒗 𝒗 𝒗 𝒗 𝒗 𝒗 𝒗 𝒗 𝒗 𝒗 + - 𝒗 𝒗 𝒗 𝒗 𝒗 𝒗 𝒗 𝒗 𝒗 + - (b) The high-order topology 𝒗 𝒗 𝒗 𝒗 𝒗 𝒗 𝒗 𝒗 𝒗 𝒗 𝒗 𝒗 𝒗 𝒗 + - 𝒗 𝒗 𝒗 𝒗 𝒗 𝒗 𝒗 𝒗 𝒗 + - (c) The first-order topologyFig. 1. An example to illustrate the high-order topology and first-order topology in signed directed networks.Red arrows mean positive directed edges and blue arrows indicate negative directed edges. (a) is a signeddirected network example G e . (b) indicates the high-order topology of node v in G e , namely the localstructures of node v . Different depth of colored shades represent different local structure orders for node v .(c) shows the first-order topology of node v in G e , namely the closeness relationships between node v andits directly linked neighbors. The concentric circles with node v as the center indicate the closeness betweenthe center node v and positively linked nodes v , v , non-linked node v , negatively linked node v . Decoupled Variational Encoder Structure Decoder

Legend: : Node : positive directed link : negative directed link : encoded node embedding … Fig. 2. General architecture of DVE.

Structural Balance Theory [43, 44]. BPWR extracts the closeness relationships among positivelinks, negative links and non-existent links in a Bayesian personalized ranking manner, as well asrefines embeddings learned from the former encoder. The auto-encoding formulation encouragesDVE to preserve the network topology in an end-to-end manner. In brief, our contributions aresummarized as follows: • We propose a variational auto-encoding based method named DVE to learn more representa-tive node embeddings for signed directed networks. To the best of our knowledge, DVE isthe first model that simultaneously models both the first-order and the high-order topologyin signed directed networks; • Developed from Extended Structural Balance Theory, we develop a novel Balance Pair-wiseRanking (BPWR) loss which also works as the decoder of DVE. BPWR adequately mines thecloseness relationships of nodes indicated by positive links, negative links and non-existentlinks in a ranking form rather than existing point-wise or distance-level metrics;

ACM Trans. Web, Vol. 1, No. 1, Article . Publication date: August 2019.

Xu Chen, et al. • Extensive experiments are conducted on three widely used real-world datasets. The superiorperformance of DVE compared with recent competitive baselines illustrates the effectivenessof DVE both quantitatively and qualitatively.The rest of this paper is organized as follows. Section 2 introduces the related works. Section 3gives the problem definition and a concise introduction of graph convolutional networks. Section 4demonstrates the details about the proposed DVE. Section 5 provides the experiments and analysison both link sign prediction task and node recommendation task, as well as the qualitative results.Finally, conclusion and future work are given in Section 6.

This section is the related work part where network embedding methods for both unsigned undi-rected networks and signed directed networks are introduced. Furthermore, some works of varia-tional auto-encoding are introduced to better illustrate the proposed model in Section 4.

Network embedding arises as one hot research topic to learn representative node embeddings fora given network. It benefits many network analysis tasks such as link prediction [15, 35], nodeclassification [4, 50, 64], online voting [60] and sentiment analysis [61]. Various methods have beenproposed for network embedding. For example, spectral analysis is performed on Laplacian matrixdecomposition [3]. Similarity based node embedding methods such as Adamic/Adar and Katz areutilized in Liben-Nowell and Kleinberg [35]. Recently, inspired by the skip-gram model for wordrepresentation in Natural Language Processing (NLP) [38], DeepWalk [41] learns node embeddingsfrom random walk sequences in social networks. LINE [53] defines first-order and second-orderproximity to describe the context of a node and trains node embeddings via negative sampling.Node2Vec [18] extends DeepWalk by designing a biased random walk to control the Bread FirstSearch (BFS) and Deep First Search (DFS). Embedding methods for directed networks are studiedin HOPE [39] and APP [76]. Qiu et.al [45] unified DeepWalk, LINE and Node2Vec into one matrixfactorization framework. SDNE in [59] is a semi-supervised deep model that captures the highlynon-linear graph structure.Recently, Graph Convolutional Networks (GCN) [11] is proposed and it analyses graph signalprocessing in spectral domain. Kipf et.al [29] simplified GCNs in Defferrard et al. [11] into adeep learning method by stacking multiple graph convolutional layers. Some variants of GCNhave been proposed such as GAT [56] and GraphSage [19]. GAT applies multi-head attentionto GCN. GraphSAGE introduces neighborhood sampling and different aggregation manners tomake inductive graph convolution on large graphs. Some works also study how to accelerateGCN via importance sampling [8] and variance reduction [9]. In general, GCN based methods aresuperior over random walk based methods both on performance and end-to-end training, whichhas introduced a new perspective for graph representation learning. For more details about GCN,we recommend [70].

The network embedding methods discussed above are designed for unsigned or undirected networks.In reality, both the existence of directed and signed (positive and negative) links in social mediaare ubiquitous. The negative links have been proven to have distinct properties and added valueover positive links [34, 52]. Several works have studied how to distinctly model the positive andnegative links in signed directed networks. Degree based features like the number of positive-incoming and negative-incoming links are explored in Leskovec et al. [34]. While these hand-crafted

ACM Trans. Web, Vol. 1, No. 1, Article . Publication date: August 2019. ecoupled Variational Embedding for Signed Directed Networks 5 features are limited and not capable in many situations. Instead, in Kunegis et al. [32], spectralanalysis is extended for signed network. Matrix Factorization (MF) [22] is also adopted to learnlow-dimensional embeddings for signed directed networks. To reduce the computation burdenof matrix decomposition, a specific aggregation manner for learning node embeddings in signednetworks is proposed in Derr et al. [12]. It follows the principle that the enemy of a friend is anenemy and the enemy of an enemy is a friend, which extends the positive and negative neighborsfor a specific node.Due to the superior representation learning ability of deep learning, researchers attempt to usedeep learning techniques to learn more representative node embeddings on networks. A deeplearning framework for signed network named SiNE is proposed in Wang et al. [63], where theobjective function is guided by social theory. Although the framework leverages non-linearity tolearn node representation, it does not model link direction which is an important factor for someasymmetric tasks. SNE is proposed in Yuan et al. [68] and log-bilinear model is extended to supportsign and direction modeling. SNE trains node embeddings based on a uniform random walk andnode context rather than social theory. However, random walk in SNE applies homophily effectson different signed links and fail to capture the local structures in signed directed networks, aswell as does not support end-to-end training. SIDE [26] is another random walk based methodbased on social balance theory [7]. SNEA [62] exploits both the network structures and nodeattributes simultaneously for network embedding on attributed signed networks. Specifically, amargin ranking loss is proposed in SNEA. However, the margin ranking loss is non-smooth anddifficult to be optimized by gradient based algorithms. Bayesian Personalized Ranking [46] derivedfrom maximizing the posterior of observations is also a ranking method, which has some advantagessuch as flexibility and easy optimization by gradient based algorithms. It has been successfullyapplied in many areas such as recommendation [21, 36, 47]. Despite the great potential of BPR tomodel the pairwise relationships, it has not been explored in signed directed networks. In this paper,based on the Extended Structural Balance Theory and BPR, we develop an objective function calledBalance Pair-wise Ranking (BPWR) to mine the first-order topology in signed directed networks.From the above, we see that most existing works focus on capturing the first-order topology,namely learning the closeness relationships of nodes. From this aspect, these methods [12, 22, 63, 68]extract the first-order topology in restrictive distance metrics and some of them ignore the additionalvalue of non-existent links. Besides, although some methods have introduced random walk [68],they fail to capture the high-order topology for signed directed networks since they apply homophilyeffects on different signs. The proposed DVE reformulates the representation learning problem onsigned directed networks from a variational auto-encoding perspective and simultaneously modelsthe first-order and high-order topology.

Variational auto-encoding (VAE) has attracted enormous attention in recent years and has becomeone of the most popular techniques in unsupervised representation learning [13]. VAE theory isappealing since it is built based on standard Bayes theory and meanwhile can be trained withstochastic gradient descent. VAE first emerged in Kingma and Welling [27] where the authors aimto perform efficient inference and learning in directed probabilistic graphic models even with theintractable posteriors or large datasets. In Kingma and Welling [27], the authors first derive thevariational evidence lower bound ( ELBO ) of the marginal log-likelihood of observed datapoints.Then a reparameterization trick is applied to approximate the intractable posteriors, which alsoenables VAE to be straightforwardly optimized using standard stochastic gradient based methods.After [27], an enormous amounts of researchers have studied VAE from various perspectives,which advances the whole community of variational auto-encoding. Recent advances of VAE

ACM Trans. Web, Vol. 1, No. 1, Article . Publication date: August 2019.

Xu Chen, et al. theory could be categorized into two aspects. First, from more expressive likelihood aspect, thestandard VAE [27] makes an assumption that the likelihoods factorizes over dimensions, whichmay cause poor approximation for tasks involving images. Thereby, a sequential auto-encodingframework is proposed in DRAW [17] to perform image generation. Also, Gulrajani et.al proposedto model the dependencies within an image and further developed an auto-regressive decoder inVAE for fine-grained image generation. Moreover, there are also some works trying to derivingmore expressive likelihoods from information theory such as [71, 73, 74]. Second, from moreexpressive posterior aspect, the main idea is that the standard VAE uses mean field approach, whichlacks expressiveness for modeling complex posteriors. Thus, IWAE [5] weights the samples inthe posterior approximation process, which increases the model’s flexibility to capture complexposteriors. Also, normalizing flows [24] is introduced in VAE to transform a simple approximateposterior into a more expressive one through multiple successive invertible transformations. Apartfrom the advances in VAE theory, there are various applications involving VAE such as hand-writtendigits [49], segmentation [51] and graph representation learning [30]. Since variational auto-encoding is a huge topic and we mainly concentrate on signed directed networks, we cannot covercomprehensively here. For more details about variational auto-encoding, we recommend [13, 72]

In this section, we give the problem definition of node representation learning on signed directednetworks, as well as an introduction of graph convolutional networks (GCNs). The introduction ofGCNs illustrates how the signal on graphs are convolved and builds a foundation to demonstratethe propose model.

A signed directed network is defined as G = (V , E p , E n ) , where V is the set of all nodes and E p ( E n ) represents positive (negative) links. Let E = E p (cid:208) E n be the observed links in G . For each link e ∈ E , it is represented as e u → v = ( u , v , ϵ u → v ) , where u → v denotes the direction from sourcenode u to target node v . And ϵ u → v indicates the sign value of link e u → v , i.e. ϵ u → v = e u → v ∈ E p or ϵ u → v = − e u → v ∈ E n . When the nodes V have raw features, the node feature matrix of G isrepresented as X ∈ R N × F , where F indicates the raw feature dimension. Given G , the objective ofnode representation learning on signed directed networks is to embed nodes into low-dimensionalembeddings Z ∈ R N × d that facilitate downstream tasks such as node recommendation, nodeclassification and link prediction. The notations in this paper are summarized in Table 1. Graph Convolutional Networks (GCN) is one essential ingredient for DVE, thus we give a conciseintroduction about it. GCN is one type of neural network that learns superior node representationsby capturing local structures of nodes. GCNs can be regarded as a feature extractor working ongraphs. It can be equipped with a variety of models and applied in a variety of tasks [19, 29, 57].GCN is firstly derived from the spectral convolution on graphs that is defined as the multiplicationof a signal x ∈ R N with a parameterized filter д θ in the Fourier domain. Let ⋆ be the convolutionoperation, the convolution of GCN can be expressed as: д θ ⋆ x = U д θ ( Λ ) U T x (1)where U is the eigenvector matrix and Λ is the eigenvalue matrix of the graph Laplacian L = I N − D − AD − = U Λ U T . And U T x indicates the graph Fourier transform of x . According to Defferrardet al. [11], a polynomial filter is usually taken as д θ ( Λ ) = (cid:205) Kk = θ k Λ k . ACM Trans. Web, Vol. 1, No. 1, Article . Publication date: August 2019. ecoupled Variational Embedding for Signed Directed Networks 7

Table 1. Notations in this paper.

Notation Description G signed directed graph V node set of GE p positive link set of GE n negative link set of GE observed links in G L the symmetric normalized Laplacian matrix of unsigned undirected graph I N an identity matrix of size Nd the latent embedding dimension ϕ s parameters of the source node encoder ϕ t parameters of the target node encoder Z s source node embeddings Z t target node embeddings A p adjacent matrix of the undirected positive graph A n adjacent matrix of the undirected negative graph X the raw feature matrix of nodes L sKL KL divergence loss for the source node encoder L BPW R

Balance Pair-wise Ranking (BPWR) loss as the structure decoder L DV E loss of DVE methodWhile the convolution filter defined in Eq. 1 involves the eigen-decomposition of L and might becomputationally expensive for large graphs. To circumvent this problem, according to Defferrardet al. [11], д θ ( Λ ) with the polynomial filter can be well-approximated by a truncated expansion interms of Chebshev polynomials T k ( x ) up to K th order: д θ ( Λ ) = K (cid:213) k = θ k Λ k ≈ K (cid:213) k = θ k T k ( (cid:101) Λ ) (2)where (cid:101) Λ = λ max Λ − I N is a rescaled version of Λ and λ max indicates the largest eigenvalue of L .The Chebshev polynomials are recursively defined as T k ( x ) = xT k − ( x ) − T k − ( x ) with T ( x ) = T ( x ) = x . Then taking Eq. 2 into consideration, Eq. 1 can be written as: д θ ⋆ x ≈ K (cid:213) k = θ k T k ( (cid:101) L ) x (3)where (cid:101) L = λ max L − I N . Eq. 3 is also called as K -localized convolution on graphs since it is a K -thorder polynomial in the Laplacian.Inspired by the idea that high-order convolutions can be built by stacking multiple convolutionallayers [25], Kipf et.al [29] achieves K th convolution by stacking multiple convolutional layers ofEq. 3, and each layer is followed by a point-wise non-linear function. In particular, the layer-wiseconvolution in Eq. 3 is defined as K =

1, which indicates a linear function on the graph Laplacianspectrum. Additionally, Kipf et.al Kipf and Welling [29] approximate λ max ≈ д θ ⋆ x ≈ θ x + θ ( L − I N ) x = θ x − θ D − AD − x (4) ACM Trans. Web, Vol. 1, No. 1, Article . Publication date: August 2019.

Xu Chen, et al. where θ and θ are two free parameters. In practice, GCNs constrain θ = θ = − θ to avoidover-fitting and this leads to the following expression: д θ ⋆ x ≈ θ ( I N + D − AD − ) x (5)Note that I N + D − AD − now has eigenvalues that range in [ , ] . Repeating the calculation in Eq. 5will lead to numerical instabilities and even exploding/vanishing gradients when stacking numberof layers. To alleviate this problem, a renormalization trick is introduced which is: I N + D − AD − → (cid:98) D − ( A + I N ) (cid:98) D − .Then when giving a signal matrix X ∈ R N × F where N denotes the number of samples and F denotes the feature dimension, the layer-wise graph convolution in Kipf and Welling [29] is definedas follows: Z l + = (cid:101) AH l Θ l , H l = h ( Z l ) , H = X (6)where the propagation matrix (cid:101) A = (cid:98) D − ( A + I N ) (cid:98) D − and (cid:98) D is the degree matrix of A + I N . H l is theactivation matrix in the l -th layer, whose each row is the vector representation of a node. Θ l isnow a matrix of filter parameters in the l -th layer and h (·) is the non-linear ReLu function. Z l + isthe node representation of ( l + ) -th layer. This layer-wise convolution as well connects the graphconvolution operation in spectral domain to that in the vertex domain. For more details about GCN,we recommend [29].From above we can see that GCN learns a node’s representation by aggregating its neighborswhich are also called the receptive field . The receptive field is enlarged through stacking layers likethe L -hop in a graph. When (cid:101) A = I N , GCN degrades to a multi-layer perceptron (MLP) model, whichdoes not consider the graph structures and the receptive field of a node is just itself. In our model,we highlight that the intrinsic high-order local structures in signed directed networks, we thushave (cid:101) A (cid:44) I N , namely GCN will not degrade to a MLP here. In this section, our decoupled variational embedding method for signed directed networks isintroduced. Model architecture is shown in Figure 3. Details are illustrated as follows.

In this subsection, we formulate the node representation learning problem on signed directednetworks from a variational auto-encoding perspective. Link direction and sign are two keyelements when describing signed directed networks. Link direction between two nodes indicatesthe asymmetric relationship that implies the different roles of two nodes in an interaction. Thisasymmetric information is an essential factor that facilitates information propagation in signeddirected networks. However, it is inappropriate to apply some GNNs methods such as [11, 29, 66]on directed graphs since they require a symmetric Laplacian matrix for graph convolutions. Sincea node in a directed relationship may both present as the source node and the target node, wethus try to leverage the asymmetric information by decoupling node embeddings into source nodeembeddings Z s and target node embeddings Z t . From the variational auto-encoding perspective, weassume the semantics of edges are drawn from some underlying distributions. To clarify, we denote θ as the parameter symbol for all non-specified models and p θ (E) as the probability density functionof E . The probability distribution function of observed signed directed links E is represented as P (E) and can be written as Eq. 7: P (E) = ∫ Z s , Z t p ψ (E | Z s , Z t ) p θ ( Z s , Z t ) dZ s dZ t (7) ACM Trans. Web, Vol. 1, No. 1, Article . Publication date: August 2019. ecoupled Variational Embedding for Signed Directed Networks 9

Source node encoder Balance Pair-wise Ranking loss

Signed Directed Graph Undirected Positive Graph 𝑨 𝐩 𝑨 𝒏 𝑖 𝑗𝑖 𝑘𝑖 𝑠𝒇 𝒊,𝒋 > 𝒇(𝒊,𝒌)𝒇 𝒊,𝒌 > 𝒇(𝒊,𝒔) Structure DecoderDecoupled Variational Encoder … …… … … Target node encoder … …… … … Z s Z t Undirected Negative Graph Signed Directed Graph 𝑞(𝑍 𝐴 $ ,𝑋𝑞(𝑍 𝐴 % ,𝑋𝑞(𝑍 &$ 𝐴 $ ,𝑋𝑞(𝑍 &% 𝐴 % ,𝑋 Legend: : approximate posterior distribution: sampling : non-existent link : node embeddings in : node embeddings in 𝑨 𝐩 𝑨 𝒏 : GCN encoder on 𝑨 𝐩 : GCN encoder on 𝑨 𝒏 Fig. 3. Model architecture of DVE (best view in color). We first decouple the signed directed graph into anundirected positive graph which is indicated by A p and an undirected negative graph which is indicatedby A n . Then our decoupled variational encoder encodes A p and A n as the source node representation Z s and target node representation Z t , respectively. Finally, Z s and Z t are used to perform the balance pair-wiseranking loss which is also the structure decoder of DVE. i as source node is from Z s and j , k , r as target nodesare from Z t . f (· , ·) represents positive link existence score defined in our paper. where Z s and Z t also indicate the latent variables of source nodes and target nodes respectively. Bymodeling node embeddings through two different latent variables, the asymmetric relationship canbe well captured. The true posterior distribution of Z s , Z t can be written as: p θ ( Z s , Z t |E) = p ψ (E | Z s , Z t ) p θ ( Z s , Z t ) p θ (E) (8)where the true posterior p θ ( Z s , Z t |E) in Eq. 8 is intractable because of the moderately complicatedlikelihood function of p ψ (E | Z s , Z t ) such as a neural network with non-linear layer [23, 27, 75].We thus introduce a tractable posterior q ϕ ( Z s , Z t |E) to approximate p θ ( Z s , Z t |E) . In this case, themarginal log-likelihood log P (E) can be rewritten as:log P (E) = D KL [ q ϕ ( Z s , Z t |E)|| p θ ( Z s , Z t |E)] + L (9)where D KL means the Kullback-Liebler (KL) divergence and L is the (variational) evidence lowerbound ( ELBO ) of log P (E) . Since the KL divergence term is non-negative, we can maximize thelog-likelihood log P (E) by maximizing L . Denoting the joint prior for Z s and Z t as p θ ( Z s , Z t ) , L isderived as: L = − D KL [ q ϕ ( Z s , Z t |E)| p θ ( Z s , Z t )] + E q ϕ ( Z s , Z t |E) [ p ψ (E | Z s , Z t )] (10)where p ψ (E | Z s , Z t ) indicates the probabilistic decoder parameterized by ψ . In our method, wesimplify the joint prior p θ ( Z s , Z t ) by assuming p θ ( Z s , Z t ) = p θ ( Z s ) p θ ( Z s ) . Complex prior is aspecific research topic in variational inference [48, 55, 67]. We do not explore more here since wemainly focus on the general variational auto-encoding idea for modeling both the first-order and high-order topology in signed directed networks. By decoupling the node embeddings into sourcenode embeddings Z s and target node embeddings Z t , we have the following proposition: ACM Trans. Web, Vol. 1, No. 1, Article . Publication date: August 2019.

Proposition 1.

Given the observed links E in signed directed networks, the latent variable forsource node embeddings Z s and the latent variable for target node embeddings Z t are conditionalindependent. Thereby, with Proposition 1, we rewrite the approximate posterior q ϕ ( Z s , Z t |E) as: q ϕ ( Z s , Z t |E) = q ϕ s ( Z s |E) q ϕ t ( Z t |E) (11)where q ϕ s ( Z s |E) and q ϕ t ( Z t |E) are the approximate posteriors parameterized by ϕ s and ϕ t respec-tively. If we denote p θ ( Z s ) and p θ ( Z t ) are the prior for Z s and Z t respectively, we rewrite the ELBO in Eq. 10 as Eq. 12. L = − D KL [ q ϕ s ( Z s |E)|| p θ ( Z s )] − D KL [ q ϕ t ( Z t |E)|| p θ ( Z t )] + E q ϕs ( Z s |E) q ϕt ( Z t |E) [ log p ψ (E | Z s , Z t )] (12)More detailed derivation is provided in Appendix A. DVE tries to learn Z s and Z t via maximizingthe above ELBO . To better understand DVE, we firstly introduce the two variational approximateposteriors q ϕ s ( Z s |E) and q ϕ t ( Z t |E) . Modeling these two distributions also indicate the decoupledvariational encoder in Figure 3. The conditional distribution p ψ (E | Z s , Z t ) which indicates thestructure decoder, will be discussed later. In this part, how the decoupled variational encoder in Figure 3 works is introduced. In our expecta-tion, Z s and Z t are the representation for the source node and target node respectively. These tworepresentations should capture the intrinsic local structures of nodes both involved in positive linksand negative links. Take the source node representation Z s as an example, directly representing Z s through existing GCN methods is not appropriate, because this makes GCN do homophily effectson different signs. Instead, we decouple the signed directed graph into an undirected positivegraph and an undirected negative graph, and consider that Z s could be generated by the noderepresentation Z ps involved in the undirected positive graph and node representation Z ns involvedin the undirected negative graph. In other words, Z s could be represented as Z s = f s ( Z ps , Z ns ) , where f s is the generative function. A proper choice of f s can capture the interactions between positiveand negative links.In particular, in the learning process of Z ps and Z ns , if we denote A p as the adjacent matrix of theundirected positive graph and A n as the adjacent matrix of the undirected negative graph, variationalGCN is applied on A p and A n . In other words, Z s ∼ q ϕ s ( Z s |E) is represented by the combination of Z ps ∼ q ϕ ps ( Z ps | A p , X ) , Z ns ∼ q ϕ ns ( Z ns | A n , X ) and f s , where q ϕ ps ( Z ps | A p , X ) and q ϕ ns ( Z ns | A n , X ) indicatethe approximate posteriors for the true posteriors p θ ( Z ps | A p , X ) and p θ ( Z ns | A n , X ) . And we set f s asconcatenation operation here for simplicity. Note that the adjacent matrices of both the undirectedpositive graph and the undirected negative graph are composed of 0 and 1, where 1 means linkedand 0 otherwise. The variational inference procedure for q ϕ s ( Z s |E) indicates the source nodeencoder shown in Figure 3 and is introduced in the following part.Let the node feature matrix be X ∈ R N × F where N is the number of nodes and F is the featuredimension . Let Z ps , i ∈ R × d and Z ns , i ∈ R × d be the source node embeddings of i -th node involvedin the undirected positive graph and the undirected negative graph, respectively. If we denote q ϕ ps ( Z ps | A p , X ) and q ϕ ns ( Z ns | A n , X ) as the variational distribution for source node involved in the Since we do not have node features in our experiments, we simply set X = I N , I N is a diagonal matrix with size N ACM Trans. Web, Vol. 1, No. 1, Article . Publication date: August 2019. ecoupled Variational Embedding for Signed Directed Networks 11 undirected positive graph and undirected negative graph respectively, we can have the following: q ϕ ps ( Z ps | A p , X ) = N (cid:214) i = q ϕ ps ( Z ps , i | A p , X ) (13) q ϕ ns ( Z ns | A n , X ) = N (cid:214) i = q ϕ ns ( Z ns , i | A n , X ) (14)Inspired by the idea that different semantics can come from the same family of functions (e.g.Gaussian) since these semantics are modeled by different parameters and are in different spaces [27,33, 42]. We assume that both q ϕ ps ( Z ps | A p , X ) and q ϕ ns ( Z ns | A n , X ) follow Gaussian distribution, thenthe reparametrization Gaussian parameters µ p , ls ∈ R N × d , σ p , ls ∈ R N × d , µ n , ls ∈ R N × d , σ n , ls ∈ R N × d in l -th layer are defined as : (cid:40) µ p , l + s = (cid:101) A p H p , ls , µ W p , ls , µ , H p , ls , µ = h ( µ p , ls ) , H p , s , µ = X log σ p , l + s = (cid:101) A p H p , ls , σ W p , ls , σ , H p , ls , σ = h ( log σ p , ls ) , H p , s , σ = X (15) (cid:40) µ n , l + s = (cid:101) A n H n , ls , µ W n , ls , µ , H n , ls , µ = h ( µ n , ls ) , H n , s , µ = X log σ n , l + s = (cid:101) A n H n , ls , σ W n , ls , σ , H n , ls , σ = h ( log σ n , ls ) , H n , s , σ = X (16)where (cid:101) A p = [ (cid:98) D p ] − ( A p + I N )[ (cid:98) D p ] − and (cid:101) A n = [ (cid:98) D n ] − ( A n + I N )[ (cid:98) D n ] − are the propagation matrices. (cid:98) D p and (cid:98) D n are the degree matrices of A p + I N and A n + I N , respectively. h (·) denotes the non-linear ReLu function. W p , ls , µ ∈ R F × d and W p , ls , σ ∈ R F × d denote the l -layer reparametrization parameters for Z ps . Similarly, W n , ls , µ ∈ R F × d and W n , ls , σ ∈ R F × d are the l -layer reparametrization parameters for Z ns .Accordingly, if we denote p θ ( Z ps ) and p θ ( Z ns ) are prior distributions for Z ps and Z ns respectively, theprior regularization loss on q ϕ ps ( Z ps | A p , X ) and q ϕ ns ( Z ns | A n , X ) are written as:min ϕ s L sKL = D KL [ q ϕ ps ( Z ps | A p , X )|| p θ ( Z ps )] + D KL [ q ϕ ns ( Z ns | A n , X )|| p θ ( Z ns )] (17)where ϕ s = { ϕ ps , ϕ ns } = { W p , l / l s , µ / σ , W n , l / l s , µ / σ } is the parameter of the source node encoder. Therefore,representation for the source node can be obtained by Z s = Z ps ⊕ Z ns ( ⊕ means concatenation),where Z ps ∼ q ϕ ps ( Z ps | A p , X ) and Z ns ∼ q ϕ ns ( Z ns | A n , X ) . The target node representation Z t can beobtained in similar procedure.It is worthwhile to highlight that GCN working on both A p and A n with different parametersmodels the distinctive effects of different link signs. Conducting GCN on A p and A n is reasonablesince GCN does not specify positive or negative meaning of links in graphs. Instead, GCN emphasizesthe correlation that links two nodes. How to leverage the information from A p and A n in thesubsequent modules determines the positive or negative semantics. In our case, we use GCN tosummarize the correlation pattern among nodes, and then ask the following module to employ thepositive semantics in A p and negative semantics in A n . By this way, the signed local structures canbe captured in a decoupled manner. A quick note: s means the source node, µ , σ denote the mean value and standard deviation parameter of Gaussiandistribution, p means undirected positive graph, n indicates undirected negative graph and l means the ( l + ) -th layer.ACM Trans. Web, Vol. 1, No. 1, Article . Publication date: August 2019. In auto-encoding theory, decoder is an essential module and our structure decoder is introduced here.The structure decoder is expected to reconstruct the signed directed links and guide the encoderlearning. This requires that the structure decoder should preserve the structural characteristicsin signed directed networks. Note that the Extended Structural Balance Theory [43] states thecloseness of users in signed networks. The essential insight of this theory is that for four users i , j , k , r , if the link signs are ϵ ij = , ϵ ik = , ϵ ir = −

1, the closeness among them follows Eq. 18. д ( i , j ) < д ( i , k ) < д ( i , r ) (18)where д ( i , j ) denotes distance between user i and j . For example, if a positive link means trust anda negative link means distrust in social networks, user i prefers to trust j than k and trusts k morethan r . Actually, this theory states the first-order topology that indicates the closeness relationshipsamong nodes. By combining this theory with Bayesian Personalized Ranking [46], we naturallydevelop a novel Balance Pair-wise Ranking (BPWR) loss to guide the whole model learning. Toclarify, we denote the distance in Eq. 18 as the existence score of positive links. The higher score is,the more probably the positive link exists. Thus, the Extended Structural Balance Theory can beinterpreted as Eq. 19: f ( i → j ) > f ( i → k ) > f ( i → r ) (19)where f ( i → j ) indicates the existence score of positive links from source node i to target node j . If j > i k indicates the relation score of i → j is larger than that of i → k with node i as the referenceobject, for samples ( i , j , k , r ) with ϵ i → j = , ϵ i → k = , ϵ i → r = −

1, the maximum posteriors satisfy:  max ϕ s , ϕ t (cid:206) ( i , j , k ) p ( ϕ s , ϕ t | j > i k ) ∝ (cid:206) ( i , j , k ) p ( j > i k | ϕ s , ϕ t ) p ( ϕ s , ϕ t ) max ϕ s , ϕ t (cid:206) ( i , k , r ) p ( ϕ s , ϕ t | k > i r ) ∝ (cid:206) ( i , k , r ) p ( k > i r | ϕ s , ϕ t ) p ( ϕ s , ϕ t ) (20)where ϕ s and ϕ t are the parameters of the decoupled variational encoder to obtain Z s and Z t . p ( j > i k | ϕ s , ϕ t ) and p ( k > i r | ϕ s , ϕ t ) indicate the likelihood functions which are written as: (cid:40) p ( j > i k | ϕ s , ϕ t ) = σ ( f ( i → j ) − f ( i → k )) p ( k > i r | ϕ s , ϕ t ) = σ ( f ( i → k ) − f ( i → r )) (21)where f ( i → j ) is calculated by the inner product of the source node embeddings Z s , i of node i and target node embeddings Z t , j of node j . f ( i → k ) and f ( i → r ) can be obtained in similar way. σ is the sigmoid function. Therefore, following [46], the Balance Pair-wise Ranking (BPWR) loss ofour structure decoder can be written as:min ϕ s , ϕ t L BPW R = − E ( i , j , k )∼ P (E) ln σ ( Z Ts , i Z t , j − Z Ts , i Z t , k )− E ( i , k , r )∼ P (E) ln σ ( Z Ts , i Z t , k − Z Ts , i Z t , r ) (22)where i , j , k , r are the node indexes that satisfy e i → j ∈ E p and e i → r ∈ E n and e i → k is the samplednon-existent link . When Z s and Z t are not learned from the decoupled variational encoder, Z s and Z t can be initialized trainable embedding matrices. In other words, BPWR can be an independentmodel to learn node embeddings in signed directed networks. ACM Trans. Web, Vol. 1, No. 1, Article . Publication date: August 2019. ecoupled Variational Embedding for Signed Directed Networks 13

Putting the encoder and decoder together, we can write the objective function of DVE as follows :min ϕ s , ϕ t L DV E = − E ( i , j , k )∼ P (E) ln σ ( Z Ts , i Z t , j − Z Ts , i Z t , k )− E ( i , k , r )∼ P (E) ln σ ( Z Ts , i Z t , k − Z Ts , i Z t , r ) + N N (cid:213) i = { D KL [ q ϕ ps ( Z ps , i | A p , X )|| p θ ( Z ps )] + D KL [ q ϕ ns ( Z ns , i | A n , X )|| p θ ( Z ns )]} + N N (cid:213) i = { D KL [ q ϕ pt ( Z pt , i | A p , X )|| p θ ( Z pt )] + D KL [ q ϕ n t ( Z nt , i | A n , X )|| p θ ( Z nt )]} (23)where ϕ s = { ϕ ps , ϕ ns } = { W p , l / l s , µ / σ , W n , l / l s , µ / σ } is the parameter of the source node encoder and ϕ t = { ϕ pt , ϕ nt } = { W p , l / l t , µ / σ , W n , l / l t , µ / σ } denotes the parameter of the target node encoder. The source nodeembeddings and target node embeddings are respectively denoted as Z s = Z ps ⊕ Z ns , Z t = Z pt ⊕ Z nt .All priors p θ ( Z ps ) , p θ ( Z ns ) , p θ ( Z pt ) and p θ ( Z nt ) are standard Gaussian distributions. In DVE, the matrixmultiplication is conducted between a sparse adjacent matrix and a dense matrix, e.g. Eq. 15,16,which can be implemented with high efficiency in recent deep learning programming frameworks.For each positive link e i → j and negative link e i → r , we randomly sample n noise non-linked nodesto play as k and construct the training triplets ( i , j , k ) and ( i , k , r ) . We adopt Dropout technique forregularization rather than L norm. Many widely used optimization algorithms such as RMSPropcan be applied for model learning. There are differences and connections between DVE and existing methods. A key difference is thatDVE integrally captures both the first-order and high-order topology for signed directed networks.However, most existing methods [22, 62, 63] mainly focus on modeling the first-order topology.There are some works [26, 68] based on random walks, but they fail to capture the high-order topology since they apply homophily effects on different link signs.Meanwhile, there are connections between DVE and existing methods in terms of the first-order topology modeling. Both DVE and existing methods perform the first-order topology modelingregarding signed directed links. Existing methods works with restrictive distance metrics andusually ignore the mediator function of non-existent links. Instead, BPWR of DVE working in apersonalized ranking scheme has more potential to capture the closeness relationships of nodes.Besides, it is worthwhile to point out that both the margin ranking (MR) loss in SNEA [62] andBPWR loss in DVE have similar target that are developed from Extended Structural Balance Theory.However, MR is a deterministic non-smooth metric and BPWR derived from maximizing theposterior of the observations is smooth and easy to be optimized by gradient based techniques [46].The superior performance of BPWR over SNEA-MR is also verified in Section 5.2.

Stochastic training of DNN methods involves two steps, the forward and backward computations.DVE supports the mini-batch training and the time cost lies in the decoupled variational encoderand structure decoder. We thus decompose the time complexity of DVE into two parts, namelythe time complexity of decoupled variational encoder and structure decoder. In each batch of Note that we take the expectation formula here for scaled loss values instead of summarization.ACM Trans. Web, Vol. 1, No. 1, Article . Publication date: August 2019.

DVE, the decoupled variational encoder learns node embeddings for all nodes. Thus, followingthe analysis of GCN in Wu et al. [65], the time complexity of decoupled variational encoder is O( |E p | + |E n |) = O( |E |) , where |E p | , |E p | and |E | denote the number of edges of undirectedpositive graph, undirected negative graph and signed directed graph, respectively. Note that for thisdecoupled variational encoder, the source node encoder and target node encoder can be parallellyconducted. In this case, the time complexity for decoupled variational encoder can be reducedto O(|E p | + |E n |) = O(|E |) . Furthermore, the graph convolution on A p and A n to learn Z s and Z t could also be paralleled, which leads to the time complexity as O( max {|E p | , |E n |}) . As for thestructure decoder in each batch, we compute the BPWR loss with non-existent link sampling. If wedenote the sampling size as n noise and the batch size of training positive/negative links as B , thetime complexity is O( n noise B ) .In summary, the time complexity of the non-parallel DVE in each batch is O( |E | + n noise B ) , thehalf-parallel counterpart is O(|E | + n noise B ) and the quarter-parallel counterpart is O( max {|E p | , |E n |} + n noise B ) . Generally, the main time cost lies in the decoupled variational encoder, since we usuallyhave 2 |E | > |E | > max {|E p | , |E n |} >> n noise B . Actually, the time complexity of decoupled varia-tional encoder is related to the specific graph convolutional network. For datasets with too largemax {|E p | , |E n |} , the O( max {|E p | , |E n |}) in each batch may still be time-consuming. This can besolved by using other kinds of GCN [8, 9, 19] that reduce O( max {|E p | , |E n |}) to the scale of thetraining batch size B . This makes DVE scalable to much larger datasets. We do not explore morehere since we mainly focus on the general idea of variational auto-encoding to capture both the first-order and high-order topological information for signed directed networks. In this section, we conduct experiments on three widely used datasets. Performance on both linksign prediction task and node recommendation task are implemented to verify the effectiveness ofDVE. Further ablation study and qualitative analysis are investigated to provide deep understandingabout DVE.

Dataset Description . We conduct the experiments on three widely used real-world datasets.Epinions : Epinions is one popular product review site in which users can create both trust (positive)and distrust (negative) links to others. Slashdot is a technology news platform where users cancreate friend (positive) and foe (negative) links to others. Wiki is a dataset collected from theWikipedia site, where users vote for or against other users in order to determine administrationpromotion. For each dataset, we randomly sample a subset links as our experimental dataset. Wealso filter out users who have no link with others. The statistics of processed data are shown inTable 2. From the table, it is obvious that both the undirected positive graph and the undirectednegative graph are very sparse. Baselines . We compare DVE with 9 competitive baselines. • LINE [53]: LINE defines loss functions to preserve the first-order or second-order proximitybetween nodes in a graph. Here, we only perform LINE on the positive links since it does notwork on signed graphs. Since LINE’s first order proximity usually presents better performancethan the second order one, we report the performance of LINE’s first-order proximity here. https://snap.stanford.edu/data/soc-sign-epinions.html https://snap.stanford.edu/data/soc-sign-Slashdot090216.html https://snap.stanford.edu/data/wiki-RfA.htmlACM Trans. Web, Vol. 1, No. 1, Article . Publication date: August 2019. ecoupled Variational Embedding for Signed Directed Networks 15 Table 2. The statistics of Epinions, Slashdot and Wiki utilized in our experiments.

Dataset Epinions Slashdot Wiki • MF [22]: Matrix factorization is one popular technique for network embedding. We performMF with the same noise sampling method as DVE here to learn low-dimensional nodeembeddings for signed directed networks. • SNE [68]: This method develops the log-bilinear model with random walk to learn low-dimensional node embeddings for signed networks. On signed directed networks, we applydirected random walk for SNE here. • SiNE [63]: SiNE is a deep neural network method that makes a distinction between positivelylinked nodes and negatively linked nodes. It is capable of capturing the non-linear pattern insigned directed networks. • SIDE [26]: SIDE is a random walk based method, which formulates the social balance theoryinto a likelihood for signed directed networks. • SNEA-MR [62]: SNEA is a method for attributed signed social networks. Considering that themargin ranking (MR) loss in SNEA is also based on Extended Structural Balance Theory,we thus extend it here as a baseline to make comparison with BPWR. • BPWR (Ours): As the structure decoder of DVE method, this Balance Pair-wise Ranking losscould be an independent model and be a comparison to the loss in SiNE and SNEA-MR. • SLVE (Ours): SLVE substitutes the decoupled variational encoder in DVE with a non-decoupled one by leveraging the signed Laplacian matrix [16]. • DE (Ours): DE is the non-variational variant of DVE method.

Parameter Settings.

For each baseline, we follow the parameter settings in their papersor codes. Batch training size is 1000 for all methods. For our model, we set the training epochsize as 200 and the number of GCN layers as l =

2. Dropout probability is 0.2. Learning rate istaken as 0.01. RMSProp optimizer [54] is adopted to optimize our objective function. Accordingto the training loss, the size of randomly sampled noise ( e i → k =0) is set as 5 for Epinions and 20for Slashdot and Wiki. Embedding size is d =

128 and d =

64 on all three datasets. We randomlysplit each dataset into 80% train data and 20% test data. For every model, we conduct 10 times andreport the averaged best performance on test set as the model performance.

Link Sign Prediction . We first compare the model performance on link sign predictiontask. Link sign prediction aims to predict the unobserved signs of existing links. Following theevaluation protocols of existing works [63, 68], we train a binary classifier which is a two-layerMLP with

Relu as the non-linear function. Then, we use signed links in the model training stageas the train data for the binary classifier and predict signs for the test links. More specifically, weconcatenate two node embeddings as the link representation and take the link representation as It refers to Eq.5 in the original paper. ACM Trans. Web, Vol. 1, No. 1, Article . Publication date: August 2019.

Table 3. Link sign prediction performance. Names with * refer to our methods. Compared to SiNE, theabsolute improvement percentage of DE and DVE are given. Compared to DVE, the t-test results of otherbaselines are shown in this table. ‡ means p-value<0.01, † indicates p-value<0.05 and − means p-value>0.05. Dataset Epinions Slashdot WikiMethod AUC F1 AUC F1 AUC F1LINE 0.906 ‡ ‡ ‡ ‡ ‡ ‡ MF 0.934 ‡ ‡ ‡ ‡ ‡ ‡ SNE 0.952 ‡ ‡ ‡ ‡ ‡ ‡ SiNE 0.929 ‡ ‡ ‡ ‡ ‡ ‡ SIDE 0.807 ‡ ‡ ‡ ‡ ‡ ‡ SNEA-MR 0.864 ‡ ‡ ‡ ‡ ‡ ‡ BPWR ∗ ‡ ‡ ‡ ‡ ‡ ‡ SLVE ∗ ‡ ‡ ‡ ‡ ‡ ‡ DE ∗ − − ‡ ‡ ‡ ‡ DVE ∗ input for the binary classifier. Due to the unbalanced signs in test links, AUC and F1 are adopted toassess the performance. We consider sign +1 as the positive class. The results are shown in Table 3.From this table, we summarize that: • The proposed DVE outperforms recent competitive methods and reaches the best performance.For example, regarding AUC on Slashdot, DVE obtains a 3.5% improvement compared to SiNEand a 3.6% improvement compared to SNE. This verifies that DVE learns more representativenode embeddings in signed directed networks. • Comparing BPWR with other baselines (SiNE, SNE, MF), BPWR outperforms them on AUCand F1 on all three datasets. This exposes the deficiencies of the baselines in mining the first-order topology. Developed from Extended Structural Balance Theory, BPWR working in apersonalized pair-wise ranking manner is more capable of mining the closeness relationshipsamong nodes. It is worthwhile to point out that although SNEA-MR and BPWR are bothdeveloped from the Extended Structural Balance Theory and they have a similar traininggoal. However, the objective of SNEA-MR is not smooth while BPWR based on maximizingthe posterior of signed directed links is smooth and easy to be optimized by gradient basedoptimization methods. The comparison results of SNEA-MR and BPWR as well verify theeffectiveness of BPWR. • Modeling the high-order topology facilitates to learn more representative node embeddingsin signed directed networks. Comparing DVE, DE with BPWR, DVE and DE in an auto-encoder formula are able to model both the first-order and the high-order topology in signeddirected networks. However, BPWR works as an independent model can only model the first-order topology. The gap between DVE,DE and BPWR is more obvious in the followingnode recommendation task. • It is obvious that DVE always outperforms SLVE, which indicates the importance of ourdecoupling idea. Particularly, SLVE is the non-decoupled variant of DVE by applying signedLaplacian matrix [16] in GCN. Thus the only difference between SLVE and DVE is the encoderpart. From the comparison between SLVE and BPWR, we can see that SLVE even damagesits own decoder’s (BPWR) performance. This highlights the necessity of applying distinctiveeffects on different types of links in signed directed networks.

ACM Trans. Web, Vol. 1, No. 1, Article . Publication date: August 2019. ecoupled Variational Embedding for Signed Directed Networks 17

Table 4. Node recommendation performance on Epinions. Names with ∗ refer to our methods. The metricsfor this task are Recall @ k and Precision @ k . We pick k=10,20,50 here. Compared to SiNE, the absoluteimprovement percentage of DVE is given in the table. Compared to DVE, the t-test results of other baselinesare as well shown in the table. ‡ means p-value<0.01, † indicates p-value<0.05 and − means p-value>0.05. Dataset EpinionsMethods R@10 R@20 R@50 P@10 P@20 P@50LINE 0.004 ‡ ‡ ‡ ‡ ‡ ‡ MF 0.022 ‡ ‡ ‡ ‡ ‡ ‡ SNE 0.002 ‡ ‡ ‡ ‡ ‡ ‡ SiNE 0.027 ‡ ‡ ‡ ‡ ‡ ‡ SIDE 5.5e-4 ‡ ‡ ‡ ‡ ‡ ‡ SNEA-MR 0.024 ‡ ‡ ‡ ‡ ‡ ‡ BPWR ∗ ‡ ‡ ‡ ‡ ‡ ‡ SLVE ∗ ‡ ‡ ‡ ‡ ‡ ‡ DE ∗ ‡ ‡ ‡ ‡ ‡ ‡ DVE ∗ Table 5. Node recommendation performance on Slashdot. Names with ∗ refer to our methods. The metricsfor this task are Recall @ k and Precision @ k . We pick k=10,20,50 here. Compared to SiNE, the absoluteimprovement percentage of DVE is given in the table. Compared to DVE, the t-test results of other baselinesare as well shown in the table. ‡ means p-value<0.01, † indicates p-value<0.05 and − means p-value>0.05. Dataset SlashdotMethods R@10 R@20 R@50 P@10 P@20 P@50LINE 0.011 ‡ ‡ ‡ ‡ ‡ ‡ MF 0.015 ‡ ‡ ‡ ‡ ‡ ‡ SNE 0.002 ‡ ‡ ‡ ‡ ‡ ‡ SiNE 0.052 ‡ ‡ ‡ ‡ ‡ ‡ SIDE 8.2e-4 ‡ ‡ ‡ ‡ ‡ ‡ SNEA-MR 0.005 ‡ ‡ ‡ ‡ ‡ ‡ BPWR ∗ ‡ ‡ ‡ ‡ ‡ ‡ SLVE ∗ ‡ ‡ ‡ ‡ ‡ ‡ DE ∗ ‡ ‡ ‡ ‡ ‡ ‡ DVE ∗ • Regarding DVE and DE, DVE models the uncertainty of node embeddings in signed directednetworks. DVE performs better on Slashdot and Wiki compared to the non-variational DE. Amore informative prior matching the complex data rather than standard Gaussian will bebetter for variational inference. Thus, if provided with a more proper prior, the advantagesof modeling uncertainty are expected to be more obvious.

Node Recommendation . Another practical application of network embedding in signeddirected networks is node recommendation. It matches a fact that friend recommendation in socialmedia. We thus conduct the node recommendation task here to investigate the quality of learnednode embeddings.In particular, for a specific node, we recommend nodes that have high-probabilities to buildpositive directed links. For example, denote i as the source node, we want to recommend a target ACM Trans. Web, Vol. 1, No. 1, Article . Publication date: August 2019.

Table 6. Node recommendation performance on Wiki. Names with ∗ refer to our methods. The metrics for thistask are Recall @ k and Precision @ k . We pick k=10,20,50 here. Compared to SiNE, the absolute improvementpercentage of DVE is given in the table. Compared to DVE, the t-test results of other baselines are as wellshown in the table. ‡ means p-value<0.01, † indicates p-value<0.05 and − means p-value>0.05. Dataset WikiMethods R@10 R@20 R@50 P@10 P@20 P@50LINE 0.037 ‡ ‡ ‡ ‡ ‡ ‡ MF 0.011 ‡ ‡ ‡ ‡ ‡ ‡ SNE 0.002 ‡ ‡ ‡ ‡ ‡ ‡ SiNE 0.033 ‡ ‡ ‡ ‡ ‡ ‡ SIDE 0.001 ‡ ‡ ‡ ‡ ‡ ‡ SNEA-MR 0.002 ‡ ‡ ‡ ‡ ‡ ‡ BPWR ∗ ‡ ‡ ‡ ‡ ‡ ‡ SLVE ∗ ‡ ‡ ‡ ‡ ‡ ‡ DE ∗ ‡ ‡ ‡ ‡ ‡ ‡ DVE ∗ node list J i = [ j , j , ..., j k ] which is ranked according to the prediction scores to build positivelinks. k means the cut off number. Specifically, we use the learned embeddings and calculate theprediction scores by the trained model on the test nodes. We take Recall @ k and Precision @ k asthe evaluation metrics here. The results are shown in Table 4,5,6, from which we have the followingobservations: • DVE outperforms other baselines of

Recall @ k and Precision @ k on all datasets. Comparedto SiNE on Recall @50, DVE even reaches a 2.6% improvement on Epinions and a 2.7%improvement on Slashdot and a 6.8% increase on Wiki. Compared to the baseline methods thatignores the high-order topology, DVE integrally extracts both the first-order and high-order topology, and learns more representative node embeddings for signed directed networks. • Compared with other baselines (SiNE, SNE, MF), BPWR has better ability to mine the relativecloseness relationships among nodes. For example, SiNE defines a limited distance metric byconsidering only signed links. In contrary, BPWR mines the mediator value of non-existentlinks. Furthermore, compared to SNEA-MR, BPWR in personalized ranking formulation is asmooth objective function and can be easily optimized by gradient based algorithms.In order to investigate whether there is a statistical improvement of our method, we further conductt-test experiments with 10 times for each setting. Results are shown in Table 3,4,5,6.From these tables, it is clear that the improvement of DVE is statistical significant with p-value<0.01 in comparison with baseline methods. The statistical improvement of DVE over BPWRindicates the importance of high-order topological information extracted by decoupled variationalencoder. Note that on link sign prediction task for Epinions dataset, p-value of DVE and DE islarger than 0.05, which indicates the improvement is not statistical significant. While for othertwo datasets, we get the opposite figures and conclusion. This is because DE is the non-variationalvariant of DVE and whether the variational one presents a better performance relies on the datadistribution and prior distribution, which does not serve as a conflict of our main idea.

The Effect of Sparse Training Data . We investigate the effect of sparse training data onmodel performance. In particular, we vary the ratio of the 80% training data as new train data andkeep the 20% test data fixed. Results are shown in Figure 4. From this figure, we see that:

ACM Trans. Web, Vol. 1, No. 1, Article . Publication date: August 2019. ecoupled Variational Embedding for Signed Directed Networks 19 • Generally, the performance of each model decreases with the decline of training data, whichindicates that sparse data causes deterioration to the model performance. Meanwhile, DVEconsistently reaches the best performance in most sparse cases, which shows DVE has betteradaptability in comparison with the baseline methods. • For node recommendation task in Figure 4 (d)(e)(f), it is obvious that both MF and SNEperform worse than other methods (e.g. SiNE, BPWR) in most cases, because MF and SNE arenot ranking based loss. In contrary, SiNE and BPWR are both based on ranking loss which ismore advantageous in node recommendation task.

20 40 60 80 100 x% of train data A U C (a) Epinions - AUC

20 40 60 80 100 x% of train data A U C (b) Slashdot - AUC

20 40 60 80 100 x% of train data A U C MFSNESiNEBPWRSLVEDEDVE (c) Wiki - AUC

20 40 60 80 100 x% of train data R e c a ll @ (d) Epinions - Recall@50

20 40 60 80 100 x% of train data R e c a ll @ (e) Slashdot - Recall@50

20 40 60 80 100 x% of train data R e c a ll @ MFSNESiNEBPWRSLVEDEDVE (f) Wiki - Recall@50Fig. 4. Comparison of methods with different training data on Epinions, Slashdot and Wiki for two tasks.AUC in (a)(b)(c) is the metric for link sign prediction task and Recall@50 in (d)(e)(f) is the metric for noderecommendation task.

The Effect of Different Latent Dimensions . The latent dimension of embeddings is animportant factor that accounts for the model performance in network embedding. We thus conductan experiment to investigate the effect of different latent dimensions varying in [ , , , , ] .The results are shown in Figure 5. From this figure, we have the following observations: • The proposed methods (BPWR, DE and DVE) consistently outperform other baselines withdifferent latent dimensions. DVE achieves the best performance in most cases, because DVE isthe only method that simultaneously captures both the first-order and high-order topologicalinformation in signed directed networks. • It is worthwhile to notice that DVE tends to reach a better performance at a higher dimensionin comparison with SiNE. DVE considering both the high-order and the first-order topologyrequires a high dimension to encode the additional information. As for the baselines that onlyconsiders the first-order topology, when the dimension is high, the information in learnedembeddings tends to be redundant and yields unsatisfying performance on the test set.

ACM Trans. Web, Vol. 1, No. 1, Article . Publication date: August 2019.

16 32 64 128 256

Latent Dimension A U C (a) Epinions - AUC

16 32 64 128 256

Latent Dimension R e c a ll @ MFSNESiNEBPWRSLVEDEDVE (b) Slashdot - Recal@50

16 32 64 128 256

Latent Dimension A U C (c) Wiki - AUC

16 32 64 128 256

Latent Dimension R e c a ll @ MFSNESiNEBPWRSLVEDEDVE (d) Epinions - Recall@50

16 32 64 128 256

Latent Dimension A U C (e) Slashdot - AUC

16 32 64 128 256

Latent Dimension R e c a ll @ MFSNESiNEBPWRSLVEDEDVE (f) Wiki - Recall@50Fig. 5. Comparison of methods with different latent dimension on Epinions, Slashdot and Wiki for two tasks.AUC in (a)(c)(e) is the metric for link sign prediction task and Recall@50 in (b)(d)(f) is the metric for noderecommendation task.

Empirical Running Time Analysis.

To investigate the time complexity, we conduct anexperiment to compare the empirical running time in each epoch of different methods. In particular,we set the training batch number is 1,000 for all methods. For the baselines (MF, SNE, SiNE,SIDE), we follow the hyper-parameter settings in the source codes provided by the authors. Allthese methods are implemented with deep learning programming frameworks such as Pytorch,

ACM Trans. Web, Vol. 1, No. 1, Article . Publication date: August 2019. ecoupled Variational Embedding for Signed Directed Networks 21

MF SNE SiNE SIDE BPWR DVE-N DVE-H DVE-Q

Method0102030405060 C o s t T i m e ( s ) / E p o c h (a) Epinions MF SNE SiNE SIDE BPWR DVE-N DVE-H DVE-Q

Method010203040 C o s t T i m e ( s ) / E p o c h (b) Slashdot MF SNE SiNE SIDE BPWR DVE-N DVE-H DVE-Q

Method01020304050607080 R unn i n g T i m e ( s ) / E p o c h (c) WikiFig. 6. The empirical running time in each epoch of different methods. In this figure, DVE-N indicates thenon-parallel DVE, DVE-H means the half-parallel one and the DVE-Q denotes the quarter-parallel one.Table 7. Different generative functions for f s . In this table, [· , ·] means the concatenation operation and W C ∈ R d × d is the weight of MLP for concatenation. ⊙ indicates the element-wise product operation and W E ∈ R d × d is the weight of MLP for element-wise product. type formulaconcat Z s = [ Z ps , Z ns ] concat+MLP Z s = ([ Z ps , Z ns ]) W C element-wise product Z s = Z ps ⊙ Z ns element-wise product+MLP Z s = ( Z ps ⊙ Z ns ) W E Tensorflow or Theano. For DVE, we implement it with Tensorflow and the sampling size n noise ofnon-existent links is 5,5,20 on Epinions, Slashdot and Wiki, respectively. Since DVE has parallelversions, we thus denote DVE-N as the non-parallel one, DVE-H as the half-parallel one and DVE-Qas the quarter-parallel one. We conduct the experiments 10 times on the same machine with oneNvidia-1080 GPU. The mean value of running time per epoch is reported in Figure 6. We see that: • MF costs the least time because of its simple scheme. SNE and SIDE involving the softmaxoperation consume much time than other methods. The proposed DVE in non-parallel version(DVE-N) generally costs more time than MF and SiNE, because DVE is more complex tocapture both the high-order and first-order topological information. • DVE-H and DVE-Q take much less time than DVE-N. They are even faster than SiNE in somecases. Meanwhile, DVE provides better performance in comparison with SiNE. In summary,the decoupling idea has advantages to accelerate the training process as well as learns morerepresentative node embeddings for signed directed networks.

The Effect of Different Generative Functions . Remind that we assume the source nodeembeddings Z s could be generated through Z s = f s ( Z ps , Z ns ) , where f s is the generative function.In order to explore the influence of different generative functions, we conduct an experiment withvarious functions that are defined in Table 7. Note that we only use f s as an example to illustratethe experiment setting here. The target node representation Z t has similar formulation, while thenotations are Z pt , Z nt and f t . The results are shown in Figure 7.From Figure 7, we observe two interesting phenomenons: 1) concatenation does better thanconcatenation+MLP and inner product does better than element-wise product+MLP; 2) concate-nation performs better than element-wise product and concatenation+MLP performs better than ACM Trans. Web, Vol. 1, No. 1, Article . Publication date: August 2019. concat concat+MLP element element+MLP interaction style0.000.200.400.600.801.00 A U C (a) Epinions - AUC concat concat+MLP element element+MLP interaction style0.000.200.400.600.80 A U C (b) Slashdot - AUC concat concat+MLP element element+MLP interaction style0.000.200.400.600.80 A U C (c) Wiki - AUC concat concat+MLP element element+MLP interaction style0.000.020.040.060.080.10 R e c a ll @ (d) Epinions - Recall@50 concat concat+MLP element element+MLP interaction style0.000.020.040.060.080.100.120.14 R e c a ll @ (e) Slashdot - Recall@50 concat concat+MLP element element+MLP interaction style0.000.030.050.080.100.120.150.18 R e c a ll @ (f) Wiki - Recall@50Fig. 7. Performance of DVE with different generative functions. AUC in (a)(b)(c) refers to the metric for linksign prediction task. Recall@50 in (d)(e)(f) is the metric for node recommendation task. element-wise product+MLP. The main reason for the first phenomenon may be that additionaltrainable parameters lead to over-fitting on the sparse graph data. For the second phenomenon, itis because both Z ps and Z ns are learned with distinctive deep neural networks, which indicates theyare in different latent spaces. The aligned element-wise product may lead to information loss torepresent source node embeddings Z s . Therefore, concatenation operation tends to be the mostsuitable choice among them in terms of both efficiency and easy implementation. Hyper-Parameter Sensitivity . In DVE, the two hyper-parameters are the number ofGCN layer n GCN and the noise sampling size n noise . n GCN controls the order of a node’s localstructures and n noise influence the sampling size of non-existent links. We here investigate themodel sensitivity on these two hyper-parameters. The results are show in Figure 8, from which wehave the following observations: • DVE achieves its best performance on different datasets when n GCN = , n noise = , n GCN indicates that most useful topological information is within low-order neighborhoods. While the noise sampling size n noise varies a lot on different datasets,which means n noise is better chosen according to the datasets. • From Figure 8 (a)(b)(c) for link sign prediction task, we see that the performance of link signprediction does not change a lot (0.958 ∼ ∼ ∼ n GCN and n noise . However, as shown in Figure 8 (d)(e)(f), the node recommendationperformance changes obviously (0.10 ∼ ∼ ∼ n GCN and n noise . ACM Trans. Web, Vol. 1, No. 1, Article . Publication date: August 2019. ecoupled Variational Embedding for Signed Directed Networks 23 number of GCNs layer A U C (a) Epinions - AUC number of GCNs layer A U C (b) Slashdot - AUC number of GCNs layer A U C noise=5noise=10noise=15noise=20 (c) Wiki - AUC number of GCNs layer R e c a ll @ (d) Epinions - Recall@50 number of GCNs layer R e c a ll @ (e) Slashdot - Recall@50 number of GCNs layer R e c a ll @ noise=5noise=10noise=15noise=20 (f) Wiki - Recall@50Fig. 8. Performance of DVE with different parameter settings. AUC in (a)(b)(c) is the metric for link signprediction task and Recall@50 in (d)(e)(f) is the metric for node recommendation task. Epoch T e s t A U C (a) Epinions - AUC Epoch T e s t A U C (b) Slashdot - AUC Epoch T e s t A U C dropout rate=0.6dropout rate=0.4dropout rate=0.2dropout rate=0.0 (c) Wiki - AUC Epoch T e s t R e c a ll @ (d) Epinions - Recall@50 Epoch T e s t R e c a ll @ (e) Slashdot - Recall@50 Epoch T e s t R e c a ll @ dropout rate=0.6dropout rate=0.4dropout rate=0.2dropout rate=0.0 (f) Wiki - Recall@50Fig. 9. Performance of DVE with different dropout rates. Dropout rate=0.0 means dropout keep probability is1.0 during training. AUC in (a)(b)(c) is the metric for link sign prediction task and Recall@50 in (d)(e)(f) is themetric for node recommendation task. ACM Trans. Web, Vol. 1, No. 1, Article . Publication date: August 2019.

Dropout Regularization . In our method, we apply Dropout for regularization. In order tostudy the influence of Dropout, we investigate the model performance with different Dropout ratesalong the training process. The results are shown in Figure 9.From Figure 9, we can see that different Dropout rates may have different influence on differentdatasets. In Figure 9 (a)(d) for Epinions, it is obvious that DVE reaches its best performance whendropout=0.0, which means the Dropout keep probability equals 1.0 is better for Epinions. Thereason for this may be that the data distribution is complex and no Dropout encourages the modelto fit data better. In contrary, in Figure 9 (b)(e) for Slashdot and Figure 9 (c)(f) for Wiki, Dropoutrate equals 0.2 facilitates better performance, which indicates DVE needs necessary regularizationon these two datasets to avoid over-fitting. (a) MF (b) SNE (c) SiNE(d) BPWR (e) SLVE (f) DVEFig. 10. t-SNE visualization of topology preservation in Epinions. The node in red color means the sampledcentral node i as source node. The nodes in blue color represents the positively linked neighbors N p ( i ) andnodes in green color are the negatively linked neighbors N n ( i ) . While the yellow ones are randomly samplednon-linked nodes N un ( i ) for center node i . Both N p ( i ) and N n ( i ) are target nodes. Topology Preservation.

Signed directed networks have complex topology pattern andthere are some obvious topology characteristics. If we denote i as a source node, N p ( i ) as thepositively linked target neighbors, N n ( i ) as negatively linked target neighbors and N un ( i ) as thenon-linked neighbors, there are several characteristics of topology in signed directed networks: • N p ( i ) , N n ( i ) and N un ( i ) tend to be three clusters since they play different roles for node i ; • Closeness between N p ( i ) and node i tends to be larger than that between N un ( i ) and node i ; • Closeness between N un ( i ) and node i is larger than that between N n ( i ) and node i .In order to study whether the learned node embeddings preserve the above characteristics, weconduct an experiment about node embedding visualization. In particular, we randomly sample asource node i whose number of directly linked neighbors is larger than 100 from Epinions. The ACM Trans. Web, Vol. 1, No. 1, Article . Publication date: August 2019. ecoupled Variational Embedding for Signed Directed Networks 25 positively linked neighbors N p ( i ) and negatively linked neighbors N n ( i ) are both from targetnodes. Next, we also randomly sample some non-linked nodes N un ( i ) . Finally, we visualize thecorresponding embeddings with t-SNE [37] for 6 methods. The results are shown in Figure 10.From the figure, we can summarize that: • DVE has the best visualization performance in terms of the well clustered nodes and clearcloseness pattern among different types of nodes. For SNE in Figure 10 (b), we can see that theclosest neighbors for the central node are non-linked nodes and the positively linked nodesare not well clustered. For SiNE in Figure 10 (c), the nodes are not well distributed and it iseven impossible to recognize some nodes. MF in Figure 10 (a) clusters the positively linkednodes and non-linked nodes well but fails in negatively linked nodes. The central node inred color is in the marginal part, which is not reasonable according to the actual central nodepattern. Compare to these three competitive baselines, our proposed methods BPWR andDVE in Figure 10 (d) and (f) are capable of learning the distributed and well clustered nodeembeddings. The central node in red color are surrounded by the positively linked nodes inblue color. The clear closeness pattern among different types of nodes as well matches thefact that we have illustrated before. These advantages are benefited from modeling both the first-order and high-order topology in signed directed networks. • DVE models the distinctive influence of messaging propagation in signed directed networks,and yields better topology preservation. Compared to DVE in Figure 10 (f), SLVE in Figure 10(e) tends to mix positively linked nodes and non-linked nodes. Moreover, the central nodein red color is false positioned in the middle part of positively linked nodes and non-linkednodes. This is because SLVE applies homophily effects with different signs, which cannotmodel the distinctive influence of message propagation. In contrary, DVE with decoupledvariational encoder can learn distinctive effects for different signs and better preserve thenetwork topology.

Closeness Distribution.

In signed directed networks, positive edges mean trust/friendwhile negative edges represent distrust/enemy and the non-existent edges may both have theprobability to be positive ones or negative ones. According to Extended Structural Balance theory,different types of node pairs pose different closeness distributions and we have the following rules: • The similarity between positively linked node pairs is expected to be large because of thesemantics of positive edges; • The similarity between negatively linked node pairs should be small due to the negativemeaning of negative edges; • For the node pairs with non-existent edges, they have potential to be either positive ornegative relation, and should be in the middle position between positively linked node pairsand negatively linked ones.Thereby, we conduct an experiment to investigate whether DVE has better ability of preservingthe closeness distribution pattern. In particular, we visualize the estimated Probability DensityFunction (PDF) of different node pairs on Slashdot for 6 methods. In particular, we calculate thecosine similarity of all positively linked node pairs, negatively linked node pairs and randomlysampled non-linked node pairs by leveraging the learned embeddings from 6 methods. The estimatedPDF curves of cosine similarity are shown in Figure 11. The red curve, yellow curve and greencurve indicate the estimated PDF curve for positively linked node pairs, negatively linked nodepairs and non-linked node pairs, respectively. From this figure, we have the following observations: • From Figure 11 (a)(b)(c), we can see that the baseline methods MF, SNE and SiNE all exhibithigh overlap of different curves, especially for SNE in Figure 11. This indicates these methods

ACM Trans. Web, Vol. 1, No. 1, Article . Publication date: August 2019. (a) MF (b) SNE (c) SiNE(d) BPWR (e) SLVE (f) DVEFig. 11. Estimated probability density function of different types of node pairs on Slashdot for 6 methods.The red curve means the estimated PDF (Probability Density Function) of cosine similarity among positivelylinked node pairs. Similarly, the yellow curve and green curve denote the estimated PDF among the negativelylinked node pairs and non-linked node pairs, respectively. are not capable of capturing the different closeness distribution patterns of different nodepairs. By contrast, considering the results of BPWR and DVE in Figure 11 (d)(e)(f), it isobvious that the three curves show different distributions. Meanwhile, BPWR and DVEfollow the closeness rules in which positively linked node pairs have highest cosine similarity,non-linked node pairs have the second and negatively linked ones have the last. • In addition, in Figure 11 (d)(e) for BPWR and SLVE, the estimated PDF of non-linked nodepairs in green color tends to have more overlap with the other curves, which may lead to in-distinguishable node embeddings. Instead, DVE in Figure 11 (f) presents both distinguishableestimated PDF curves with smaller overlap and obvious cosine similarity gap among differentkinds of node pairs. This indicates DVE can better preserve the closeness distribution patternin signed directed networks.

In this paper, we reformulate the representation learning problem on signed directed networks froma variational auto-encoding perspective and further propose a decoupled variational embedding(DVE) method to learn representative node embeddings. DVE is capable of preserving both the first-order and high-order topology for signed directed networks. In particular, DVE consists of adecoupled variational encoder and a structure decoder. The decoupled variational encoder captureslocal structures and provides informative node embeddings for the structure decoder. Meanwhile,the structure decoder mines the closeness relationships among positive, negative and non-existentlinks in a pair-wise ranking manner, as well as supervises embedding learning in the encoder

ACM Trans. Web, Vol. 1, No. 1, Article . Publication date: August 2019. ecoupled Variational Embedding for Signed Directed Networks 27 module. Performance on three real-world datasets of two tasks proves the superiority of DVEcompared to recent competitive baselines.Remind that DVE constructs source node embeddings Z s just by the limited concatenationoperation of two latent embeddings Z ps and Z ns . Observing unbalance between positive links andnegative links from data, source node embeddings Z s may follow some distribution through Z ps and Z ns . We will explore how to better model the interaction between Z ps and Z ns , and constructsource node embeddings Z s more reasonably to pursue better performance. ACKNOWLEDGMENTS

This work is supported by the National Key Research and Development Program of China (No.2019YFB1804304), SHEITC (No. 2018-RGZN-02046), 111 plan (No. BP0719010), and STCSM (No.18DZ2270700), and State Key Laboratory of UHD Video and Audio Production and Presentation.

REFERENCES [1] Luca Maria Aiello, Alain Barrat, Rossano Schifanella, Ciro Cattuto, Benjamin Markines, and Filippo Menczer. 2012.Friendship Prediction and Homophily in Social Media.

ACM Trans. Web

6, 2, Article 9 (June 2012), 33 pages. https://doi.org/10.1145/2180861.2180866[2] Peter W Battaglia, Jessica B Hamrick, Victor Bapst, Alvaro Sanchez-Gonzalez, Vinicius Zambaldi, Mateusz Malinowski,Andrea Tacchetti, David Raposo, Adam Santoro, Ryan Faulkner, et al. 2018. Relational inductive biases, deep learning,and graph networks. arXiv preprint arXiv:1806.01261 (2018).[3] Mikhail Belkin and Partha Niyogi. 2002. Laplacian eigenmaps and spectral techniques for embedding and clustering.In

Advances in neural information processing systems . 585–591.[4] Smriti Bhagat, Graham Cormode, and S Muthukrishnan. 2011. Node classification in social networks. In

Social networkdata analytics . Springer, 115–148.[5] Yuri Burda, Roger Grosse, and Ruslan Salakhutdinov. 2015. Importance weighted autoencoders. arXiv preprintarXiv:1509.00519 (2015).[6] Fidel Cacheda, Roi Blanco, and Nicola Barbieri. 2018. Characterizing and Predicting Users&

ACM Trans. Web

12, 2, Article 11 (May 2018), 32 pages. https://doi.org/10.1145/3157059[7] Dorwin Cartwright and Frank Harary. 1956. Structural balance: a generalization of Heider’s theory.

Psychologicalreview

63, 5 (1956), 277.[8] Jie Chen, Tengfei Ma, and Cao Xiao. 2018. Fastgcn: fast learning with graph convolutional networks via importancesampling. arXiv preprint arXiv:1801.10247 (2018).[9] Jianfei Chen, Jun Zhu, and Le Song. 2017. Stochastic training of graph convolutional networks with variance reduction. arXiv preprint arXiv:1710.10568 (2017).[10] Wenhu Chen, Wenhan Xiong, Xifeng Yan, and William Wang. 2018. Variational Knowledge Graph Reasoning. arXivpreprint arXiv:1803.06581 (2018).[11] Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. 2016. Convolutional neural networks on graphs withfast localized spectral filtering. In

Advances in Neural Information Processing Systems . 3844–3852.[12] Tyler Derr, Yao Ma, and Jiliang Tang. 2018. Signed Graph Convolutional Network. arXiv preprint arXiv:1808.06354 (2018).[13] Carl Doersch. 2016. Tutorial on variational autoencoders. arXiv preprint arXiv:1606.05908 (2016).[14] Yuxiao Dong, Jing Zhang, Jie Tang, Nitesh V Chawla, and Bai Wang. 2015. Coupledlp: Link prediction in couplednetworks. In

Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining .ACM, 199–208.[15] Rossano Gaeta. 2018. A Model of Information Diffusion in Interconnected Online Social Networks.

ACM Trans. Web

12, 2, Article 13 (June 2018), 21 pages. https://doi.org/10.1145/3160000[16] Jean Gallier. 2016. Spectral theory of unsigned and signed graphs. applications to graph clustering: a survey. arXivpreprint arXiv:1601.04692 (2016).[17] Karol Gregor, Ivo Danihelka, Alex Graves, Danilo Jimenez Rezende, and Daan Wierstra. 2015. Draw: A recurrentneural network for image generation.

Proceedings of the 32nd International Conference on Machine Learning, PMLR37:1462-1471, 2015 (2015).[18] Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. In

Proceedings of the 22ndACM SIGKDD international conference on Knowledge discovery and data mining . ACM, 855–864.ACM Trans. Web, Vol. 1, No. 1, Article . Publication date: August 2019. [19] Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. In

Advancesin Neural Information Processing Systems . 1024–1034.[20] William L Hamilton, Rex Ying, and Jure Leskovec. 2017. Representation learning on graphs: Methods and applications. arXiv preprint arXiv:1709.05584 (2017).[21] Ruining He and Julian McAuley. 2016. VBPR: visual bayesian personalized ranking from implicit feedback. In

ThirtiethAAAI Conference on Artificial Intelligence .[22] Cho-Jui Hsieh, Kai-Yang Chiang, and Inderjit S Dhillon. 2012. Low rank modeling of signed networks. In

Proceedingsof the 18th ACM SIGKDD international conference on Knowledge discovery and data mining . ACM, 507–515.[23] Ya Zhang Huangjie Zheng, Jiangchao Yao and Ivor W. Tsang. 2018. Degeneration in VAE: in the Light of FisherInformation Loss.

ArXiv abs/1802.06677 (2018).[24] Danilo Jimenez Rezende and Shakir Mohamed. 2015. Variational inference with normalizing flows. arXiv preprintarXiv:1505.05770 (2015).[25] Andrej Karpathy et al. 2016. Cs231n convolutional neural networks for visual recognition.

Neural networks

Proceedings of the 2018 World Wide Web Conference on World Wide Web . International World Wide Web ConferencesSteering Committee, 509–518.[27] Diederik P Kingma and Max Welling. 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013).[28] Thomas Kipf, Ethan Fetaya, Kuan-Chieh Wang, Max Welling, and Richard Zemel. 2018. Neural relational inference forinteracting systems. arXiv preprint arXiv:1802.04687 (2018).[29] Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXivpreprint arXiv:1609.02907 (2016).[30] Thomas N Kipf and Max Welling. 2016. Variational graph auto-encoders. arXiv preprint arXiv:1611.07308 (2016).[31] Jérôme Kunegis, Julia Preusse, and Felix Schwagereit. 2013. What is the added value of negative links in online socialnetworks?. In

Proceedings of the 22nd international conference on World Wide Web . ACM, 727–736.[32] Jérôme Kunegis, Stephan Schmidt, Andreas Lommatzsch, Jürgen Lerner, Ernesto W De Luca, and Sahin Albayrak.2010. Spectral analysis of signed graphs for clustering, prediction and visualization. In

Proceedings of the 2010 SIAMInternational Conference on Data Mining . SIAM, 559–570.[33] Matt J Kusner, Brooks Paige, and José Miguel Hernández-Lobato. 2017. Grammar variational autoencoder. In

Proceedingsof the 34th International Conference on Machine Learning-Volume 70 . JMLR. org, 1945–1954.[34] Jure Leskovec, Daniel Huttenlocher, and Jon Kleinberg. 2010. Predicting positive and negative links in online socialnetworks. In

Proceedings of the 19th international conference on World wide web . ACM, 641–650.[35] David Liben-Nowell and Jon Kleinberg. 2007. The link-prediction problem for social networks.

Journal of the Americansociety for information science and technology

58, 7 (2007), 1019–1031.[36] Qiang Liu, Shu Wu, and Liang Wang. 2017. DeepStyle: Learning user preferences for visual recommendation. In

Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval . ACM,841–844.[37] Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE.

Journal of machine learning research

9, Nov (2008), 2579–2605.[38] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of wordsand phrases and their compositionality. In

Advances in neural information processing systems . 3111–3119.[39] Mingdong Ou, Peng Cui, Jian Pei, Ziwei Zhang, and Wenwu Zhu. 2016. Asymmetric transitivity preserving graphembedding. In

Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining .ACM, 1105–1114.[40] Symeon Papadopoulos, Yiannis Kompatsiaris, Athena Vakali, and Ploutarchos Spyridonos. 2012. Community detectionin social media.

Data Mining and Knowledge Discovery

24, 3 (2012), 515–554.[41] Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deepwalk: Online learning of social representations. In

Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining . ACM, 701–710.[42] Yunchen Pu, Zhe Gan, Ricardo Henao, Xin Yuan, Chunyuan Li, Andrew Stevens, and Lawrence Carin. 2016. Variationalautoencoder for deep learning of images, labels and captions. In

Advances in neural information processing systems .2352–2360.[43] Yi Qian and Sibel Adali. 2013. Extended structural balance theory for modeling trust in social networks. In

Privacy,Security and Trust (PST), 2013 Eleventh Annual International Conference on . IEEE, 283–290.[44] Yi Qian and Sibel Adali. 2014. Foundations of Trust and Distrust in Networks: Extended Structural Balance Theory.

ACM Trans. Web

8, 3, Article 13 (July 2014), 33 pages. https://doi.org/10.1145/2628438[45] Jiezhong Qiu, Yuxiao Dong, Hao Ma, Jian Li, Kuansan Wang, and Jie Tang. 2018. Network embedding as matrixfactorization: Unifying deepwalk, line, pte, and node2vec. In

Proceedings of the Eleventh ACM International Conferenceon Web Search and Data Mining . ACM, 459–467.ACM Trans. Web, Vol. 1, No. 1, Article . Publication date: August 2019. ecoupled Variational Embedding for Signed Directed Networks 29 [46] Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. 2009. BPR: Bayesian personalizedranking from implicit feedback. In

Proceedings of the twenty-fifth conference on uncertainty in artificial intelligence .AUAI Press, 452–461.[47] Steffen Rendle and Lars Schmidt-Thieme. 2010. Pairwise interaction tensor factorization for personalized tag recom-mendation. In

Proceedings of the third ACM international conference on Web search and data mining . ACM, 81–90.[48] Danilo Jimenez Rezende and Shakir Mohamed. 2015. Variational inference with normalizing flows. arXiv preprintarXiv:1505.05770 (2015).[49] Tim Salimans, Diederik Kingma, and Max Welling. 2015. Markov chain monte carlo and variational inference: Bridgingthe gap. In

International Conference on Machine Learning . 1218–1226.[50] Xiaobo Shen, Shirui Pan, Weiwei Liu, Yew-Soon Ong, and Quan-Sen Sun. 2018. Discrete network embedding. In

Proceedings of the 27th International Joint Conference on Artificial Intelligence . AAAI Press, 3549–3555.[51] Kihyuk Sohn, Honglak Lee, and Xinchen Yan. 2015. Learning structured output representation using deep conditionalgenerative models. In

Advances in neural information processing systems . 3483–3491.[52] Jiliang Tang, Shiyu Chang, Charu Aggarwal, and Huan Liu. 2015. Negative link prediction in social media. In

Proceedingsof the Eighth ACM International Conference on Web Search and Data Mining . ACM, 87–96.[53] Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. 2015. Line: Large-scale informationnetwork embedding. In

Proceedings of the 24th International Conference on World Wide Web . International World WideWeb Conferences Steering Committee, 1067–1077.[54] Tijmen Tieleman and Geoffery Hinton. 2014. RMSprop gradient optimization. (2014).[55] Jakub M Tomczak and Max Welling. 2017. VAE with a VampPrior. arXiv preprint arXiv:1705.07120 (2017).[56] Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. 2017. Graphattention networks. arXiv preprint arXiv:1710.10903 (2017).[57] Petar Velickovic, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. 2017. Graphattention networks. arXiv preprint arXiv:1710.10903

1, 2 (2017).[58] Patricia Victor, Nele Verbiest, Chris Cornelis, and Martine De Cock. 2013. Enhancing the Trust-based RecommendationProcess with Explicit Distrust.

ACM Trans. Web

7, 2, Article 6 (May 2013), 19 pages. https://doi.org/10.1145/2460383.2460385[59] Daixin Wang, Peng Cui, and Wenwu Zhu. 2016. Structural deep network embedding. In

Proceedings of the 22nd ACMSIGKDD international conference on Knowledge discovery and data mining . ACM, 1225–1234.[60] Hongwei Wang, Jia Wang, Miao Zhao, Jiannong Cao, and Minyi Guo. 2017. Joint topic-semantic-aware socialrecommendation for online voting. In

Proceedings of the 2017 ACM on Conference on Information and KnowledgeManagement . ACM, 347–356.[61] Hongwei Wang, Fuzheng Zhang, Min Hou, Xing Xie, Minyi Guo, and Qi Liu. 2018. Shine: Signed heterogeneousinformation network embedding for sentiment link prediction. In

Proceedings of the Eleventh ACM InternationalConference on Web Search and Data Mining . ACM, 592–600.[62] Suhang Wang, Charu Aggarwal, Jiliang Tang, and Huan Liu. 2017. Attributed signed network embedding. In

Proceedingsof the 2017 ACM on Conference on Information and Knowledge Management . ACM, 137–146.[63] Suhang Wang, Jiliang Tang, Charu Aggarwal, Yi Chang, and Huan Liu. 2017. Signed network embedding in socialmedia. In

Proceedings of the 2017 SIAM international conference on data mining . SIAM, 327–335.[64] Suhang Wang, Jiliang Tang, Charu Aggarwal, and Huan Liu. 2016. Linked document embedding for classification. In

Proceedings of the 25th ACM international on conference on information and knowledge management . ACM, 115–124.[65] Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and Philip S Yu. 2019. A comprehensivesurvey on graph neural networks. arXiv preprint arXiv:1901.00596 (2019).[66] Bingbing Xu, Huawei Shen, Qi Cao, Keting Cen, and Xueqi Cheng. 2019. Graph convolutional networks using heatkernel for semi-supervised learning. In

Proceedings of the 28th International Joint Conference on Artificial Intelligence .AAAI Press, 1928–1934.[67] Mingzhang Yin and Mingyuan Zhou. 2018. Semi-implicit variational inference.

International Conference on MachineLearning (2018).[68] Shuhan Yuan, Xintao Wu, and Yang Xiang. 2017. SNE: signed network embedding. In

Pacific-Asia conference onknowledge discovery and data mining . Springer, 183–195.[69] Xianchao Zhang, Zhaoxing Li, Shaoping Zhu, and Wenxin Liang. 2016. Detecting Spam and Promoting Campaigns inTwitter.

ACM Trans. Web

10, 1, Article 4 (Feb. 2016), 28 pages. https://doi.org/10.1145/2846102[70] Ziwei Zhang, Peng Cui, and Wenwu Zhu. 2018. Deep learning on graphs: A survey. arXiv preprint arXiv:1812.04202 (2018).[71] Shengjia Zhao, Jiaming Song, and Stefano Ermon. 2017. Infovae: Information maximizing variational autoencoders. arXiv preprint arXiv:1706.02262 (2017). ACM Trans. Web, Vol. 1, No. 1, Article . Publication date: August 2019. [72] Shengjia Zhao, Jiaming Song, and Stefano Ermon. 2017. Towards deeper understanding of variational autoencodingmodels. arXiv preprint arXiv:1702.08658 (2017).[73] Huangjie Zheng, Jiangchao Yao, Ya Zhang, and Ivor W Tsang. 2018. Degeneration in VAE: in the light of fisherinformation loss. arXiv preprint arXiv:1802.06677 (2018).[74] Huangjie Zheng, Jiangchao Yao, Ya Zhang, Ivor W Tsang, and Jia Wang. 2019. Understanding vaes in fisher-shannonplane. In

Proceedings of the AAAI Conference on Artificial Intelligence , Vol. 33. 5917–5924.[75] Huangjie Zheng, Jiangchao Yao, Ya Zhang, and Ivor Wai-Hung Tsang. 2019. Understanding VAEs in Fisher-ShannonPlane. In

Proceedings of the 33rd Association for the Advancement of Artificial Intelligence .[76] Chang Zhou, Yuqiong Liu, Xiaofei Liu, Zhongyi Liu, and Jun Gao. 2017. Scalable Graph Embedding for AsymmetricProximity.. In

AAAI . 2942–2948.

A DETAILED DERIVATION

The detailed derivation of the

ELBO in Eq. 12 is shown as follows.log P (E) = ∫ q ϕ ( Z s , Z t |E) log p θ (E) dZ s dZ t (24) = ∫ q ϕ ( Z s , Z t |E) log p θ (E , Z s , Z t ) p θ ( Z s , Z t |E) dZ s dZ t (25) = ∫ q ϕ ( Z s , Z t |E) log p θ (E , Z s , Z t ) q ϕ ( Z s , Z t |E) · q ϕ ( Z s , Z t |E) p θ ( Z s , Z t |E) dZ s dZ t (26) = ∫ q ϕ ( Z s , Z t |E) log p θ (E , Z s , Z t ) q ϕ ( Z s , Z t )|E dZ s dZ t + ∫ q ϕ ( Z s , Z t ) log q ϕ ( Z s , Z t |E) p θ ( Z s , Z t |E) dZ s dZ t (27) = ∫ q ϕ ( Z s , Z t |E) log p θ (E , Z s , Z t ) q ϕ ( Z s , Z t |E) dZ s dZ t + D KL [ q ϕ ( Z s , Z t )|E || p θ ( Z s , Z t |E)] (28)We then have the formulation of the ELBO in Eq. 10 as: L = ∫ q ϕ ( Z s , Z t |E) log p θ (E , Z s , Z t ) q ϕ ( Z s , Z t |E) dZ s dZ t (29) = ∫ q ϕ ( Z s , Z t |E) log p θ ( Z s , Z t ) q ϕ ( Z s , Z t |E) dZ s dZ t + ∫ q ϕ ( Z s , Z t |E) log p ψ (E | Z s , Z t ) dZ s dZ t (30) = − D KL [ q ϕ ( Z s , Z t |E)| p θ ( Z s , Z t )] + E q ϕ ( Z s , Z t |E) [ p ψ (E | Z s , Z t )] (31)Following the proposition and prior assumption, we have the ELBO in Eq. 12 as follows: L = − D KL [ q ϕ ( Z s , Z t |E)| p θ ( Z s , Z t )] + E q ϕ ( Z s , Z t |E) [ p ψ (E | Z s , Z t )] (32) = ∫ q ϕ s ( Z s |E) q ϕ t ( Z s |E) log p θ ( Z s ) p θ ( Z t ) q ϕ s ( Z s |E) q ϕ t ( Z s |E) dZ s dZ t + E q ϕ ( Z s , Z t |E) [ p ψ (E | Z s , Z t )] (33) = − D KL [ q ϕ s ( Z s |E)|| p θ ( Z s )] − D KL [ q ϕ t ( Z t |E)|| p θ ( Z t )] + E q ϕ ( Z s , Z t |E) [ p ψ (E | Z s , Z t )] (34)(34)