[PDF] Analysis of node2vec random walks on networks

Abstract

Random walks have been proven to be useful for constructing various algorithms to gain information on networks. Algorithm node2vec employs biased random walks to realize embeddings of nodes into low-dimensional spaces, which can then be used for tasks such as multi-label classification and link prediction. The usefulness of node2vec in these applications is considered to be contingent upon properties of random walks that the node2vec algorithm uses. In the present study, we theoretically and numerically analyze random walks used by the node2vec. The node2vec random walk is a second-order Markov chain. We exploit the mapping of its transition rule to a transition probability matrix among directed edges to analyze the stationary probability, relaxation times, and coalescence time. In particular, we provide a multitude of evidence that node2vec random walk accelerates diffusion when its parameters are tuned such that walkers avoid both back-tracking and visiting a neighbor of the previously visited node, but not excessively.

Full PDF

AAnalysis of node2vec random walks on networks

Lingqi Meng a , Naoki Masuda a,b, ∗ a Department of Mathematics, University at Buffalo, State University of New York, Buffalo, NY 14260-2900, USA b Computational and Data-Enabled Science and Engineering Program, University at Buffalo, State University of NewYork, Buffalo, NY 14260-5030, USA

Abstract

Random walks have been proven to be useful for constructing various algorithms to gain infor-mation on networks. Algorithm node2vec employs biased random walks to realize embeddings ofnodes into low-dimensional spaces, which can then be used for tasks such as multi-label classiﬁca-tion and link prediction. The usefulness of node2vec in these applications is considered to be con-tingent upon properties of random walks that the node2vec algorithm uses. In the present study, wetheoretically and numerically analyze random walks used by the node2vec. The node2vec randomwalk is a second-order Markov chain. We exploit the mapping of its transition rule to a transitionprobability matrix among directed edges to analyze the stationary probability, relaxation times, andcoalescence time. In particular, we provide a multitude of evidence that node2vec random walkaccelerates diffusion when its parameters are tuned such that walkers avoid both back-tracking andvisiting a neighbor of the previously visited node, but not excessively.

Keywords:

Diffusion, relaxation time, coalescence time, second-order Markov chain, communitystructure, ring network.

1. Introduction

Random walks on ﬁnite networks have been a favorite research topic for decades [1, 2, 3, 4].Perhaps more importantly, random walks are a core technique for building algorithms to extractuseful information from network data. Such applications of random walks include communitydetection, ranking of nodes and edges, dimension reduction of data, sampling, to name a few [4, 5].Many theoretical, computational, and algorithmic studies have employed simple random walks onunweighted networks, which by deﬁnition dictates that a walker moves to one of its neighbors withequal probability in each time step. However, there are also various other types of random walks,many of which have been fed to random walk algorithms.The random walks developed for the algorithmic framework called the node2vec are one suchrandom walk [6]. Unlike simple random walks, transitions of node2vec random walkers not onlydepend on the degree of the currently visited node or its variant with edge weights, but also on thestructure of the local network and last visited node. Grover and Leskovec proposed node2vec forscalable feature learning on networks, which can be used in tasks such as multi-label classiﬁcation ∗ Corresponding author

Email address: [email protected] (Naoki Masuda) a r X i v : . [ phy s i c s . s o c - ph ] J un nd link prediction. In node2vec, one can tune the weight of local versus global search of thenetwork by modulating parameter values [6].To date, not much is known about behavior of node2vec random walks. Note that, among var-ious properties of random walks, the stationary probability plays a key role in ranking the nodes[4, 7, 8], and the relaxation time affects, for example, the rate of the convergence of random-walk algorithms and quality of community structure [4]. In the present study, we theoreticallyand numerically examine the node2vec random walks on ﬁnite networks. In particular, we providemultiple lines of evidence supporting that diffusion is accelerated when the parameters of node2vecrandom walks are tuned such that back-tracking and visiting the neighbors of the last visited nodeare suppressed and exploration of the rest of the network, similar to depth-ﬁrst sampling, is explic-itly promoted. This is the case unless the avoidance of local sampling including back-tracking isnot excessive.

2. Model

Consider a ﬁnite network G ( V, E ) , where V = { , . . . , N } is a ﬁnite set of nodes, N is the numberof nodes, and E = { ( i, j ) | ( i, j ) ∈ V × V and i (cid:54) = j } is a set of edges. In the present study, weassume undirected and possible weighted networks that are free of self-loops and multiple edges,although the node2vec random walks and the formalism developed below are also valid for directednetworks. Denote by v t ( t = 0 , , . . . ) the position of a random walker at discrete time t . We saythat a discrete-time random walk is node2vec if its transition probability p i → j ( t ) at time t , where ( i, j ) ∈ E , is given by p i → j ( t ) ∝  αw ij if v t − = v t +1 ,βw ij if ( v t − , v t +1 ) ∈ E,γw ij if ( v t − , v t +1 ) (cid:54)∈ E, , (1)where w ij is the weight of edge ( i, j ) , and the symbol ∝ means “proportional to” [6]. The normal-ization is given by N (cid:88) j =1 p i → j ( t ) = 1 for all i ∈ V and t > . Variable α represents the propensityfor the random walk to backtrack, β the weight of reaching a common neighbor of the currentlyvisited node and the node visited in the last step, and γ the weight of exploring any of the othernodes. A large β value implies an approximate breadth-ﬁrst sampling, and a large γ value im-plies an approximate depth-ﬁrst sampling [6]. If α = β = γ (cid:54) = 0 , the node2vec random walkis reduced to a simple random walk. If α = 0 and β = γ (cid:54) = 0 , the node2vec random walk is anon-backtracking random walk [9, 10]. Possible one-step transitions of the node2vec random walkare schematically shown in Fig. 1.Equation (1) implies that a node2vec random walk is a second-order Markov chain [6]. Inother words, the transition probability p i → j ( t ) depends on the currently visited node i and the nodevisited in the last time step (i.e., t − ), but not on the further history of the walk. To transform thenode2vec random walk into a ﬁrst-order Markov chain, we change the state space from the nodesof the network to the directed edges of the network, similar to the formation of memory networks[11, 12]. Let M denote the number of undirected edges. Let E = { e , . . . , e M } be the set of2 igure 1: Schematic of the node2vec random walk. We assume that the network is unweighted. The transitionprobability to one of the four neighbors at time t in this example is given by α/ ( α + β + 2 γ ) , β/ ( α + β + 2 γ ), or γ/ ( α + β + 2 γ ) . directed edges, which consists of each undirected edge ( u, v ) ∈ E duplicated as directed edges ( u, v ) and ( v, u ) . To avoid the abuse of notation, we use ( · , · ) to represent either an undirected ordirected edge here and in the following text, but without causing confusion. For e = ( u, v ) ∈ E ,we denote e (0) = u and e (1) = v . Under this transformation, the M × M transition probabilitymatrix T is given by T i,j ∝  αw e i (1) ,e j (1) if e i (1) = e j (0) and e i (0) = e j (1) ,βw e i (1) ,e j (1) if e i (1) = e j (0) and ( e i (0) , e j (1)) ∈ E,γw e i (1) ,e j (1) if e i (1) = e j (0) and ( e i (0) , e j (1)) (cid:54)∈ E, . (2)The normalization is given by M (cid:88) j =1 T i,j = 1 for i = 1 , , . . . , M .

3. Results3.1. Stationary probability in special cases

We start by brieﬂy reviewing some deﬁnitions. A directed network is strongly connected if thereexists a directed path from u to v and from v to u for any nodes u and v . We say that a network isaperiodic if the greatest common divisor of the length of all the closed directed paths is equal to .Because most empirical networks are aperiodic unless they are bipartite networks by construction,we assume aperiodicity throughout this paper.A node2vec random walk on a strongly connected aperiodic ﬁnite network with state space E induces a unique positive probability vector q ∗ = ( q ∗ , . . . , q ∗ M ) , where q ∗ j is the stationaryprobability on directed edge e j ( j = 1 , , . . . , M ), such that q ∗ T = q ∗ . (3)3enote p ∗ = ( p ∗ , . . . , p ∗ N ) , where p ∗ i is the stationary probability at node i ( i = 1 , . . . , N ). Proba-bility vectors p ∗ and q ∗ are related by p ∗ i = (cid:88) e j ∈ Ee j (1)= i q ∗ j . (4)In particular, if the network is undirected and the random walk is simple (i.e., α = β = γ ), oneobtains q ∗ = 12 M (1 , ..., . (5)Therefore, for a simple random walk on undirected networks, we recover the well-known resultgiven by p ∗ i = d i M , (6)where d i is the weighted degree, which is called the node strength, of node i .We say that a network is simple if it is unweighted, undirected, and free of self-loops andmultiple edges. Non-backtracking random walks on a simple ﬁnite network with degree d i ≥ i = 1 , . . . , N ) have the same stationary distribution as the simple random walk [9]. Here wepresent a slight generalization of this result stated as follows: Theorem 1.

For a node2vec random walk on a simple ﬁnite network, the stationary distributionis the same as that for the simple random walk if β = γ , α > . In other words, it is given by Eq.(5). Therefore, the stationary distribution for nodes is given by Eq. (6).Proof. Let β = γ . In this case, we do not have to distinguish whether or not edges ( v t − , v t ) , ( v t − , v t +1 ) , and ( v t , v t +1 ) form a triangle. Therefore, the transition probability matrix is given by T i,j =  αα +( d ej (0) − β if e i (1) = e j (0) and e i (0) = e j (1) , βα +( d ej (0) − β if e i (1) = e j (0) and e i (0) (cid:54) = e j (1) , . (7)It is straightforward to verify that T has a left eigenvector = (1 , . . . , , such that T = .Because of the uniqueness of the Perron-Frobenius vector, the stationary distribution is given byEq. (5).We remark that Theorem 1 allows nodes with degree .We now examine how symmetry in the network constrains the stationary distribution of thenode2vec random walk. Consider a network G ( V, E ) and its corresponding adjacency matrix A ,where G can be directed or undirected, and weighted or unweighted. An automorphism π ofnetwork G is a permutation of the nodes that preserves the adjacency of the nodes [13, 14, 15, 16].In other words, automorphism π : V → V is a bijection that satisﬁes A ij = A π ( i ) π ( j ) , for any i, j = 1 , . . . , N . Two nodes, denoted by v and v (cid:48) , are said to be automorphically equivalent if there4s an automorphism that maps one node to the other, i.e., π ( v ) = v (cid:48) [13, 14]. A vertex-transitivenetwork is an undirected network in which any pair of nodes is automorphically equivalent [15, 17]. Theorem 2.

If nodes u and v are automorphically equivalent in undirected network G ( V, E ) ,then they have the same stationary probability of being visited by a node2vec random walker, i.e., p ∗ u = p ∗ v .Proof. Let π be an automorphism of G . Let E = { e , e , . . . , e M } be an ordered set of thedirected edges in the undirected network G , in which each undirected edge ( u, v ) ∈ E is du-plicated as directed edges ( u, v ) ∈ E and ( v, u ) ∈ E . Deﬁne a permutation of E by φ ( E ) = { φ ( e ) , φ ( e ) , . . . , φ ( e M ) } , where a directed edge φ ( e i ) := ( π ( e i (0)) , π ( e i (1))) for i = 1 , . . . , M .Because φ ( e i ) ∈ E and φ ( e i ) (cid:54) = φ ( e j ) if i (cid:54) = j , set φ ( E ) is also an ordered set of the directed edgesin G . Therefore, φ is a permutation of E .First, we show that φ is an automorphism of the directed weighted network G , of which theset of nodes is given by E , and the set of edges is speciﬁed by the weighted adjacency matrix, T ,given by Eq. (2). For arbitrary e i , e j ∈ E , ordered pair ( φ ( e i ) , φ ( e j )) is an edge of G if and only if ( e i , e j ) is an edge of G , because π ( e i (1)) = π ( e j (0)) if and only if e i (1) = e j (0) . We also obtain T e i ,e j = T φ ( e i ) ,φ ( e j ) ∝  αw e i (1) e j (1) if e i (1) = e j (0) and e i (0) = e j (1) , equivalently , if π ( e i (1)) = π ( e j (0)) and π ( e i (0)) = π ( e j (1)) ,βw e i (1) e j (1) if e i (1) = e j (0) and ( e i (0) , e j (1)) ∈ E, equivalently , if π ( e i (1)) = π ( e j (0)) and ( π ( e i (0)) , π ( e j (1))) ∈ E,γw e i (1) e j (1) if e i (1) = e j (0) and ( e i (0) , e j (1)) (cid:54)∈ E, equivalently , if π ( e i (1)) = π ( e j (0)) and ( π ( e i (0)) , π ( e j (1))) (cid:54)∈ E, . (8)Therefore, φ is an automorphism of G . Note that, in Eq. (8), we used, for example, e i rather than i to refer to the row and column of T to avoid an abuse of notation.Second, we show that automorphically equivalent nodes in G have the same stationary prob-ability of the random walk whose transition probability matrix is given by T . To show this, let T (cid:48) be the weighted adjacency matrix of G when the rows and columns are reordered as φ ( E ) = { φ ( e ) , . . . , φ ( e M ) } . Because φ is an automorphism, we obtain T e i ,e j = T φ ( e i ) ,φ ( e j ) = T (cid:48) e i ,e j , (9)for any i, j = 1 , . . . , M . Let q ∗ and (cid:101) q ∗ be the stationary probability of the random walk whosetransition probability matrix is given by T and T (cid:48) , respectively. Because T = T (cid:48) , we obtain q ∗ = (cid:101) q ∗ , i.e., q ∗ e i = q ∗ φ ( e i ) , i = 1 , . . . , M .Finally, assume that u ∈ V and v ∈ V are automorphically equivalent in G and connected byan automorphism π , i.e., v = π ( u ) . For any directed edge e i incoming to u , i.e., e i (1) = u , directededge φ ( e i ) is incoming to v because φ ( e i ) = ( π ( e i (0)) , π ( e i (1))) = ( π ( e i (0)) , v ) . Because φ isan automorphism of G , we obtain q ∗ e i = q ∗ φ ( e i ) . Because this argument holds true for any pair of e i ∈ E incoming to u and the corresponding edge incoming to v , we use Eq. (4) to conclude that p ∗ u = p ∗ v . Corollary 1.

If network G is vertex-transitive, p ∗ i = 1 /N for all nodes. able 1: Properties of the empirical networks. Network

N M

Dolphin 62 159Enron 143 623Vole 51 105Coauthorship 379 914Jazz 198 2742Email 1133 5451

The relaxation speed of the random walk is governed by the second largest eigenvalue of T inmodulus [2, 4, 18]. The spectral gap deﬁned by − | λ | , where λ is the second largest eigenvalueof T in modulus, quantiﬁes the relaxation speed. A large spectral gap implies a fast convergence.A node2vec random walk is speciﬁed by three parameters α , β , and γ . Because only theratio among α , β , and γ speciﬁes the transition probabilities, we set γ = 1 . Note that we arenot interested in the case γ = 0 because it implies that the walker always backtracks or visitsthe neighbor of the previously visited node without exploring a node different from v t − or itsneighbor. In this section, we examine relaxation time of node2vec random walks on empirical andsynthetic networks. We study node2vec random walks on six empirical networks. Basic properties of the data sets areshown in Table 1. All the networks are treated as unweighted and undirected networks. All thedata sets are downloaded at [19].The dolphin network is a social network, in which nodes are the bottlenose dolphins, and anedge occurs if there is a frequent association between two bottlenose dolphins [20]. Enron EmailData set was collected and prepared by the CALO (A Cognitive Assistant that Learns and Orga-nizes) project [21]. Each node represents a manager or an employee of the Enron Corporation.There is an edge between two nodes if there is at least one email exchanged between the two indi-viduals. The voles network is one of the wild vole networks gathered in Kielder Forest on theEnglish–Scottish border around [22]. Each node denotes a vole. An edge is present if twovoles were caught in at least one common trap. The coauthorship network represents coauthor rela-tionships between authors who published papers on network science up to 2006 [23]. The originaldata set has nodes, and we only use the largest connected component. The jazz network isconstructed based on collaboration between jazz musician bands [24]. Each node denotes a band.Two nodes are adjacent if they have a musician in common anytime between 1912 and 1940. Theemail network is gathered from University at Rovira i Virgili in Tarragona, Spain, and contains1669 users [25]. Each node represents an email address. An edges occurs between two nodes ifthere is an email communication between them at least once. Among the 1669 nodes, 1133 of6 igure 2: Spectral gap for node2vec random walks on empirical networks. (a) Dolphin. (b) Enron. (c) Vole. (d)Coauthorship. (e) Jazz. (f) Email. them belongs to the largest connected component, which we use in the following analysis.Figure 2 shows the numerically calculated spectral gap for the different empirical networkswhen we vary the α and β values while keeping γ = 1 . The ﬁgure suggests that spectral gaplargely decreases as α or β increases for all the networks. The global maximum value of thespectral gap is obtained near ( α, β ) = (0 , . Therefore, smaller α and β values, which imply alarger probability of exploring the network without backtracking or visiting common neighborsof the presently visited node and the last visited node, accelerate relaxation. In Figs. 2(d), 2(e),and 2(f), the spectral gap is small for excessively small β even when α is relatively large. It is7robably because a tiny β value compels the random walker to leave local neighbors of a node,such as a community, before it sufﬁciently explores the neighborhood with a breadth-ﬁrst samplingmechanism. Empirical networks are heterogeneous in terms of the node’s degree and local abundance in tri-angles. Therefore, the stationary probability depends on the α and β values given γ = 1 , unless β = 1 . Therefore, the result that a small α and β largely accelerates the exploration of node2vecrandom walkers may partly rely on the change in the stationary probability as α or β changes.To exclude this possibility, in this section and Section 3.2.3, we consider model networks whosestationary probability does not depend on α or β . Speciﬁcally, in this section we consider anextended ring network shown in Fig. 3(a). It is the Watts-Strogatz network with the rewiring prob-ability equal to 0 [26]. As the ﬁgure indicates, each node has degree k = 4 , and all the nodesare automorphically equivalent to each other. Therefore, Theorem 2 implies that owing to sym-metry induced by the vertex-transitivity of the network, the stationary probability of the node2vecrandom walk is given by p ∗ = /N regardless of the values of α , β , and γ . Figure 3: Schematic of the extended ring network and the method to label its directed edges. (a) Extended ring networkwith N = 20 . (b) The corresponding labeling method for its directed edges. The nodes and the corresponding directededges are labeled counterclockwise. To analyze the spectral gap, given k × k matrices B i , where i = 1 , , . . . , n , we deﬁne the kn × kn block circulant matrix bcirc( B , B , . . . , B n ) by bcirc( B , B , . . . , B n ) :=  B B · · · B n − B n B n B B · · · B n − ... B n B . . . ... B . . . . . . B B B · · · B n B  . (10)8onsider the extended ring network and the set of directed edges E . Note that there are M =4 kN directed edges in E . We order the directed edges in E as illustrated in Fig. 3(b). Then, thetransition probability matrix T is block circulant and is given by T = bcirc(0 , A, B, , . . . , , C, D ) , (11)where A = 1 α + 2 β + 1  β α β

10 0 0 0  , (12) B = 1 α + β + 2  α β  , (13) C = 1 α + β + 2  β α  , (14)and D = 1 α + 2 β + 1  β α β  . (15)We let ρ j = e i πjN (16)denote the N th roots of , where i is the imaginary unit and j = 0 , , , ..., N − . Then, we deﬁne × matrices H j = Aρ j + Bρ j + Cρ N − j + Dρ N − j , (17)9here j = 0 , , . . . , N − . In particular, H = A + B + C + D (18)has a right eigenvector (cid:62) = (1 , , , (cid:62) corresponding to eigenvalue . Theorem 3 in Ref. [27]guarantees that spec( T ) = N − (cid:91) j =0 spec( H j ) , (19)where spec( · ) denotes the spectrum of the matrix, i.e. the set of all its eigenvalues (also see Ref.[16]).Equation (19) allows us to calculate spec( T ) , and therefore the spectral gap of T , by calculatingthe spectrum of N matrices of size . This method reduces the time for computing the spectralgap from O ( N ) to O ( N ) . The method can be generalized to the k -regular extended ring withoutdifﬁculty, where k is an even number larger than .The spectral gap of T for the -regular extended ring network with N = 100 nodes is shownin Fig. 4. The ﬁgure indicates that the spectral gap is large when α and β are small. However, thespectral gap is not the largest when α and β are the smallest. These results are roughly consistentwith the results for the empirical networks shown in Section 3.2.1. When α and β are both ex-tremely small, the random walker has to go clockwise or counterclockwise for a long time beforechanging the direction. We consider that the spectral gap is small when α and β are both tinybecause the walker skips to visit some nodes when unidirectionally sweeping the ring. Figure 4: Spectral gap for node2vec random walks on the extended ring network with N = 100 . Similarly, one can also semi-analytically calculate the spectral gap of the transition probability ma-trix of node2vec random walks on two-layer extended ring networks deﬁned as follows. Considera pair of extended ring network each of which has N (cid:48) nodes labeled , , . . . , N (cid:48) in the same man-ner, e.g., counterclockwise. Then, we connect the nodes with the same label in the different layersby an edge with weight w (Fig. 5(a)). We assume that the edges within each extended ring have10eight . The obtained network is an undirected weighted network with N = 2 N (cid:48) nodes. Notethat each node v has degree ; four edges in the same layer as v have weight , and the other edgeconnecting the two layers has weight w . The network is composed of two communities when w issmall. Furthermore, it can be regarded as a multilayer network with two layers under the so-calledordinal coupling [28, 29, 30].Consider the node2vec random walk on this network. For example, as the ﬁrst-order randomwalk on the M directed edges, the transition probability from e to e N +4 in Fig. 5(b) is given by T , N +4 = γ/ ( αw + 4 γ ) , and that from e to e is given by T , = γw/ ( α + 2 β + γ + γw ) .Because the network is vertex-transitive, Theorem 2 implies that the stationary probability p ∗ = /N . To analyze the spectral gap of this network, we label the N directed edges as shownin Fig. 5(b). Figure 5: Schematic of the two-layer extended ring network. (a) Two-layer extended ring network with N (cid:48) = 10 . Thesolid and dashed lines represent the edges with weight and w , respectively. (b) Labeling convention for its directededges. The transition probability matrix T is a block circulant matrix given by T = bcirc( M , M ) =  M M M M  , (20)where N (cid:48) × N (cid:48) matrices M and M are themselves block circulant matrices. Matrices M and M are given by M = bcirc(0 , A, B, , . . . , , C, D ) , (21)and M = bcirc( E, , . . . , , (22)11here A = 1 α + 2 β + w + 1  β α β w  , (23) B = 1 α + β + w + 2  α β w  , (24) C = 1 α + β + w + 2  β α w  , (25) D = 1 α + 2 β + w + 1  β α β w  , (26) E = 1 αw + 4  αw  . (27)12heorem 3 in Ref. [27] yields spec( T ) = spec( M + M ) ∪ spec( M − M ) . (28)We deﬁne H j = E + Aρ j + Bρ j + Cρ N − j + Dρ N − j (29)and G j = − E + Aρ j + Bρ j + Cρ N − j + Dρ N − j , (30)where ρ j is given by Eq. (16). Because M + M and M − M are block circulant, one obtains spec( M + M ) = N − (cid:91) j =0 spec( H j ) , (31)and spec( M − M ) = N − (cid:91) j =0 spec( G j ) . (32)Therefore, the spectrum of T is given by spec( T ) = N − (cid:91) j =0 [spec( H j ) ∪ spec( G j )] . (33)Similar to the case of mono-layer extended ring networks, this method enables practical com-putation of the spectrum and the spectral gap for two-layer extended ring networks of various sizesand can be easily generalized to two-layer k -regular extended ring networks. Equation (33) impliesthat one can reduce the computation time from O ( N ) to O ( N ) .Numerically calculated spectral gaps for the two-layer extended ring networks with N = 200 nodes are shown in Fig. 6 for various α and β values and four values of w . We ﬁnd that back-tracking (i.e., large α ) slows down mixing for all the w values. When w is small, the spectral gapincreases as α or β decreases (Figs. 6(a), 6(b) and 6(c)). These results are consistent with theresults for the empirical networks and the mono-layer extended ring network. When w is large,movements between the two layers are frequent. In this case, the spectral gap decreases as α increases, whereas it is relatively insensitive to β within the range of β values that we have ex-plored (Fig. 6(d)). In this situation, a random walker that visits more neighbors within the samelayer by the breadth-ﬁrst sampling mechanism (i.e., large β ) mixes roughly as fast as a walker thatfrequently switches the layer (i.e., small β ).Last, Fig. 6 indicates that the spectral gap is not monotonic in terms of w for any given α and β values. When w is small (Fig. 6(a)), walkers ﬁnd it difﬁcult to transit from one layer to theother, which poses a bottleneck of diffusion. The spectral gap is the largest (i.e., relaxation is thefastest) for an intermediate value of w ( w = 0 . among the four values of w ; Fig. 6(b)). When w islarger (Figs. 6(c) and 6(d)), the diffusion is decelerated presumably because exploration within theindividual layer is not enough relative to intra-layer moves. This deceleration result is opposite tothe previous result that strong inter-layer coupling makes the spectral gap larger than for randomwalks conﬁned to the individual layers for simple random walks [31]. The difference may beascribed to the different types of random walks employed in these studies, i.e., simple randomwalks in Ref. [31] and node2vec random walks in the present study.13 igure 6: Spectral gap for node2vec random walks on the two-layer extended ring network with N (cid:48) = 100 . (a) w = 0 . . (b) w = 0 . . (c) w = 1 . (d) w = 10 . In this section, we provide an analysis that is different from the spectral gap with the aim ofsupporting our main claim that diffusion accelerates with small α and β values. The voter modelis a linear stochastic model of collective opinion formation, where each node in the network hasone of the two opinions, denoted by A and B [32]. At least in ﬁnite networks, the consensus ofopinion A and that of B are the only absorbing states. The duality relationship guarantees thatthe mean time to consensus is given by the mean time to coalescence of N coalescing randomwalkers deployed on each node of the edge-reversed network into one walker [2, 4, 32, 33]. Thereare two random walkers just before all the N walkers coalesce into one walker. Therefore, inthis section, we evaluate the mean time to coalescence of two node2vec random walkers as analternative measure of speed of diffusion.We consider a weighted network composed of two cliques each of which has N (cid:48) = N/ nodes,where the two cliques are connected by one edge with weight w , which we call the bridge (Fig.7). We refer to the two nodes that are incident to the bridge as portal nodes. Unless w is extremelylarge, this network is composed of well distinguished two communities such that diffusion needsa long time when N is large. Because the two portal nodes are automorphically equivalent and soare the N − non-portal nodes, the stationary probability for a single node2vec random walker is14iven by p ∗ i =  N (cid:104) N (cid:48) − w · α +( N (cid:48) − β + wαw + N (cid:48) − (cid:105) if node i is a non-portal node , N (cid:104) N (cid:48) − w · α +( N (cid:48) − β + wαw + N (cid:48) − + 1 (cid:105) if node i is a portal node , (34)where N = N ( N (cid:48) − w · α +( N (cid:48) − β + wαw + N (cid:48) − + 2 . Note that p ∗ i ≈ /N for all nodes when w = o (1) .The state of two coalescing node2vec random walkers is described by the currently visitednode and the last visited node of each walker. In every time step, we update the position of oneof the two walkers using the link dynamics rule [34, 35]. In other words, we select one of the twowalkers with the equal probability (i.e., / ) and then the selected walker makes a single moveaccording to the rule of node2vec. This dynamics repeats until the two walkers meet at the samenode to coalesce.Specifying the currently visited and last visited nodes for the two walkers is equivalent tospecifying two directed edges (while the network is assumed to be undirected). By exploiting theautomorphical equivalence of the two portal nodes and that of the N − non-portal nodes, we onlyneed to distinguish the following types of the pairs of directed edges for specifying the state of thepair of the walkers. The possible states are enumerated in Table 2 and schematically shown in Fig.7. A ﬁrst level of classiﬁcation of the pair of directed edges is whether they are in the same ordifferent cliques, or on the bridge. Owing to the symmetry, if the two directed edges are containedin the same clique, we do not need to know which of the two cliques contains the two edges. Thereare ten such states. Alternatively, the two edges may belong to the opposite cliques. There are sixsuch states. As the third and last possibility, one of the two edges may be on the bridge. Thereare ﬁve such states. Note that it is impossible for both edges to be on the bridge because it wouldmean that the walkers coalesced in a previous time step.A second level of classiﬁcation is based on whether or not and how the two directed edgesshare a node. At this classiﬁcation level, we distinguish between four conﬁgurations, which areschematically shown in Fig. 8. First, we say that two directed edges e and e are disjoint if theydo not share a node, i.e., e (0) (cid:54) = e (0) , e (0) (cid:54) = e (1) , e (1) (cid:54) = e (0) , and e (1) (cid:54) = e (1) (Fig.8(a)). Second, e and e are divergent if e (0) = e (0) and e (1) (cid:54) = e (1) (Fig. 8(b)). Third,the two edges are said to be chasing if e (1) = e (0) and e (0) (cid:54) = e (1) , or e (0) = e (1) and e (1) (cid:54) = e (0) (Fig. 8(c)). Fourth, if e (1) = e (1) , we say that the two edges are conﬂuent (Fig.8(d)), which implies the coalescence of the two walkers.In some cases, in addition to applying the aforementioned two levels of the classiﬁcationscheme, one has to distinguish between different states depending on whether or not and howthe nodes coincide with the portal node. For example, Table 2 indicates that there are three statesfor a pair of directed edges that qualify as “same clique” (according to the ﬁrst-level classiﬁcation)and “disjoint” (second-level). The exhaustive classiﬁcation yields states excluding the coales-cent (i.e., conﬂuent) state. We use the state number from 1 through 21 to inform the row/columnindex of the transition-probability matrix. We assign state 22 to the coalescent state.Let p i ( t ) be the probability that two walkers are in state i ( i = 1 , , . . . , ) at time t and r ( t ) theprobability that the two walkers coalesce at time t . Let T CRW be the × transition probabilitymatrix derived in the Appendix, and S be the minor of T CRW that one obtains by removing its lastrow and column of T CRW corresponding to the conﬂuent state. Note that T CRW22 ,j = δ j, where δ is15 igure 7: Schematic of the 21 states of the coalescencing node2vec random walk. The coalescent state is omitted. Kronecker delta. We obtain [36] p ( t ) = p (0) S t , (35)where p ( t ) = ( p ( t ) , . . . , p ( t )) , and r ( t + 1) = p ( t ) v , (36)16 igure 8: Classiﬁcation of a pair of directed edges in the two-clique network. (a) Disjoint. (b) Divergent. (c) Chasing.(d) Conﬂuent. where v = ( T CRW1 , , . . . , T CRW21 , ) (cid:62) . The mean coalescence time (cid:104) τ (cid:105) is given by (cid:104) τ (cid:105) = ∞ (cid:88) t =1 t · r ( t )= p (0) A ( I − A ) − v = p (0) A ( I − A ) − (cid:62) . (37)We consider the two-clique network with N = 200 nodes (i.e., N (cid:48) = 100 nodes in eachclique) and three initial conditions, i.e., two walkers starting from the same clique, the oppositecliques, or either clique with probability / independently for the different walkers. Speciﬁcally,we deﬁne the initial condition under which the two walkers start from the same clique by p j (0) =1 / for j = 1 , , , , , , , , , , , , and p j (0) = 0 otherwise. The initial conditionunder which the two walkers start from the opposite cliques is deﬁned by p j (0) = 1 / for j =11 , , , , , , , , , and p j (0) = 0 otherwise. The initial condition under which thetwo walkers start from a uniformly randomly selected clique is deﬁned by p j (0) = 1 / for j =1 , . . . , .We show the mean coalescence time numerically calculated using Eq. (37) in Fig. 9 for thethree initial conditions and two values of w (i.e., w = 1 and w = 10 ). As expected, the meancoalescence time is considerably smaller if the two walkers start in the same clique (Figs. 9(a)and 9(d)) than in the opposite cliques (Figs. 9(b) and 9(e)). The results for the uniformly randominitial condition (Figs. 9(c) and 9(f)) are intermediate between the other two initial conditions.Under each initial condition, the mean coalescence time is smaller for w = 1 (Figs. 9(a)–(c))than w = 10 (Figs. 9(d)–(f)) because large w enables the two walkers to move between cliquesrelatively frequently so that they have more chances to coalesce.17 igure 9: Mean coelascence time of two node2vec random walkers on the two-clique network with N = 200 nodes.(a)–(c) w = 1 . (d)–(f) w = 10 . The two walkers are initially in the same clique ((a) and (d)), opposite cliques ((b) and(e)), or uniformly randomly selected cliques ((c) and (f)).

4. Discussion

The node2vec has been recognized as a competitive algorithm of network embedding and also in-spiring further network embedding algorithms [37, 38]. However, theoretical properties of thenode2vec random walks, which are considered to affect the performance and applicability ofnode2vec, have been underexplored. A previous study provided a theoretical foundation of thestationary probability of node2vec random walks [39]. In the present study, we have investigatedproperties of node2vec random walks with a particular focus on diffusion speed. We have shownthat diffusion measured in terms of the spectral gap and coalescence time is faster when randomwalkers are encouraged to explore the network without backtracking or visiting common neighborsof the currently visited node and the last visited node. We have conﬁrmed this conclusion for sev-eral empirical and model networks except for some cases in which the avoidance of backtrackingor visiting the common neighbors is excessive.Node2vec random walks are a second-order Markov process. Second-order Markov processeshave been shown to be a promising representation of temporal network data, as opposed to ﬁrst-order (i.e., memoryless) Markov processes [11, 12]. For temporal network data, second-orderrandom walks ﬁnd various applications such as community detection and ranking of nodes. There-fore, apart from network embedding for which the node2vec random walks are originally used [6],node2vec random walks themselves may ﬁnd applications in, for example, community detection,ranking of nodes, network search, and collaborative ﬁltering [4, 5]. For example, one may be ableto accelerate network search and sampling by setting α and β to small values. However, we havepointed out that the stationary probability depends on the parameters of node2vec random walks,i.e., α and β assuming γ = 1 (also see Ref. [39]). Therefore, applications that depend on thestationary probability have to be carefully considered; one may have to calibrate the dependenceof the stationary probability on the α and β values to realize such applications.In the analysis of the spectral gap of model networks (Section 3.2.2 and 3.2.3), we analyzed18etworks whose stationary probability is independent of α and β values. To this end, we usedvertex-transitive networks, in which all nodes are automorphically equivalent to each other. Weavoided the complete graph, which is trivially vertex-transitive, because all the triplets of nodesform a triangle such that the approximate depth-ﬁrst sampling, which is deﬁned to occur with theprobability proportional to γ , is irrelevant. Both of the vertex-transitive networks that we haveemployed have a large average path length because they are essentially one-dimensional. Thischoice allowed us to employ a theorem in Ref. [27] for conveniently calculating the spectrum ofblock circulant matrices. However, these networks do not resemble most of the empirical networksthat have a small average path length relative to the number of nodes, N [8, 26]. In fact, thereare various named vertex-transitive networks, and methods to construct vertex-transitive networkssuch as Cayley graphs are available in algebraic graph theory [15]. Analysis of the diffusionspeed in vertex-transitive and small-world networks (i.e., having a small average path length andreasonably many triangles) warrants future work. Appendix: Transition probabilities for a pair of coalescent node2vec random walkers

In this section, we list the transition probability for a pair of coalescent random walkers on thetwo-clique graph. The non-zero elements of the × transition probability matrix, T CRW , areenumerated as follows: T CRW1 , = α + ( N (cid:48) − βα + ( N (cid:48) − β , (38) T CRW1 , = T CRW1 , = T CRW1 , = T CRW2 , = T CRW4 , = T CRW4 , = T CRW5 , = T CRW11 , = βα + ( N (cid:48) − β , (39) T CRW2 , = T CRW7 , = ( N (cid:48) − β α + ( N (cid:48) − β ] , (40) T CRW2 , = T CRW3 , = T CRW7 , = α + ( N (cid:48) − β α + ( N (cid:48) − β ] , (41) T CRW2 , = T CRW6 , = T CRW8 , = T CRW12 , = T CRW16 , = T CRW19 , = α α + ( N (cid:48) − β ] , (42) T CRW2 , = T CRW2 , = T CRW3 , = T CRW7 , = T CRW7 , = T CRW7 , = T CRW8 , = T CRW9 , = T CRW12 , = T CRW14 , = T CRW17 , = β α + ( N (cid:48) − β ] , (43) T CRW3 , = α + ( N (cid:48) − β α + ( N (cid:48) − β + w ] , (44) T CRW3 , = T CRW10 , = β α + ( N (cid:48) − β + w ] , (45) CRW3 , = T CRW6 , = T CRW9 , = T CRW10 , = T CRW14 , = T CRW16 , = w α + ( N (cid:48) − β + w ] , (46) T CRW3 , = T CRW6 , = β α + ( N (cid:48) − β ] + β α + ( N (cid:48) − β + w ] , (47) T CRW4 , = ( N (cid:48) − βα + ( N (cid:48) − β , (48) T CRW4 , = T CRW5 , = T CRW13 , = αα + ( N (cid:48) − β , (49) T CRW5 , = ( N (cid:48) − βα + ( N (cid:48) − β , (50) T CRW6 , = T CRW10 , = ( N (cid:48) − β α + ( N (cid:48) − β + w ] , (51) T CRW6 , = T CRW8 , = T CRW8 , = T CRW9 , = ( N (cid:48) − β α + ( N (cid:48) − β ] , (52) T CRW6 , = α α + ( N (cid:48) − β + w ] , (53) T CRW7 , = T CRW8 , = α + β α + ( N (cid:48) − β ] , (54) T CRW9 , = α + ( N (cid:48) − β α + ( N (cid:48) − β + w ] , (55) T CRW9 , = α α + ( N (cid:48) − β ] + β α + ( N (cid:48) − β + w ] , (56) T CRW10 , = T CRW12 , = T CRW14 , = T CRW17 , = T CRW18 , = α + ( N (cid:48) − β α + ( N (cid:48) − β ] , (57) T CRW10 , = β α + ( N (cid:48) − β ] + α α + ( N (cid:48) − β + w ] , (58) T CRW11 , = α + ( N (cid:48) − βα + ( N (cid:48) − β , (59) T CRW12 , = T CRW16 , = T CRW19 , = T CRW21 , = ( N (cid:48) − β α + ( N (cid:48) − β ] , (60) CRW13 , = ( N (cid:48) − βα + ( N (cid:48) − β , (61) T CRW14 , = T CRW16 , = T CRW20 , = α + ( N (cid:48) − β α + ( N (cid:48) − β + w ] , (62) T CRW15 , = α + ( N (cid:48) − βα + ( N (cid:48) − β + w , (63) T CRW15 , = wα + ( N (cid:48) − β + w , (64) T CRW17 , = T CRW18 , = T CRW19 , = T CRW21 , = wα wα + ( N (cid:48) − , (65) T CRW17 , = T CRW19 , = T CRW20 , = N (cid:48) − wα + ( N (cid:48) − , (66) T CRW18 , = N (cid:48) − wα + ( N (cid:48) − , (67) T CRW18 , = 12[ wα + ( N (cid:48) − , (68) T CRW18 , = β α + ( N (cid:48) − β ] + 12[ wα + ( N (cid:48) − , (69) T CRW20 , = w α + ( N (cid:48) − β + w ] + wα wα + ( N (cid:48) − , (70) T CRW21 , = N (cid:48) − wα + ( N (cid:48) − , (71) T CRW21 , = α α + ( N (cid:48) − β ] + 12[ wα + ( N (cid:48) − , (72) and T CRW22 , = 1 . (73) All the other elements of T CRW are equal to . Acknowledgments

N.M. acknowledges the support provided through AFOSR European Ofﬁce (FA9550-19-1-7024 ).21 eferences ∼ ∼ ./enron/. Accessed on March 7, 2020.[22] S. Davis, B. Abbasi, S. Shah, S. Telfer, M. Begon, Spatial analyses of wildlife contact net-works, J. R. Soc. Interface 12 (2015) 20141004.[23] M. E. Newman, Finding community structure in networks using the eigenvectors of matrices,Phys. Rev. E 74 (2006) 036104.[24] P. M. Gleiser, L. Danon, Community structure in jazz, Adv. Comp. Syst. 6 (2003) 565–573.[25] R. Guimer`a, L. Danon, A. D´ıaz-Guilera, F. Giralt, A. Arenas, Self-similar community struc-ture in a network of human interactions, Phys. Rev. E 68 (2003) 065103.[26] D. J. Watts, S. H. Strogatz, Collective dynamics of ‘small-world’networks, Nature 393 (1998)440.[27] G. J. Tee, Eigenvectors of block circulant and alternating circulant matrices, New Zealand J.Math. 36 (2007) 195–211.[28] M. Kivel¨a, A. Arenas, M. Barthelemy, J. P. Gleeson, Y. Moreno, M. A. Porter, Multilayernetworks, J. Comp. Netw. 2 (2014) 203–271.[29] S. Boccaletti, G. Bianconi, R. Criado, C. I. Del Genio, J. G´omez-Garde˜nes, M. Romance,I. Sendi˜na-Nadal, Z. Wang, M. Zanin, The structure and dynamics of multilayer networks,Phys. Rep. 544 (2014) 1–122.[30] G. Bianconi, Multilayer Networks: Structure and Function, Oxford University Press, Oxford,2018.[31] S. G´omez, A. D´ıaz-Guilera, J. G ´omez-Garde˜nes, C. J. P´erez-Vicente, Y. Moreno, A. Arenas,Diffusion dynamics on multiplex networks, Phys. Rev. Lett. 110 (2013) 028701.[32] T. M. Liggett, Interacting Particle Systems, Springer, Berlin, 2012.2333] P. Donnelly, D. Welsh, Finite particle systems and infection models, Math. Proc. Camb. Phi-los. Soc. 94 (1983) 167–182.[34] T. Antal, S. Redner, V. Sood, Evolutionary dynamics on degree-heterogeneous graphs, Phys.Rev. Lett. 96 (2006) 188104.[35] V. Sood, T. Antal, S. Redner, Voter models on heterogeneous networks, Phys. Rev. E 77(2008) 041121.[36] N. Masuda, Voter model on the two-clique graph, Phys. Rev. E 90 (2014) 012802.[37] H. Cai, V. W. Zheng, K. C.-C. Chang, A comprehensive survey of graph embedding: Prob-lems, techniques, and applications, IEEE Trans. Knowl. Data Eng. 30 (2018) 1616–1637.[38] P. Goyal, E. Ferrara, Graph embedding techniques, applications, and performance: A survey,Knowledge-Based Systems 151 (2018) 78–94.[39] J. Qiu, Y. Dong, H. Ma, J. Li, K. Wang, J. Tang, Network embedding as matrix factoriza-tion: Unifying deepwalk, line, pte, and node2vec,, in: Proc. Eleventh ACM InternationalConference on Web Search and Data Mining, 2018, pp. 459–467.24 able 2: States of a pair of directed edges in two-clique networks. If the two edges are in the same clique and chasing,we assume e (1) = e (0) without loss of generality within this table. We do so to distinguish between states 8, 9, and10. Note that this convention does not apply to states 20 and 21. If one of the two edges coincides with the bridge,we assume that edge e coincides with the bridge without loss of generality within this table. This convention is todistinguish between states 17 and 18 and between states 20 and 21.State State Additional condition1 same clique disjoint No edge touches a portal node.2 same clique disjoint e (0) or e (0) , not both, is a portal node.3 same clique disjoint e (1) or e (1) , not both, is a portal node.4 same clique divergent No edge touches a portal node.5 same clique divergent e (0) = e (0) is a portal node.6 same clique divergent e (1) or e (1) , not both, is a portal node.7 same clique chasing No edge touches a portal node.8 same clique chasing e (0) is a portal node.9 same clique chasing e (0)(= e (1)) is a portal node.10 same clique chasing e (1) is a portal node.11 opposite cliques disjoint No edge touches a portal node.12 opposite cliques disjoint e (0) or e (0) , not both, is a portal node.13 opposite cliques disjoint Both e (0) and e (0) are portal nodes.14 opposite cliques disjoint e (1) or e (1) , not both, is a portal node.15 opposite cliques disjoint Both e (1) and e (1) are portal nodes.16 opposite cliques disjoint e (0) and e (1) , or e (1) and e (0) areportal nodes.17 one edge on bridge disjoint Edge e and node e (0) are in the sameclique (so, e (1) is in the other clique).18 one edge on bridge disjoint Edge e and node e (1) are in the sameclique (so, e (0) is in the other clique).19 one edge on bridge divergent e (0) = e (0) is a portal node.20 one edge on bridge chasing Edge e and node e (0) are in the sameclique (so, e (1) is in the other clique).21 one edge on bridge chasing Edge e and node e (1) are in the sameclique (so, e (0) is in the other clique).22 conﬂuentis in the other clique).22 conﬂuent