Higher-order temporal network effects through triplet evolution
TTriplet Evolution
Qing Yao, Bingsheng Chen, Tim S. Evans, Kim Christensen
Blackett Laboratory and Centre for Complexity Science, Imperial College London, SouthKensington Campus, London SW7 2AZ, United Kingdom
Abstract
We study the evolution of networks through ‘triplets’ — three-node graphlets. We develop amethod to compute a transition matrix of these triplets to describe their evolution in temporalnetworks. To identify the network dynamics’ non-pairwise interactions, we compare both arti-ficial and real-world data to a pairwise interaction model. The significant differences betweenthe computed matrix and the calculated matrix from the fitted parameters demonstrate thatnon-pairwise interactions exist for various real-world data sets. Furthermore, different entriesof the matrix differences reveal the real-world systems have different higher-order interactionpatterns which are seldomly reported in the previous researches. We then use these transitionmatrices as the basis of a link prediction algorithm. We investigate our algorithm’s performanceon four temporal networks comparing our approach against ten other link prediction methods.Our results show that higher-order interactions play a crucial role in the evolution of networksas we find our method along with two other methods based on non-local interactions, give thebest overall performance. The results also confirm the concept that the higher-order interactionpatterns, i.e., triplet dynamics, can help us understand and predict different real-world systems’evolution.
The collective behaviours of a complex system can not be merely understood and predicted byconsidering the units of the system in isolation [1, 2]. In a complex system, some connections aremade among groups of more than two participants and the interactions of these connections that arenot equal to the sum of the pairwise interactions, are called higher-order interactions. Researches haverevealed that empirical systems display higher-order interactions, for example in social systems [3],neuroscience [4, 5], biology [6] and ecology [7]. Therefore understanding the dynamics of real complexsystems with those interactions requires the description of non-pairwise interactions. A commonlyused modelling tool for complex systems is network modelling [8, 9]. The approaches to analysingthe group interactions of the networks are bipartite graph representations (or hypergraphs) [10,11], motifs statistics [12] and communities detection [13, 15]. Although they have demonstratedthe importance of special structures within the complex systems, they cannot explicitly describegroup interactions. One of the few examples of research with higher-order interactions is simplicialcomplexes [16, 17], describing higher-order interactions, and the hypergraphs are relaxing conditionsfor simplicial complexes [2]. However, the temporal measures of the higher-order interactions are stilllacking. In this paper, we introduce a method of measuring higher-order interactions in the temporalnetworks – triplet evolution as the three-node interactions are the simplest graphlet beyond pairwiseinteractions.This paper is organised in the following way: In the first section, we introduce the transitionmatrix that describes the Markovian evolution of the triplet. In the second section, we propose anull model that can demonstrate the higher-order interactions indeed existing in many real-worldnetworks. In the third part, based on the triplet evolution, we create an algorithm that can predictthe existence of the links in the temporal networks.
Three node combination is the simplest case beyond the pairwise interactions, and the current studyof three-node combination in temporal networks includes temporal motifs and simplicial complexes.1 a r X i v : . [ phy s i c s . s o c - ph ] J a n ost of the temporal motifs researches focus on the number of different types of motifs changingwhile the network is evolving. On the other hand, the study of simplicial complexes is interestedin how the fully connected triangles affect the structures of networks. However, there is a gapbetween understanding how three-node combinations will evolve and how that evolution will affectthe evolution of the whole networks. This gap motivates us to design a method to investigate whetherany non-pairwise interactions will be observed by measuring three-node dynamics. To measure the three-node dynamics, we start with temporal graphs G ( s ) with a node set V , where s indicates the time step of the graphs. Then we define a variable, M s ( u, v, w ), a node triplet,often abbreviated to M s to indicate the state of a three-node triplet ( u, v, w ) at time s . That is M s ≡ M s ( u, v, w ) is a map from a node triplet ( u, v, w ), where u, v, w ∈ V , to the induced subgraph,known as a graphlet, the maximal subgraph in G ( s ) containing the nodes u , v and w . M is the setof all the possible states of three node combinations and m i ∈ M represents the possible state in M . M s : V × V × V → M , ( u, v, w ) (cid:55)→ M s ( u, v, w ) = m i . (2.1)The choice of M is not unique, and it depends on whether we consider the characteristics of thenodes and links. If we characterise the states by the number of links among the three nodes, thereare just four distinct states in M , namely 0 , , M to represent the state set thatis charactered by the number of links. By contrast, if we want to consider the labels (identities) ofthe nodes, there are eight states in M as each link between the three pairs of nodes can either bepresent or absent. That means the links between different pairs of nodes are distinct. This stateset is represented by M . The detailed illustration is in Figure 1. Any three nodes selected from anetwork will be in one of the states in M at a given time step. We then define a transition matrix T . T ij ( s ) gives the probability of a triplet of nodes in the state m i in G ( s ) at time s becoming thetriplet m j at the next time step, s + 1, mathematically: T ij ( s ) = Pr( M s +1 = m j | M s = m i ) . (2.2)All entries of T are non-negative, T ij ( s ) ≥
0, and each row in the transition matrix satisfies a nor-malisation condition (cid:88) j T ij ( s ) = (cid:88) m j ∈M Pr( M s +1 = m j | M s = m i ) = 1, ∀ m i ∈ M . (2.3)In practice we have to use an estimate (cid:98) T ( s ) for the transition matrix T ( s ). We do this by usinga set of distinct node triplets T s ⊆ V and counting how often the associated subgraph transformsfrom triplet m i to m j . More formally we define (cid:98) T ij ( s ) = 1 k i (cid:88) ( u,v,w ) ∈T s δ ( M s +1 ( u, v, w ) , m j ) δ ( M s ( u, v, w ) , m i ) , (2.4a) k i = (cid:88) j (cid:88) ( u,v,w ) ∈T δ ( M s +1 ( u, v, w ) , m j ) δ ( M s ( u, v, w ) , m i ) (2.4b)where δ ( G , G ) = 1 (0) if graphlets G and G are isomorphic (not isomorphic). The best estimate (cid:98) T ij ( s ) of T ij ( s ) is produced if T s is the set of all possible distinct node triplets.To describe the construction of a triplet transition matrix in the simplest way, we want to startwith M = { m , m , m , m } . Note that the subscripts of T , i, j = 0 , , , T start from 0, corresponding to subscripts of statesin M , see an example of a network of 5 nodes in Figure 2.2 m m m ab cab cab c ab cab c ab cab cab c
000 100 010 001 110 101 011 111
Figure 1: The triplet states illustration of M . In the upper row, when the states are only characterisedby the number of links of the three nodes combinations, there are four possible states denoted by m i ∈ M , i = 0 , , ,
3. In the lower row, there are eight distinct states of triplets with nodes positions(identities/labels), denoted by n i ∈ M , i = 0 , . . . (a) (b) (c) (d) (e)(f) (g) (h) (i) (j) Figure 2: An illustration of how the empirical transition matrix (cid:98) T shown top centre can be computedfrom evolution of the network G ( s ) shown top left, with links at time s , to G ( s + 1) shown top right,with links at time s + 1. When the states are only characterised by the number of links of the threenodes combinations, there are four possible states being denoted by m i , i = 0 , , ,
3. The subscript i represents the number of the links i in the graphlet m i , so m is the triplet with zero links, m is a triplet with one link and so on. There are (cid:0) (cid:1) = 10 ways of choosing 3 nodes for a set of 5nodes, and the evolution of all ten triplets are shown in the remaining rows in the figure above. Forinstance, the subgraph induced by a triplet (a) is the triplet m . This triplet loses an link in thenext graphlet, so the same node triplet is now associated with the two link triplet m . This change,therefore, contributes to the (cid:98) T entry. As this is the only induced subgraph (graphlet) in the earlier G ( s ) graph which is isomorphic to the m triad, this means (cid:98) T = 1 while (cid:98) T j = 0 for j = 0 , , m triad appears twice as an induced subgraph of node tripletsin G ( s ), namely (c) and (d). These triplets have two different triads in G ( s + 1) leading to two nonzero entries in the (cid:98) T j row, (cid:98) T = (cid:98) T = 1 / N nodes there are (cid:0) N (cid:1) = N · ( N − · ( N − / N = 100 , , (cid:0) N (cid:1) =1 . · , Therefore, we will use random sampling of triplets of a sufficient amount to produce ourestimates. The estimations are detailed in Appendix A. To investigate the effectiveness of three-node interactions, we use a model whose dynamics is drivenonly by pair-wise relationships as a null model when analysing the transition matrix for artificial andreal-world networks.
The ‘pairwise model’ is a random graph model for dynamic networks [18], the evolution of the linksis based on a pairwise mechanism. In moving from one network, G ( s ) to the next network in thesequence, G ( s + 1), any pair of nodes with no existing link gains a link with probability p otherwisewith probability (1 − p ) the pair of nodes remains unconnected. Similarly, every existing link e ∈ E ( s )is removed with probability q otherwise with probability (1 − q ) the link remains. The number oflinks is not preserved in this model. This null model is a lower order description of the interactionwhich does not depend on any neighbours of each node pair.We can write down the form of the pairwise model’s transition matrix exactly as T pw . Thedetailed calculation can be found in Appendix B.If we assume the graph evolution follows this pairwise mechanism; we can estimate values ˆ p ( s )and ˆ q ( s ) for the parameters p and q respectively by looking at how links changed in the previoustime step, i.e., from G ( s ) to G ( s + 1). By looking at how existing links survive and how often newlinks appear in G ( s + 1) we estimate q and p for (cid:98) T ( s ) repectively, using ˆ q ( s ) and ˆ p ( s ), which may bewritten formally as ˆ q ( s ) = 1 − |E ( s ) ∩ E ( s + 1) ||E ( s ) | , (3.1)ˆ p ( s ) = |E ( s + 1) \ ( E ( s ) ∩ E ( s + 1)) | N ( N − / − |E ( s ) | . (3.2)This gives us our pairwise model prediction for the triplet transition matrix (cid:98) T (pw) ( s ), where wesubstitute ˆ p ( s ) from Eq. (3.2) and ˆ q ( s ) from Eq. (3.1) for p and q in T pw in Eq. (B1), that is (cid:98) T (pw) ( s ) = T pw (ˆ p ( s ) , ˆ q ( s )) . (3.3) The pairwise model captures the effects of purely pairwise node interaction. We can use this in theform (cid:98) T (pw) ( s ) in Eq. (3.3) as a benchmark to show how much higher-order information our tripletbased analysis using (cid:98) T ( s ) in Eq. (2.4) captures. One way to study this is to look at the difference∆ (cid:98) T ( s ) of the transition probabilities between the empirical triplet transition and the assumed pairwisenull model: ∆ (cid:98) T ( s ) = (cid:98) T ( s ) − (cid:98) T (pw) ( s ) , (3.4)where each entry represents the difference between the real triplet transition probability and pairwisetransition probability for different states.By using the three simple models to generate artificial temporal networks to test our approach,we show that this approach can differentiate the non-pairwise interaction model from the pairwiseinteraction models, see Appendix B.We then apply this framework with M to analyse some real data sets, see Figure 3. Thedescription of the data sets can be found in Appendix C. There are apparent differences between4 a) (b) (c) (d) Figure 3: Triplet transition matrix (cid:98) T evaluate from Eq.(2.4) (first row) and (cid:104) ∆ (cid:98) T ( s ) (cid:105) (second row) inEq.(3.4) for (a) Turkish shareholder networks (∆ t = 2yr), (b) Wikipedia Mathematicians networks(∆ t = 1yr), (c) College Message networks (∆ t = 1mo), and (d) Email networks (∆ t = 1mo). Thedescription of the data sets can be found in Appendix C.the triplet transition matrix (cid:98) T and the simple pairwise reference model of (cid:98) T (pw) , especially in theTurkish shareholder network Figure 3(a). For example, compared to the probability from the pairwisemodel, the real probability of the state becoming not connected in the next snapshot (i.e., from m i → m , i ∈ { , , , } , the first column of the transition matrix) is much less, showing that thissubgraph is very stable compared with the pairwise case in the shareholder network. Additionally,the state is much more likely to evolve to m at s + 1 (the second column of the transition matrix),which demonstrates that in many real networks, the interactions are beyond the pairwise interaction.The significance test using Z-score can be found in Appendix D. In our real data analysis, we use∆ t to denote the time unit 1 and use the time interval between network snapshots by experience ordata availability. The Figure 3(a) and Figure 3(d) Email networks hare some common patterns, i.e.∆ (cid:98) T and ∆ (cid:98) T and different behaviours in some entries, for example, ∆ (cid:98) T and ∆ (cid:98) T , indicatingthat the evolution of three nodes of a binary tree: on the one hand, it indeed shows that this type ofconnection is more likely to be closed as a triangle compared to pairwise interactions, on the otherhand, expect for transforming into a triangle, (a) is more likely to remain connected at least onelink while the (d) is more likely to be disconnected. The Figure 3(b) Wikipedia Mathematiciansnetworks and Figure 3(d) have similar difference matrices ∆ (cid:98) T yet different transition matrices (cid:98) T ( s ),suggesting that different dynamics of networks can have similar higher-order interaction patterns. The analysis of the transition matrix of triplet, reveal that non-pairwise interaction exists in networkevolution. Are these higher-order interaction patterns essential for network evolution? We want toinvestigate this question by performing link predictions for dynamic networks.
The idea behind the algorithm is that the formation of links between a node pair is the projectionfrom the triplet interactions and the transition matrix provides information about triplet evolution.Thus, the likelihood of a link appearing between a node pair can be obtained by summing up theprobabilities of the node pair having a link through triplet interactions.This idea implies two assumptions: First, we assume that at each step, one transition occurs foreach type of triplet. This constrains the largest number of transition of the time window for our5ethod. Second, we assume that the transition depends only on the last snapshot of the network,which means we assume the process is Markovian. More snapshots can be easily included in prin-ciple, but doing so will increase the complexity of the method and may decrease effectiveness if theinformation available is spread too thinly.The algorithm assigns a link appearing score to each node pair separately. We take a node pairof interest, say ( u, v ), and compute the state distribution of all ( N −
2) triplets in G ( s ) containingnodes u, v , denoting as ψ ( u, v ; s ) at time s : ψ i ( u, v ; s ) = 1 N − (cid:88) w ∈V\{ u,v } δ ( M s ( u, v, w ) , m i ) . (4.1)The transition matrix (cid:98) T ( s ) tells us about the evolution of this triplet state distribution ψ ( s ) inEq.(4.1) and the probability L β that a pair of nodes u, v has an link, β = 1, or no link, β = 0, in thegraph G ( s + 1) is given by the projection from predicted ψ ( s + 1): L β ( u, v ; s + 1) = (cid:88) i,j ψ i ( u, v ; s ) (cid:98) T ij ( s ) P (out) jβ , (4.2)where P (out) jβ is a projection vector for the triplet evolution and (cid:80) β =0 P (out) jβ = 1. With the other defini-tion (cid:80) j (cid:98) T ij = 1, and (cid:80) i ψ i ( u, v ; s ) = 1, then we have that L β is properly normalised (cid:80) β =0 L β ( u, v ) =1. It is this L β in Eq. (4.2) that we use as a score for link prediction.Take the case of M as an example, the projection from triplet distribution onto links uses P (out) = / / / /
30 1 . (4.3)The state set M can not distinguish the link in one triplet state. Take P (out)21 for example; thereare three possible ways of mapping the m graph onto our original a pair of nodes u, v . One of thesethree orientations tells us the u, v, node pair will remain without a link ( β = 0) while the other twoorientations give a prediction that there will be a link ( β = 1) between u, v, in the network at thenext time index, G ( s + 1). In order to evaluate our approach to link prediction, we will use our approach alongside severaldifferent existing methods listed in Table 1 [23, 24, 25], applying all of them to a diverse collection ofevolving networks. In general, these link prediction methods assign a similarity score between pairsof nodes (these same scores are also used in many other contexts), and then this is used to predictlinks. The basic conjecture is that the higher the similarity between a pair of nodes, the more likelywe are to find a link between these two nodes. Full details of each method can be found in thereferences given in Table 1 with a summary given in Appendix E.The various link prediction methods can be categorised based on the type of information usedto make a prediction about each node pair: those which use only local interactions probing a fixeddistance from the node pair, and those global methods which use nodes arbitrarily far from the nodepair of interest.For all the algorithms considered, each algorithm uses a previous snapshot of the networks, G ( s ),to produce a score for each pair of nodes for G ( s + 1). And the extra information recorded by TTmethod is the dynamics of the triplets. A high score indicates that a link is likely to be present inthe next snapshot G ( s + 1), while a low score indicates that the node pair is unlikely to be connectedin the next snapshot. All methods, therefore, require a way to turn the scores into predictions,essentially to define what is meant by a ‘high’ or a ‘low’ score by introducing a classification ∞ OwnMFI Matrix Forest Index [26] ∞ OwnLPI Local Path Index [27] 3 OwnRAI Resource Allocating Index [27] 2 nx AAI Adar-Academic Index [23] 2 nx CN Common Neighbour [23] 2 nx JC Jaccard Coefficient [23] 2 nx PA Preferential Attachment [23] 2 nx LLHN Local Leicht-Holme-Newman [29] 2 OwnTable 1: The link prediction methods used and their abbreviations. The length scale given indicatesthe longest path length involved in the method. Under code nx indicates NetworkX [14] routine used. threshold . Often this is done very simply by ranking the scores and using a fixed number ofthe most highly ranked node pairs to predict a link. This method is typically used only for linkaddition in which one is only trying to predict when an unlinked node pair gains a link, that is the A ij ( s ) = 0 to A ij ( s + 1) = 1 process.We are interested in more general predictions, looking at all four possible changes for node pairsfrom one snapshot in time to the next, that is all the four possible A ij ( s ) = 0 , A ij ( s + 1) = 0 , link evolution rather than link addition. In order to make these more generalpredictions for any method, we use k -means clustering methods to separate the prediction scoresproduced by each method into two classes: a high score group and a low score group of node pairs.Any node pair with a score in the high scoring group will be predicted to have a link in the nextsnapshot; node pairs in the low scoring group will be predicted to have no link.To show how this works, we give some examples of how the score from our triplet transitionmethod produces a natural split into low and high scores which is easily discovered by an automatedclustering method such as k -means, as seen in the first-row of Figure 4. Natural split is not a propertyseen in many other algorithms, for example Figure E4 in Appendix E.In our link prediction algorithm, we use M (second row in Figure 1) as the state set, treatingeach node pair distinctively. Thus, if the selected node pair is ( a, b ) in Figure 1, the P (out) j =(0 , , , , , , ,
1) while P (out) j = (1 , , , , , , ,
0) and again (cid:80) β =0 P (out) jβ = 1. With M , thealgorithm does not change the confusion matrix but increases the separation between two scoreclusters compared to using M . The increased separation distance between clusters indicates thatnot all triplet states are equally likely, and the evolution of the triplet is not random; instead, theywill evolve in some specific patterns. We have four transitions ( A ij ( s ) → A ij ( s + 1)) of node pair: 0 →
0, 0 →
0, 1 →
0, and 1 → A ij ( s + 1), so a positive result is where alink is present in snapshot ( s + 1) regardless of the state that pair started in. A false positive iswhere we predict no link for a node pair which end up with a link, while a true negative is wherewe correctly predict a link will not connect a pair of nodes. Then traditional binary classificationmeasures such as used here can be applied, and the table of predicted classes and actual classes isshown in Table 4.3. To evaluate the performance of our of Triplet Transition (TT) method, we usestandard ranking metrics (see Appendix F for more details) – Area under curve and precision and ameasure to indicate the level of changes in the network from snapshot to snapshot. The measure of7 a.1) (b.1) (c.1) (d.1)(a.2) (b.2) (c.2) (d.2) Figure 4: The histogram of scores in our triplet transition (TT) method for (a) Turkish Shareholdernetworks, (b) Wikipedia Mathematician networks, (c) College Message networks, and (d) Emailnetworks. It can be seen that in each case, the existing score separates into two clear clusters, onecluster c appears to have a ‘low’ probability that a link will exist in the next snapshot and the othercluster has ‘high’ probability. We predict that if the node pair has a score in the lower score cluster c , then a link will not exist in the next snapshot. Conversely, if a node pair exists in the higherscore cluster c , then we predict a link will exist between this node pair in the next snapshot. Thefirst row, where the histogram is plotted in blue is the clustering results for the M while the secondrow, where the histogram is plotted in green is the clustering results for the M . It shows that the M has higher the separation between two clusters than M . The results for each network are basedon samples which contain at least 2000 node-pairs with an initial link and at least 2000 more nodepairs which started without a link. Actual Classes A ij ( s + 1) = 1 A ij ( s + 1) = 0Predicted Classes A ij ( s + 1) = 1 True positive ( 01+ , − , − ) A ij ( s + 1) = 0 False negative ( 00 − , − ) True negative ( 00+ , s to s + 1 is denoted by f E ( s ), f E ( s ) = 1 − |E ( s + 1) ∩ E ( s ) ||E ( s ) | , (4.4)where E ( s ) is the link set of the network in snapshot s . This is the same as the measured probabilitythat an link disappears in the next time step in Eq.(3.1).We use a different sampling method from that used in Figure 4 to take account of the variations inthe number of connected and disconnected node pairs. In the following analysis, we predict the nodepairs uniformly randomly selected from all the possible node pairs of a whole network. We expectpredictions can capture evolving characteristics of networks and the Mathematician networks changetiny fractions which are not sufficient enough to evaluate the predictions. Therefore, we replace theMathematician networks with Hypertext networks (see Appendix C for more details of this dataset)in the following prediction analysis. Time data is sampled over different time intervals ∆ t . ForTurkish Shareholder networks, the smallest time separation is 2 years, and we consider four or moreyears too long for the evolution; for the Hypertext data, which records the short communications,we choose ∆ t as 40 and 60 min; for the College Message and Email networks, we consider that 88 ype Algorithm Shareholder Hypertext CollegeMsg Email Avg.(Time scale) (2 years) (1h) (1 mon) (1 mon) Rank[ f E ( s )] [0 .
63] [0 .
52] [0 .
93] [0 . .
27 5 0 .
19 (9) 2 0 .
11 (4) 2 . ( ) 1 2 . β = 0 .
01) 0 .
25 8 . ( ) 1 0 .
11 (4) 2 . ( ) 1 3 . .
19 9 0 .
19 (9) 2 . ( ) 1 0 . . . .
09 (4) 5 0 .
009 (2) 7 0 .
08 (3) 7 5 . .
28 4 0 .
10 (6) 4 0 .
019 (4) 5 . .
18 (7) 4 4 . .
33 2 0 .
09 (5) 5 0 .
008 (2) 8 0 .
13 (5) 5 5 . .
31 3 0 .
09 (5) 5 0 .
008 (2) 8 0 .
08 (1) 7 5 . .
25 7 0 .
08 (6) 8 0 . .
10 (4) 6 . . .
27 6 0 .
07 (6) 9 0 .
03 (1) 4 0 .
059 (6) 9 7 . .
09 10 0 .
06 (5) 11 0 . .
03 (1) 10 10 . . .
067 10 0 .
010 6 0 .
014 11 9 . Table 2: Summary of the average precision scores (on the left) and their ranks (on the right)forselected ∆ t based on k-means clustering. The errors quoted are standard deviations from all thecomputed network snapshots of the selected time scale, except for the shareholder networks whichhave one prediction for three snapshots.hour to 1 month communication frequency is reasonable. Precision
The precision score is defined as the number of times that we predict a link exists between node pairsin the later snapshot correctly (a true positive, N + N ) divided by the number of times wepredict a link, correctly (true positive) or incorrectly (false positive, N − + N − ). A high precisionscore means we can trust that links predicted by the algorithm will exist. We have S prec = N + N N + N + N − + N − , (4.5)We set a baseline for the precision, using a simple model which predicts an link in snapshot ( s + 1)for any given node pair with a probability ρ ( s ), the density of the network in snapshot s , i.e., thefraction of node pairs which have an link. In this case the precision Eq. (4.5) is equal to the densityin snapshot s , S prec , base = ρ ( s ), see (F10) in Appendix F.A very low precision score for baseline method indicates that the links existing between nodepairs are not random. Some of the local measurement algorithms give lower scores than the baseline,suggesting that those types of local information are less likely to drive the evolution of the networks.There are some clear patterns in our results. First, these algorithms have a similar performance onthree of the networks, the Hypertext network, the College Message Network and the Email network.Three algorithms are consistently better than the others though with similar performances relativeto one another: our Triplet Transition (TT) methods, the Katz method and the MFI method. All ofthese are probing non-local information in our networks, suggesting this is necessary to understandthe time evolution of these networks.However, the results relative to other algorithms often change when we look at precision mea-surements for the Turkish Shareholder networks. In particular, the three methods picked out by theother networks at now in a group of second-best methods. AUC
We also evaluate the methods independent of the classification threshold chosen by using the areaunder the receiver operating characteristic curve (AUC). This curve is defined for binary classifier9 a) (b)(c) (d)
Figure 5: Precision scores for the link prediction results for ten algorithms applied to four temporalnetworks constructed from real data sets: (a) Turkish Shareholder networks, (b) Hypertext networks,(c) College Message networks, (d) Email networks. Results for our TT algorithm based on transitionmatrix denoted by the triangle symbol (cid:52) . For the networks in (b), (c) and (d), we also show theresults for different time scales (window).algorithms. We defined our binary classifier at the beginning of this section, where we explainedthat the correctness of link existence regardless of the initial states. The curve is plotted using thefraction of positive results (also known as recall) which are correct against the fraction of negativeresults which incorrect. This assessment varies the classifier threshold and gives an overview of howeffective the methods in predicting the link existence and non-existence. The higher the AUC is, thebetter the overall performance regardless of the clustering method.Figure 6 shows a comparison of AUC for the ten algorithms when we apply them to node pairssampled uniformly at random from all possible node pairs. Our approach (TT) is the left-most pointin each figure. In the Hypertext networks (b) and the College Message networks (c), our TT methodhas the highest AUC though the PA, Katz, MFI and LPI algorithms perform almost as well. Forthe Turkish shareholder network (a) the AUC of our TT method is again the highest though with asimilar AUC value for various algorithms. Note that PA, Katz, and MFI are now weaker. Only forthe Email network (d) is our TT outperformed, in this case by the Katz, MFI and LPI algorithms.Overall we see some similarity in these AUC results, summarised in Table 3 as we saw for precisionin Table 2. The same three algorithms, our Triplet Transition (TT) methods, the Katz method, andthe MFI method, show similar high performance on our have a similar performance on the samethree networks, the Hypertext network, the College Message Network and the Email network. Forthese three networks, we might also pick out the PA algorithm. However, now the AUC values for theshareholder network show that our TT method continues to perform well, unlike for the precisionmeasurements. The Katz, MFI and PA methods are though in a weaker group of algorithms asmeasured by their AUC performance on the shareholder network.10 ype Algorithm Shareholder Hypertext CollegeMsg Email Avg.(Time scale) (2 years) (1 h) (1 mon) (1 mon) Rank[ f E ( s )] [0 .
63] [0 .
52] [0 .
93] [0 . . . ( ) 1 . ( ) 1 0 .
80 (2) 4 1 . β = 0 .
01) 0 .
65 8 . ( ) 1 0 .
69 (8) 2 . ( ) 1 3 . .
65 8 . ( ) 1 0 .
69 (8) 2 . ( ) 1 3 . .
69 7 0 .
69 (9) 5 0 .
69 (6) 2 . ( ) 1 3 . . .
63 (8) 6 0 .
58 (3) 6 0 .
78 (2) 5 4 . . .
63 (8) 6 0 .
58 (3) 6 0 .
78 (2) 5 4 . . .
63 (8) 6 0 .
58 (3) 6 0 .
77 (2) 8 5 . . .
62 (8) 9 0 .
58 (2) 6 0 .
77 (2) 8 6 . .
63 10 0 .
70 (7) 4 0 .
69 (6) 2 0 .
78 (1) 5 5 . .
70 6 0 .
62 (7) 9 0 .
57 (2) 10 0 .
77 (2) 8 8 . . . . . . Table 3: Summary of AUC-ROC performance, ∆ t is taken by selecting the highest AUC-ROC. TheAUC scores for each network are in the left-hand column, the rank of the score in the right-handcolumn. The errors quoted on the AUC scores are standard deviations from all the computed networksnapshots of the selected time scale expect for shareholder networks where two predictions are made.We highlight the highest result and any whose result is within the error quoted on the largest result.The baseline method for AUC is the random cluster the node pairs into two groups, and the AUC is0 . Discussion of Different Methods
The Triplet Transition (TT) proposed here considers is used to predict a link between two nodes byconsidering these alongside a third node which can be in any position in the network. In this way,this third node not only captures the higher-order interactions but also enables the method to encodeboth local and non-local information about the link of interest. We find that our method and the twoother global methods used here, Katz Index method (Katz) [23, 28, 25] and the Matrix Forest Index(MFI) [26] are generally the best for most of networks we studied, in particular for the Hypertextnetworks, the College Message networks and the Email networks. As the most successful methodshere perform better than other approaches based on local measures; this shows that in most systems,the pattern of connections depends on the broader structure of interactions. This dependence of thebehaviour of systems on structure beyond nearest neighbours is the crucial motivation for using thelanguage of networks rather than just looking at the statistics of pairs [50]. For instance, the Katzmethod counts the number of paths between each pair of nodes, probing all paths though giving lessweight to longer paths, so nearest neighbours contribute the most.The results for our Shareholder networks were a little different. In this case, predictions based onlocal measurements (paths of length two), the semi-local Local Path Index method (LPI), as well asour non-local triplet transition method outperformed the other global methods in terms of AUC, seeTable 3. The global methods also perform poorly on precision, see Table 2. An algorithm with lowprecision and high AUC, such as Jaccard Coefficient (JC), is predicting the disconnected pairs well.11 a) (b)(c) (d)
Figure 6: The area under the curve (AUC) results for four temporal networks constructed from realdata sets: (a) Turkish Shareholder networks, (b) Hypertext networks, (c) College Message networks,(d) Email networks. The results compare ten different methods of Table 1 including our TT algorithmdenoted by the ‘triangle’ symbol. For networks (b),(c) and (d), we show the results for different timescales (windows).
In this paper, we considered the temporal evolution of networks by looking at a sequence of snapshotsof each network. The network G ( s ) defined for snapshot s contains all the links present between nodesfor a certain period, ∆ t . Each snapshot s covers the period ∆ t immediately following the latest timeincluded in the previous snapshot. In some cases, we have looked at the effect of changing the sizeof the temporal windows ∆ t on our results. The main tool we use to study the network evolution isa transition matrix T of (2.2) which are derived from the evolution of three-node combinations in anetwork from one temporal snapshot to the next. These transition matrices are obtained from realdata by counting the different states of three nodes found in consecutive network snapshots.To decode the higher-order interaction patterns of network dynamics, we fitted a pairwise inter-action model to the real data and computed the transition matrices from the fitted parameters. Bycomparing the actual transition matrices found numerically with those predicted using a simple pair-wise model, as shown in Figure 4, we showed that higher-order interactions are needed to understandthe evolution of networks. For example, in the Email network, derived from the emails within anEU Institution, if three nodes are connected in the previous temporal snapshot, they are less likelyto be disconnected in the following snapshot.To show that we must look beyond the interactions of neighbours, we then designed a linkprediction algorithm based on the transition matrix T of (2.2), our Triplet Transition (TT) method.We compared our method with nine other methods as well as to simple baseline measures. Whatwe found was that on a range of different temporal networks, our Triplet Transition method was12s good as two methods based on non-local (global) information in the network, namely, the KatzIndex method [23, 28, 25] and the Matrix Forest Index (MFI) method [26]. While not always thebest on every network or every measure, these three global methods were usually better than theother methods we studied, all of which used information on paths of length two in the network.Intriguingly, the one other method that used paths of length three, Local Path Index (LPI) s LPI [27, 28], often performed well too though rarely as well as the top three global methods.Since the most successful methods in our tests were those that access non-local information,it seems that such information is essential in the evolution of most networks and therefore, it isimportant to include this in network measures. However, including information from the wholenetwork is numerically intensive and, for any reasonably sized network, the evolution of a link isunlikely to depend directly on what is happening a long way from that link. One reason why ourTransition Triplet method works well is that it does not emphasise the vast majority of the network.Most of the information in the transition matrix network is based on neighbours of one or other ofthe link of interest. For large networks, we use sampling to add the necessary global informationinto our transition matrices. The Katz index method does include information from all scales butsuppresses contributions from more distant parts of the network. The success of the transition tripletapproach suggests that no need to access all of the global information of a network in order to knowwhat is going on locally. Notably, the overall better performance of TT method reveals that theinformation of different higher-order interaction patterns can help understand and predict differentdynamics of networks.There is also another reason why our Triplet Transition method may work better than the localmethods we look at, and that is because our methods are also probing a longer time scale as well as alonger spatial scale. That is we use two snapshots, G ( s −
1) and G ( s ) in order to create the transitionmatrix T which in turn we use for the predictions in snapshot G ( s + 1). All the other methodsused here as based n information from one snapshot. So again, the success of our approach pointsto correlations over short but non-trivial time scales as being important in understanding networkevolution. In our case, the dependence of results on the time intervals can be seen in the effect of ∆ t used to define our transition matrices are important. For different data sets, we found different ∆ t gave optimal results which of course reflects inherently different timescales in the processes encodedby our different data sets. This research is supported by the Centre for Complexity science group at Imperial College London.We thank Nanxin Wei, from Imperial College London and Junming Huang, from Princeton Universityfor developing early ideas of this paper. We thank Hardik Rajpal and Fernando Rosas for valuablediscussion of the triplets in the graph. We also appreciate another colleague Hayato Goto for valuableinteroperation of machine learning methods and its application.13 eferences [1] Anderson, Philip W. “More is different.” Science 177, no. 4047 (1972): 393-396.[2] Battiston, Federico, Giulia Cencetti, Iacopo Iacopini, Vito Latora, Maxime Lucas, Alice Patania,Jean-Gabriel Young, and Giovanni Petri. “Networks beyond pairwise interactions: structure anddynamics.” Physics Reports (2020).[3] Benson, Austin R., David F. Gleich, and Jure Leskovec. “Higher-order organization of complexnetworks.” Science 353, no. 6295 (2016): 163-166.[4] Expert, Paul, Renaud Lambiotte, Dante R. Chialvo, Kim Christensen, Henrik Jeldtoft Jensen,David J. Sharp, and Federico Turkheimer. “Self-similar correlation function in brain resting-statefunctional magnetic resonance imaging.” Journal of The Royal Society Interface 8, no. 57 (2011):472-479.[5] Petri, Giovanni, Paul Expert, Federico Turkheimer, Robin Carhart-Harris, David Nutt, Peter J.Hellyer, and Francesco Vaccarino. “Homological scaffolds of brain functional networks.” Journalof The Royal Society Interface 11, no. 101 (2014): 20140873.[6] Sanchez-Gorostiaga, Alicia, Djordje Baji´c, Melisa L. Osborne, Juan F. Poyatos, and AlvaroSanchez. “High-order interactions dominate the functional landscape of microbial consortia.”Biorxiv (2018): 333534.[7] Bairey, Eyal, Eric D. Kelsic, and Roy Kishony. “High-order species interactions shape ecosystemdiversity.” Nature communications 7, no. 1 (2016): 1-7.[8] Newman, Mark. “Networks.” (2018).[9] Strogatz, Steven H. “Exploring complex networks.” nature 410, no. 6825 (2001): 268-276.[10] Guillaume, Jean-Loup, and Matthieu Latapy. “Bipartite graphs as models of complex networks.”Physica A: Statistical Mechanics and its Applications 371, no. 2 (2006): 795-813.[11] Evans, T. S., and Renaud Lambiotte. “Line graphs, link partitions, and overlapping communi-ties.” Physical Review E 80, no. 1 (2009): 016105.[12] Milo, Ron, et al. “Network motifs: simple building blocks of complex networks.” Science 298.5594(2002): 824–827.[13] Newman, Mark EJ. “Communities, modules and large-scale structure in networks.” Naturephysics 8, no. 1 (2012): 25-31.[14] Hagberg, Aric A., Daniel A. Schult and Pieter J. Swart. “Exploring network structure, dy-namics, and function using NetworkX”, in Proceedings of the 7th Python in Science Conference(SciPy2008), G´’ael Varoquaux, Travis Vaught, and Jarrod Millman (Eds), (Pasadena, CA USA),pp. 11–15, Aug 2008.[15] Fortunato, Santo, and Darko Hric. “Community detection in networks: A user guide.” Physicsreports 659 (2016): 1-44.[16] Muhammad, Abubakr, and Magnus Egerstedt. “Control using higher order Laplacians in net-work topologies.” In Proc. of 17th International Symposium on Mathematical Theory of Networksand Systems, pp. 1024-1038. Citeseer, 2006.[17] Horak, Danijela, Slobodan Maleti´c, and Milan Rajkovi´c. “Persistent homology of complex net-works.” Journal of Statistical Mechanics: Theory and Experiment 2009, no. 03 (2009): P03034.1418] Zhang, Xiao, Cristopher Moore, and Mark EJ Newman. “Random graph models for dynamicnetworks.” The European Physical Journal B 90.10 (2017): 200.[19] KONECT — The Koblenz Network Collection http://konect.cc/ .[20] Kunegis, J. “KONECT – The Koblenz Network Collection” in Proc. Int. Conf. on World WideWeb Companion, 2013, pp. 1343–1350.[21] Leskovec, Jure and Andrej Krevl, SNAP Datasets: Stanford Large Network Dataset Collection, http://snap.stanford.edu/data (2014).[22] Tore Opsahl and Pietro Panzarasa, “Clustering in weighted networks”, , 155–163 (2009). DOI: [23] Liben-Nowell, David and Kleinberg, Jon, The link-prediction problem for social networks, J.Am. Soc. Inf. Sci., 2007, 58, 1019–1031, DOI: [24] L´’u, Linyuan and Zhou, Tao, Link prediction in complex networks: A survey, Physica A, 2011,390, 1150–1170, DOI: , arXiv:1010.0725 .[25] G. Abuoda, G. D. F. Morales, and A. Aboulnaga, “Link prediction via higher-order motiffeatures,”[26] Chebotarev, Pavel and Shamis, Elena, The matrix-forest theorem and measuring relations insmall social groups, Autom. Remote Control, 1997, 58, 1505, arXiv:math/0602070 .[27] T. Zhou, L. L´’u, and Y.-C. Zhang, “Predicting missing links via local information,” The Euro-pean Physical Journal B - Condensed Matter and Complex Systems , vol. 71, no. 4, pp. 623–630,2009.[28] L. L´’u, C.-H. Jin, and T. Zhou, “Similarity index based on local paths for link prediction ofcomplex networks,”
Physical Review E , vol. 80, no. 4, p. 046122, 2009.[29] Leicht, E. A. and Holme, Petter and Newman, M. E. J., Vertex similarity in networks, Phys.Rev. E, 2006, 73, 026120, DOI: [30] Yao, Qing, Tim S. Evans, and Kim Christensen. “How the network properties of shareholdersvary with investor type and country.” PloS one 14.8 (2019).[31] KONECT, “Haggle network dataset”, 2016, http://konect.uni-koblenz.de/networks/contact [32] KONECT, “Hypertext 2009 network dataset”, 2016, http://konect.uni-koblenz.de/networks/sociopatterns-hypertext [33] J. Kunegis, KONECT – “The Koblenz Network Collection”. In Proc. Int. Conf. on World WideWeb Companion, pages 1343–1350, 2013.[34] P. Panzarasa, T. Opsahl, and K. M. Carley. “Patterns and dynamics of users’ behavior andinteraction: Network analysis of an online community.” Journal of the American Society for In-formation Science and Technology 60.5 (2009): 911-932.[35] Pietro Panzarasa and Tore Opsahl and Kathleen M. Carley, “Patterns and dynamics of users’behavior and interaction: Network analysis of an online community”, , 911–93 (2009), DOI: ,[36] Jure Leskovec and Jon Kleinberg and Christos Faloutsos, “Graph evolution: Densification andshrinking diameters”, ,1537] B. Chen, Z. L and T. Evans,”Analysis of the Wikipedia Network of Mathematicians”, arXiveprint: 1902.07622,[38] K. Yu, W. Chu, S. Yu, V. Tresp, and Z. Xu, “Stochastic relational models for discriminativelink prediction,” in Advances in neural information processing systems , pp. 1553–1560, 2007.[39] Kovanen, Lauri, et al. “Temporal motifs reveal homophily, gender-specific patterns, and grouptalk in call sequences.” Proceedings of the National Academy of Sciences 110.45 (2013): 18070-18075.[40] Newman, M. E. J., Scientific collaboration networks. I. Network construction and fundamentalresults, Phys. Rev. E, 2001, 016131, 64, DOI: ,[41] Barab´asi, Albert-Laszlo and Jeong, Hawoong and N´eda, Zoltan and Ravasz, Erzsebet and Schu-bert, Andras and Vicsek, Tamas, Evolution of the social network of scientific collaborations, Phys-ica A, 590–614, 311, 2002[42] Salton, G. & McGill, M. J., Introduction to Modern Information Retrieval, McGrawHill, 1983[43] Lada A. Adamic and Eytan Adar, Friends and Neighbors on the Web, Social Networks, 25,211–230, 2003.[44] R. R. Sarukkai, “Link prediction and path analysis using markov chains,”
Computer Networks ,vol. 33, no. 1-6, pp. 377–386, 2000.[45] A. Popescul and L. H. Ungar, “Statistical relational learning for link prediction,” in
IJCAIworkshop on learning statistical models from relational data , vol. 2003, Citeseer, 2003.[46] J. Zhu, J. Hong, and J. G. Hughes, “Using markov models for web site link prediction,” in
Proceedings of the thirteenth ACM conference on Hypertext and hypermedia , pp. 169–170, 2002.[47] M. Bilgic, G. M. Namata, and L. Getoor, “Combining collective classification and link predic-tion,” in
Seventh IEEE International Conference on Data Mining Workshops (ICDMW 2007) ,pp. 381–386, IEEE, 2007.[48] A. Clauset, C. Moore, and M. E. Newman, “Hierarchical structure and the prediction of missinglinks in networks,”
Nature , vol. 453, no. 7191, pp. 98–101, 2008.[49] M. Molloy and B. Reed, A critical point for random graphs with a given degree sequence,Random Structures and Algorithms, 6, 161–180, 1995[50] Ulrik Brandes, Garry Robins, Ann McCranie, and Stanley Wasserman, What is network sci-ence?, Network Science, 1, 1–15, 2013. DOI: ,16 ppendicesA Transition matrix estimation
The memory needed for the calculations in out Triplet Transition (TT) method scales with thenumber of combination of triplet in network, that is (cid:0) N (cid:1) = N ( N − N − / ∼ O ( N ) and thiscan be seen in Figure A1.Figure A1: Time and memory needed for generating all three node graphlet combinations against adifferent number of nodes. N = 2 n n = 6 , , , ,
10. For each n , the time and memory for each runare measured and the error bars are corresponding to standard deviation. The time used scales asexpected order (the slope of the fitted line is 3 . ± .
02) of the system size. The slope of the fittedline for the memory used is 1 . ± . (cid:98) T , weuse all three-node combinations if the number of nodes in the system is smaller than 10 ; otherwise,we sample 10 triplets chosen uniformly at random from the set of all three-node combinations. Ifthe calculated (cid:98) T is stable, we choose this sample number for the following analysis. Otherwise, wecontinue to add sample a further sample of 10 triplets until the (cid:98) T is stable. B Simple Models
The simplest null model for the evolution of the network is one in which we assign a probability p that in each time step, a pair of nodes changes from disconnected to connect with probability p . Incontrast, a connected pair becomes disconnected with probability q . We can then write down thetransition matrix in this Pairwise Null model T pw ( p, q ) (often abbreviated to T pw ) for our three-nodecombinations in terms of the set of four states M = { m , m , m , m } where m i is the configurationin which three nodes have i links between them. Namely T pw ( p, q ) = (1 − p ) p (1 − p ) p (1 − p ) p q (1 − p ) (1 − q )(1 − p ) + 2 qp (1 − p ) 2(1 − q ) p (1 − p ) + qp (1 − q ) p q (1 − p ) 2(1 − q ) q (1 − p )+ q p (1 − q ) (1 − p ) + 2 qp (1 − q ) (1 − q ) pq − q ) q q (1 − q ) (1 − q ) . (B1)Here T pw ij is the probability that three nodes connected with i nodes, in state m i , evolves in the nexttime step to a configuration m j with j edges between those three nodes. For instance, T pw03 denotesthe probability of evolving from three disconnected nodes m to state m of three fully connectednodes. The addition of an edge for each of the three edges gives a single factor of p , so T pw03 = p overall. When looking at the transition from a single edge to two edges in a triplet, the entry T pw12 ,we can do this in two ways. First, we can add one new edge with probability p , keeping the otherpair of unconnected nodes in that state, probability (1-p), and keeping the original single connectedpair in that state with probability (1 − q ). There are two ways of choosing where to add the extra17dge, so we have a factor of 2 p (1 − p )(1 − q ). However, we could also remove the existing edge andadd two new edges between the other pairs of edges edge in one of two positions, giving the secondfactor of p q seen in the entry for T pw12 in (B1). We can check that the rows sum to one, 1 = (cid:80) j T pw ij since we conserve the total number of triplets in our models.Our second null model, our “edge swap model”, we swap the ends of a pair of edges so edges( u, v ) and ( w, x ) in snapshot s are removed and are replaced by edges ( u, x ) and ( w, v ) in the nextsnapshot. This preserves the degree of every node. For each update from G ( s ) to G ( s + 1) we update20% edges.The final model, our “random walk model”, is also an edge rewiring model but it is based onhigher-order structures as we use random walks to select the new edges. The model starts with anErd˝os-R´enyi graph. The initial node for a random walker, say u , is chosen uniformly at random fromthe set of nodes. Then one of the edges from u , say the edge to a node y is chosen uniformly fromthe set of neighbours. Finally, three non-backtracking steps are made on the network starting from u and ending at a node x , in which edges are always chosen uniformly from those available excludingany edge used in the previous step of the random walk. The existing edge ( u, y ) is removed andreplaced by a new edge ( u, x ). The final graph is then created through a projection where nodes areconnected if they share a common neighbour, that is if directed edges from u to v and w to v exists,then the projected graph has an undirected link between u and v . This rewiring and projectionprocedure maintains the number of edges and nodes in the original graph but not in the projectedgraph.To produce the time evolution, we rewire 20% of the edges using this rewiring procedure and thenuse this new network as the next snapshot. In our context, our random walk model is used to producetest networks with local correlations between nodes. While motivated by real-world examples andbuilding on existing experience with the model [30], here it is used as a toy model to illustrate theapproach.We can use these three simple models to generate artificial temporal networks to test our approach.The results in terms of the transition matrix (cid:98) T derived from the artificial networks is shown inFigure B2.The behaviour of our simple “pairwise model” should be completely captured by the referencetransition matrix (cid:98) T (pw) ( s ) and, as expected, the numerical results shown in Figure B2a show nosignificant difference between numerical data (cid:98) T (pw) ( s ) and theoretical T pw .For Figure B2b, we use the artificial networks generated by the “edge swap model”. While thisinvolves two pairs of edges, so in principle is a higher-order model, in a sparse graph, the four nodesselected by a pair of randomly selected edges are unlikely to be linked by other edges. So in practice,in terms of the triplet graphlets, this model behaves much like the pairwise model and shows littledifference from that model.It is only with the networks generated using our three-step random walk that we see significantdifferences between the data and the pairwise model. This is to be expected as higher-order processeswere used to create the numerical networks. C Data Sets
In this project, we use several different datasets to produce temporal networks with different reso-lutions. The resolution of a network is the time interval used to create each snapshot G ( s ) of ournetwork. This can be done in two ways, depending on the context. In either case, the resolution canbe ‘seconds’, ‘hours’, ‘days’ or ‘months’ and should be chosen to suit the context.In the first type of temporal data, the data capture interactions between pairs of nodes whichoccur briefly on the time scale of the network resolution. A list of the times of phone calls betweenmembers of a social network would be an example of this type of data. In this case, each edge in asingle snapshot indicates that an event linking the two nodes occurred during the time interval.The second approach is where the pairwise interactions recorded in the data typically last for18 a) (b) (c) Figure B2: The triplet transition matrix (cid:98) T estimated from the artificial data. Each of the six four-by-four heat maps shows values where the rows represent one initial triplet graphlet i and the columnsrepresenting the final graphlet j . The top three heat maps show the average value of entries (cid:104) (cid:98) T ij (cid:105) .The bottom three heat maps give the values of the difference matrix (cid:104) ∆ (cid:98) T (cid:105) = (cid:98) T (pw) − (cid:98) T of (3.4)in which the numerical data is compared to the analytical form predicted from the simple pairwisemodel of (B1). Any large entries in these lower rows of heatmaps indicate higher-order effects notpresent in our simple pairwise model of (cid:98) T (pw) . The three columns of heat maps show results for thethree different artificial temporal networks created numerically using the stochastic models defined inSection B: (a) our “pairwise model”, (b) our “edge swap model”, and (c) our “random walk model”.Only the networks formed with random walks show significant higher-order effects.much longer than the resolution, but they do change slowly over time. An example of this would behyperlinks between webpages. In this second case, the snapshots are the network at one instant intime, and the interval is now the time between these snapshots.We use five different data sets which are as follows. • Wikipedia Mathematicians ( WikiMath ). Each biographical Wikipedia page of an individualmathematicians forms a node. If a hyperlink links two biographies (in either or both directions),a link is present in the network. The edges are edited by users and are both added and removedover time. Each snapshot represents the state of these webpages at one moment in time. Thedata is taken at one point in three different years, 2013,2017, and 2018, so the intervals are notconstant in this case. See [37] for further details on this dataset. • Hypertext (
Hypertext ). This is the network of face-to-face contacts of the attendees of theAssociation of Computing Machinery (ACM) Hypertext 2009 conference. In the network, anode represents a conference visitor, and an edge represents a face-to-face contact that wasactive for at least 20 seconds [32, 19, 20]. • Email (Email). This is derived from the emails at a large European research institution sentbetween October 2003 and May 2005 (18 months). Each node corresponds to an email address.An edge in a given snapshot indicates that an email was sent between the nodes in the timeinterval corresponding to that snapshot [36, 33, 21]. • College Message ( CollegeMsg ). The nodes are students, and an edge in a snapshot indicatesthat the students exchanged a message within the interval associated with that snapshot. Thedata was collected over a seven month period in 2004, see [35, 22, 21] for more details.19
Turkish Shareholder Network ( Shareholder ).The nodes are shareholders in Turkish companies. The shareholders are linked if they bothhold shares in the same company during the time interval associated with the snapshot [30].We provide a summary of the graph statistics as the following:
Dataset (Abbreviation) Nodes Edges Time Period Resolutions Source( N ) ( E ) ( T ) (∆ t )Wikipedia Mathematicians ( WikiMath ) 6049 36315 2013,2017,2018 1 and 4 years [37]Turkish Shareholder (
Shareholder ) 39901 68017 2010,2012,2014,2016 2 years [30]Hypertext (
Hypertext ) 113 5246 3 days in 2009 40,60 min [32, 19, 20]College Message (
CollegeMsg ) 1899 58911 6 months, 2004 7 days, 1 month [35, 22, 21]Institution Email (
Email ) 986 20780 1 year, 1970-1971 8 hours, 7 days,1 month [36, 33, 21]
Table C4: Summary of graph statistics.
D Significance test of the pairwise interactions
The significance of the transition of the triplet transition can be quantified by using the z -score(standard score). The z -score has been used to a qualitative measure of statistical significance ofdifferent motifs [12] or temporal motifs [39]. We apply similar procedures to compute z -scores fordifferent triplet transitions and for a transition from graphlet i (three nodes with i edges betweenthem) to graphlet j we define Z ij = (cid:104) (cid:98) T ij ( s ) (cid:105) − (cid:104) (cid:98) T (pw) ij ( s ) (cid:105) σ (pw) ij (D1)To calculate this, we generate an ensemble of R = 1000 realisations of the null model, the pairwisenull model. The values of p and q used in the calculation of (cid:98) T (pw) ij ( s ) are those inferred from one realnetwork as described in (3.2) and (3.1).Here σ (pw) ij is the standard deviation in the ij -th entry of the transition matrices (cid:98) T (pw) ( s ) obtainedfrom the simple pairwise model. The z -score Z ij shows us if the data is showing behaviour notaccounted for in our simple pairwise model. So it quantifies how likely a specific transition fromstate i to j is derived from higher-order processes not captured by simple pairwise interaction. The T (pw) ij and σ (pw) ij are the average and the standard deviation of the counts of triplet transition fromstate i to state j of a sufficiently large number of an ensemble of pairwise models, respectively. Theresults for Z ij for some of our networks are shown in Figure D3. The number of simulations R needed was found as follows. We start from R = 100 realisations, increase to 200,300 and so on. Each time we increase R we compare the difference in the results to those found with the ( R − R − R , we assume R is sufficient, andwe stop increasing R . Otherwise, we continue to increase the number of realisations until the results do not change. a) (b)(c) (d) Figure D3: Z-scores for triplet transition based on comparisons of data to the pairwise simple model.Each of the large four squares corresponds to a Z matrix (D1) for a different data source, labelledas follows: (a) Turkish shareholder network (∆ t = 2yr), (b) Mathematicians wiki pages (∆ t = 1yr),(c) College message data (∆ t = 1mo), and (d) Europe institutional email network data (∆ t = 1mo).Each large square represents a four by four grid of smaller squares. The rows (columns) are thetriplets at the earlier (later) arranged in order of size from smallest m to largest m going top tobottom (left to right). In each the colour represents Z ij on the scale given on the right of each largesquare, the top blue digits represent Z -scores, the middle black digits in brackets present standarddeviation, and the bottom grey digits in brackets gives ( N r ( m ) − N pw ) /N pw where N r gives thenumber of triplets in the real data and N pw gives the number predicted in the simple pairwise modelfor the same sized sample. 21 Other Link Prediction Methods
We have used several other measures as summarised in Table E. These methods use node similaritymeasures to make link predictions. In the following s ( u, v ) is a (similarity) score assigned to a pairof vertices u, v ∈ V where N ( u ) is the set of neighbours of vertex u , i.e. N ( u ) = { w | ( u, w ) ∈ E } . Thesimilarity scores are used to assign a probability p ( u, v ) that an edge should exist between u and v .Abbreviation Method Reference Length Scale CodeAAI Adar-Academic Index [23] 2 nx CN Common Neighbour [23] 2 nx JC Jaccard Coefficient [23] 2 nx Katz Katz [28] ∞ Own
LLHN Local Leicht-Holme-Newman [29] 2
Own
LPI Local Path Index [27] 3
Own
EE Edge Existence [Here] 1 -PA Preferential Attachment [23] 2 nx RAI Resource Allocating Index [27] 2 nx MFI Matrix Forest Index [26] ∞ Own
TT Triplet Transition [Here] ∞ Own
Table E5: Table of the link prediction methods used and their abbreviations. The length scale givenindicates the longest path length involved in the method or equivalently the largest power of theadjacent matrix involved in the method. Under code nx indicates that a NetworkX [14] routine wasused, Own indicates the authors’ own code was used. The Edge Existence (EE) approach was notinvestigated numerically but was included for the sake of comparison.The most obvious similarity measure is Edge Existence (EE) index. An edge between twovertices u and v indicates a close link, and this is the most local (one step) measure possible s EE ( u, v ) = A uv = (cid:88) e ∈E δ e, ( u,v ) . (E1)That is every edge between u , and v adds one to this index, so for a simple graph, this is the entryof the adjacency matrix. This is the simplest edge prediction method in that it predicts no changesat all but therefore one that is not very useful and we do not use it in our work. However, we willsee this score contributes to some of the other, more sophisticated methods, so it is useful to identifyit here.While not very useful, this Edge Existence (EE) index could produce some very good statistics.For instance, if very few edges are changing in each time interval, then this method would predictthe behaviour of the vast majority of edges as most remain unchanged. It would only do badly ifwe used statistics that specifically measured the success of node pairs changing their connectednessfrom one time step to the next. A good example of this would be when we are able to break thedata up into arbitrarily sized steps in time, and if we choose steps which are too small, we will getlittle change step because of this discretisation choice.The Common Neighbours (CN) method [23, 28, 25] simply scores the relationship betweentwo nodes based on the the number of neighbours they have in common s CN ( u, v ) = (cid:88) w ∈V A uw A wv = |N ( u ) ∩ N ( v ) | . (E2)This will tend to give large scores if u and/or v have high degrees.To illustrate this, we can estimate the number of common neighbours in a random graph withdegree distribution p ( k ). Suppose the vertices u and v have degree k u and k v respectively. Then22his means we are picking out a pair of stubs with probability k u k v / (4 E ) if there are E edges inthe simple graph. The probability that a pair of stubs are connected to the same nearest neighbournode of degree k is (1 / k ( k − p nn ( k ) where p nn ( k ) = kp ( k ) / (cid:104) k (cid:105) is the probability that the nearestneighbour has degree k . This gives us that the number of nearest neighbours in common in a randomgraph may be estimated to be s CN , rnd ( u, v ) ≈ k u E k v E ( (cid:104) k (cid:105) − (cid:104) k (cid:105) ) (cid:104) k (cid:105) ∝ s PA ( u, v ) . (E3)One way to take this bias in s CN ( u, v ) towards higher degree nodes is to make the score comparisonbetween the two. If we compare by looking at the difference between s CN and its expected value inthe configuration model we end up with s CN , diff ( u, v ) = s CN ( u, v ) − β (cid:48) s CN , rnd ( u, v ) = (cid:32)(cid:88) w ∈V A uw A wv (cid:33) − β k u E k v E (E4)which looks remarkably like a term in a Modularity index used for vertex clustering. On the otherhand, if we look at the fractional difference, we could use a score s CN , frac ( u, v ) = s CN ( u, v ) s CN , rnd ( u, v ) ∝ k u k v (cid:88) w ∈V A uw A wv (E5)and we’ll see similar forms in other indices below.Another way to compensate for the expected dependence of s CN on the degree of the nodes isto normalise by the total number of unique neighbours. This gives us the Jaccard Coefficient (JC) method [23, 25] based on the well known similarity measure [42] in which the likelihood thattwo nodes are linked is equal to the number of neighbours they have in common relative to the totalnumber of unique neighbours. s JC ( u, v ) = |N ( u ) ∩ N ( v ) ||N ( u ) ∪ N ( v ) | (E6)= |N ( u ) ∩ N ( v ) ||N ( u ) + |N ( v ) | − A uv − |N ( u ) ∩ N ( v ) | (E7)= s CN ( u, v ) k u + k v − s EE ( u, v ) − s CN ( u, v ) . (E8)For instance in a random graph we estimate that s JC , rnd ( u, v ) ≈ s CN , rnd ( u, v ) k u + k v − A uv − s rnd ( u, v ) (E9)which for large k , k ∼ O ( K ) gives s JC , rnd ( u, v ) ∼ O ( K ) while s CN , rnd ( u, v ) ∼ O ( K ).The Preferential Attachment (PA) method for link prediction [23, 25] is based on the ideathat the probability of a link between two vertices is related to the product of their degrees of thetwo vertices s PA ( u, v ) = |N ( u ) | . |N ( v ) | (E10)where N ( u ) is the set of neighbours of vertex u , i.e. N ( u ) = { w | ( u, w ) ∈ E } . This is proportionalto the number of common neighbours expected in the Configuration model [49] as has been used inthe context of collaboration (bipartite) collaboration graphs [40, 41, 23]. Amusingly in some otherindices, other is compared to this value as a fraction so we find the same term in the denominatorwhich highlights just how wide the approach to Link prediction can be.The Resource Allocating index (RAI) method [27] and the
Adamic-Adar Index (AAI)method [23, 25] are both based on the idea that if two vertices u and v share some ‘features’ f thatis very common in the whole network then that common feature is not a strong indicator that the23wo vertices should be linked. The converse is true if the common feature is rare, that is a goodindicator that the vertices u and v should be linked. So in general if W is a monotonically decreasingweight function we can use this on the frequency n ( f ) of the occurrence of feature f to give a genericsimilarity function of the form s W ( u, v ) = (cid:80) f ∈ u,v W ( n ( f )).In our case, we will not assume any meta-data exists, but we will look for methods that usefeatures which are based purely upon the topology of the network. For our simple graphs, the mostimportant feature two nodes can share is an edge between them, while the second most importantfeature is sharing a common neighbour. The Resource Allocating index (RAI) method and theAdamic-Adar Index (AAI) method ignore the presence or presence of a direct connection betweenvertices u and v (as captured by s EE (E1)) and the features considered for these two similarity scoresare the the common neighbours themselves w ∈ N ( u ) ∩N ( v ). One way to picture this is to rememberthat the adjacency list representation of a network, often used in practice numerically, is where foreach node we record the list of neighbours. We can picture this adjacency list of neighbours N ( u )as the ‘words’ on the ‘document’ u to connect with text analysis methods. The frequency of thisfeature simply is the degree |N ( w ) | of the neighbouring vertex w as that is the number of times thisfeature will occur in the adjacency lists of other vertices.The Resource Allocating index (RAI) method and the Adamic-Adar Index (AAI) method differin their choice of weighting function W in defining their similarity score. In Resource Allocating index(RAI) method [27] the inverse of the degree of the neighbour is used, so W ( n ( f )) ≡ / |N ( w ) | giving s RAI ( u, v ) = (cid:88) w ∈N ( u ) ∩N ( v ) |N ( w ) | . (E11)On the other hand, the Adamic-Adar Index (AAI) method [23, 25] is based on the similarityscore used to compare web pages in [43]. In this case, the inverse logarithm of the degree is used toweight the contribution of each common neighbour to the score, W ( n ( f )) ≡ / ln( |N ( w ) | ) so that s AAI ( u, v ) = (cid:88) w ∈N ( u ) ∩N ( v ) |N ( w ) | ) . (E12)All the indices mentioned above so far have been based either on the degree of the two nodes ofinterest s EE or on the properties of u , v and their common nearest neighbours w ∈ N ( u ) ∩ N ( v ). Putanother way; this is the information in the adjacency matrix A to the power one or two. A logicalstep forward is to include paths of length three and the Local Path Index (LPI) s LPI [27, 28] isan example of this where s LPI ( u, v ) = [ A + β A ] uv (E13)Here β is a real parameter where β = 0 reproduces the Common Neighbours score of (E2). Theterm dependent on β is counting the number of walks of length three that start at u and end at v .Note that the second term always contains paths that are a sequence of four distinct vertices say( u, w, x, v ), paths of length three. However, if there is already an edge between u and v the walksinclude backtracking paths such as the sequence u, w, u, v , so the second term also includes a termequal to A uv ( |N ( u ) | + |N ( v ) | ).The Katz Index (Katz) method [23, 28, 25] counts the number of paths between each pair ofvertices, where each path of length (cid:96) contributes a factor of β (cid:96) to the score. If the adjacency matrixfor the network is A then the score is simply the appropriate entry of the matrix [ I − β A ] − , s Katz ( u, v ) = ([ I − β A ] − ) uv (E14) This would be where the features f are edges. Note that this means that the contribution from any one node w to the total of all scores S RAI ( u, v ) = (cid:80) u,v s RAI ( u, v ) is half of the degree of w minus one, ( N ( w ) − /
2. Thus the RAI method still gives high de-gree nodes more weight. So another sensible suggestion is to define W ( n ( f )) = 2 /k w ( k w −
1) as then each node w ofdegree k w contributes a total of 1 to all scores. β is positive but must be less than the largest eigenvalue of the adjacency A . Note for low β and for a simple graph we have that s Katz ( u, v ) = βA uv + β (cid:88) w A uw A wv + O ( β ) for u (cid:54) = v (E15)= βs EE ( u, v ) + β s CN ( u, v ) + O ( β ) for u (cid:54) = v . (E16)Note that for low β , that is β (cid:28) n cn = (cid:104)|N ( u ) ∩ N ( v ) |(cid:105) we are dominated by the existence orotherwise of an edge between these pair of nodes, which may be a fair predictor for an edge in thenext time frame if few edges change per time frame.In a random graph the probability the two stubs from vertices u and v of degree k u and k v respectively are connected is simply k u k v / (4 E ). Comparing this to the estimate for the number ofcommon neighbours in a random graph, (E3), we estimate that in a random graph the Katz score isdominated by the existence of an edge if β (cid:28) ( (cid:104) k (cid:105) − (cid:104) k (cid:105) ) /
2. (here β = 0 . Local Leicht-Holme-Newman Index (LLHN) method [24] is based on the vertex sim-ilarity index of [29], but while this gives a specific motivation for the form, it is in the end just aspecific rescaling of the Katz index (E14), specifically s LLHN ( u, v ) = s Katz ( u, v ) s PA ( u, v ) (E17)= s Katz ( u, v ) |N ( u ) | |N ( v ) | = s Katz ( u, v ) k u k v (E18)= D − ( I − β A ) − D − (E19)where D is a diagonal matrix whose entries are equal to the degrees of the nodes, D uv = δ u,v |N ( u ) | .The motivation for using this normalisation is that |N ( u ) | |N ( v ) | is proportional to the numberof neighbours expected in the configuration model, as shown in (E3). So the Local Leicht-Holme-Newman Index is the Katz score relative to the Katz score expected for the same pair of nodes inthe configuration model.The Matrix Forest Index (MFI) method [26] is defined as s MFI ( u, v ) = [( I + L ) − ] uv , (E20)where L = D − A is the Laplacian. One way to see this forms a suitable similarity measure is toknow that if Q = ( I − L ) − , then D ij = Q ii + Q jj + Q ij + Q ji is a metric on the network and hence isa good distance measure. Similarity measures are often related to distance through a function thatensures the similarity measure increases if the distance between two vertices decreases.A final way to view the Matrix Forest Index (MFI) method is to consider a discrete-time versionof the diffusion process described by a Laplacian. If we have a vector w ( t ) whose entries representthe number of particles at every vertex at time t , then we can define a diffusion process where w ( t + 1) − w ( t ) = λ L w ( t ) . (E21)In this process, a fraction λ of the particles at each node flow down each edge in each time step(so λk particles leave each node at each step so large degree nodes to lose a larger fraction of everytime step)). The Laplacian gives the network flow into a given vertex, and the number of particlesis conserved (since (cid:80) i L ij + 0). The matrix Q = ( I − L ) − used to give the MFI score is therefore w ( t ) = [( I + L ) − ] w ( t + 1) . (E22)That is if we were to demand that at time t + 1 we had only had particles at one site u , then Q uv tells us how many of those particles were at vertex v at the previous time step. Note that this ishighly non-local. . . 25 d) Institutional Email(a) Turkish Shareholder Networks(c) College Messages(b) Wiki Mathematician Networks Figure E4: The histogram of scores in nine different methods for (a) Turkish shareholder networks,(b) Wiki Mathematician networks, (c) College messages, and (d) Emails among institutions. Noneof the above methods show the clustering behaviour.
E.1 Scores for other link prediction methods from under-sampling
We under-sampled 1000 link pairs and 1000 non-connected pairs to demonstrate that TT can natu-rally split the score into two clusters, which can be used to predict link existence in Figure 4. We alsocompute the performance of other similarity measures, which shows, apart from Katz score, non-ofthe methods show cluster behaviour.
E.2 From Node Scores to Link Predictions
One we have a similarity score for a pair of nodes, we have to turn this into a prediction. Generally,the node pairs with a high similarity score in G ( s ) will be predicted to have an edge in G ( s + 1),and low scores will lead to a no edge prediction. If we are looking at a link addition problem[44, 45, 47, 23, 38, 48, 28] or, more generally, an uncertain link problem [25], then the score is turnedinto a prediction for the node pair by using a standard machine learning approach to what is a binaryclassification problem. For instance, we remove edges (or add an edge between unconnected nodes[25]) using some examples (say 10%) to train the classifier and then the remain node pairs as usedto verify the effectiveness of the method. In the simplest method we could imagine ranking our nodepairs based on similarity scores, and then the n unconnected node pairs with the highest scores areassigned an edge, while the n lowest-scoring node pairs which are connected are predicted to havetheir edges removed. The values for n and n could be learnt as part of the training of this simpleclassifier. However, more sophisticated methods are normally used.26 Measures of Success and Baseline Scores
The notation N αβ ± is the number of node pairs that • start in state α (1 if the node pair is connected by an edge, 0 otherwise) in G ( s ), • which change to state β defined in the same way but in terms of the existence of an edgebetween the same node pair, in snapshot G ( s + 1), • for which the prediction made for that node pair was correct (+) or incorrect ( − ).So if the prediction for node pair ( i, j ) from snapshot s to snapshot ( s + 1) is P ( i, j ) then N αβ + = (cid:88) ( i,j ) δ ( α, A ij ( s )) δ ( β, A ij ( s + 1)) δ ( P ( i, j ) , A ij ( s + 1)) (F1)where δ ( a, b ) = 1 if a = b otherwise it is zero (the Kronecker delta function). For later conveniencewe also define N αβ = N αβ + + N αβ − = (cid:88) ( i,j ) δ ( α, A ij ( s )) δ ( β, A ij ( s + 1)) (F2) N · β = N β + N β = (cid:88) j δ ( β, A ij ( s + 1)) (F3)The precision score is the number of times we predict an edge to exist between node pairs inthe later snapshot correctly (a true positive) divided by the number of times we predict an edge,correctly (true positive) or incorrectly (false positive) S prec = N + N N + N + N − + N − . (F4)A high precision score means we can trust that edges predicted by the algorithm will exist.The recall is the fraction of positive identifications which were correct, so S rec = N + N N + N + N − + N − , (F5)The accuracy is the number of correct predictions (edge or no edge) in the final snapshot overthe total number of predictions S acc = N + N + N + N N + N + N + N + N − + N − + N − + N − , (F6)Our baseline model is that we predict that an edge will connect a node pair with probability ρ ( s )where this is the density of the network in snapshot s , that is the fraction of edge pairs with an edgein snapshot s where for N nodes in the network and ignoring self-loops ( i = j excluded) ρ ( s ) = (cid:80) ( i,j ) A i,j (cid:80) ( i,j ) N ( N − (cid:88) ( i,j ) A i,j . (F7)This baseline model is equivalent to just randomising the edges in snapshot s as a prediction for thenext snapshot; it doesn’t even preserve the degree of the nodes. This means that in our notation wehave that N α = ρ ( s ) N α , N α − = (1 − ρ ( s )) N α − , N α = (1 − ρ ( s )) N α − , N α − = ρ ( s ) N α − . (F8)27he baseline value of precision is S prec , base = ρ ( s ) N · ρ ( s ) N · + (1 − ρ ( s )) N · (F9)= ρ ( s ) . (F10)For recall we find the baseline value is S rec , base = ρ ( s ) N + ρ ( s ) N ρ ( s ) N + ρ ( s ) N + ρ ( s ) N + ρ ( s ) N = N · N · + N · (F11)= ρ ( s + 1) . (F12)where ρ ( s + 1) is the density of edges in the network in snapshot ( s + 1). Finally accuracy in ourbaseline model is S acc , base = ρ ( s ) N · + (1 − ρ ( s )) N · ρ ( s ) N · + (1 − ρ ( s )) N · + (1 − ρ ( s )) N · + ρ ( s ) N · (F13)= ρ ( s ) ρ ( s + 1) + (1 − ρ ( s ))(1 − ρ ( s + 1)) , (F14)where for definiteness we have written ρ = ρ ( s ).An even more naive model which gives us another reference value would be one in which wesimply predict an edge for any given node pair 50% of the time. In our notation, this is equivalentto saying that N αβ + = N αβ −−