Joint Subgraph-to-Subgraph Transitions -- Generalizing Triadic Closure for Powerful and Interpretable Graph Modeling
Justus Hibshman, Daniel Gonzalez, Satyaki Sikdar, Tim Weninger
JJoint Subgraph-to-Subgraph TransitionsGeneralizing Triadic Closure for Powerful and Interpretable Graph Modeling
Justus Hibshman, ∗ Daniel Gonzalez, † Satyaki Sikdar, ‡ and Tim Weninger § University of Notre Dame – Notre Dame, Indiana 46556, USA
We generalize triadic closure, along with previous generalizations of triadic closure, under an intu-itive umbrella generalization: the Subgraph-to-Subgraph Transition (SST). We present algorithmsand code to model graph evolution in terms of collections of these SSTs. We then use the SSTframework to create link prediction models for both static and temporal, directed and undirectedgraphs which produce highly interpretable results that simultaneously match state of the art graphneural network performance. nlmhijkdefgbca a b c d ef g h i jk l m n AB FIG. 1. Edge-Addition Subgraph-to-Subgraph Transitions forUndirected Four-Node Subgraphs. The same information isdepicted in two formats ( A and B ). A: Each arrow indicatesa possible subgraph-to-subgraph transition (SST) caused byadding an edge to an undirected four-node subgraph. Thisformat helps illustrate that subgraphs may transition to (andbe transitioned to from) isomorphically distinct subgraphs. B: Each dashed missing edge indicates a possible subgraph-to-subgraph transition caused by adding the dashed edge to anundirected four-node subgraph. This format helps illustratethat each distinct edge-addition transition corresponds to anisomorphically-distinct missing edge in a subgraph.
I. INTRODUCTION
Triadic closure is a widely known, simple process formodeling the evolution and dynamics of many real worldgraph processes [1, 2]. Triadic closure’s use in the graph ∗ [email protected] † Second and third authors contributed equally to this research;[email protected] ‡ Second and third authors contributed equally to this research;[email protected] § [email protected] modeling community is due, in large part, to its abil-ity to intuitively explain commonly observed social andnatural phenomenon. For example, social balance theoryis built upon achieving consistency among individuals insocial network triads [3], and social networks commonlypredict friendship links that close the most triangles [4].In additional to triads, analysis of the evolution and dy-namics of other small subgraphs ( i.e. , graphlets, motifs,etc.) have proven to be illuminating and pleasantly inter-pretable for many graph mining and scientific tasks [5–7].To this end, researchers have generalized the concept oftriadic closure in different ways. For instance, Seshadhriet. al. considered the many different kinds of triangleclosures possible in a directed graph [8]. Yin et. al.considered something similar to Seshadhri et. al., butdid not include bidirected edges in their enumeration ofdirected triadic closure types [9]. Rossi et. al. considered“motif closures,” whereby they mean any occurrence ofa motif being formed by the adding of an edge [10].We offer an elegant generalization which encapsulatesand expands upon previous generalizations of triadic clo-sure: The Subgraph-to-Subgraph Transition (SST). Inour formulation triangle closure can be considered onespecific kind of subgraph-to-subgraph transition: open-wedge to triangle. SSTs are also a generalization of “mo-tif closures,” as a motif closure only considers the result-ing subgraph, not the beginning subgraph (see [10]); thusa single motif closure may correspond to many distinctSSTs.For example, Fig. 1 depicts all of the possible four-nodesubgraph-to-subgraph transitions in two formats: (A) asstate transitions, and (B) with added edges. Althoughnot shown in Fig. 1, our SST algorithms can handle sig-nificantly larger subgraphs (albeit with runtime implica-tions), over directed or undirected graphs, for both nodeand edge additions and deletions.When using triadic closure to model graph evolution,the essential question is: “How many triangles would thisnew edge close?” Put in the language of SSTs, we wouldask, “How many wedge-to-triangle transitions would thisnew edge cause?” Once we move beyond a single SSTsuch as wedge-to-triangle and into the world of multipleSSTs, we can then observe that any change to a graphcorresponds to a collection (i.e. a multiset) of SSTs. Con-sider the example of a growing binary tree illustrated in a r X i v : . [ c s . S I] S e p w u v ⇒⇒⇒⇒⇒ wu v A B
SST
FIG. 2. Growing a Binary Tree. The graph on the left (A)illustrates adding three new nodes to a binary tree, whereeach new node is connected by a single edge. The table on theright (B) enumerates the connected 3-to-4-node subgraph-to-subgraph transitions (SSTs) caused by adding the new nodes.Columns u , v , and w depict the number of SSTs in which eachcorresponding node in (A) is present. Fig. 2. Here the table (B) on the right shows the 3-to-4-node SSTs caused by the addition of a new node andedge. Each addition has its own “feature vector” of as-sociated SST counts, i.e. , the number of SSTs in whicheach new node and edge is present. In this small exam-ple, it is quickly evident that node v ’s connection, whichdoes not follow the binary-nature of the rest of the graph,has a distinct vector from nodes u and w . Consideringthese collections grants new power for graph modeling,because it grants an important multidimensionality.The larger the subgraphs one considers for SSTs, themore information one acquires (information-theoretically- meaningful human comprehension may decrease). Thelargest possible “subgraph” to subgraph transition onecould consider simply consists of the full old graph andthe full new graph.In the present work we define a model for SSTs ondirected and undirected graphs both with and withoutnode and/or edge properties. We provide several analysesto show that SSTs can be used to model graph evolutionand static graphs in a variety of contexts simply by fittinglinear SVM models to the SST count vectors. Notably,these models are very interpretable , yet they perform sim-ilarly to state of the art neural network models on staticand temporal link prediction tasks. We also demonstrate,via a short case study, SSTs intuitively modeling a knowngraph process.All our code is available at https://github.com/SST-Author/Subgraph-Subgraph-Transitions . II. PRELIMINARIES
Before we formally introduce our SST model we firstintroduce some preliminary notation. We define a graphin the usual way. A simple directed or undirected graphis defined as G = ( V, E ), where V is the set of ver-tices (“nodes”) and E ⊆ V × V is the set of connections(“edges”). When G is undirected, ( u, v ) ∈ E ↔ ( v, u ) ∈ E , and ( u, v ) is considered to be identical to ( v, u ). Here-after we use the convention that in undirected graphs( u, v ) = ( v, u ). a. Induced Subgraph Given a graph G = ( V, E ) anda set of nodes S ⊆ V , the induced subgraph G ( S ) isthe graph consisting only of the nodes in S . Formally, G ( S ) = ( V G ( S ) , E G ( S ) ) where V G ( S ) = S and E G ( S ) = { ( u, v ) | ( u, v ) ∈ E ∧ u, v ∈ S } . b. Node and Edge Properties A graph’s nodesand/or edges may have certain values associated withthem. For instance, if an edge indicates a road, it mighthave a speed limit value. In some cases these proper-ties may be important in how to model the graph. Inthose cases, we redefine the graph as follows: Let G =( V, E, p V , p E ) be a graph with two property functions, p V and p E . Each property function maps a node/edgeand a property name to a value. c. Isomorphisms and Automorphism Orbits Giventwo graphs G = ( V , E ), G = ( V , E ) an isomor-phism is a bijection f : V → V such that ( u, v ) ∈ E ↔ ( f ( u ) , f ( v )) ∈ E . That is, an isomorphism is a mappingof a graph’s nodes to another graph’s nodes in a waythat lines up the structures exactly. An automorphism issimply an isomorphism of a graph with itself.When including node and edge properties (e.g. G =( V , E , p V , p E ) and G = ( V , E , p V , p E )), an iso-morphism would also preserve property values. For-mally, ∀ v ∈ V . p V ( v ) = p V ( f ( v )) and ∀ ( u, v ) ∈ E . p E (( u, v )) = p E (( f ( u ) , f ( v ))).The automorphism orbit of a node v ∈ V is the set ofnodes to which v is equivalent under automorphism. For-mally, AO ( v ) = { u | u ∈ V ∧ ∃ automorphism f. f ( v ) = u } . Edges can be described as having automor-phism orbits through a similar definition: AO (( u, v )) = { ( a, b ) | ( a, b ) ∈ E ∧∃ automorphism f. f ( u ) = a ∧ f ( v ) = b } .Finding automorphism orbits in a graph is frequentlythought of in terms of matching or refining a set of “col-ors,” where nodes in the same orbit are given the samecolor and the original input graph may arbitrarily requirethat nodes be put in separate orbits by giving them dif-ferent colors [11, 12]. We use this idea in our model tofind the automorphism orbits of nodes in SSTs with nodeand/or edge properties. III. SUBGRAPH-TO-SUBGRAPHTRANSITIONS
A subgraph-to-subgraph transition T is defined to bea pair of subgraphs: T = ( G T = ( V T , E T , p V T , p ET ) , G (cid:48) T = ( V (cid:48) T , E (cid:48) T , p V (cid:48) T , p E (cid:48) T ))Thus, there are many, many possible subgraph-to-subgraph transitions (SSTs). In this work, we limit ouranalyses to incorporate SSTs meeting certain conditions.Specifically, we focus on modeling edge additions toa graph. Thus we restrict ourselves to the SSTs thatindicate the addition of an edge. Additionally, we re-quire that an SST does not include a change in propertyvalues. (The only allowed property changes are for anew edge to receive a property value. All existing nodes’or edges’ values remain unchanged in the context of theSST.) This restriction on properties is a way to simplifyour model/analyses in this work, but conceptually, theSST generalization allows for changing property valuesas well. We discuss the ways we do allow changing prop-erty values during our graph modeling in Section IV B 1.Lastly, we require that the subgraph before, the subgraphafter, or both must be connected.Notably, the code we release along with this paper in-cludes the ability to acquire SST information for edgedeletions, node additions, and node deletions. However,only edge additions are studied in the present work. A. Properties in SSTs
In addition to link prediction, we also use SSTs to cre-ate interpretable models of graph evolution. To do so,we correspond changes to a graph with certain SSTs. Ifwe allowed numeric property values, the number of dif-ferent SSTs would explode combinatorically. To addressthis difficulty, our modeling algorithms require that eachproperty be treated as one of the following:1.
Class Trait:
A class trait is a property which hasa (ideally small) set of possible class values, whereno ordering on the values is required.2.
Rank Trait:
A rank trait is a property which re-quires that the possible values are totally ordered(e.g. numbers). When labeling an SST with ranktraits, our modeling algorithm does not use the rawvalues of the property but rather for each subgraph,the nodes (or edges) are sorted and the ranks ofthe nodes (or edges) are used rather than the rawproperty values. So for example, instead of listingraw PageRank values of the nodes in a subgraph( e.g. , PageRanks: (cid:104) . , . , . , . (cid:105) ), the sub-graph would be encoded just with the relative or-dering of those values ( e.g. “PageRank Ordering: (cid:104) , , , (cid:105) ”). IV. SST GRAPH MODEL
Given our formalism, we implement a graph model thatencodes and uses SSTs to model graph evolution. Thisgraph model has three distinct modules: • A “Transition Labeler,” which takes a before andafter subgraph and produces a canonical ( i.e. ,automorphism-invariant) label for that subgraph-to-subgraph transition. • A “Transition Counter,” which takes a change to agraph (an edge addition, edge deletion, node addi-tion, or node deletion) and enumerates all the SSTsinduced by that graph change. • Interpretable Static and Temporal Link Predictorswhich make use of the Transition Counter informa-tion.In this section, we describe the details of each module.
A. The Transition Labeler
To correctly identify a subgraph-to-subgraph transi-tion, we need to label it in a way that maps all iso-morphically equivalent SSTs to the same label ( i.e. , a“canonical label”). To do this, we use an adaption of theWeisfeiler-Lehman isomorphism algorithm [13], (a pro-cess also known as “Color Refinement”) which allowsedge (and node) properties to be incorporated as “colors”[14]. This algorithm provides a canonical node order-ing for the vertices involved in the SST. The Weisfeiler-Lehman algorithm is not a complete isomorphism algo-rithm, but it is guaranteed to work on up to 9-nodegraphs (SSTs) [15].As discussed earlier, an SST can be thought of as con-sisting of a “before” subgraph and an “after” subgraph.At first glance, it may seem that to produce a label for anSST, we must compute distinct canonical labels for thebefore and after subgraphs and then combine the labels.However, as discussed in Section III, we limit our algo-rithms working with four kinds of graph changes: edgeaddition, edge deletion, node addition, and node dele-tion, and we do not include property value changes in ourSSTs. Thus, each SST we work with can be described asa single subgraph where the added/deleted edge/nodeis uniquely marked; for an example, revisit Figure 1.Thus, at a high level, the transition labeler works asfollows:1. Receive as input a graph G = ( V, E, p V , p E ), a setof nodes S ⊆ V , and a node or edge x to be added toor deleted from the subgraph induced by S . In thecase of a node addition, the edges by which the newnode first connects to the network must be includedin G . Similarly, in the case of a node deletion, theedges incident to the deleted node must be includedin G .2. Uniquely label ( i.e. color) x .3. For any rank traits (see Section III A), temporarilyreplace the property values with the relative rank-ing of those values. For example, (“Node Degrees”: (cid:104) , , , (cid:105) ) would be replaced with (“Node De-gree Ranks”: (cid:104) , , , (cid:105) ).4. Convert node and edge trait values into node andedge “colors.” Each distinct “color” corresponds toa unique combination of trait values. This ensuresthat nodes (or edges) with different property valueswill be assigned to different automorphism orbits(see Section II 0 c).5. Given the coloring discussed above, perform canon-ical color refinement on G ( S ) to obtain a canonicalnode ordering O [14].6. Use O to serialize G ( S ), coupled with the infor-mation denoting the added/deleted edge or node.This produces a canonical label string H .7. Hash string H to produce a canonical numeric la-bel, and output this label.The computational bottlenecks of the algorithm aresorting the values of any included rank traits and run-ning the color refinement algorithm. If n = V G ( S ) = | S | , m = E G ( S ) , j = the number of node traits (proper-ties) and k = the number of edge traits (properties),the ordering of rank traits and other trait processing canbe completed in O ( j n log n + k m log m ). Similarly, us-ing an algorithm developed by Berkholz et. al. whichcan produce a canonical stable coloring even for edge-colored graphs, the SST labeler can run its “augmentedWeisfeiler-Lehman” in O (( n + m ) log n ) time [14]. Notethat our implementation of color refinement is simpleralgorithmically but less efficient than Berkholz et. al.’s( O ( n )), but typically n is small enough that the differ-ence does not matter. B. The Transition Counter
To model a graph change ( e.g. , an edge addition), wewish to acquire counts of all the SSTs of a given size in-duced by the change. For the kinds of changes we model(edge addition, edge deletion, node addition, node dele-tion) all the SSTs will involve a few special nodes andtheir surrounding regions - one special node in the caseof a node addition/deletion (the added/deleted node),two in the case of an edge addition/deletion (the edge’sendpoints). We first note the one or two nodes involvedin all of the SSTs and then employ a technique known as“Reverse Search” to enumerate all k -node connected sub-graphs involving those nodes, where k is the desired SSTsize [16]. Lastly, for each of these connected subgraphs,we apply our Transition Labeler to obtain a canonicallabel for the SST. At present, we are unaware of any techniques to com-pute the SSTs more efficiently than enumeration. Com-plex combinatorial tricks allow computing of three, four,and five-node subgraphs in a graph rapidly [17, 18]. Atfirst glance it may seem that a simple solution to avoidenumeration is to efficiently count the subgraphs beforeand then after the graph change. However, while thiswould certainly produce useful information, it would notdirectly produce SSTs, since to know the SST countsone must know which subgraphs turned into which sub-graphs; recall from Figure 1 that one subgraph can of-ten transition into multiple other subgraphs. Addition-ally, our model requires that SSTs be allowed to havenode and edge property values, but the state-of-the-art subgraph counters operate on property-less graphs.Nonetheless we do expect that future researchers will cre-ate quick, combinatorial methods for counting SSTs withnode and edge properties, and we hope this paper servesas the spark that ignites that project.As it is, if we hold the SST subgraph size constantat a value k , the runtime of our Transition Counter iseffectively equivalent to the number of enumerated k -node subgraphs around the changes.
1. Trait Updaters
Our Transition Labeler forces property values to bethe same in both the “before” and “after” halves ofan SST. However our modeling system can still accom-modate changes in property values across time. Thesechanges simply are not directly shown in the SSTs. TheTransition Counter allows the user to define “Trait Up-daters” which can update property values before a setof changes is applied, just before a change’s collection ofSSTs is given labels, just after a change’s SSTs are la-beled, and after a full set of changes is applied. Theseoptions provide great flexibility, which we utilize in ourlink predictors (Sections IV C 1 and IV C 2).
C. Interpretable Link Predictors
Finally, to demonstrate the power of SSTs, we usethem to create interpretable link predictors.Our predictors train for link prediction by collectingvectors of SST counts which correspond to adding ac-tual/positive edges from training samples and vectors ofSST counts which correspond to adding randomly sam-pled non-edges; then we separate the edges’ SST vectorsfrom random non-edges’ SST vectors with a simple linearSVM. A linear SVM is certainly not the optimal modelfor prediction accuracy, but even a simple linear SVMwith SSTs as its features performs comparably to stateof the art graph neural network models and, importantly,provides a simple way to interpret its predictions: theunit vector which defines the hyperplane separating realedges from non-edges.Each component of the direction vector correspondsto a distinct SST. The magnitude of the component indi-cates the relative importance of the SST in distinguishingbetween actual edges and randomly sampled non-edges.SSTs with positive component values indicate an edgeis more likely to be real; SSTs with negative componentvalues indicate an edge is more likely to be a randomlysampled non-edge. We provide examples of interpretingSVM output in the results section.
1. Static Link Predictor
A static link predictor is given a single graph and triesto predict which edges may be missing. While SSTs areimplicitly designed to model evolving graphs, we can ap-ply them to static graphs relatively easily. To do this,we imagine each edge in the graph as having been “justadded” by some temporal process. That is, for each edge,we can ask the question, “What SSTs would be involvedif this edge was not present and then was added?” Theimagined temporal process we uncover can then predictmissing edges in terms of which edges are “most likely tobe added next.”Our code “trains” on every positive edge plus α timesas many randomly-sampled non-edges. In our experi-ments we set α = 10.In a directed graph, SSTs naturally distinguish be-tween the two endpoints of an added edge by the di-rection of the new edge. While distinguishing betweenthe two vertices being joined is not necessary, it may adduseful information. Thus, if the graph is undirected, wecreate a “trait updater” (see Section IV B 1) to allow theSSTs to distinguish between the two nodes being con-nected. Whenever the addition of an undirected edge( u, v ) is about to have its associated SSTs counted, thistrait updater compares the degrees of u and v and thenmarks them as being of “equal degree” or “higher/lesser”degree in order to distinguish them.
2. Temporal Link Predictor
A temporal link predictor operates over series of in-teractions (edges) with timestamps. In the context ofthe present work the interactions are allowed to repeatacross timestamps. Our temporal link predictor uses afraction of its training edges as the “base graph” andthen computes counts for an equal number of true edgesand randomly-sampled non-edges.To make use of the fact that edges have timestampsand may repeat, we create two edge traits and corre-sponding trait updaters (see Sections III A and IV B 1)to reflect the recency and frequency of interactions.The recency trait is a “class trait” (see Section III A)and indicates when an edge most recently occurred. Itsorts edges into four categories based on whether theedge: 1. has never occurred before (“never”).2. last occurred in the previous timestamp(“newest”).3. last occurred in the timestamp before the previous(“new”).4. last occurred at least three timestamps ago (“old”).Similarly, the frequency trait is also a class trait thatsorts edges into four categories based on whether theedge:1. has never occurred before (“0”).2. has occurred once before (“1”).3. has occurred twice before (“2”).4. has occurred three or more times before (“3+”).These traits for edges allow the SSTs to carry meaningthat is simultaneously structural and temporal.In our temporal link prediction tests (Section V D),the training/validation data is bucketed into nine times-tamps. During testing our model uses the first eighttimestamps’ worth of interactions as the “base graph”and then computes the SST vectors for the ninth times-tamp. Just using the latest edges for training has atwofold benefit: The edges being trained on and thegraph at time of training most closely resemble theedges/graph at test time, and using the latest edges onlyspeeds up training.
V. RESULTSA. Modeling Known Graph Generators
Before proceeding to show our models operating onreal-world graphs, we offer the reader a “warm-up” bydemonstrating the ability of three-node SSTs to capturethe well-known preferential attachment graph generationprocess first introduced by Barabasi and Albert [19]. Thepreferential attachment model generates a graph by cre-ating a new node and wiring it to m existing nodes, withhigher odds of connecting to a node that already hasmany edges.We generate an example preferential attachment graphwith n = 1000 and m = 2 and run the temporal link pre-dictor on the temporal sequence of edge additions. Asdiscussed earlier, the SVM yields discrimination scoresfor each SST, which we use to describe the importanceof each SST to the link prediction task. From these SSTswe see that our model captures many key aspects of thepreferential attachment process. For example, we illus-trate the top twelve in Figure 3 where the two subgraphsof an SST are combined into a single subgraph wheresource and target nodes of the new edge are indicatedby shaded nodes, and edge colors indicate their recency.A positive weight above the subgraph indicates that theSST is more likely to be associated with a real edge ad-dition than a random edge addition. A negative weightindicates the opposite.The results indicated in this example are in line withour expectations. The link predictor’s top three most -0.631 -0.49 -0.44 0.224 0.201 -0.178-0.135 0.118 -0.085 0.042 -0.033 -0.027 Source Target Never Newest New Old1 2 3 4 5 67 8 9 10 11 12
FIG. 3. Top 12 3-Node SSTs from a preferential attachmentgraph process, listed in order of decreasing importance. SSTsare combined into a single subgraph where the new edge’ssource and target nodes are highlighted in gray and dark-grayrespectively. important SSTs along with SSTs 6 and 9 all indicate thata node cannot acquire out-edges at distinct timestamps.SSTs 4, 5, and 8 indicate that nodes are pointed-to onlyafter they first point to other nodes, which is a key aspectof the preferential attachment process. Likewise, SST 10suggests that a node has a higher chance of being pointedto if it was already pointed to recently. Similarly, SSTs7 and 11 suggest that a node will not begin to point toanother node after it has been pointed at. The orderingof SSTs 4, 5, and 8 indicates that newly-formed edges aremore likely than randomly-sampled non-edges to point tonodes with older edges; this in turn indicates that nodeswith older out-edges have more in-edges. Finally, therelative lack of triangles in the top 12 SSTs suggests thattriangles are either rare, uninformative, or both.These results demonstrate the interpretable power ofSSTs to capture a well-known graph generation processwhich does not follow triadic closure. We now proceedto analyses of real-world graphs, generating both quan-titative and interpretable results from the same model.
B. Quantitative Link Prediction Metrics
Many different metrics are used to quantitatively mea-sure the performance of link prediction systems. Some ofthe most common are the area under the receiver operat-ing characteristic curve (i.e. “AUROC” or just “AUC”),and Hits at K.However, Yang et. al. argue that AUC may not bea particularly meaningful metric for link prediction, andHits at K can provide a very different picture dependingon the selected K [20]. Instead, Yang et. al. demonstratethat area under the precision recall curve (AUPR) maybe the best metric both in terms of what it representsand its discriminatory power. This is in part because ul-timately a model with both high precision and high recallis of great use, but in an imbalanced setting like link pre-diction, a model with even a very small false positive rate and high true positive rate (and thus a high AUC) canstill produce a high number of false positives compared to the number of true positives it produces, making itslink predictions of little use in a real-world setting.
Unfortunately, for area under the precision recall curve(AUPR) to be meaningful, negative test cases must notbe downsampled [20]. Otherwise, the reported AUPRvalues will be artificially high. However, letting a modelscore every possible non-existent edge can be quite time-consuming. Thus, rather than reporting link predictionresults for a whole graph, Yang et. al. recommend evalu-ating on the smaller task of link prediction between nodesa max distance of k apart, where k is some small number.For comparability to other work, we report the AUC.For completeness, we report AUPR computed on the lim-ited task of scoring all disconnected pairs of nodes (non-edges) within 3 hops and all connected node pairs (edges)that would be within 3 hops if they were to be discon-nected. a. Properly Calculating AUPR Curve Areas Areaunder the precision recall curve is often calculated viathe trapezoidal rule, which effectively performs a linearinterpolation between precision-recall points. This is in-correct, as explained by Davis and Goadrich [21], whointroduce the proper interpolation in their seminal work.This difference becomes particularly relevant whenmodels have large “gaps” in their precision recall curves.In our analysis, these gaps occurred most with the Com-mon Neighbors model [22, 23] and the Gravity Autoen-coder models [24].
C. Static Link Prediction
Next, we perform a quantitative and qualitative eval-uation on three popular networks, detailed in Table III,which are frequently used for static link prediction. Eu-core Emails is a correspondence graph from a Europeanresearch institute where an edge indicates email(s) sentbetween two researchers. Cora ML and Citeseer are pa-per citation networks where edges indicate citations be-tween papers.
TABLE III. Datasets for link prediction.
Dataset Node Count Edge Count TemporalEdgeCount S t a t i c Eu-core Emails 1,005 16,706 –Cora ML 2,708 5,278 –Citeseer 3,264 45,536 – T e m p o r a l Eu-core Temporal 986 24,929 332,334College Messages 1,899 20,296 59,835Wikipedia 100,312 746,086 1,627,472
We use an 85%/5%/10% split of the edges for training,validation, and testing respectively. Because there are notimestamps, the edges are partitioned randomly.We report results for our models with both 3-node and4-node SSTs. We compare against 2 baseline models, 4
TABLE I. Link prediction performance on the static undirected graphs. The best and second-best performing models areboldfaced and underlined respectively.
Model CiteSeer Cora ML Eu-core Emails
AUC AUPR AUC AUPR AUC AUPRCommonNeighbors 0.669 ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± Model CiteSeer (D) Cora ML (D) Eu-core Emails (D)
AUC AUPR AUC AUPR AUC AUPRCommonNeighbors 0.669 ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± state-of-the-art graph neural networks (GNNs) for undi-rected link prediction, and 1 state-of-the-art GNN fordirected link prediction. For all the GNNs, we used thedefault hyperparameters from their source code. a. Baseline Models We defined two naive baselinemethods: random and common neighbor count. Therandom baseline assigns edge predictions at random. Thecommon neighbors method predicts that the more neigh-bors two nodes share in common, the more likely thosetwo nodes are to connect [25]. Since the common neigh-bors heuristic does not directly apply to directed graphs,we count each of the four possible directed wedges con-necting two nodes for directed graphs, similar to Yin et.al. [9]. b. Graph Variational Autoencoders
Graph Varia-tional Autoencoders (GAEs) [26] have been recently de-veloped to perform deep learning on graphs in supportof tasks like link prediction and graph generation. GAEsare comprised of two parts. First an encoder that embedsa graph into a latent space by applying convolutional lay-ers to an adjacency matrix. Second, using a simple inner-product decoder, GAEs produce an adjacency matrix ofthe same dimensions as the original input, which can beused for generating a new graph or for evaluating linkprediction on the original graph. c. Linear Variational Autoencoders
In response tothe introduction of GAEs, Salha et. al. [24] questionedwhether convolutional layers are really necessary for per-forming high-quality node embeddings. Their proposedLinear Variational Autoencoders (LinearAEs) replace theconvolutional layers in GAEs with a simpler one-hop lin-ear model which performs competitively on static linkprediction. The overall behavior is similar to GAEs in that LinearAEs embed a graph’s nodes and an inner-product decoder produces a new adjacency matrix forevaluation. d. Gravity Graph Variational Autoencoders
A lim-itation of both GAEs and LinearAEs lies in their re-liance on using inner products of vectors in the latentspace for decoding. This imposes a strong restrictionon the decoded adjacency matrices, which must alwaysbe symmetric. To circumvent this limitation, with thegoal of performing directed link prediction, Salha et. al.[27] also introduced Gravity-Inspired Graph VariationalAutoencoders (GravityAE), capable of generating non-symmetric adjacency matrices using a decoder based ontaking sigmoid-activated logarithms of transformed la-tent vectors.
1. Quantitative Results
The undirected and directed static link prediction re-sults are detailed in Tables I and II respectively. TheSST-based models are consistently among the top per-formers on static link prediction.It is important to note that the graph neural networkswere trained with their default parameters; no hyperpa-rameter optimization was performed. This should makeus take the GNNs’ lower performance relative to our SSTmodels’ with a grain of salt, as our models have the ad-vantage of requiring almost no parameter tuning.
TABLE IV. Link prediction performance on the temporal directed graphs. The best and second-best performing models areboldfaced and underlined respectively.
Model College Messages Eu-core Temporal Wikipedia
AUC AUPR AUC AUPR AUC AUPRCommonNeighbors 0.594 ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± -0.23 0.177 -0.165 0.159 0.147 0.145-0.132 -0.131 0.128 0.122 -0.119 0.119 Source Target1 2 3 4 5 67 8 9 10 11 12
FIG. 4. Top Cora SSTs with Bidirected Citations – SSTs 2,3, and 6 indicate that if articles A and B mutually cite eachother, A tends to cite whatever B cites unless another article C who bi-cites with B does not. SSTs 4 and 5 indicate thatarticles are more likely to bi-cite each other if they cite thesame articles.
2. Interpretation
As a case-study in the interpretability of SSTs on real-world graphs, we analyze the four-node SSTs from theCora ML paper citation graph. Recall that the SVMeffectively orders SSTs by how strongly they indicate thatan edge is either a genuine edge or a randomly-samplednon-edge.We find that the SSTs ranked highest tend to involvebidirected edges (papers that cite each other, perhapsvia pre-prints). Sometimes these SSTs are used to pre-dict the presence/non-presence of bidirected edges; some-times they simply use nearby bidirected edges as indica-tors of single-direction links. Predicting when a bidi-rected citation edge forms is a fascinating and difficulttask but has limited applicability (Only 2.8% of connec-tions in the Cora ML graph are bidirected.). Thus, toget a sense for the SVM’s top SSTs, we look at both thetop SSTs with bidirected edges and the top SSTs with-out . These are depicted and analyzed in Figures 4 and5 respectively. The SSTs pick up intuitive aspects of acitation network as well as some intriguing results. -0.321 0.108 0.101 0.10 0.098 0.0970.088 0.082 0.082 -0.079 0.078 0.064
Source Target1 2 3 4 5 67 8 9 10 11 12
FIG. 5. Top Cora ML SSTs Without Bidirected Citations –The top SST indicates that if an edge closes a 4-cycle thatis considered a strong indicator that the edge is not genuine.Similarly, SST 10 suggests that a 3-cycle is unlikely, but notas unlikely as a 4-cycle. Other than SSTs 1 and 4, the topSSTs are positive indicators. SSTs 2 - 9, and 11 - 12 allinclude some kind of “transitivity”, that nodes which cite (orare cited by) similar articles cite each other.
D. Temporal Network Evolution
For evaluating networks’ behavior over time, we per-form future link prediction on three topologically rich,dynamic datasets, summarized in Table III. Eu-coreTemporal is a time-attributed version of the earlier Eu-core Emails dataset, incorporating timestamps on theemails. College Messages is a dynamic social networkwhere edges indicate messages between users at certaintimes. Wikipedia is a temporal hyperlink network wherethe addition of a hyperlink from one page to another isrepresented by a timestamped edge.For each network, the edges at time t indicate inter-actions at time t that can then be repeated at a latertime.Similar to the methodology used by Kasat et al, webucket the interactions into τ evenly-sized buckets [28].Since a bucket may cover multiple timestamps, an inter-action (edge) may occur multiple times in a single bucket.We squash these multiple occurrences into a single edgeand weight the interaction by its number of occurrences.In each bucket an edge’s original timestamp is replacedwith the index of that bucket. Thus we effectively havea series of τ graphs, G , ..., G τ . We train on the first τ − G τ . In our experiments we set τ = 10.Note that neither our model nor the models we compareagainst make use of the weights; they just use the topol-ogy and the timestamps. However, if desired, one couldadd a “class trait” or “rank trait” (Section III A) to ourtemporal link predictor allowing it to make use of thesevalues.Temporal link predictors are fewer in number thantheir static counterparts. We compare against one state-of-the-art graph neural network and the baselines frombefore. As in our static evaluation, we test with boththree-node SSTs and four-node SSTs.
1. Temporal Graph Neural Networks
As a state-of-the-art baseline for comparison on thetask of temporal link prediction, we rely on the Tempo-ral Graph Networks (TGNs) introduced by Rossi et. al.[29]. The specific TGN presented here, borrowed fromthe authors of the aforementioned paper, is a graph au-toencoder capable of temporally embedding a sequenceof events on a graph (e.g., node additions or deletions)using temporal graph attention layers. A Multi-LayerPerceptron decoder allows the TGN to score candidateedges with probabilities for evaluation of future link pre-diction.
2. Quantitative Results
Quantitative results are listed in Table IV. Once againour SST-based link predictors are among the top per-formers. As with our static graph results, we suggestthat these numbers be taken with a grain of salt becausewe used the GNNs’ default hyperparameters without hy-perparameter optimization. Note that we bypassed com-puting AUPR on the Wikipedia graph simply due tothe sheer size of instantiating a false test edge set of O ((10 ) ) non-edges. Ultimately, our tests demonstratethat SSTs can produce strong prediction performancewhile remaining elegant and interpretable.
3. Interpretable Temporal Results
To demonstrate the interpretability of SSTs on tem-poral graphs, we explore the three-node SSTs, on theWikipedia edge additions graph. We find that, unlikethe general assumption of triadic closure, according toour model many triangles are considered unlikely to close.It is only the triangles where certain connection combi-nations in the wedge were formed recently (indicated byour recency trait) and for the first (or maybe second)time (indicated by our frequency trait) that the wedge isquite likely to close into a triangle. See Figure 6. This isevidenced quantitatively by the fact that the three-nodeSST predictor performs much better than the CommonNeighbors heuristic on the Wikipedia graph. -0.166 -0.164 -0.124 -0.115-0.113 0.097 -0.088 0.088-0.088 -0.087 -0.081 0.081
Source Target Never Newest New Old‘0’ ‘1’ ‘2’ ‘3+’1 2 3 45 6 7 89 10 11 12
FIG. 6. Top 3-Node SSTs for Wikipedia Link Additions –Main Takeaway: Wedges only close to triangles when thewedge had recent edges ( e.g.
Newest) appearing for the firstor maybe second time (low frequency, e.g. ‘1’), ideally includ-ing an edge pointing to the target node of the new edge.
VI. CONCLUSION
We defined an elegant generalization of Triadic Clo-sure, the Subgraph-to-Subgraph Transition (SST). Thisgeneralization allowed us to use a simple classifier, theLinear SVM, to create interpretable link prediction mod-els which performed competitively with state of the artgraph neural networks. We expect that the Subgraph-to-Subgraph Transition will become a standard tool inmodeling graphs and that future research will producenew and creative ways to use and efficiently count SSTs.
ACKNOWLEDGEMENTS
This research is supported by a grant from the USNational Science Foundation ( [1] G. Bianconi, R. K. Darst, J. Iacovacci, and S. Fortunato,Physical Review E , 042806 (2014).[2] P. Klimek and S. Thurner, New Journal of Physics ,063008 (2013).[3] M. S. Granovetter, in Social networks (Elsevier, 1977)pp. 347–367.[4] L. A. Adamic and E. Adar, Social networks , 211(2003).[5] A. Paranjape, A. R. Benson, and J. Leskovec, in Pro-ceedings of the Tenth ACM International Conference onWeb Search and Data Mining (2017) pp. 601–610.[6] J. Ugander, L. Backstrom, and J. Kleinberg, in
Proceed-ings of the 22nd international conference on World WideWeb (2013) pp. 1307–1318.[7] N. Prˇzulj, Bioinformatics , e177 (2007).[8] C. Seshadhri, A. Pinar, N. Durak, and T. G. Kolda,Journal of Complex Networks , 32 (2017).[9] H. Yin, A. R. Benson, and J. Ugander, arXiv preprintarXiv:1905.10683 (2019).[10] R. A. Rossi, A. Rao, S. Kim, E. Koh, and N. Ahmed,in Companion Proceedings of the Web Conference 2020 (2020) pp. 42–43.[11] E. M. Luks, Journal of computer and system sciences ,42 (1982).[12] B. D. McKay and A. Piperno, Journal of Symbolic Com-putation , 94 (2014).[13] B. Weisfeiler and A. A. Lehman, Nauchno-Technicheskaya Informatsia , 12 (1968).[14] C. Berkholz, P. Bonsma, and M. Grohe, Theory of Com-puting Systems , 581 (2017).[15] A. Leman, Avtomatika i Telemehanika , 75 (1970).[16] D. Avis and K. Fukuda, Discrete Applied Mathematics , 21 (1996).[17] M. Jha, C. Seshadhri, and A. Pinar, in Proceedings ofthe 24th International Conference on World Wide Web (2015) pp. 495–505.[18] A. Pinar, C. Seshadhri, and V. Vishal, in
Proceedings ofthe 26th International Conference on World Wide Web (2017) pp. 1431–1440.[19] A.-L. Barab´asi and R. Albert, science , 509 (1999).[20] Y. Yang, R. N. Lichtenwalter, and N. V. Chawla, Knowl-edge and Information Systems , 751 (2015).[21] J. Davis and M. Goadrich, in Proceedings of the 23rdinternational conference on Machine learning (2006) pp.233–240.[22] E. M. Jin, M. Girvan, and M. E. Newman, Physicalreview E , 046132 (2001).[23] J. Davidsen, H. Ebel, and S. Bornholdt, Physical reviewletters , 128701 (2002).[24] G. Salha, R. Hennequin, and M. Vazirgiannis, in Eu-ropean Conference on Machine Learning and Princi-ples and Practice of Knowledge Discovery in Databases(ECML-PKDD) (2020).[25] D. Liben-Nowell and J. Kleinberg, Journal of the Amer-ican society for information science and technology ,1019 (2007).[26] T. N. Kipf and M. Welling, NIPS Workshop on BayesianDeep Learning (2016).[27] G. Salha, S. Limnios, R. Hennequin, V. A. Tran, andM. Vazirgiannis, in ACM International Conference on In-formation and Knowledge Management (CIKM) (2019). [28] H. Kasat, S. Markan, M. Gupta, and V. Pudi, in
Pro-ceedings of the Mining and Learning on Graphs workshop (2019).[29] E. Rossi, B. Chamberlain, F. Frasca, D. Eynard,F. Monti, and M. Bronstein, in