Efficient Algorithms to Mine Maximal Span-Trusses From Temporal Graphs
EEfficient Algorithms to Mine Maximal Span-Trusses FromTemporal Graphs
Quintino F. Lotito
University of TrentoTrento, [email protected]
Alberto Montresor
University of TrentoTrento, [email protected]
ABSTRACT
Over the last decade, there has been an increasing interest intemporal graphs, pushed by a growing availability of temporally-annotated network data coming from social, biological and financialnetworks.Despite the importance of analyzing complex temporal networks,there is a huge gap between the set of definitions, algorithms andtools available to study large static graphs and the ones availablefor temporal graphs.An important task in temporal graph analysis is mining densestructures, i.e., identifying high-density subgraphs together withthe span in which this high density is observed.In this paper, we introduce the concept of ( 𝑘, Δ ) -truss (span-truss) in temporal graphs, a temporal generalization of the 𝑘 -truss,in which 𝑘 captures the information about the density and Δ cap-tures the time span in which this density holds. We then proposenovel and efficient algorithms to identify maximal span-trusses,namely the ones not dominated by any other span-truss neither inthe order 𝑘 nor in the interval Δ , and evaluate them on a numberof public available datasets. CCS CONCEPTS • Theory of computation → Graph algorithms analysis ; •
Mathematics of computing → Graph theory ; •
Informationsystems → Spatial-temporal systems ; Data mining ; Web search-ing and information discovery . KEYWORDS
Community detection, Dense structures, Graph mining, Social net-works analysis, Temporal graphs
ACM Reference Format:
Quintino F. Lotito and Alberto Montresor. 2020. Efficient Algorithms toMine Maximal Span-Trusses From Temporal Graphs. In
MLG ’20: 16th Inter-national Workshop on Mining and Learning With Graphs, August 24, 2020,San Diego, CA, USA.
ACM, New York, NY, USA, 7 pages.
Despite the fact that graph theory has been studied for centuries,in the last years there has been an explosion in the interest of the
Permission to make digital or hard copies of part or all of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for third-party components of this work must be honored.For all other uses, contact the owner/author(s).
MLG ’20, August 24, 2020, San Diego, CA, USA © 2020 Copyright held by the owner/author(s). t t t t Figure 1: A temporal graph with time-evolving communities.It is represented as a sequence of static graphs; each staticgraph is a snapshot of the temporal graph at a certain time. research community in network-related fields. This is mainly mo-tivated by the increasing interest in social networks – which canbe defined as a set of social entities (such as people, groups, andorganizations) together with the relationships or interactions be-tween them – and by a proliferating availability of network datasetscoming from online social networks (e.g., Facebook, Twitter, Insta-gram, YouTube), biological networks (e.g., molecular interactions)or financial interactions.So far, most of the work in social network analysis has focusedon static graphs. The growing availability of temporally-annotatednetwork data coming from social, biological and financial networkscreates the opportunity to fill the gap between the set of definitions,algorithms and tools available for large static graphs, and the onesavailable to analyze temporal graphs. The latter are defined asgraphs that change over time (i.e., whose edges are not continuouslyactive). However, it is not yet clear how introducing the notion oftime will affect the computational complexity of combinatorialgraph problems [9].Just to mention a few examples, temporal graph modelling andanalysis of temporal properties can have applications in sociologyand social network analysis (e.g., find voting patterns based onsocial media posts); security and distributed computing (e.g., designstrategy to contain the spread of malware in computing devices);biology (e.g., study the set of chemical reactions that occur in ahealthy organisms) [9].A property of real-world graphs is that they tend to be globallysparse but locally dense, meaning that while the entire graph issparse (i.e., vertices have a small average degree), it contains dense a r X i v : . [ c s . D S ] O c t LG ’20, August 24, 2020, San Diego, CA, USA Lotito and Montresor, et al. subgraphs (i.e., groups of vertices with a large number of linksamong each other). In general, density is an indication of relevance.Dense regions in a network may indicate high degrees of interactionand mutual similarity. In real-world applications, these regionsmay indicate characteristics like attractive forces or favourableenvironments [11].The enumeration of the dense components of a graph can eitherbe the main goal of an analysis task, or act as a preprocessing stepaiming to reduce the graph by removing sparse parts, in order toconduct more complex and time-consuming analysis [5].A number of definitions of dense structures have been proposedin literature, ranging from cliques (i.e., subgraphs in which everyvertex is adjacent to every other vertex), to some relaxations of theclique, such as 𝑘 -cores, the 𝑘 -trusses, or the 𝑘 -plexes.The previously mentioned concepts of dense structures can begeneralized to the temporal case, in which one can be interestedin mining high-density subgraphs together with the span in whichthis high density is observed. Having a set of tools to extract thesestructures enables a detailed comprehension of the network dynam-ics and can act as a building block towards more complex tasks andapplications [7].To name some examples of applications, we can rely on temporaldense structures computation to mine stories from social networks(i.e., events capturing popular attention in social media), whichcan be identified by finding a group of entities (i.e., people, loca-tions, companies or products) strongly associated for a reasonableamount of time [2]; we can mine well-acquainted individuals froma collaboration network and form successful teams; we can ana-lyze protein-interaction networks and locate protein complexesthat are densely interacting at different states, indicating possibleunderlying regulatory mechanisms [19].In this paper, we follow the approach of Galimberti et al. [7],who introduced the concept of the span-cores of a temporal graph(a temporal generalization of the 𝑘 -core dense structure), and definethe concept of ( 𝑘, Δ ) - trusses ( span-trusses ), a temporal generaliza-tion of the 𝑘 -truss, in which 𝑘 captures the information about thedensity and Δ captures the time span in which this density holds.We propose novel and efficient algorithms to discover the maximalspan-trusses of a temporal graph, i.e., the ones not dominated byany other span-truss neither in the order 𝑘 nor in the interval Δ .We conclude the paper by evaluating our contributions on anumber of public available real-world network datasets, showingthat our proposals consistently outperform the baseline proposedfor this task. 𝑘 -truss is a dense structure which considers the involvement be-tween the structures of edges and triangles. It has been introducedbased on the observation of social cohesion, where triangles playan essential role [6]. The 𝑘 -truss community model has three sig-nificant advantages: strong guarantee on cohesive structure, fewparameters and low computational cost [10]. Definition . Given a graph 𝐺 = ( 𝑉 , 𝐸 ) , a triangle in 𝐺 is a cycle of length 3. Truss decomposition3-truss 3-truss edge4-truss edge5-truss edge5-truss 4-truss
Figure 2: Example of the truss decomposition of a graph [5].
Definition . Given a graph 𝐺 = ( 𝑉 , 𝐸 ) andan edge 𝑒 ∈ 𝐸 , the support sup ( 𝑒 ) is the number of triangles that 𝑒 participates in. Definition 𝑘 -truss) . Given a graph 𝐺 = ( 𝑉 , 𝐸 ) , the 𝑘 - truss of 𝐺 , where 𝑘 ≥
2, is defined as the largest subgraph 𝑔 of 𝐺 in whichevery edge is contained in at least ( 𝑘 − ) triangles within thesubgraph, i.e., sup 𝑔 ( 𝑒 ) ≥ 𝑘 − ∀ 𝑒 ∈ 𝑔 .It is easy to see that a 𝑘 -truss is an edge-induced subgraph . Definition 𝑘 -truss) . A 𝑘 -truss 𝑇 𝑘 of a graph 𝐺 is saidto be maximal if there does not exist any other 𝑘 -truss 𝑇 𝑘 ′ suchthat 𝑘 ′ > 𝑘 . Problem . The problem of truss decom-position in a graph 𝐺 is to find the (non-empty) 𝑘 -trusses of 𝐺 forall 𝑘 [22]. Observation . Each 𝑘 -truss of a graph 𝐺 is asubgraph of the ( 𝑘 − ) -truss of 𝐺 ; for example, in 2, the 5-truss is asubgraph of the 4-truss which in turn is a subgraph of the 3-truss.An algorithm to efficiently compute the truss decomposition of astatic, unweighted, undirected graph 𝐺 = ( 𝑉 , 𝐸 ) has been proposedby Wang et al. [22]. This algorithm resorts to an in-memory trianglecounting algorithm [16] and bin sort to achieve a complexity of 𝑂 (| 𝐸 | . ) . We are given a temporal graph 𝐺 = ( 𝑉 ,𝑇, 𝜏 ) , where 𝑉 is a set ofvertices, 𝑇 = [ , , ..., 𝑡 𝑚𝑎𝑥 ] ⊆ N is a discrete-time domain, and 𝜏 : 𝑉 × 𝑉 × 𝑇 → { , } is a function defining for each pair ofvertices 𝑢, 𝑣 ∈ 𝑉 and each timestamp 𝑡 ∈ 𝑇 whether edge ( 𝑢, 𝑣 ) exists in 𝑡 .We denote 𝐸 = {( 𝑢, 𝑣, 𝑡 ) | 𝜏 ( 𝑢, 𝑣, 𝑡 ) = } the set of all temporaledges. Given a timestamp 𝑡 ∈ 𝑇 , the set of edges existing at time 𝑡 is 𝐸 𝑡 = {( 𝑢, 𝑣 ) | 𝜏 ( 𝑢, 𝑣, 𝑡 ) = } .A temporal interval Δ = [ 𝑡 𝑠 , 𝑡 𝑒 ] is contained into another tem-poral interval Δ ′ = [ 𝑡 ′ 𝑠 , 𝑡 ′ 𝑒 ] , denoted Δ ⊑ Δ ′ , if 𝑡 ′ 𝑠 ≤ 𝑡 𝑠 and 𝑡 ′ 𝑒 ≥ 𝑡 𝑒 .Given an interval Δ ⊑ 𝑇 , we denote 𝐸 Δ = ∩ 𝑡 ∈ Δ 𝐸 𝑡 the edgesexisting in all timestamps of Δ . Given an interval Δ ⊑ 𝑇 , we denote 𝐺 Δ = ( 𝑉 , 𝐸 Δ ) as the static graph with vertices V and edges 𝐸 Δ . fficient Algorithms to Mine Maximal Span-Trusses From Temporal Graphs MLG ’20, August 24, 2020, San Diego, CA, USA We define the temporal support of an edge 𝑒 over the temporalinterval Δ to be equal to the support on the graph 𝐺 Δ , denoted assup Δ ( 𝑒 ) . Definition 𝑘, Δ )-truss) . The ( 𝑘, Δ )-truss or span-truss of a tem-poral graph 𝐺 = ( 𝑉 ,𝑇, 𝜏 ) is the largest subgraph of 𝐺 Δ in whichevery edge is contained in at least ( 𝑘 − ) triangles within the sub-graph, i.e, sup Δ ( 𝑒 ) ≥ 𝑘 −
2, where Δ ⊑ 𝑇 is a temporal interval and 𝑘 ≥
2. We will often denote the ( 𝑘, Δ )-truss as 𝑇 𝑘, Δ .A ( 𝑘, Δ )-truss is a dense subgraph (where 𝑘 is the cohesivenessconstraint), together with its temporal span, i.e., the span Δ forwhich the subgraph satisfies the cohesiveness constraint. Problem . Given a temporal graph 𝐺 , find the set of all ( 𝑘, Δ ) -trusses of 𝐺 . Observation . For a fixed temporal interval Δ ⊑ 𝑇 , finding allspan-trusses that have Δ as their span is equivalent to computingthe classic truss decomposition of the static graph 𝐺 Δ = ( 𝑉 , 𝐸 Δ ) .Similarly to what has been proved for the span-cores [7], the totalnumber of span-trusses may be too large for human inspection. Infact, the total number of temporal intervals contained in the wholetime domain 𝑇 is | 𝑇 |( | 𝑇 |+ ) , so the total number of span-trusses is 𝑂 (| 𝑇 | × 𝑘 max ) , where 𝑘 max is the largest value of 𝑘 for which a( 𝑘, Δ )-truss exists. For this reason, it is worthwhile to focus only onthe most important trusses, the maximal ones, as defined next. Definition . A span-truss 𝑇 𝑘, Δ of a temporalgraph 𝐺 is said to be maximal if there does not exist any other span-truss 𝑇 𝑘 ′ , Δ ′ of 𝐺 such that 𝑘 ≤ 𝑘 ′ and Δ ⊑ Δ ′ .A span-truss is recognized as maximal if it is not dominated byanother span-truss both on order 𝑘 and the span Δ . In our temporalsetting, the number of maximal span-trusses is 𝑂 (| 𝑇 | ) , as, in theworst case, there may be one maximal span-truss for every temporalinterval. However, similarly to the maximal span-cores, we expectthe number of maximal span-trusses to be much smaller. Problem . Given a temporal graph 𝐺 , find the set of all maximal ( 𝑘, Δ ) -trusses of G.We now outline and prove some properties which will be usefullater. Proposition . For any two span-trusses 𝑇 𝑘, Δ , 𝑇 𝑘 ′ , Δ ′ of a temporal graph 𝐺 , it holds that 𝑘 ′ ≤ 𝑘 ∧ Δ ′ ⊑ Δ = ⇒ 𝑇 𝑘, Δ ⊆ 𝑇 𝑘 ′ , Δ ′ .Proof. The result can be proved by separately showing that (i) 𝑘 ′ ≤ 𝑘 = ⇒ 𝑇 𝑘, Δ ⊆ 𝑇 𝑘 ′ , Δ , and (ii) Δ ′ ⊑ Δ = ⇒ 𝑇 𝑘, Δ ⊆ 𝑇 𝑘, Δ ′ .(i) holds because every 𝑒 ∈ 𝐸 Δ is in at least 𝑘 triangles in thesubgraph 𝑇 𝑘, Δ , thus every 𝑒 is also in at least 𝑘 ′ triangles since 𝑘 ′ ≤ 𝑘 ; this means that 𝑇 𝑘, Δ ⊆ 𝑇 𝑘 ′ , Δ .(ii) holds because Δ ′ ⊑ Δ = ⇒ 𝐸 Δ ⊆ 𝐸 Δ ′ = ⇒ ∀ 𝑒 ∈ 𝐸 Δ , 𝑒 ∈ 𝐸 Δ ′ .If 𝑒 is in at least 𝑘 triangles in 𝑇 𝑘, Δ then it is in at least 𝑘 trianglesalso in 𝑇 𝑘, Δ ′ , so 𝑇 𝑘, Δ ⊆ 𝑇 𝑘, Δ ′ . □ Definition . Let 𝑇 𝑘 ∗ [ 𝐺 ] denote the innermosttruss of 𝐺 , i.e., the non-empty 𝑘 -truss of 𝐺 with the largest 𝑘 .
2, [0,0] 3, [1,1] 2, [2,2]2, [1,1]3, [0,2] 3, [1,2] 4, [2,3]4, [0,1] 3, [1,3]4, [0,2] 4, [1,2] 4, [1,3] 3, [2,3]3, [2,2]2, [0,2] 2, [0,3]4, [1,1]2, [0,1]3, [0,0]4, [0,0] 4, [2,2]2, [1,2]3, [0,1] 2, [1,3] 2, [2,3] 2, [3,3]3, [3,3]4, [3,3]
Figure 3: Graphical representation of the containment prop-erty. Span-trusses follow the same structure of span-cores.For a temporal span Δ = [ 𝑡 𝑠 , 𝑡 𝑒 ] , the ( 𝑘, Δ ) -truss is depictedas a node labeled " 𝑘, [ 𝑡 𝑠 , 𝑡 𝑒 ] ", an arrow 𝑇 → 𝑇 denotes 𝑇 ⊇ 𝑇 [7]. Lemma . Given a temporal graph 𝐺 = ( 𝑉 ,𝑇, 𝜏 ) , let 𝑇 𝑀 be the setof all maximal span-trusses of 𝐺 , and 𝑇 𝑖𝑛𝑛𝑒𝑟 = { 𝑇 𝑘 ∗ [ 𝐺 Δ ]| Δ ⊑ 𝑇 } bethe set of innermost trusses of all graphs 𝐺 Δ . It holds that 𝑇 𝑀 ⊆ 𝑇 𝑖𝑛𝑛𝑒𝑟 .Proof. Every 𝑇 𝑘, Δ ∈ 𝑇 𝑀 is the innermost truss of the non-temporal graph 𝐺 Δ : else, there would exist another truss 𝑇 𝑘 ′ , Δ ≠ ∅ with 𝑘 ′ > 𝑘 , implying that 𝑇 𝑘, Δ ∉ 𝑇 𝑀 . □ Lemma . Given a temporal graph 𝐺 = ( 𝑉 ,𝑇, 𝜏 ) , and threetemporal intervals Δ = [ 𝑡 𝑠 , 𝑡 𝑒 ] ⊑ 𝑇 , Δ ′ = [ 𝑡 𝑠 − , 𝑡 𝑒 ] ⊑ 𝑇 , and Δ ′′ = [ 𝑡 𝑠 , 𝑡 𝑒 + ] ⊑ 𝑇 . The innermost truss 𝑇 𝑘 ∗ [ 𝐺 Δ ] is a maximalspan-truss of 𝐺 if and only if 𝑘 ∗ > max { 𝑘 ′ , 𝑘 ′′ } where 𝑘 ′ and 𝑘 ′′ are the orders of the innermost trusses of 𝐺 Δ ′ and 𝐺 Δ ′′ , respectively.Proof. The " ⇒ " part comes directly from the definition of maxi-mal span-truss (Definition 3.2): if 𝑘 ∗ were not larger than max { 𝑘 ′ , 𝑘 ′′ } ,then 𝑇 𝑘 ∗ [ 𝐺 Δ ] would be dominated by another span-truss both onthe order and on the span (as both Δ ′ and Δ ′′ are super intervals of Δ ). For the " ⇐ " part, from Lemma 3.1 and Proposition 3.1 it followsthat max { 𝑘 ′ , 𝑘 ′′ } is an upper bound on the maximum order of aspan-truss of a super interval of Δ . Therefore, 𝑘 ∗ > max { 𝑘 ′ , 𝑘 ′′ } implies that there cannot exist any other span-truss that dominates 𝑇 𝑘 ∗ [ 𝐺 Δ ] both on the order and on the span. □ We present our solution by first giving a naïve approach, and thenby introducing three versions (Baseline, Streaming, Heuristic) thatimprove over the previous version.
LG ’20, August 24, 2020, San Diego, CA, USA Lotito and Montresor, et al.
A first approach to solve the problem could be based on Observa-tion 3.1; namely, we can repeat the truss decomposition for everypossible interval and then filter out non-maximal span-trusses.
Algorithm 1
Naïve maximal span-trusses
Input:
A temporal graph 𝐺 = ( 𝑉 ,𝑇, 𝜏 ) . Output:
The set 𝑇 𝑀 of all maximal span-trusses of 𝐺 . candidates ← ∅ 𝑇 𝑀 ← ∅ forall 𝑡 𝑠 in [0, 1, ... , 𝑡 𝑚𝑎𝑥 ] do forall 𝑡 𝑒 in [ 𝑡 𝑠 , 𝑡 𝑠 + ... , 𝑡 𝑚𝑎𝑥 ∗] | 𝐸 [ 𝑡 𝑠 , 𝑡 𝑒 ] ≠ ∅ do Δ ← [ 𝑡 𝑠 , 𝑡 𝑒 ] candidates [ Δ ] ← computeMaxTruss ( 𝐺 Δ ) 𝑇 𝑀 ← maximal span-trusses from candidates Algorithm 1 is trivially sound and complete since it iterates overevery possible interval Δ , extracts the maximal 𝑘 -truss from 𝐺 Δ and saves it as a candidate element of 𝑇 𝑀 . 𝑇 𝑀 is constructed by filtering out non-maximal elements from candidates and applying Definition 3.2. As a baseline, we use a slightly better algorithm. This approach issimilar to the baseline of the algorithm to mine span-cores [7]. Itexploits the containment properties we have proved before, whichare shared between span-cores and span-trusses.
Algorithm 2
Maximal span-trusses
Input:
A temporal graph 𝐺 = ( 𝑉 ,𝑇, 𝜏 ) . Output:
The set 𝑇 𝑀 of all maximal span-trusses of 𝐺 . 𝑇 𝑀 ← ∅ 𝐾 ′ [ 𝑡 ] ← , ∀ 𝑡 ∈ 𝑇 forall 𝑡 𝑠 in [ , , . . . , 𝑡 𝑚𝑎𝑥 ] do 𝑡 ∗ ← max { 𝑡 𝑒 ∈ [ 𝑡 𝑠 , 𝑡 𝑚𝑎𝑥 ] | 𝐸 [ 𝑡 𝑠 , 𝑡 𝑒 ] ≠ ∅} 𝑘 ′′ ← forall 𝑡 𝑒 in [ 𝑡 ∗ , 𝑡 ∗ − ... , 𝑡 𝑠 ] do Δ ← [ 𝑡 𝑠 , 𝑡 𝑒 ] lb ← max { 𝐾 ′ [ 𝑡 𝑒 ] , 𝑘 ′′ } innermostTruss ← computeMaxTruss ( 𝐺 Δ ) 𝑘 ∗ ← order of innermostTruss if 𝑘 ∗ > lb then 𝑇 𝑀 ← 𝑇 𝑀 ∪ { 𝑇 } 𝑘 ′′ ← 𝑘 ∗ 𝐾 ′ [ 𝑡 𝑒 ] ← max { 𝐾 ′ [ 𝑡 𝑒 ] , 𝑘 ′′ } Algorithm 2 works as follows. It iterates over all the startingtimestamps 𝑡 𝑠 ∈ 𝑇 in increasing order and, for each 𝑡 𝑠 , all themaximal span-trusses that have span starting in 𝑡 𝑠 are identified.Proceeding in this way guarantees that a span-truss recognized asmaximal will not be later dominated by another span-truss, sincean interval [ 𝑡 𝑠 , 𝑡 𝑒 ] can not be contained in another interval [ 𝑡 ′ 𝑠 , 𝑡 ′ 𝑒 ] with 𝑡 𝑠 < 𝑡 ′ 𝑠 .To find all the maximal span-trusses having span starting in 𝑡 𝑠 ,for any 𝑡 𝑠 the algorithm identifies 𝑡 ∗ ≥ 𝑡 𝑠 , the maximum timestamp such that the edge set 𝐸 [ 𝑡 𝑠 ,𝑡 𝑒 ] is not empty. Then, proceeding indecreasing order of 𝑡 𝑒 and starting from 𝑡 𝑒 = 𝑡 ∗ , all intervals Δ = [ 𝑡 𝑠 , 𝑡 𝑒 ] are considered (from the largest interval to the smallestinterval).The internal cycle computes the lower bound lb (maximum be-tween 𝐾 ′ [ 𝑡 𝑒 ] and 𝑘 ′′ ) on the order of the innermost truss of 𝐺 Δ tobe recognized as maximal. 𝐾 ′ is a map that maintains, for everytimestamp 𝑡 ∈ [ 𝑡 𝑠 , 𝑡 ∗ ] , the order of the innermost truss of graph 𝐺 ′ Δ where Δ = [ 𝑡 𝑠 − , 𝑡 ] (i.e., 𝐾 ′ [ 𝑡 ] stores what in Lemma 3.2 isdenoted as 𝑘 ′ ). 𝑘 ′′ stores the order of the innermost truss of 𝐺 ′′ Δ and Δ ′′ = [ 𝑡 𝑠 , 𝑡 𝑒 + ] .The selected truss is added to the set of the maximal span-trussesonly if its order is larger than lb, then the values of 𝑘 ′′ and 𝐾 ′ [ 𝑡 𝑒 ] are updated. Observation . The worst-case time complexity of Algorithm 2is 𝑂 (| 𝑇 | × | 𝐸 | . ) since the 𝑘 -truss decomposition (complexity 𝑂 (| 𝐸 | . ) ) is repeated for every Δ . It is trivial to show that thenumber of possible intervals Δ is 𝑂 (| 𝑇 | ) . Note that, since theoutput itself is potentially quadratic in | 𝑇 | , it is not possible toimprove over the | 𝑇 | factor in the computational complexity.We outline now and discuss the operation of building the graph ( 𝑉 , 𝐸 Δ ) efficiently on both space and time; we follow the approachof [7].Having a fixed timestamp 𝑡 𝑠 ∈ [ , ..., 𝑡 max ] , they propose thefollowing reasoning which holds for every 𝑡 𝑠 . Let 𝐸 − ( 𝑡 𝑒 ) = 𝐸 [ 𝑡 𝑠 ,𝑡 𝑒 ] \ 𝐸 [ 𝑡 𝑠 ,𝑡 𝑒 + ] be the set of edges that are in 𝐸 [ 𝑡 𝑠 ,𝑡 𝑒 ] but not in 𝐸 [ 𝑡 𝑠 ,𝑡 𝑒 + ] ,for 𝑡 𝑒 ∈ [ 𝑡 𝑠 , ..., 𝑡 ∗ − ] . For each 𝑡 𝑠 , one can compute and store all edgesets { 𝐸 − ( 𝑡 𝑒 )} 𝑡 𝑒 ∈[ 𝑡 𝑠 ,𝑡 ∗ − ] . These operations can be done in 𝑂 (| 𝑇 | ×| 𝐸 |) time, because every 𝐸 − ( 𝑡 𝑒 ) can be computed incrementallyfrom 𝐸 [ 𝑡 𝑒 ,𝑡 𝑒 ] as 𝐸 − ( 𝑡 𝑒 ) = {( 𝑢, 𝑣 ) ∈ 𝐸 [ 𝑡 𝑠 ,𝑡 𝑒 ] | 𝜏 ( 𝑢, 𝑣, 𝑡 𝑒 + ) = } .For any 𝑡 𝑒 , 𝐸 [ 𝑡 𝑠 ,𝑡 𝑒 ] can be reconstructed as 𝐸 [ 𝑡 𝑠 ,𝑡 𝑒 + ] ∪ 𝐸 − ( 𝑡 𝑒 ) ,having previously computed 𝐸 [ 𝑡 𝑠 ,𝑡 𝑒 + ] . Note that storing all 𝐸 − ( 𝑡 𝑒 ) takes 𝑂 (| 𝐸 |) space. That is why all 𝐸 − ( 𝑡 𝑒 ) are stored and 𝐸 [ 𝑡 𝑠 ,𝑡 𝑒 ] arereconstructed afterward instead of storing the latter, which wouldtake 𝑂 (| 𝑇 | × | 𝐸 |) space.We use this approach in Algorithm 2. Observation . Since for any 𝑡 𝑒 , we reconstruct 𝐸 [ 𝑡 𝑠 ,𝑡 𝑒 ] as 𝐸 [ 𝑡 𝑠 ,𝑡 𝑒 + ] ∪ 𝐸 − ( 𝑡 𝑒 ) , we are always adding new edges to the graph 𝐺 [ 𝑡 𝑠 ,𝑡 𝑒 + ] start-ing from an empty graph. This means we can exploit a streamingapproach to solve the problem. It is trivial to see that the Algorithm 2 repeats the truss decom-position in every possible interval. This means it also repeats thesupport computation, which for a single interval Δ has complexity 𝑂 (| 𝐸 Δ | . ) and it is the most expensive operation. Here we outlinean algorithm to achieve better performance with regards to thesupport computation.We can reframe the problem and think of it as a streaming prob-lem, as stated in Observation 4.2. Suppose we have computed thesupport for every edge active in the interval Δ ∗ = [ 𝑡 𝑠 , 𝑡 𝑒 + ] . Inthe next step, we consider the interval Δ = [ 𝑡 𝑠 , 𝑡 𝑒 ] and so we areconsidering the graph 𝐺 Δ which is simply the graph 𝐺 Δ ∗ with anumber of edges added, namely 𝐸 − ( 𝑡 𝑒 ) . We can study how theaddition of these new edges changes the support of the edges of fficient Algorithms to Mine Maximal Span-Trusses From Temporal Graphs MLG ’20, August 24, 2020, San Diego, CA, USA DA CB01 11 DA CB11 12 1
Figure 4: In this example we show how the insertion ofa new edge ( 𝐵, 𝐶 ) affects the supports of the other edgesin the graph. The red vertex 𝐴 is the only vertex in ( neighbours ( 𝐵 ) ∩ neighbours ( 𝐶 )) , so we update the supportsof ( 𝐴, 𝐶 ) , ( 𝐴, 𝐵 ) and of the new edge ( 𝐵, 𝐶 ) . In fact we observethat ( 𝐵, 𝐶 ) forms a triangle with these edges, colored in green.On the right, we have the graph with the supports updated. the old graph 𝐺 Δ ∗ and develop an algorithm that computes onlythe support of the edges in 𝐸 − ( 𝑡 𝑒 ) and just updates the support ofthe edges in 𝐺 Δ ∗ . The updating part, without always recomputing,leads to a high speedup in the performance, as we will see in thenext section.After the update of the support of the edges, we can run thetruss decomposition algorithm. Algorithm 3
Computing the support of every edge in 𝐺 Δ efficiently Input:
A graph 𝐺 [ 𝑡 𝑠 ,𝑡 𝑒 + ] = ( 𝑉 , 𝐸 [ 𝑡 𝑠 ,𝑡 𝑒 + ] ) with the supportcomputed for every edge and a set 𝐸 − ( 𝑡 𝑒 ) of edges to add to 𝐺 [ 𝑡 𝑠 ,𝑡 𝑒 + ] Output:
A graph 𝐺 [ 𝑡 𝑠 ,𝑡 𝑒 ] = ( 𝑉 , 𝐸 [ 𝑡 𝑠 ,𝑡 𝑒 + ] ∪ 𝐸 − ( 𝑡 𝑒 )) with thesupports updated forall 𝑒 ∈ 𝐸 − ( 𝑡 𝑒 ) do add 𝑒 to 𝐺 [ 𝑡 𝑠 ,𝑡 𝑒 + ] let ( 𝑢, 𝑣 ) = 𝑒 forall 𝑤 ∈ ( neighbours ( 𝑢 ) ∩ neighbours ( 𝑣 )) do sup ( 𝑢, 𝑣 ) = sup ( 𝑢, 𝑣 ) + sup ( 𝑣, 𝑤 ) = sup ( 𝑣, 𝑤 ) + sup ( 𝑢, 𝑤 ) = sup ( 𝑢, 𝑤 ) + Observation . If we use a map 𝑀 , which maps a pair of vertices ( 𝑢, 𝑣 ) to 1 if the edge exists in 𝐺 Δ at observation time or to 0 ifit does not exists, we can implement the intersection at step 4 bysimply iterating over the neighbours of one of the two verticesand check in 𝑂 ( ) if the remaining edge to form the triangle existsin the graph at observation time. Hence, the running time of thisapproach is bounded by (cid:205) ( 𝑢,𝑣 ) ∈ 𝐸 − ( 𝑡 𝑒 ) min { deg ( 𝑢 ) , deg ( 𝑣 )} . It is worth mentioning that we still compute the truss decompositionin every graph 𝐺 Δ . From Algorithm 2, lines 11 to 14, we observethat a 𝑘 -truss recognized as a maximal 𝑘 -truss in a snapshot of atemporal graph will not always be recognized as a maximal span-truss. Observation . If the order of the innermost-truss 𝐼 ′ of the graph 𝐺 [ 𝑡 𝑠 ,𝑡 𝑒 ] is 𝑘 ′ and the order of the innermost-truss 𝐼 ′′ of the graph 𝐺 [ 𝑡 𝑠 ,𝑡 𝑒 − ] is 𝑘 ′ then 𝐼 ′′ is not a maximal span-truss. Observation . If the order of the innermost-truss 𝐼 ′ of the graph 𝐺 [ 𝑡 𝑠 ,𝑡 𝑒 ] is 𝑘 ′ and the graph 𝐺 [ 𝑡 𝑠 ,𝑡 𝑒 − ] and the graph 𝐺 [ 𝑡 𝑠 ,𝑡 𝑒 ] havethe same number of edges with support greater than 𝑘 ′ − 𝐼 ′′ is 𝑘 ′ .These two simple yet effective observations provide a minimalcondition to avoid the computation of the truss decomposition ina snapshot of a temporal graph and lead to an improvement inthe performance in particular datasets, as we will see in the nextchapter. Datasets.
We use eight real-world datasets recording timestampedinteractions between entities , as in [7]. For each dataset, a windowsize is selected to build the corresponding temporal graph. Multi-ple interactions occurrinng between two entities during the samediscrete timestamp are counted as one. The characteristics of theresulting graphs are reported in Table 1.prosperloans represents the network of loans between theusers of Prosper, a marketplace of loans between privates. lastfmrecords the co-listening activity of the streaming platform Last.fm:two users are connected if they listened to songs of the same bandduring the same discrete timestamp. wikitalk is the communica-tion network of the English Wikipedia. dblp is the co-authorshipnetwork of the authors of scientific papers from the DBLP com-puter science bibliography. stackoverflow includes the answer-to-question interactions on StackOverflow. wikipedia connectsusers of the Italian Wikipedia that co-edited a page within the samediscrete timestamp. In the amazon dataset, vertices are users, andedges represent the rating of at least one common item within thesame discrete timestamp. Implementation.
The code for the experiments has been imple-mented in C++11, compiled with g++ 5.4 and -O3 optimization, andrun on a machine equipped with a 2,2 GHz CPU, 94GB RAM andUbuntu 16.04.6 LTS (GNU/Linux 4.4.0-145-generic x86_64). All datasets are made available by the KONECT Project (http://konect.cc), except forStackOverflow which is part of the SNAP Repository (http://snap.stanford.edu). https://github.com/FraLotito/span_trusses Dataset | 𝑉 | | 𝐸 | | 𝑇 | windowsize(days) domainprosperloans 89k 3M 307 7 economiclastfm 992 4M 77 21 co-listeningwikitalk 2M 10M 192 28 communicationdblp 1M 11M 80 366 co-authorshipstackoverflow 2M 16M 51 56 questionansweringwikipedia 343k 18M 101 56 co-editingamazon 2M 22M 115 28 co-rating Table 1: Description of the temporal graphs used for the ex-periments
LG ’20, August 24, 2020, San Diego, CA, USA Lotito and Montresor, et al.
Dataset
Table 2: Number of maximal span-trusses in each dataset
Dataset Baseline(s) Streaming(s) Heuristics(s)prosperloans 5 5 5lastfm 1318 dblp 513 112 stackoverflow 381 91 wikipedia 2447 Table 3: Experimental results
Results.
Table 2 reports the number of maximal span-trusses thatare present in the datasets.Table 3, instead, shows the computing time for each of thedatasets for the Baseline, Streaming and Heuristic algorithms. Thetable shows how computing the support of the edges in a streamingfashion improves the overall performance of the algorithm. Wereport a constant decrease in the time execution, with a peak withthe wikitalk dataset, which takes almost ten times less than thebaseline.The table also shows how our proposed heuristic to avoid un-necessary decompositions helps in reducing the time execution insome of the datasets, with a peak with the wikitalk dataset whichtakes half the time with respect to our efficient algorithm. In somedatasets, however, the heuristic comes with minimal overhead; webelieve that it is worthwhile to use such version anyway, to exploitthe more significant performance gain in the other cases.
The first and most obvious dense subgraph introduced to socialnetwork analysis is the clique, a subgraph in which every vertex isadjacent to every other vertex [13]. Computing cliques has severaldisadvantages. First, they are both too rare and too common: cliquesof only a few members are frequently too numerous to be helpful,while larger cliques are too difficult to be found in real-world graphs.Second, no polynomial-time algorithm is known for this problem:this makes the enumeration of cliques impractical for moderatedata sizes [3].A number of generalizations and relaxations have been proposedto avoid the issues of rarity and tractability of cliques [1, 15, 18].A well-known relaxation of the clique is the 𝑘 -core decomposi-tion [17]. A 𝑘 -core is a maximal subgraph in which each memberis adjacent to at least 𝑘 other members. Unlike other clique gener-alizations, 𝑘 -cores can be computed and listed in polynomial time. Figure 5: Example of the differences between 𝑘 -core (firstpicture) and 𝑘 -truss decomposition (second picture) [12]. Wehighlight the coreness of every vertex in the first picture andthe trussness of every edge in the second. The disadvantage of 𝑘 -cores is that they are too promiscuous andthey can be of questionable utility.The concept 𝑘 -truss has been introduced as a compromise be-tween the expensive-to-find and overly-numerous groupings pro-vided by cliques, 𝑘 -cliques, 𝑘 -clubs, 𝑘 -plexes on the one hand, andthe easy-to-compute, few-in-number, but overly-generous 𝑘 -coreson the other [6]. In most real-world graphs, the maximum trussnessis much lower than the maximum coreness, and the highest ordertruss is much denser than the highest-order core [20]. Figure 5highlights the differences between 𝑘 -core and 𝑘 -truss.Recently, there has been an increasing interest from the researchcommunity in generalizing cohesive structure concepts in a tempo-ral setting. Our work is directly inspired by the work of Galimbertiet al. [7] who generalized the concept of 𝑘 -core and introducedthe concept of span-core . They also provided the correspondingalgorithms to compute all the span-cores and to efficiently computeonly the maximal ones (span-cores that are not dominated by anyother span-core by both the coreness property and the span) in atemporal graph.Other works related to ours include Semertzidis et al. [19], whointroduced the problem of identifying a set of vertices that aredensely connected in at least 𝑘 timestamps of a temporal network;Himmel at al. [8] and Viard et al. [21], who generalized the conceptof clique in a temporal graph and proposed the respective listingalgorithms; and Ma et al. [14], who a proposed a statistics-drivenapproach to find dense temporal subgraphs in large temporal net-works. fficient Algorithms to Mine Maximal Span-Trusses From Temporal Graphs MLG ’20, August 24, 2020, San Diego, CA, USA In this paper, we have generalized the concept of 𝑘 -truss to a tempo-ral setting defining a structure called span-truss , where each trussis associated with its span. We have developed both a naïve andan efficient algorithm to extract all the maximal span-trusses of atemporal graph, along with a heuristic to improve the running timein particular conditions. Finally, we have evaluated our proposalson a number of public datasets.In our future work, we plan to explore new heuristics to avoid thecomputation of the whole truss decomposition when not needed;for example, Burkhardt et al. [4] summarized a number of propertiesand bounds that a 𝑘 -truss must satisfy and which can be useful toavoid the computation of the decomposition when not needed. REFERENCES [1] Richard D. Alba. 1973. A Graph-Theoretic Definition of a Sociometric Clique.
Journal of Mathematical Sociology
Proc. VLDB Endow.
5, 6 (2012), 574–585.[3] Coen Bron and Joep Kerbosch. 1973. Algorithm 457: Finding All Cliques of anUndirected Graph.
Commun. ACM
16, 9 (Sept. 1973), 575–577.[4] Paul Burkhardt, Vance Faber, and David G. Harris. 2018. Bounds and algorithmsfor 𝑘 -truss. arXiv:math.CO/1806.05523[5] Lijun Chang. 2018. Cohesive subgraph computation over large sparse graphs :algorithms, data structures, and programming techniques . Springer, Boston, MA.[6] Jonathan Cohen. 2008. Trusses: Cohesive Subgraphs for Social Network Analysis.[7] Edoardo Galimberti, Alain Barrat, Francesco Bonchi, Ciro Cattuto, and FrancescoGullo. 2018. Mining (Maximal) Span-Cores from Temporal Networks. In
Pro-ceedings of the 27th ACM International Conference on Information and KnowledgeManagement (Torino, Italy) (CIKM ’18) . Association for Computing Machinery,New York, NY, USA, 107–116.[8] A. Himmel, H. Molter, R. Niedermeier, and M. Sorge. 2016. Enumerating maximalcliques in temporal graphs. In
IEEE/ACM International Conference on Advances inSocial Networks Analysis and Mining (ASONAM’16) . IEEE Press, Calgary, Canada, 337–344.[9] Petter Holme and Jari Saramäki. 2012. Temporal networks.
Physics Reports
Proceedings of the 2014ACM SIGMOD International Conference on Management of Data (Snowbird, Utah,USA) (SIGMOD ’14) . Association for Computing Machinery, New York, NY, USA,1311–1322.[11] Victor E. Lee, Ning Ruan, Ruoming Jin, and Charu C. Aggarwal. 2010. A Surveyof Algorithms for Dense Subgraph Discovery. In
Managing and Mining GraphData . Springer, Boston, MA, 303–336.[12] Penghang Liu and A. Erdem Sarıyüce. 2019. Analysis of Core and Truss De-compositions on Real-World Networks. In
Proceedings of the 15th InternationalWorkshop on Mining and Learning with Graphs (MLG) .[13] R. Duncan Luce and Albert D. Perry. 1949. A method of matrix analysis of groupstructure.
Psychometrika
14, 2 (01 Jun 1949), 95–116.[14] Shuai Ma, Renjun Hu, Luoshu Wang, Xuelian Lin, and Jin-Peng Huai. 2019. AnEfficient Approach to Finding Dense Temporal Subgraphs.
IEEE Transactions onKnowledge and Data Engineering (01 2019).[15] Robert J. Mokken. 1979. Cliques, clubs and clans.
Quality and Quantity
13, 2 (01Apr 1979), 161–173.[16] Thomas Schank. 2007.
Algorithmic Aspects of Triangle-Based Network Analysis .Ph.D. Dissertation. Universität Karlsruhe, Karlsruhe.[17] Stephen B. Seidman. 1983. Network structure and minimum degree.
SocialNetworks
5, 3 (1983), 269 – 287.[18] Stephen B. Seidman and Brian L. Foster. 1978. A graph-theoretic generalizationof the clique concept.
The Journal of Mathematical Sociology
6, 1 (1978), 139–154.[19] Konstantinos Semertzidis, Evaggelia Pitoura, Evimaria Terzi, and PanayiotisTsaparas. 2018. Finding lasting dense subgraphs.
Data Mining and KnowledgeDiscovery
33, 5 (Nov. 2018), 1417–1445.[20] Kijung Shin, Tina Eliassi-Rad, and Christos Faloutsos. 2018. Patterns and Anom-alies in K-Cores of Real-World Graphs with Applications.
Knowl. Inf. Syst.
54, 3(March 2018), 677–710.[21] Jordan Viard, Matthieu Latapy, and Clémence Magnien. 2015. Revealing ContactPatterns among High-School Students Using Maximal Cliques in Link Streams. In
Proceedings of the 2015 IEEE/ACM International Conference on Advances in SocialNetworks Analysis and Mining 2015 (Paris, France) (ASONAM ’15) . Associationfor Computing Machinery, New York, NY, USA, 1517–1522.[22] Jia Wang and James Cheng. 2012. Truss Decomposition in Massive Networks.