Computing Betweenness Centrality in Link Streams
aa r X i v : . [ c s . D S ] F e b Computing Betweenness Centrality in Link Streams
Frédéric Simard, Clémence Magnien and Matthieu Latapy Abstract
Betweeness centrality is one of the most important concepts in graph analysis. Itwas recently extended to link streams, a graph generalization where links arrive overtime. However, its computation raises non-trivial issues, due in particular to the factthat time is considered as continuous. We provide here the first algorithms to computethis generalized betweenness centrality, as well as several companion algorithms thathave their own interest. They work in polynomial time and space, we illustrate themon typical examples, and we provide an implementation.
Betweenness centrality, or betweenness for short, is one of the most classical and importantconcepts defined over graphs and used in the field of complex networks and social networkanalysis [36, 30, 20, 19, 9]. Given a graph G = ( V, E ) , it measures how frequently eachnode v ∈ V is involved in shortest paths: B ( v ) = P u ∈ V,w ∈ V σ ( u,w,v ) σ ( u,w ) where σ ( u,w,v ) σ ( u,w ) is thefraction of all shortest paths from u to w that involve v if there is a path from u to w , otherwise. Reference algorithms compute the betweenness of all nodes in a graph in time O ( n · m ) , where n and m are the number of nodes and links in the graph [4].Betweenness was extended recently to link streams [18], a family of formal objects thatmodel sequences of interactions over time in a way similar to the modeling of relationsby graphs. They are equivalent to other objects like time-varying graphs (TVG) [8, 2],relational event models (REM) [7, 25], or temporal networks [21, 14], with an emphasis onthe streaming nature of link sequences. Various temporal extensions of beweenness wereintroduced in these contexts, see Section 7.Betweenness in link streams has some unique features that make it quite different fromother temporal extensions of betweenness in graphs. In particular, it considers continuoustime and links with or without durations: nodes may be linked at specific time instants,as well as during continuous periods of time. Also, it considers paths from any node atany time instant to any node at any time instant, which induces an uncountable amountof temporal nodes. This raises specific algorithmic challenges, that we address in thispaper, thus obtaining the first algorithm (and implementation) for computing betweennesscentrality in link streams.We first introduce key concepts and notations in Section 2. We then show that be-tweenness computations involve uncountable sets of paths with a finite volume, that wedefine and compute in Section 3. In addition, it involves integrals that must be tranformedinto discrete sums over a finite number of time intervals. We define and compute these Sorbonne Université, CNRS, LIP6, F-75005 Paris, France –
A link stream L is a triplet ( T, V, E ) where T = [ α, ω ] is an interval of R representing time, V is a finite set of nodes, and E ⊆ T × V ⊗ V is the set of links . Then, ( t, uv ) ∈ E meansthat u and v are linked together at time t . For any u and v in V , T uv = { t, ( t, uv ) ∈ E } denotes the set of time instants at which u and v are linked together. See Figure 1 for anillustration and [18] for a full presentation of the formalism.We assume here that T uv is the union of a finite number of disjoint closed intervals(possibly singletons) of T . We denote by T the set of bounds of maximal intervals in T uv for any u and v , that we call event times. We denote by m uv the number of maximalintervals in T uv , and by m = P u,v ∈ V m uv their sum, i.e. the number of maximal intervalsin E . In the case of Figure 1, we obtain T = { , , , , , , , , , , , , , , , , , , , , , , , } , m ab = 3 , m ac = 1 , m bc = 4 , m bd = 1 , m cd = 3 , m de = 4 , andso m = 16 .Given a link stream L = ( T, V, E ) and a time t , we define the graph G t = ( V, E t ) with E t = { uv, ( t, uv ) ∈ E } . We denote by N t ( v ) the set of neighbors of v in G t . We denote by σ t ( u, v ) the (finite) number of paths from u to v in G t , and by d t ( u, v ) the distance from u to v in this graph. abcde Figure 1:
An example of link stream L = ( T, V, E ) with T = [ α, ω ] = [0 , , V = { a, b, c, d, e } , and E defined by T ab = [1 , ∪ [15 , ∪ [23 , , T ac = [8 , , T bc =[3 , ∪ { } ∪ [19 , ∪ [25 , , T bd = [12 , , T cd = [6 , ∪ [18 , ∪ [27 , , and T de =[9 , ∪ { } ∪ [23 , ∪ [30 , .In L = ( T, V, E ) , a path P from ( x, u ) ∈ T × V to ( y, v ) ∈ T × V is a sequence v , t , v , t , v , . . . t k , v k with v = u , v k = v , x ≤ t ≤ t ≤ · · · ≤ t k ≤ y , and ( t i , v i − v i ) ∈ E for all i . If such a path exists, then ( y, v ) is reachable from ( x, u ) , which we denote by We make the distinction between the set X × Y of ordered pairs of elements of X and Y , that wedenote by ( x, y ) with x ∈ X and y ∈ Y , and the set X ⊗ Y of unordered pairs of distinct elements of X and Y , that we denote by xy with x ∈ X , y ∈ Y and x = y ; ( x, y ) = ( y, x ) while xy = yx . x, u ) −→ ( y, v ) . The path P involves ( t , u ) , ( t k , v ) , and ( t, v i ) for all t ∈ [ t i , t i +1 ] and all i . It starts at t , arrives at t k , has length k and duration t k − t . A path with duration is called an instantaneous path.For instance, in the case of Figure 1, the sequences a, , b, , c, , d, , e and a, , c, ,d, , c, , d, , e are two paths from (0 , a ) to (32 , e ) with length and duration , andlength and duration , respectively.The path P is a shortest path from ( x, u ) to ( y, v ) if it has minimal length, called thedistance from ( x, u ) to ( y, v ) and denoted by d (( x, u ) , ( y, v )) . The path P is a fastest pathfrom ( x, u ) to ( y, v ) if it has minimal duration, called the latency from ( x, u ) to ( y, v ) anddenoted by ℓ (( x, u ) , ( y, v )) . The path P is a shortest fastest path from ( x, u ) to ( y, v ) if itis a path of minimum length among those of minimal duration from ( x, u ) to ( y, v ) .For instance, in the case of Figure 1, the path a, , b, , c, , d, , e is a fastest path from (0 , a ) to (32 , e ) , but a, , b, , c, , d, , e is not (it has duration ). The path a, , b, , c, , d, , e has length and duration , and no path from (0 , a ) to (32 , e ) with lower duration exists.It is not a shortest path since a, , c, , d, , e also is a path from (0 , a ) to (32 , e ) whichhas length and duration . This last path is a shortest path, since no path with lowerlength exists, but not a fastest one. The distance from (0 , a ) to (32 , e ) therefore is andthe latency is . Among the fastest paths from (0 , a ) to (32 , e ) , i.e. the paths of duration ,the shortest have length . Therefore, a, , b, , c, , d, , e is a shortest fastest path betweenthem, as well as a, , b, , c, , d, , e , for instance.Finally, the betweenness of a node v ∈ V at a time instant t ∈ T measures howfrequently ( t, v ) is involved in shortest fastest paths in L , see [18]: B ( t, v ) = X u ∈ V,w ∈ V Z i ∈ T,j ∈ T σ (( i, u ) , ( j, w ) , ( t, v )) σ (( i, u ) , ( j, w )) d i d j where σ (( i,u ) , ( j,w ) , ( t,v )) σ (( i,u ) , ( j,w )) is the fraction of all shortest fastest paths from u at time i to w attime j that involve v at time t if there is a path from ( i, u ) to ( j, w ) , otherwise.In this original definition, the quantity σ (( i,u ) , ( j,w ) , ( t,v )) σ (( i,u ) , ( j,w )) is only loosely defined as a fractionof shortest fastest paths; the function σ itself, as well as the ratio between its values, arenot explicitely defined. We will see in next section that this fraction involves uncountablesets of shortest fastest paths that have finite volumes with a size and a dimension. Wewill also introduce the appropriate arithmetic operators needed to deal with them, and analgorithm to compute these volumes. Let us consider a link stream L = ( T, V, E ) , and a sequence I , I , · · · , I k of intervals of T . Let us denote by b i and e i the bounds of interval I i , with b i ≤ e i . If e i = b i then I i is a singleton ( I i = { b i } = { e i } ). The intervals may be closed ( I i = [ b i , e i ] ), half-open( I i =] b i , e i ] or I i = [ b i , e i [ ), or open ( I i =] b i , e i [ ).3e say that the sequence I , I , · · · , I k is a sliding sequence if for all i , there existsno element in I i +1 strictly smaller than all elements of I i ( ∄ y ∈ I i +1 , ∀ x ∈ I i , y < x ), andno element of I i strictly larger than all elements of I i +1 ( ∄ x ∈ I i , ∀ y ∈ I i +1 , x > y ).In such a sequence, the intervals may overlap ( I i ∩ I j = ∅ , i = j ), may be included ineach other ( I i ⊆ I j , i = j ), or may even be equal ( I i = I j , i = j ).Given a sliding sequence I , I , · · · , I k , we denote by v , I , v , I , v , · · · I k , v k the set S of all sequences v , t , v , t , v , · · · , t k , v k such that v i ∈ V , t i ∈ I i and t i +1 ≥ t i for all i .We say that S is a sliding set . If the intervals are disjoint then S = { v } × I × { v } × I × { v } × · · · × I k × { v k } , but this is not true in general.In the case of Figure 1, for instance, [23 , , ]25 , , [27 , , { } is a sliding sequenceand a, [23 , , b, ]25 , , c, [27 , , d, { } , e is a sliding set. The elements of this set arethe the paths a, t , b, t , c, t , d, t , e with ≤ t ≤ , < t ≤ , max(27 , t ) ≤ t ≤ ,and t = 30 .More generally, all paths in any link stream are elements of sliding sets. In the case ofFigure 1, for instance, all shortest paths from (0 , a ) to (14 , e ) go from a to b between times and , from b to c between times and , from c to d between and , and finally from d to e between and . Therefore, they are elements of a, [1 , , b, [3 , , c, [6 , , d, [9 , , e .In addition, if we consider any two elements ( i, u ) and ( j, v ) of T × V , then we havethe following result. Proposition 1.
The set SP (( i, u ) , ( j, v )) of all shortest paths from ( i, u ) and ( j, v ) is thedisjoint union of a finite number of sliding sets.Proof. Let us consider all sliding sequences I , I , · · · , I k with k = d (( i, u ) , ( j, v )) and I i iseither an open interval ] t, t ′ [ such that t and t ′ are two consecutive event times, or I i is asingleton { t } such that t is an event time. There is a finite number of such sequences, andthey induce a finite number of sliding sets which are all disjoint.Any path in SP (( i, u ) , ( j, v )) is in one of these sliding sets, and then all the elementsof this sliding set are shortest paths from ( i, u ) to ( j, v ) . Therefore SP (( i, u ) , ( j, v )) is theunion of such sliding sets.For instance, let us consider the following sliding sets: A = a, [1 , , b, [3 , , c, [6 , , d, [9 , , e ; B = a, [8 , , c, { } , b, [12 , , d, { } , e ; C = a, [15 , , b, { } , c, { } , d, [23 , , e ; D = a, [23 , , b, [25 , , c, [27 , , d, [30 , , e ; E = a, [1 , , b, [12 , , d, { } , e ; F = a, [1 , , b, [12 , , d, { } , e ; G = a, [1 , , b, [12 , , d, [23 , , e ; H = a, [1 , , b, [12 , , d, [30 , , e ; I = a, [8 , , c, [18 , , d, { } , e ; J = a, [8 , , c, [18 , , d, [23 , , e . K = a, [8 , , c, [18 , , d, [30 , , e ; and L = a, [8 , , c, [27 , , d, [30 , , e .Then, consider the link stream of Figure 1. There are simple cases where each set ofshortest paths corresponds to a unique sliding set, like for instance SP ((0 , a ) , (14 , e )) = A , SP ((4 , a ) , (17 , e )) = B , SP ((12 , a ) , (26 , e )) = C , SP ((20 , a ) , (32 , e )) = D , or SP ((0 , a ) , (18 , e )) = E . In most cases, however, the set of shortest paths are disjoint unions (denoted by ⊔ )of several sliding sets, like for instance SP ((0 , a ) , (23 , e )) = E ⊔ F ⊔ I , SP ((0 , a ) , (26 , e )) = E ⊔ G ⊔ J , or SP ((0 , a ) , (32 , e )) = E ⊔ G ⊔ H ⊔ J ⊔ K ⊔ L .4 efinition 1 (volumes) . The volume of a sliding set S = v , I , v , I , v , · · · I k , v k , denotedby | S | , is defined by its size and dimension as follows: • If I i is a singleton for all i , then S contains only one sequence. It has size anddimension . • Otherwise, let I ′ , I ′ , · · · , I ′ l be the subsequence of I , I , · · · , I k composed of all itsintervals that are not singletons, and let b ′ i and e ′ i , b ′ i < e ′ i , denote the bounds of I ′ i , for all i . Then, size ( S ) = R e ′ t = b ′ R e ′ t = max ( t ,b ′ ) . . . R e ′ l t l = max ( t l − ,b ′ l ) t l . . . d t d t anddim ( S ) = l .In both cases, the volume of S , | S | , is defined as the pair ( size ( S ) , dim ( S )) giving its sizeand dimension. For instance, the sliding sets above have the following volumes: | A | = (4 , , | B | =(2 , , | C | = (1 , , | D | = (5 . , , | E | = (2 , , | F | = (2 , , | G | = (2 , , | H | = (2 , , | I | = (1 , , | J | = (1 , , | K | = (1 , , and | L | = (2 , . The case of D is differentfrom the others, as it involves two non-trivially overlapping intervals, namely [25 , and [27 , . Therefore, D may be written as D = a, [23 , , b, [25 , , c, [27 , , d, [30 , , e ∪ a, [23 , , b, [27 , , c, [27 , , d, [30 , , e ∪ a, [23 , , b, [27 , , c, [28 , , d, [30 , , e . Thevolume of D is then the sum of volumes of these three sliding sets. The first and last oneshave volumes (4 , and (1 , , respectively. The middle one has volume (0 . , , since it isthe set of all sequences of the form a, t , b, t , c, t , d, t with t in [23 , , both t and t in [27 , , and t in [30 , , with the constraint that t ≤ t .More generally, we have the following definitions for volume operations. Definition 2 (addition, ⊞ ) . Given two disjoint sliding sets S and S ′ of volume | S | = ( s, d ) and | S ′ | = ( s ′ , d ′ ) , the volume of their union S ⊔ S ′ is the sum of their two volumes, which wedenote by | S | ⊞ | S ′ | . In such a sum, volumes in lower dimensions are negligible, and the sizesof volumes with maximal dimension just add up, so we obtain | S ⊔ S ′ | = | S | ⊞ | S ′ | = ( s + s ′ , d ) if d = d ′ , ( s, d ) if d > d ′ , and ( s ′ , d ′ ) if d ′ > d . By extension, any disjoint union of a finitenumber of sliding sets S , S , · · · , S k has dimension equal to the largest dimension of thesesets, and size equal to the sum of the size of all these sets of maximal dimension; we denoteits volume by ⊞ ki =1 | S i | = | S | ⊞ | S | ⊞ · · · ⊞ | S k | . Definition 3 (product, (cid:26) ) . Consider three nodes u , v and w in V , and two sets S and S ′ such that all elements of S are of the form u, t , v , t , · · · , t k , v and the ones of S ′ areof the form v, t ′ , v ′ , t ′ , · · · , t ′ l , w , with t k ≤ t ′ . We denote by S · S ′ the set of all sequences u, t , v , t , · · · , t k , v, t ′ , v ′ , t ′ , · · · , t ′ l , w such that the sequence from u to v is in S and theone from v to w is in S ′ . If S and S ′ are disjoint unions of a finite number of sliding setswith | S | = ( s, d ) and | S ′ | = ( s ′ , d ′ ) , then S · S ′ also is the disjoint union of a finite numberof sliding sets, and its volume is | S · S ′ | = | S | (cid:26) | S ′ | = ( s · s ′ , d + d ′ ) . Definition 4 (quotient and difference, (cid:27) , and ⊟ ) . Consider S and S ′ two disjointunions of sliding sets with | S | = ( s, d ) and | S ′ | = ( s ′ , d ′ ) , and such that S ′ ⊆ S . Then ecessarily d ′ ≤ d and the fraction of elements of S that are also in S ′ , which we denoteby | S ′ | (cid:27) | S | or | S ′ || S | , is equal to if d > d ′ , and to s ′ /s if d = d ′ . In addition, the set S \ S ′ is a disjoint union of sliding sets, and its volume is ( s, d ) ⊟ ( s ′ , d ′ ) = ( s, d ) ⊞ ( − s ′ , d ′ ) . These notations and operations make it easy to describe the set SP (( i, u ) , ( j, v )) and compute its volume, which is the goal of this section. In the non-trivial cases above, for instance, | SP ((0 , a ) , (23 , e )) | = | E ⊔ F ⊔ I | = | E | ⊞ | F | ⊞ | I | = (2 , ⊞ (2 , ⊞ (1 ,
2) = (5 , , | SP ((0 , a ) , (26 , e )) | = | E ⊔ G ⊔ J | = | E | ⊞ | G | ⊞ | J | =(2 , ⊞ (2 , ⊞ (1 ,
3) = (3 , , and | SP ((0 , a ) , (32 , e )) | = | E ⊔ G ⊔ H ⊔ J ⊔ K ⊔ L | =(2 , ⊞ (2 , ⊞ (2 , ⊞ (1 , ⊞ (1 , ⊞ (2 ,
3) = (8 , .We will now prove two lemmas needed to compute the volume of shortest paths froma given temporal node ( i, u ) in T × V to another one ( j, v ) in T × V . Lemma 1 shows howto compute the volume of shortest paths between two consecutive event times. Lemma 2shows how to decompose the set of shortest paths from a temporal node to another oneinto a disjoint union of smaller sets of shortest paths. This will lead to Algorithm 1, thatstarts by computing the volume of shortest paths from ( i, u ) to ( i, w ) for any w . Then,in a temporal BFS-like manner, it uses volumes from ( i, u ) to ( t, x ) to compute volumesfrom ( i, u ) to ( t ′ , w ) , for increasing pairs of consecutive event times t and t ′ . Indeed, asillustrated in Figure 2, the volumes at t ′ can be derived from the ones at t . The temporalBFS also uses two queues, named Q and X , to compute the distance that are also neededto compute volumes of shortest paths. It stops when it reaches time j .In all the following, we consider two consecutive event times t and t ′ . For all x and y in ] t, t ′ [ , the graphs G x and G y are identical. We denote by G + t (or G − t ′ ) this graph, and by σ + t ( u, v ) and d + t ( u, v ) (or σ − t ′ ( u, v ) and d − t ′ ( u, v ) ) the (finite) number of shortest paths andthe distance from u to v in this graph. Lemma 1.
Given two nodes x and w , the volume of the set of shortest paths from x to w that start and arrive during ] t, t ′ [ is equal to σ + t ( x, w ) · ( t ′ − t ) d + t ( x,w ) d + t ( x, w )! , d + t ( x, w ) ! . Proof.
First notice that if v , t , v , t , v , . . . , t k , v k is a shortest path from ( t, x ) to ( t ′ , w ) in L with t i ∈ ] t, t ′ [ for all i , then necessarily v , v , v , . . . , v k is a shortest path from x to w in G + t . Conversely, if v , v , v , . . . , v k is a shortest path from x to w in G + t then eachsequence v , t , v , t , v , . . . , t k , v k with t i ∈ ] t, t ′ [ and t i ≤ t i +1 for all i is a shortest pathfrom ( t, x ) to ( t ′ , w ) in L .Therefore, the set of shortest paths from x to w that start and arrive during ] t, t ′ [ is thedisjoint union of v , ] t, t ′ [ , v , ] t, t ′ [ , · · · , ] t, t ′ [ , v k for all shortest path v , v , v , . . . , v k from x to w in G + t , where v = x , v k = w , and k = d + t ( x, w ) . It is easy to show by inductionthat the size of each such sliding set is R t ′ t = t R t ′ t = t . . . R t ′ t k = t k − t k . . . d t d t = ( t ′ − t ) k k ! , andits dimension is k . The volume of SP (( t, x ) , ( t ′ , w )) is the sum of the volumes of all thesesliding sets, and there are σ + t ( x, w ) such sliding sets, which completes the proof.6 emma 2. Given ( i, u ) in T × V and w in V , we define the two sets X = { x ∈ V, d (( i, u ) , ( t, x )) + d + t ( x, w ) = d (( i, u ) , ( t ′ , w )) } and Y = { y ∈ N t ′ ( w ) , d (( i, u ) , ( t ′ , y )) + 1 = d (( i, u ) , ( t ′ , w )) } . Then, the volume of SP (( i, u ) , ( t ′ , w )) is the sum of the two followingvolumes: ⊞ x ∈ X | SP (( i, u ) , ( t, x )) | (cid:26) σ + t ( x, w ) · ( t ′ − t ) d + t ( x,w ) d + t ( x, w )! , d + t ( x, w ) !! and ⊞ y ∈ Y | SP (( i, u ) , ( t ′ , y )) | . Proof.
Let us denote by A the set A = ⊔ x ∈ X SP (( i, u ) , ( t, x )) · SP + (( t, x ) , ( t ′ , w )) , where SP + (( t, x ) , ( t ′ , w )) is the set of shortest paths from ( t, x ) to ( t ′ , w ) that start and arriveduring ] t, t ′ [ . Let us denote by B the set B = ⊔ y ∈ Y SP (( i, u ) , ( t ′ , y )) · { ( y, t ′ , w ) } , whichmeans that B is the set obtained when one concatenates any sequence in SP (( i, u ) , ( t ′ , y )) with y ∈ Y to the sequence y, t ′ , w . By definition of X and Y , elements of A and B areshortest paths from ( i, u ) to ( t ′ , w ) , and A and B are disjoint.Let us now consider a shortest path P = v , t , v , t , v , · · · , t k , v k in SP (( i, u ) , ( t ′ , w )) ,hence v = u and v k = w . We show that P is in A or B . Indeed, if t k = t ′ then v k − isin Y , and v , t , v , t , v , · · · , v k − is in SP (( i, u ) , ( t ′ , v k − )) , which implies that P is in B .If instead t k < t ′ , let l be the largest value such that t l ≤ t . Then, all t j with l < j ≤ k are in ] t, t ′ [ and v l necessarily is in X . Therefore, v l , t l +1 , · · · , t k , v k necessarily is a shortestpath from ( t, v l ) to ( t ′ , w ) that starts and arrives in ] t, t ′ [ . In addition, v , t , v , · · · , t l , v l is a shortest path from ( i, u ) to ( t, v l ) , with v l ∈ X . Therefore, P is in A .Finally, SP (( i, u ) , ( t ′ , w )) is exactly A ⊔ B , which, together with Lemma 1 and Defini-tions 2 and 3 on volume operations, proves the claim. t t’i,u t’,wt,x t t’i,u t’,yt’,w Figure 2: If t and t ′ are two consecutive event times, then a shortest path from ( i, u ) to ( t ′ , w ) is the concatenation of either (left) a blue path from ( i, u ) to a given ( t, x ) and agreen path from this ( t, x ) to ( t ′ , w ) in G + t ; or (right) a blue path from ( i, u ) to a specific ( t ′ , y ) and then a jump from y to w at time t ′ using ( t ′ , yw ) ∈ E t ′ . Theorem 3.
Given two temporal nodes ( i, u ) and ( j, v ) in T × V , Algorithm 1 computesthe volume of shortest paths from ( i, u ) to ( j, v ) . lgorithm 1: Volume of shortest paths between two temporal nodes. Function
VSP :Input: a link stream L = ( T, V, E ) , ( i, u ) ∈ T × V , and ( j, v ) ∈ T × V Output: volume of shortest paths from ( i, u ) to ( j, v ) Dist ← Dictionary initialized to ∞ for any key vol ← Dictionary intialized to (0 , for any key for each w reachable from u in G i do Dist [( i, w )] ← d i ( u, w ) and vol [( i, w )] ← ( σ i ( u, w ) , for each t , t ′ consecutive times in { i, j } ∪ ( T ∩ [ i, j ]) in increasing order do Q ← empty queue set all nodes as unmarked X ← list of all ( w, Dist [( t, w )]) in increasing order of Dist [( t, w )] while Q or X is not empty do ( w, d ) ← get and remove the first element of Q or X with minimal d if w is unmarked then Dist [( t ′ , w )] ← d and mark w for all unmarked node y in N t ′ ( w ) do add ( y, d + 1) to Q for all marked node w in increasing order of Dist [( t ′ , w )] do for all marked node x such that Dist [( t, x )] + d + t ( x, w ) = Dist [( t ′ , w )] do vol [( t ′ , w )] ← vol [( t ′ , w )] ⊞ vol [( t, x )] (cid:26) (cid:18) σ + t ( x, w ) · ( t ′ − t ) d + t ( x,w ) d + t ( x,w )! , d + t ( x, w ) (cid:19) for all marked node y in N t ′ ( w ) such that Dist [( t ′ , w )] = Dist [( t ′ , y )] + 1 do vol [( t ′ , w )] ← vol [( t ′ , w )] ⊞ vol [( t ′ , y )] return vol [( j, v )] Proof.
Let us consider any time t < j in { i, j }∪ ( T ∩ [ i, j ]) and let t ′ be the next time in thisset. We show below that, if Dist [( t, w )] = d (( i, u ) , ( t, w )) and vol [( t, w )] = | SP (( i, u ) , ( t, w )) | for all w when one enters the main loop at line 6, then at the end of the loop we have Dist [( t ′ , w )] = d (( i, u ) , ( t ′ , w )) and vol [( t ′ , w )] = | SP (( i, u ) , ( t ′ , w )) | . This is sufficient toprove that the algorithm returns | SP (( i, u ) , ( j, v )) | , since the loop at line 4 initializes Dist and vol correctly.Lines 7 to 13 deal with the computation of d (( i, u ) , ( t ′ , w )) from the distances at time t ,for all w . It is similar to a BFS on the graph G t ′ , except that distances at t ′ are boundedby the ones at t : d (( i, u ) , ( t ′ , w )) ≤ d (( i, u ) , ( t, w )) . The loop therefore uses two queues: alist X of nodes in increasing distance at time t , and a queue Q for the exploration of G t ′ .At each round, we consider a node w with minimal distance in these queues: Line 11 takesthe first element of X or Q , depending on which has the minimal second field d . This is itsactual distance d (( i, u ) , ( t ′ , w )) (line 12). Then we add its neighbors to Q , together with8he information that their distance from ( i, u ) cannot be larger than d (( i, u ) , ( t ′ , w )) + 1 (line 13). The loop ends when both X and Q are empty, i.e. the distances to all reachablenodes are found.Then, Lines 14 to 18 deal with the computation of | SP (( i, u ) , ( t ′ , w )) | from the volumesat time t , for all reachable w . They are a straightforward application of Lemma 2. Let us consider a link stream L = ( T, V, E ) , and two nodes u and w in V . The previoussection shows how to compute the volume of shortest paths from u to w between two giventime instants i and j . However, betweenness computations rely on volumes of shortest fastest paths from u to w . These paths are the shortest paths from ( s, u ) to ( a, w ) if thelatency from ( s, u ) to ( a, w ) is equal to a − s . We then say that ( s, a ) in T × T is a latencypair from u to w (in L ). This section is devoted to the computation of such latency pairs.In the case of Figure 1, for instance, (2 , is a latency pair from a to e , because thefastest paths from (2 , a ) to (9 , e ) start at and end at . Similarly, (9 , , (16 , and (24 , are the other latency pairs from a to e . Instead, (3 , is not a latency pair from a to e since there is no path from (3 , a ) to (8 , e ) , and (1 , is not a latency pair from a to e either because the fastest paths from (1 , a ) to (9 , e ) start at time .For any t in T , the pair ( t, t ) is a latency pair from u to w exactly if there is aninstantaneous path between ( t, u ) and ( t, w ) , i.e. there is a path between u and w in G t .The latency between ( t, u ) and ( t, w ) is then equal to , and we call ( t, t ) an instantaneouslatency pair . In the case of Figure 1, such latency pairs occur from b to d at all timesfrom to , at time , and at all times from to .Notice that there may exist an infinite amount of instantaneous latency pairs from anode to another one, like in this last example, but there is only a finite number of non-instantaneous latency pairs. Indeed, if ( s, a ) is a latency pair with a − s = 0 , then s and a necessarily are event times, and as said in Section 2 all link streams considered here havea finite number of event times.Notice also that if ( s, a ) is a latency pair from u to w , then there cannot be any latencypair ( s ′ , a ′ ) from u to w with [ s ′ , a ′ ] ( [ s, a ] . Indeed, this would imply that the latency from ( s, u ) to ( a, w ) is equal to s ′ − a ′ < s − a , which contradicts the fact that ( s, a ) is a latencypair. This also implies that, if ( s, a ) is a latency pair with s = a , then necessarily s and a are event times: otherwise, there is a pair ( s ′ , a ′ ) such that [ s ′ , a ′ ] ( [ s, a ] , with G s ′ = G s and G a ′ = G a , which would imply that ( s ′ , a ′ ) also is a latency pair, which contradicts ourprevious remark.As a consequence, latency pairs are componentwise ordered: if ( s, a ) and ( s ′ , a ′ ) are twodistinct latency pairs, then [ s ′ , a ′ ] [ s, a ] and [ s, a ] [ s ′ , a ′ ] . Therefore, either s < s ′ and a < a ′ , or s ′ < s and a ′ < a .In this section, we compute the latency list from u to w , defined as the (finite)componentwise ordered list of all latency pairs ( s, a ) such that s and a are event times. Forinstance, in the case of Figure 1, the latency list from a to e is (2 , , (9 , , (16 , , (24 , ,9nd the latency list from b to d is (5 , , (12 , , (14 , , (19 , , (27 , , (28 , .Our algorithm considers all event times in increasing order. It maintains the latencylists from a given node to all others before the current event time. It then updates theselatency lists for the current time by computing the connected components of the graph atthis time. For each of these components, it considers the latest starting time from which anode in this component can be reached, which is given by the previously computed latencylists. This time is the beginning of latency pairs for its nodes, that ends at current time,and so the algorithms updates the lists accordingly. Algorithm 2:
Computation of all latency lists from a given node. Function
Latency-lists :Input: a link stream L = ( T, V, E ) and u ∈ V Output: for each w ∈ V , the latency list from u to w create LL ← empty dictionary and LL [ w ] ← empty list for all w for t in T do append ( t, t ) to LL [ u ] for each connected component C of G t do s ← None and X ← ∅ for w ∈ C with non-empty LL [ w ] do ( s ′ , a ′ ) ← last element of LL [ w ] if s = None or s ′ > s then s ← s ′ and X ← { w } else if s ′ = s then add w to X if X is non-empty then for w ∈ C \ X do append ( s, t ) to LL [ w ] return LL Theorem 4.
Given a link stream L = ( T, V, E ) and a node u ∈ V , Algorithm 2 computesthe ordered latency lists from u to any node w ∈ V .Proof. We claim that, at the end of each iteration of the main loop, for all w in V , LL [ w ] is the list of all latency pairs ( s, a ) from u to w such that s and a are event times with a ≤ t .Assume this is true for all iterations before a given event time t . When it reaches thisevent time, the loop starts by adding ( t, t ) to LL [ u ] , which makes the claim true for w = u .Consider any connected component C of G t ; the nodes w ∈ C , with non-empty LL [ w ] arethe nodes reachable from u with an arrival time before t or at t . Then, the value of s ′ computed by the loop at Line 7 is the latest starting time such that one of these nodes isreachable from ( s ′ , u ) before t or at t , and X is the set of these reachable nodes.Therefore, if X is non-empty, there exists a path from ( s ′ , u ) to ( t, w ) for any w ∈ C \ X :for any x ∈ X , the path from ( s ′ , u ) to ( t, x ) and then from ( t, x ) to ( t, w ) (which exists since10 and w are in the same connected component C of G t ) is such a path. As a consequence, ( s ′ , t ) is a latency pair for any w ∈ C \ X . Notice that ( s ′ , t ) is not a latency pair for anynode x ∈ X , x = u , since they all have a latency pair ( s ′ , t x ) with t x < t .Finally, if the claim is true for all event times lower than t , it is true for t too. It is truefor the first iteration, i.e. when t is the first event time: it sets LL [ w ] to { ( t, t ) } for all node w in the same connected component of G t as u , which is the correct value. Therefore, forall w in V , the returned value of LL [ w ] is the list of latency pairs ( s, a ) from u to w suchthat s and a are event times, and it is ordered by construction. In all this section, we consider a link stream L = ( T, V, E ) and two nodes u and w in V .In addition, we consider a temporal node ( t, v ) in T × V .For any i and j in T , we denote by C ijtv ( u, w ) the fraction σ (( i,u ) , ( j,w ) , ( t,v )) σ (( i,u ) , ( j,w )) of shortestfastest paths from ( i, u ) to ( j, w ) that involve ( t, v ) , and we call it the contribution of ( i, j ) . If there is no path from ( i, u ) to ( j, w ) , we consider that C ijtv ( u, w ) = 0 . By extension,we call R i,j ∈ T C ijtv ( u, w ) d i d j the contribution of ( u, w ) to the betweenness of ( t, v ) , andwe denote it by C tv ( u, w ) . The goal of this section is to compute C tv ( u, w ) .First notice that the contribution of ( i, j ) is derived from volumes of paths as follows.Given x , y and z in T × V , we denote by SFP ( x, y ) the set of all shortest fastest paths from x to y , and by SFP ( x, y, z ) the set of these paths that involve z . Then, we define σ ( x, y ) and σ ( x, y, z ) as the volumes of SFP ( x, y ) and SFP ( x, y, z ) , respectively. It follows that C ijtv ( u, w ) is equal to σ (( i, u ) , ( j, w ) , ( t, v )) (cid:27) σ (( i, u ) , ( j, w )) if there is a path from ( i, u ) to ( j, w ) . Otherwise, C ijtv ( u, w ) is .This gives a rigorous ground to the definition of C ijtv ( u, w ) , which, as discussed at the endof Section 2, was loosely defined as the fraction σ (( i,u ) , ( j,w ) , ( t,v )) σ (( i,u ) , ( j,w )) of shortest fastest paths from ( i, u ) to ( j, w ) that involve ( t, v ) ; it is indeed equal to the ratio between the two volumes σ (( i, u ) , ( j, w ) , ( t, v )) and σ (( i, u ) , ( j, w )) now defined, with volume ratio operation fromDefinition 4: C ijtv ( u, w ) = σ (( i,u ) , ( j,w ) , ( t,v )) σ (( i,u ) , ( j,w )) .Consider for instance the case of Figure 1 with u = a and w = e , and let us consider i = 0 and j = 18 . Then, the shortest fastest paths from ( i, u ) = (0 , a ) to ( j, w ) = (18 , e ) are the elements of the set SFP ((0 , a ) , (18 , e )) = X ⊔ Y where X and Y are the slidingsets a, { } , b, [3 , , c, [6 , , d, { } , e and a, { } , c, { } , b, [12 , , d, { } , e , respectively. If ( t, v ) = (7 . , c ) or ( t, v ) = (10 , b ) , for instance, then none of these paths involve ( t, v ) and sowe obtain a contribution. If ( t, v ) = (4 . , c ) or ( t, v ) = (8 , d ) , for instance, then all pathsin X involve ( t, v ) and no path in Y does, leading to C ijtv ( u, w ) = σ ((0 , a ) , (18 , e ) , ( t, v )) (cid:27) σ ((0 , a ) , (18 , e )) = | X | (cid:27) ( | X | ⊞ | Y | ) = (2 , (cid:27) ((2 , ⊞ (2 , , (cid:27) (2 ,
2) = 1 . If ( t, v ) = (10 , c ) or ( t, v ) = (14 , d ) , then C ijtv ( u, w ) = | Y | (cid:27) ( | X | ⊞ | Y | ) = 0 .Before presenting the algorithm computing these path volumes and associated contri-butions, we characterize more precisely which pairs ( i, j ) have non-zero contribution.11 emma 5. There is at most one latency pair from u to w with non-zero contribution.Proof. Consider two distinct latency pairs ( s, a ) and ( s ′ , a ′ ) ; we can assume s < s ′ and a < a ′ , since, as explained in previous section, [ s ′ , a ′ ] ⊆ [ s, a ] is impossible. Suppose bothlatency pairs have non-zero contribution: there are shortest fastest paths from ( s, u ) to ( a, w ) that involve ( t, v ) and from ( s ′ , u ) to ( a ′ , w ) that also involve ( t, v ) . Therefore, thereis a path from ( s ′ , u ) to ( t, v ) and a path from ( t, v ) to ( a, w ) , and so a path from ( s ′ , u ) to ( a, w ) . It has duration a − s ′ which is strictly lower than both a − s and a ′ − s ′ , thuscontradicting both that ( s, a ) and ( s ′ , a ′ ) are latency pairs.If all latency pairs from u to w have contribution , then the contribution of ( u, w ) itselfis . Otherwise, let us denote by ( s, a ) the unique latency pair with non-zero contribution.We now introduce two specific times, S and A , that we will use to find all timeinstants with non-zero contribution. We define ] S, A [ as the largest interval containing ] s, a [ such that: for all other latency pair ( s ′ , a ′ ) in this interval, either a ′ − s ′ > a − s , or a ′ − s ′ = a − s and d (( s ′ , u ) , ( a ′ , w )) ≥ d (( s, u ) , ( a, w )) ; and the number of instantaneouspaths from ( S, u ) to ( A, w ) of length d (( s, u ) , ( a, w )) is finite. We illustrate this definitionin Figure 3. time AS t,vwu s a < a−s> a−s< a−s> a−s= a−s= a−s= a−s= a−s< a−s i j
Figure 3: An abstract example of link stream L = ( T, V, E ) in which we consider aspecific ( t, v ) in T × V (in red), two nodes u and w in V (in black, horizontal lines), as wellas the latency pair ( s, a ) containing t such that shortest (necessarily fastest) paths from ( s, u ) to ( a, w ) have length d and some of them involve ( t, v ) . We display all latency pairsfrom u to w with two green vertical lines topped by a dotted horizontal line indicating thecorresponding latencies ( = a − s , < a − s or > a − s ). In addition, we also indicate thelength ( = d , < d or > d ) of corresponding shortest paths within each latency pair, whenthis is useful (in grey). We indicate in blue the two specific times S and A defined above,as well as the time periods for i and j such that the contribution of ( i, j ) may be non-zero(Lemma 6).We then have the following result. Lemma 6.
All pairs ( i, j ) in T × T that have non-zero contribution are in [ S, s ] × [ a, A ] . roof. If a given pair ( i, j ) has non-zero contribution, then there is a latency pair ( s ′ , a ′ ) with s ′ ≥ i and a ′ ≤ j that has non-zero contribution. Remind that ( s, a ) is itself such alatency pair. From Lemma 5, we then have ( s ′ , a ′ ) = ( s, a ) , and so i ≤ s and j ≥ a .If i < S and j ≥ a , or if i ≤ s and j > A , then by definition of S and A we are in oneof the following situations.There exists a latency pair ( s ′ , a ′ ) in [ i, j ] such that: either a ′ − s ′ < a − s , or a ′ − s ′ = a − s and d (( s ′ , u ) , ( a ′ , w )) < d (( s, u ) , ( a, w )) . Then, shortest fastest paths from ( s, u ) to ( a, w ) are not shortest fastest paths from ( i, u ) to ( j, w ) . All shortest fastest paths from ( i, u ) to ( j, w ) are from ( s ′ , u ) to ( a ′ , w ) where ( s ′ , a ′ ) is a latency pair as described above. Supposesuch a shortest fastest path involves ( t, v ) . Then there are paths from ( s ′ , u ) to ( t, v ) andfrom ( t, v ) to ( a ′ , w ) . As a consequence, s ′ [ s, t ] , otherwise ( s, a ) would not be a latencypair. Likewise, a ′ [ t, a ] . Therefore, s ′ < s and a ′ > a , but this contradicts the fact that a ′ − s ′ ≤ s − a . This means that shortest fastest paths from ( s ′ , u ) to ( a ′ , w ) cannot involve ( t, v ) , and so the contribution of ( i, j ) is .Or there is an infinite number of instantaneous paths from ( i, u ) to ( j, w ) with length d (( s, u ) , ( a, w )) . Only the σ t ( u, w ) ones starting and arriving at time t involve ( t, v ) . Thereis a finite number of such paths, as they are paths in the graph G t . Therefore, the contri-bution of ( i, j ) is zero.In conclusion, i ≤ s , j ≥ a , i cannot be smaller than S , and j cannot be larger than A ,which proves the claim.This lemma says that all pairs ( i, j ) with non-zero contribution are in [ S, s ] × [ a, A ] .Notice however that some pairs ( i, j ) in [ S, s ] × [ a, A ] may have a contribution equal to0. This happens whenever the volume of shortest fastest paths from ( s, u ) to ( a, w ) has alower dimension than the one from ( i, u ) to ( j, w ) .We now define specific latency pairs that play a special role, as any shortest fastestpath from ( i, u ) to ( j, w ) must start and arrive within one of these pairs. To do this, weintroduce an ordered list LP of latency pairs centered on ( s, a ) , which means thatlatency pairs preceding ( s, a ) have negative indexes in the list and the others have positiveindexes. It is the list LP = ( s − l , a − l ) , ( s − l +1 , a − l +1 )) , . . . , ( s = s, a = a ) , . . . , ( s r , a r ) suchthat, for all k , [ s k , a k ] ⊆ [ S, A ] , a k − s k = a − s , and d (( s k , u ) , ( a k , w )) = d (( s, u ) , ( a, w )) .We also define s − l − = S and a r +1 = A . Notice that s − l − = s l or a r = a r +1 are notforbidden; this happens for instance when s − l = α or a r = ω . We show now that thelatency pairs in LP give precisely the shortest fastest paths from u to w . Lemma 7.
For any pair ( i, j ) in ] S, s ] × [ a, A [ , the set SFP (( i, u ) , ( j, w )) is the disjointunion of all sets SFP (( s k , u ) , ( a k , w )) such that ( s k , a k ) in LP and [ s k , a k ] ⊆ [ i, j ] .Proof. We first show that for any k such that [ s k , a k ] ⊆ [ i, j ] , SFP (( s k , u ) , ( a k , w )) ⊆ SFP (( i, u ) , ( j, w )) . Let us consider a path in SFP (( s k , u ) , ( a k , w )) . Since [ s k , a k ] ⊆ [ i, j ] , itis a path from ( i, u ) to ( j, w ) . It has duration s k − a k = s − a because ( s k , a k ) is in LP .Moreover, since i > S and j < A , there exists no latency pair ( s ′ , a ′ ) such that [ s ′ , a ′ ] ⊆ [ i, j ] and a ′ − s ′ < a − s . Therefore, it is a fastest path from ( i, u ) to ( j, w ) . Similarly, because13 s k , a k ) is in LP , this path has length d (( s, u ) , ( a, w )) and therefore it is a shortest fastestpath from ( i, u ) to ( j, w ) .Now consider any shortest fastest path from ( i, u ) to ( j, w ) , and let us denote by s ′ and a ′ its starting and arrival times. Since it is a fastest path, ( s ′ , a ′ ) is a latency pair, andobviously [ s ′ , a ′ ] ⊆ [ i, j ] . In addition, s ′ − a ′ = s − a : if it was larger then the paths from ( s ′ , u ) to ( a ′ , w ) would not be fastest paths from ( i, u ) to ( j, w ) ; and if it was smaller, thenthe paths from ( s, u ) to ( a, w ) would not be fastest paths from ( i, u ) to ( j, w ) . Similarly, d (( s ′ , u ) , ( a ′ , w )) = d (( s, u ) , ( a, w )) : if it was larger then the paths from ( s ′ , u ) to ( a ′ , w ) would not be shortest paths from ( i, u ) to ( j, w ) ; and if it was smaller, then the pathsfrom ( s, u ) to ( a, w ) would not be shortest paths from ( i, u ) to ( j, w ) . Therefore, ( s ′ , a ′ ) is in LP , leading to the fact that SFP (( i, u ) , ( j, w )) is included in the union of all sets SFP (( s k , u ) , ( a k , w )) such that ( s k , a k ) in LP and [ s k , a k ] ⊆ [ i, j ] .Finally, notice that the sets SFP (( s k , u ) , ( a k , w )) are disjoint for different values of k ,since all the paths they contain start at s k and arrive at a k . We therefore obtain theclaim. Algorithm 3:
Compute the list of starting times for latency pairs in LP , as wellas associated volumes of sets of shortest fastest paths. Function
PrevList :Input: a link stream L = ( T, V, E ) , u ∈ V , w ∈ V , ( s, a ) a latency pair from u to w , and the ordered latency list LL from u to w Output: the list (( s − , f − ) , ( s − , f − ) , . . . , ( s − l − = S, f − l − ) with s k definedby LP and with f k = σ (( s k +1 , u ) , ( a, w )) ⊟ σ (( s, u ) , ( a, w )) init Result to empty list and vol to (0 , if s = a and d − s ( u, w ) = d (( s, u ) , ( a, w )) then return Result foreach ( s ′ , a ′ ) with s ′ < s in LL backwards do if a ′ − s ′ < a − s then append ( s ′ , vol ) to Result and return
Result if a ′ − s ′ = a − s then if d (( s ′ , u ) , ( a ′ , w )) < d (( s, u ) , ( a, w )) then append ( s ′ , vol ) to Result and return
Result if d (( s ′ , u ) , ( a ′ , w )) = d (( s, u ) , ( a, w )) then append ( s ′ , vol ) to Result if s ′ = a ′ and d − s ′ ( u, w ) = d (( s, u ) , ( a, w )) then return Result vol ← vol ⊞ VSP ( L, ( s ′ , u ) , ( a ′ , w )) append ( α, vol ) to Result and return
Result
Lemma 8.
The contribution of ( u, w ) to the betweenness of ( t, v ) , i.e. the fraction of short-est fastest paths from u to w that involve ( t, v ) , namely C tv ( u, w ) = R i,j ∈ T σ (( i,u ) , ( j,w ) , ( t,v )) σ (( i,u ) , ( j,w )) d i d j ,can be written as a discrete sum: C tv ( u, w ) = − X k = − l − r X k ′ =0 ( s k +1 − s k )( a k ′ +1 − a k ′ ) σ (( s, u ) , ( a, w ) , ( t, v )) ⊞ k ′ h = k +1 σ (( s h , u ) , ( a h , w )) . (1) Proof.
According to Lemma 6, the contribution of time instants ( i, j ) is equal to zerowhenever ( i, j ) [ S, s ] × [ A, a ] . For ( i, j ) ∈ [ S, s ] × [ A, a ] , all shortest fastest paths from ( i, u ) to ( j, w ) involving ( t, v ) start at time s and arrive at time a and the contribution of ( i, j ) is therefore equal to σ (( s, u ) , ( a, w ) , ( t, v )) (cid:27) σ (( i, u ) , ( j, w )) . Therefore, C tv ( u, w ) = R [ S,s ] × [ a,A ] σ (( s,u ) , ( a,w ) , ( t,v )) σ (( i,u ) , ( j,w )) d i d j = R ] S,s ] × [ a,A [ σ (( s,u ) , ( a,w ) , ( t,v )) σ (( i,u ) , ( j,w )) d i d j .According to Lemma 7, for any k < , any k ′ ≥ , any i ∈ ] s k , s k +1 ] , and any j ∈ [ a k ′ , a k ′ +1 [ , the value of σ (( i, u ) , ( j, w )) is constant and it is equal to ⊞ k ′ h = k +1 σ (( s h , u ) , ( a h , w )) .Therefore, Z s k +1 s k Z a k ′ +1 a k ′ σ (( i, u ) , ( j, w ) , ( t, v )) σ (( i, u ) , ( j, w )) d i d j = Z s k +1 s k Z a k ′ +1 a k ′ σ (( s, u ) , ( a, w ) , ( t, v )) σ (( i, u ) , ( j, w )) d i d j = Z s k +1 s k Z a k ′ +1 a k ′ σ (( s, u ) , ( a, w ) , ( t, v )) ⊞ k ′ h = k +1 σ (( s h , u ) , ( a h , w )) d i d j = ( s k +1 − s k )( a k ′ +1 − a k ′ ) σ (( s, u ) , ( a, w ) , ( t, v )) ⊞ k ′ h = k +1 σ (( s h , u ) , ( a h , w )) and we obtain the claim.In order to compute the sum of Lemma 8, we need to iterate over all s k , − l − ≤ k ≤ − and all a k ′ , ≤ k ′ ≤ r . For this purpose, we first give an algorithm computing the valuesof s k . The algorithm also associates to each s k a volume of shortest fastest paths that willbe useful for computing the denominator of the fraction in the sum. Lemma 9.
Algorithm 3 computes the list ( s − , f − ) , ( s − , f − ) , . . . , ( s − l − = S, f − l − ) with s k defined by LP and with f k = σ (( s k +1 , u ) , ( a − , w )) .Proof. The algorithm builds and returns the (initially empty) list
Result . The algorithmterminates when S is found. Indeed, a return is triggered in three different cases. If theempty list is returned at Line 4, this means that there exists an ǫ > such that for all t ∈ [ s − ǫ, s ] , there is an instantaneous path of length d (( s, u ) , ( a, w )) from ( t, u ) to ( t, w ) ,which implies that S = s . If the return happens after the last value is added to Result during the for loop, then either s ′ < s , and s ′ is the largest value such that: a ′ − s ′ < a − s (Line 7); or a ′ − s ′ = a − s and d (( s ′ , u ) , ( a ′ , w )) < d (( s, u ) , ( a, w )) (Line 10); or there15xists ǫ > such that for any t ∈ [ s ′ − ǫ, s ′ ] there is an instantaneous path of length d (( s, u ) , ( a, w )) from ( t, u ) to ( t, w ) (Line 14). This corresponds exactly to the definitionof S . Finally if the function returns at Line 16, then this means that S = α because noneof the above conditions is true for any s ′ > α .The elements ( s ′ , vol ) added to Result correspond to all latency pairs ( s ′ , a ′ ) such that s ′ ∈ [ S, s [ , a ′ − s ′ = a − s , and d (( s ′ , u ) , ( a ′ , w )) = d (( s, u ) , ( a, w )) . These are therefore the ( s i , a i ) in LP with i < .Let us now show that vol contains the desired value when ( s ′ , vol ) is added to Result .If the empty list is returned at Line 4, then this is true. Otherwise, vol is initialized to (0 , and this value is not changed before the first time the pair ( s ′ , vol ) is appended to Result . Therefore the first pair appended to
Result is ( s − , (0 , , which is correct since f − = σ (( s , u ) , ( a − , w )) = (0 , (since there are no paths from ( s , u ) to ( a − , w ) ).Assume now that the correct value ( s i , f i ) has been added to Result at one loop iteration,and that s i > S (otherwise, as shown above, a return is triggered just after the appendand the function returns). We then have s ′ = s i and the value σ (( s ′ , u ) , ( a ′ , w )) is thenadded to vol . vol is therefore now equal to σ (( s ′ , u ) , ( a ′ , w )) ⊞ f i = σ (( s i , u ) , ( a i , w )) ⊞ σ (( s i +1 , u ) , ( a − , w )) = f i − . Moreover, the loop will skip latency pairs not in LP and thenext value of s ′ that will be considered is s i − . Therefore the next value that is added to Result is ( s i − , f i − ) and finally all the correct values are added to Result , which completesthe proof.We also introduce the function
NextList by replacing in
PrevList of Algorithm 3: d − s by d + a in Line 3; s by a in Line 4; s ′ < s in LL backwards by a ′ > a in LL forwards in Line 5;all ( s ′ , vol ) appended to Result by ( a ′ , vol ) ; d − s ′ by d + a ′ in Line 13; and ( α, vol ) by ( ω, vol ) inthe last line.The obtained function computes the list ( a , g ) , ( a , g ) , . . . , ( a r +1 = A, g r +1 ) with a k defined by LP and with g k = σ (( s , u ) , ( a k − , w )) .We finally reach the objective of this section. Theorem 10.
Given a link stream L = ( T, V, E ) , a temporal node ( t, v ) in T × V , and twonodes u and w in V , Algorithm 4 computes the contribution of u and w to the betweennessof ( t, v ) , i.e. C tv ( u, w ) = R i ∈ T,j ∈ T σ (( i, u ) , ( j, w ) , ( t, v )) σ (( i, u ) , ( j, w )) d i d j. Proof.
If there exists a latency pair ( s, a ) with non-zero contribution, then, for any i ≤ s and any j ≥ a , we have: σ (( i, u ) , ( j, w ) , ( t, v )) = σ (( s, u ) , ( a, w ) , ( t, v )) . The for loop ofLine 3 computes σ (( s, u ) , ( a, w ) , ( t, w )) and stores it in vol _ tv . Indeed, since ( t, v ) is in-volved in a shortest fastest path from ( s, u ) to ( a, w ) , necessarily ( s, a ) is such that t ∈ [ s, a ] , ( s, u ) −→ ( t, v ) , ( t, v ) −→ ( a, w ) , and d (( s, u ) , ( a, w )) = d (( s, u ) , ( t, v )) + d (( t, v ) , ( a, w )) .Since there is at most one such pair satisfying the first three conditions, the algorithmbreaks out of the for loop if one is found. If no such latency pair is found, vol _ tv is equalto (0 , at the end of the loop and the Algorithm returns . Notice that, in the specialcase where t is not an event time and ( t, u ) −→ ( t, w ) , then the arguments above do not16 lgorithm 4: Contribution of two given nodes to the betweenness of a giventemporal node Function
Contribution :Input: a link stream L = ( T, V, E ) , u ∈ V , w ∈ V , ( t, v ) ∈ T × V , and thelatency list LL from u to w Output: the contribution R T × T σ (( i, u ) , ( j, w ) , ( t, v )) (cid:27) σ (( i, u ) , ( j, w )) d i d j vol _ tv ← (0 , for ( x, y ) in LL do if t ∈ [ x, y ] and ( x, u ) −→ ( t, v ) and ( t, v ) −→ ( y, w ) then if d (( x, u ) , ( y, w )) = d (( x, u ) , ( t, v )) + d (( t, v ) , ( y, w )) then vol _ tv ← V SP ( L, ( x, u ) , ( t, v )) (cid:26) V SP ( L, ( t, v ) , ( y, w )) set ( s, a ) to ( x, y ) and exit the loop if vol_ tv = (0 , then return middle ← VSP ( L, ( s, u ) , ( a, w )) Prev ← PrevList ( L, u, w, s, a, LL ) Next ← NextList ( L, u, w, s, a, LL ) contrib ← s ′ ← s for ( s _ lef t, lef t ) in Prev do a ′ ← a for ( a _ right, right ) in Next do contrib ← contrib + ( s ′ − s _ lef t )( a _ right − a ′ ) · vol _ tv (cid:27) ( lef t ⊞ right ⊞ middle ) a ′ ← a _ right s ′ ← s _ lef t return contrib ( t, t ) is a latency pair that does notbelong to the latency list, and the contribution of ( u, w ) is , which is the returned value.Remember that LP = ( s − l , a − l ) , ( s − l +1 , a − l +1 )) , . . . , ( s = s, a = a ) , . . . , ( s r , a r ) andthat s − l − = S and a r +1 = A . PrevList (Algorithm 3) then computes, according toLemma 9, the list
Prev = (( s − , f − ) , ( s − , f − ) , . . . , ( s − l − = S, f − l − ) with s k definedby LP and with f k = σ (( s k +1 , u ) , ( a − , w )) ; its dual algorithm NextList computes thelist
Next = ( a , g ) , ( a , g ) , . . . , ( a r +1 = A, g r +1 ) with a k defined by LP and with g k = σ (( s , u ) , ( a k − , w )) .According to Lemma 8, C tv ( u, w ) = − X k = − l − r X k ′ =0 ( s k +1 − s k ) · ( a k ′ +1 − a k ′ ) · σ (( s, u ) , ( a, w ) , ( t, v )) ⊞ k ′ h = k +1 σ (( s h , u ) , ( a h , w )) . Lines 15 to 20 of Algorithm 4 compute this sum. First notice that s ′ is initialized to s = s and s _ lef t loops over values in Prev , starting with s − . At the end of each iteration s ′ is set to s _ lef t and therefore s ′ and s _ lef t loop over all consecutive values s k +1 , s k for s k in Prev . The value of lef t is f k = σ (( s k +1 , u ) , ( a − , w )) = ⊞ − h = k +1 σ (( s h , u ) , ( a h , w )) , asexplained in the characterization of Prev above.Similarly, in the inner for loop a ′ and a _ right loop over all values a k ′ , a k ′ +1 for a k ′ +1 in Next , and right = g k ′ +1 = σ (( s , u ) , ( a k ′ , w )) = ⊞ k ′ h =1 σ (( s h , u ) , ( a h , w )) .The value of middle has been set to σ (( s, u ) , ( a, w )) (Line 10). Therefore lef t ⊞ right ⊞ middle = ⊞ k ′ h = k +1 σ (( s h , u ) , ( a h , w )) . Finally, one iteration of the inner loop adds the value ( s k +1 − s k )( a k ′ +1 − a k ′ ) σ (( s, u ) , ( a, w )) (cid:27) ⊞ k ′ h = k +1 σ (( s h , u ) , ( a h , w )) , which is exactly one termof the sum in Equation 1, and the loop itself ensures we obtain the whole sum. We now have all needed building blocks for computing the betweenness of any given tem-poral node: we just have to sum the contribution of each node pair, see Algorithm 5.
Algorithm 5:
Betweenness of a temporal node Function
Betweenness :Input: a link stream L = ( T, V, E ) and ( t, v ) ∈ T × V Output: the betweenness of ( t, v ) B ← for u ∈ V do LL ← Latency-lists ( u ) for w ∈ V do B ← B + Contribution ( L, u, w, ( t, v ) , LL [ w ]) return B L = ( T, V, E ) , are the number of nodes n = | V | and the number of link segments m , i.e. the number of maximal intervals in E . Notice that the number of event times |T | is atmost · m , and so it is in O ( m ) . Likewise, the number of links m t = |{ uv, ( t, uv ) ∈ E }| attime t , for any t , as well as the number of links m = |{ uv, ∃ t, ( t, uv ) ∈ E }| in the inducedgraph, also are in O ( m ) . Then, the complexity of all algorithms presented in this paper isclearly polynomial in n and m , which makes Algorithm 5 polynomial itself.We display in Figure 4 the results obtained in the case of Figure 1, where we computedthe betweenness of more than temporal nodes in a few seconds. We provide theimplementation at [16]. abcde Figure 4: Results of betweenness computations on the example of Figure 1. We computedthe betweenness of ( t, v ) for all v and t equal to α + i · ω − α , for i = 0 .. . The obtainedvalue is displayed at ( t, v ) as a black rectangle of width ω − α and height propotional to thebetweenness of ( t, v ) . Dotted lines represent betweenness values equal to . Betweenness computations are first related to path computations . Temporal paths al-ready received much attention, in particular optimal path computations according to sev-eral criteria (like length, duration, and/or arrival time), see for instance [5, 33, 34]. How-ever, most of these works are limited to discrete time and instantaneous links; only fewconsider continuous time and links with duration [31, 35, 24]. Then, the focus is on findingoptimal paths or computing distances and latencies [24], not counting them as we do here.The authors of [1] notice that the number of foremost paths (temporal paths with minimalarrival time) may be exponential. The problem that we consider here is quite differentbecause we handle continuous time and links with duration. This leads to the concept offinite volumes of uncountable path sets, that never appeared in previous literature, up toour knowledge.The graph betweenness itself also has been studied in temporal settings. A first line ofstudy focuses on updating betweenness values upon link arrival or departure, see forinstance [3, 11]. This is quite different from our work: the considered paths are classical(static) graph paths, and the considered betweenness is the classical one, at each timeinstant.Several works consider temporal betweenness extensions that rely on various kindsof optimal (fastest, shortests, foremost, etc) paths. Most have a node-centric view: they19efine a value for each node, not for each temporal node, see for instance [13, 27, 21, 32, 15].Others define a value for each temporal node, like in our case. For instance, [26] proposes coverage centrality of ( t, v ) , defined as the fraction of pairs of (non-temporal) nodes forwhich there exists a fastest path involving ( t, v ) . Buß et al. [6] consider instantaneouslinks and define betweenness centralities for various types of optimal temporal paths. Theauthors of [27, 12] define a betweenness value for each temporal node, based on foremostpaths or other optimal paths. The algorithm in [12] starts by identifying time instants forwhich foremost path trees are stable, which is related to our latency pairs. In [28], theauthors combine the length and duration of paths using a tunable parameter, and focuson instantaneous links.All these works assume discrete time, which implies finite sets of shortest paths. Instead,we consider continuous time, leading to uncountable sets of paths, with finite volume. Inaddition, these works keep a partly node-centric point of view by considering paths betweennodes; we push the integration of temporal aspects further by considering paths betweentemporal nodes. This makes an important difference, since the node-centric view misseslocally-optimal paths: they only count paths with a given duration or length between pairsof nodes (for any starting and arrival times), whereas our approach combines a variety oflocally shortest fastest paths, with different durations and lengths. This raises differentalgorithmic challenges, like the computation of latency lists and the selection of appropriatecontributing latency pairs. Closer to our work , [1] and [22] consider optimal paths within time slices, thusobtaining a betweenness value for each node for each time slice. Again, they only considerdiscrete time, and only a limited number of source and target temporal nodes.Finally, the generalized betweenness that we consider in this paper, by dealing with con-tinuous time, links with or without duration, as well as paths between all pairs of temporalnodes, raises original algorithmic questions that are not present in previous literature.
We presented the first algorithms to compute betweenness centrality of temporal nodesin link streams. To obtain these algorithms, we identified and addressed several originalchallenges, like the definition and computation of volumes of infinite sets of paths, thecomputation of all latency pairs from any node to all others, or the transformation ofcontinuous-time integrals into discrete sums over finite numbers of time intervals. Each ofthese building blocks has its own interest, in particular the computation of shortest pathvolumes from a given temporal node. The complexity of obtained algorithms is polynomialin time and space, and we provide an implementation in python [16].Our algorithm leaves room for complexity improvement. In particular, it seems promis-ing to explore extensions to link streams of approaches like Brandes’ for betweenness ongraphs [4]. Another important direction is to design algorithms to compute the betweenessof all temporal nodes rather than just one: iterating our algorithm over many temporalnodes leads to much redundancy. However, keep in mind that there is an infinite number20f temporal nodes; one may then try to infer the betweenness of any of them from thebetweenness of a finite number of them, for instance each node at each event time. Thisseems non-trivial, though, and an open question.Going further, one may try to design approximate algorithms. Indeed, the best knowntime complexity of betweenness computations in graphs is O ( nm ) [4] and it cannot be lowerin link streams, since graphs are special cases [18]. This is prohibitive in many practicalcases, leading to much work on approximate computations, that typically compute shortestpaths from some nodes only [23, 29]. Such approaches are very relevant in link streams too,where the contribution of only a few node pairs may give reasonably accurate approximates,at a much lower cost than exact computations. This remains to explore, though.An even more challenging direction is to embrace the streaming nature of link streams,and design on-line and/or streaming algorithms for betweenness. Such algorithms do notstore the data in memory; they compute results on-the-fly and output them as soon asthey are available. They would be of high theoretical and practical interest, but they raisemany challenges.Another interesting family of perspectives consists in extending or restricting the con-sidered input. In particular, one may consider stream graphs instead of link streams: instream graphs, nodes are not always present, leading to more subtle path, distance, and la-tency concepts [18]. We considered here streams with link (and node) presence times equalto unions of disjoint closed intervals (including singletons); another extension would be toconsider more general cases, like for instance unions of disjoint closed or open intervals.Also, weighted and/or directed stream graphs and link streams [17] lead to more complexconcepts of shortest fastest paths, and our definitions of volumes may be extended to thesecases. Conversely, one may consider more specific situations, like discrete time streams, orlink stream with instantaneous links only. Such cases often appear in practice, and it maybe possible to design more efficient algorithms for them.Extending our algorithms to variants of the betweenness concept itself also is an inter-esting perspective. One may for instance consider betweenness of links rather nodes, orconsider paths of other kinds than shortest fastest ones, e.g. foremost ones [18]Finally, this paper opens the perspective of practical uses of betweenness in link streams,since until now only the definition was available. It is now possible to explore how between-ness is distributed in (small scale) real-world cases, and gain insight from this. It may alsobe used to extend important graph algorithms to link streams, like the computation of com-munities by iteratively removing temporal nodes of highest betweenness, in a way similarto [10] that iteratively removes links of highest betweenness. Acknowledgements.
This work is funded in part by the ANR (French National Agencyof Research) under the Limass project (ANR-19-CE23-0010) and the FiT LabCom grant.
References [1] Amir Afrasiabi Rad, Paola Flocchini, and Joanne Gaudet. Computation and analysisof temporal betweenness in a knowledge mobilization network.
Computational Social etworks , 4(1), dec 2017.[2] Vladimir Batagelj and Selena Praprotnik. An algebraic approach to temporal networkanalysis based on temporal quantities. Social Netw. Analys. Mining , 6(1):28:1–28:22,2016.[3] Elisabetta Bergamini, Henning Meyerhenke, Mark Ortmann, and Arie Slobbe. Fasterbetweenness centrality updates in evolving networks. In
Proceedings of the Symposiumon Experimental Algorithms (SEA) , 2017.[4] Ulrik Brandes. A Faster Algorithm for Betweenness Centrality. In
Journal of Mathe-matical Sociology , volume 25, pages 163–177, 2001.[5] B.-M. Bui-Xuan, A. Ferreira, and A. Jarry. Computing shortest, fastest, and foremostjourneys in dynamic networks.
International Journal of Foundations of ComputerScience , 14(2):267–285, nov 2003.[6] Sebastian Buß, Hendrik Molter, Rolf Niedermeier, and Maciej Rymar. Algorithmicaspects of temporal betweenness. In
Proceedings of the 26th SIGKDD Conference onKnowledge Discovery and Data Mining (KDD) , 2020.[7] Carter T. Butts. A relational event framework for social action.
Sociological Method-ology , 38(1):155–200, 2008.[8] Arnaud Casteigts, Paola Flocchini, Walter Quattrociocchi, and Nicola Santoro. Time-varying graphs and dynamic networks.
IJPEDS , 27(5):387–408, 2012.[9] Linton C. Freeman. A set of measures of centrality based on betweenness.
Sociometry ,40(1):35–41, 1977.[10] M. Girvan and M. E. J. Newman. Community structure in social and biologicalnetworks.
Proceedings of the National Academy of Sciences , 99(12):7821–7826, June2002.[11] O. Green, R. McColl, and D. A. Bader. A fast algorithm for streaming betweennesscentrality. In , pages 11–20, 2012.[12] Venkata M.V. Gunturi, Shashi Shekhar, Kenneth Joseph, and Kathleen M. Carley.Scalable computational techniques for centrality metrics on temporally detailed socialnetwork.
Machine Learning , 106(8):1133–1169, aug 2017.[13] Habiba, Chayant Tantipathananandh, and Tanya Berger-Wolf. Betweenness Central-ity Measure in Dynamic Networks. Technical report, 2011.[14] Petter Holme and Jari Saramäki. Temporal networks.
Physics Reports , 519(3):97 –125, 2012. Temporal Networks. 2215] Hyoungshick Kim and Ross Anderson. Temporal node centrality in complex networks.
Physical Review E - Statistical, Nonlinear, and Soft Matter Physics , 85(2):026107, feb2012.[16] Matthieu Latapy, Clémence Magnien, and Frédéric Simard.Code for computing betweenness centrality in link streams. .[17] Matthieu Latapy, Clémence Magnien, and Tiphaine Viard.
Weighted, Bipartite, orDirected Stream Graphs for the Modeling of Temporal Networks , pages 49–64. SpringerInternational Publishing, Cham, 2019.[18] Matthieu Latapy, Tiphaine Viard, and Clémence Magnien. Stream graphs and linkstreams for the modeling of interactions over time.
Soc. Netw. Anal. Min. , 8(1):61:1–61:29, 2018.[19] Vito Latora, Vincenzo Nicosia, and Giovanni Russo.
Complex Networks: Principles,Methods and Applications . Cambridge University Press, 2017.[20] Naoki Masuda and Renaud Lambiotte.
A Guide to Temporal Networks , volume 4 of
Series on Complexity Science . WORLD SCIENTIFIC (EUROPE), sep 2016.[21] Vincenzo Nicosia, John Tang, Cecilia Mascolo, Mirco Musolesi, Giovanni Russo, andVito Latora. Graph Metrics for Temporal Networks. In Petter Holme Saramäki andJari, editors,
Temporal Networks , pages 15–40. jun 2013.[22] Fabiola S.F. Pereira, Sandra de Amo, and Joao Gama. Evolving Centralities in Tem-poral Graphs: A Twitter Network Analysis. In
IEEE International Conference onMobile Data Management (MDM) , pages 43–48. Institute of Electrical and Electron-ics Engineers (IEEE), aug 2016.[23] Matteo Riondato and Evgenios M. Kornaropoulos. Fast approximation of betweennesscentrality through sampling. In
Proceedings of the 7th ACM International Conferenceon Web Search and Data Mining , WSDM’14, page 413¿422, New York, NY, USA,2014. Association for Computing Machinery.[24] Frédéric Simard. On computing distances and latencies in link streams. In FrancescaSpezzano, Wei Chen, and Xiaokui Xiao, editors,
ASONAM ’19: International Con-ference on Advances in Social Networks Analysis and Mining, Vancouver, BritishColumbia, Canada, 27-30 August, 2019 , pages 394–397. ACM, 2019.[25] Christoph Stadtfeld and Per Block. Interactions, actors, and time: Dynamic networkactor models for relational events.
Sociological Science , 2017.[26] Taro Takaguchi, Yosuke Yano, and Yuichi Yoshida. Coverage centralities for temporalnetworks | Request PDF.
Physics of Condensed Matter , 89(2), 2015.2327] John Tang, Mirco Musolesi, Cecilia Mascolo, Vito Latora, and Vincenzo Nicosia.Analysing information flows and key mediators through temporal centrality metrics.In
Proceedings of the 3rd Workshop on Social Network Systems, SNS’10 , pages 1–6,New York, New York, USA, 2010. ACM Press.[28] Ioanna Tsalouchidou, Ricardo Baeza-Yates, Francesco Bonchi, Kewen Liao, andTimos Sellis. Temporal betweenness centrality in dynamic graphs.
In-ternational Journal of Data Science and Analytics , 2019. To appear. https://doi.org/10.1007/s41060-019-00189-x .[29] Alexander van der Grinten and Henning Meyerhenke. Scaling betweenness approxima-tion to billions of edges by mpi-based adaptive sampling. In , pages 527–535. IEEE, 2020.[30] Stanley Wasserman and Katherine Faust.
Social network analysis: Methods and ap-plications , volume 8. Cambridge university press, 1994.[31] John Whitbeck, Marcelo Dias de Amorim, Vania Conan, and Jean-Loup Guillaume.Temporal reachability graphs. In
Proceedings of the 18th annual international con-ference on Mobile computing and networking - Mobicom ’12 , page 377. ACM Press,2012.[32] Matthew J. Williams and Mirco Musolesi. Spatio-temporal networks: reachability,centrality and robustness.
Royal Society Open Science , 3(6):160196, jun 2016.[33] Huanhuan Wu, James Cheng, Silu Huang, Yiping Ke, Yi Lu, and Yanyan Xu. Pathproblems in temporal graphs. In
Proceedings of the VLDB Endowment VLDB , pages721–732, 2014.[34] Huanhuan Wu, Yuzhen Huang, James Cheng, Jinfeng Li, and Yiping Ke. Reachabilityand time-based path queries in temporal graphs. In , pages 145–156. Institute of Electricaland Electronics Engineers Inc., jun 2016.[35] Ye Yuan, Xiang Lian, Guoren Wang, Yuliang Ma, and Yishu Wang. Constrainedshortest path query in a large time-dependent graph.
Proceedings of the VLDB En-dowment , 12(10):1058–1070, jun 2019.[36] Katharina A Zweig.