Stability and Continuity of Centrality Measures in Weighted Graphs
11 Stability and Continuity of Centrality Measures inWeighted Graphs
Santiago Segarra and Alejandro Ribeiro
Abstract —This paper presents a formal definition of stabilityfor node centrality measures in weighted graphs. It is shownthat the commonly used measures of degree, closeness andeigenvector centrality are stable whereas betweenness centralityis not. An alternative definition of the latter that preserves thesame centrality notion while satisfying the stability criteria isintroduced. Continuity is presented as a less stringent alternativeto stability. Betweenness centrality is shown to be not onlyunstable but discontinuous. Numerical experiments in syntheticrandom networks and real-world data show that, in practice,stability and continuity imply different levels of robustness inthe presence of noisy data. In particular, the stable betweennesscentrality is shown to exhibit resilience against noise that isabsent in the discontinuous and unstable standard betweennesscentrality, while preserving a similar notion of centrality.
I. I
NTRODUCTION
In any graph or network, the topology determines aninfluence structure among the nodes or agents. Peripheralnodes have limited impact on the dynamics of the networkwhereas central nodes have a major effect on the behavior ofthe whole graph. Identifying the most important nodes in anetwork helps in explaining the network’s dynamics, e.g. thedistribution of power in exchange networks [2] or migration inbiological networks [3], as well as in designing optimal waysto externally influence the network, e.g. attack vulnerability ofnetworks [4]. Node centrality measures are tools designed toidentify such important agents. However, node importance isa rather vague concept and can be interpreted in various ways,giving rise to multiple coexisting centrality measures, the mostcommon being degree [5], [6], closeness [7], [8], eigenvector[9], and betweenness [10] centrality. In degree centrality, theimportance or centrality of a node is measured by the numberof nodes it can immediately influence, i.e., its neighborhood. Incloseness centrality, importance is measured in terms of howfast information can travel from a given node to every othernode in the network. In eigenvector centrality, a refinementof degree centrality, the importance of a node is computedas a function of the importance of its neighbors. Finally, inbetweenness centrality, the centrality of a node is given by thefrequency of this node belonging to the shortest path betweenother two nodes in the network.The ability of a centrality measure to be robust to noisein the network data is of practical importance. In the pastdecade, stability has been used as a parameter to compare
Work supported by NSF CCF-1217963. The authors are with the De-partment of Electrical and Systems Engineering, University of Pennsyl-vania, 200 South 33rd Street, Philadelphia, PA 19104. Email: { ssegarra,aribeiro } @seas.upenn.edu. Part of the results in this paper appeared in [1]. the performance of different centrality measures [11]–[13]. Inthese papers, an empirical approach was followed by com-paring stability indicators measured in both random and real-world networks for different centrality measures. However, noformal theory was developed explaining the different behaviorsamong measures. Our first contribution is a formal definition ofstability and continuity of centrality measures. We also showthat all frequently used measures are stable and continuouswith the exception of betweenness centrality. In addition, wepropose an alternative definition of betweenness centralitywhich is stable. Finally, through numerical experiments insynthetic and real-word networks, we demonstrate that stabil-ity and continuity are different and important properties, andshow that the alternative definition of betweenness centralitybehaves better than the standard betweenness centrality whilepreserving a similar notion of centrality.Stability is formally defined in Section III. In order tobuild such definition, we need to rely on a metric on thespace of weighted graphs with a common node and edge set.In Sections III-A to III-D, we analyze the stability of themost frequently used centrality measures and in Section IVwe propose an alternative definition of betweenness centralitythat guarantees stability while maintaining the same conceptof node centrality. The concept of continuity as a milderrequirement for robustness is introduced in Section V. InSection VI, we illustrate how our formal definitions of stabilityand continuity are correlated with practical robustness indica-tors by analyzing the behavior of all the common centralitymeasures as well as the stable betweenness centrality proposedin random networks and two real-world networks: the networkof air traffic between airports in United States and the networkof interactions between sectors of the United States economy.II. P RELIMINARIES
In the present paper we consider weighted and directedgraphs or networks. Formally, we define a graph G =( V, E, W ) as a triplet formed by a finite set of n nodesor vertices V , a set of directed edges E ⊂ V × V where ( x, y ) ∈ E represents an edge from x ∈ V to y ∈ V , anda set of positive weights W : E → R ++ defined on eachedge. The weights can be associated to similarities betweennodes, i.e. the higher the weight the more similar the nodes are,or dissimilarities, depending on the application. The graphsconsidered here do not contain self-loops, i.e., ( x, x ) (cid:54)∈ E forall x ∈ V . For any given sets V and E , denote by G ( V,E ) thespace of all graphs with V as node set and E as edge set.This implies that two graphs G, H ∈ G ( V,E ) can only differin their weights. a r X i v : . [ c s . S I] O c t An alternative representation of a graph is through itsadjacency matrix A ∈ R n × n . If there exists an edge fromnode i to node j , then A ij takes the value of the correspondingweight. Otherwise, A ij is null. Requiring graphs not to containself-loops is equivalent to requiring the diagonal of A toconsist of all zeros. Observe that if two graphs G, H ∈ G ( V,E ) ,then the null entries of the corresponding adjacency matricesmust coincide.In the definition of centrality measures, the concepts of pathand path length are important. Given a graph ( V, E, W ) and x, x (cid:48) ∈ V , a path P ( x, x (cid:48) ) is an ordered sequence of nodes in V , P ( x, x (cid:48) ) = [ x = x , x , . . . , x l − , x l = x (cid:48) ] , (1)which starts at x and finishes at x (cid:48) and e i = ( x i , x i +1 ) ∈ E for i = 0 , . . . , l − . We say that P ( x, x (cid:48) ) links or connects x to x (cid:48) . The links e i of a path are the edges connecting consecutivenodes of the path in the direction given by the path. Specif-ically when W is associated to dissimilarities, we define thelength of a given path P ( x, x (cid:48) ) = [ x = x , . . . , x l = x (cid:48) ] as thesum of the weights (cid:80) l − i =0 W ( e i ) encountered when traversingits links in order. Given the graph G = ( V, E, W ) , we definethe shortest path function s G : V × V → R + where the shortestpath length s G ( x, x (cid:48) ) between nodes x, x (cid:48) ∈ V is defined as s G ( x, x (cid:48) ) := min P ( x,x (cid:48) ) l − (cid:88) i =0 W ( x i , x i +1 ) . (2)Whenever there is no possible path linking x to x (cid:48) in a graph G , we say that s G ( x, x (cid:48) ) = ∞ .III. N ODE C ENTRALITY AND S TABILITY
Node centrality is a measure of the importance of a nodewithin a graph. This importance is based on the location ofthe node within the graph and not on the intrinsic natureof this node. More precisely, given a graph ( V, E, W ) , acentrality measure C : V → R + assigns a nonnegativecentrality value to every node such that the higher the valuethe more central the node is. The centrality ranking imposedby C on the node set V is in general more relevant than theabsolute centrality values. Very often, this centrality rankingrelies on an underlying characteristic of the nodes. E.g.,airports which are hubs for some airline have high centralityin an air transportation network. In this way, centrality detectsfundamental roles played by nodes within the graph. Ideally,this detection should be invariant to small perturbations in theedge weights.To formalize this notion of robustness against perturbations,we define the metric d ( V,E ) : G ( V,E ) × G ( V,E ) → R + on thespace of graphs G ( V,E ) containing V as node set and E asedge set, as follows d ( V,E ) ( G, H ) := (cid:88) e ∈ E | W ( e ) − W (cid:48) ( e ) | = (cid:88) i,j | A ij − A (cid:48) ij | , (3)where G = ( V, E, W ) and H = ( V, E, W (cid:48) ) , and haveadjacency matrices A and A (cid:48) , respectively. To see that d ( V,E ) isa well-defined metric, notice that it computes the (cid:96) distancebetween two vectors obtained by stacking the values in W and W (cid:48) . The metric d ( V,E ) enables the formal definition ofstability presented next. Definition 1
A centrality measure C is stable if, for everyvertex set V , edge set E and any two graphs G, H ∈ G ( V,E ) , (cid:12)(cid:12) C G ( x ) − C H ( x ) (cid:12)(cid:12) ≤ K G d ( V,E ) ( G, H ) , (4) for every node x ∈ V , where K G is a constant for every graph G , C G ( x ) is the centrality value of node x in graph G andsimilarly for H . The above definition states that a centrality measure isstable if the difference in centrality values for a given node intwo different graphs is bounded by a constant K G times thedistance between these graphs. The constant K G only dependson graph G and must be valid for every graph H to which G isbeing compared. Moreover, the inclusion of K G in (4) ensuresthat the stability of a centrality measure does not depend onthe appearance of a normalization term in the definition of themeasure. In particular, if graph H is a perturbed version of G , any stable centrality measure ensures that the change incentrality due to this perturbation is bounded for every node.This generates a robust measure in the presence of noise aswe illustrate through examples in Section VI. In the followingsections we analyze the stability of the most frequently usedcentrality measures. A. Degree centrality
Degree centrality is a local measure of the importance of anode within a graph. The degree centrality measure C D of anode x in an undirected weighted graph ( V, E, W ) is givenby the sum of the weights of the edges incident to node x ,that is, C D ( x ) := (cid:88) x (cid:48) | ( x,x (cid:48) ) ∈ E W ( x, x (cid:48) ) . (5)For directed graphs, degree centrality is usually unfolded intotwo different measures: in-degree and out-degree centrality.The out-degree centrality C OD measure is computed as in (5),whereas the in-degree centrality C ID is computed as follows C ID ( x ) := (cid:88) x (cid:48) | ( x (cid:48) ,x ) ∈ E W ( x (cid:48) , x ) . (6)The degree centrality measure is applied to graphs where theweights in W represent similarities between the nodes. In thisway, a high degree centrality value of a given node meansthat this node has a large number of neighbors and is closelyconnected to them. Although the degree centrality measure hasa number of limitations related to its locality [14], it is stableas we state next. Proposition 1
The degree C D , out-degree C OD and in-degree C ID centrality measures in (5) and (6) are stable as definedin Definition 1 with K G = 1 . Proof:
Consider two arbitrary graphs in G ( V,E ) , G =( V, E, W ) and H = ( V, E, W (cid:48) ) . From the definition of degree centrality (5), we obtain | C GD ( x ) − C HD ( x ) | (7) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:88) x (cid:48) | ( x,x (cid:48) ) ∈ E W ( x, x (cid:48) ) − (cid:88) x (cid:48) | ( x,x (cid:48) ) ∈ E W (cid:48) ( x, x (cid:48) ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . Consolidating the summations in (7) and applying the trian-gular inequality we obtain that | C GD ( x ) − C HD ( x ) | ≤ (cid:88) x (cid:48) | ( x,x (cid:48) ) ∈ E | W ( x, x (cid:48) ) − W (cid:48) ( x, x (cid:48) ) | . (8)By summing the right hand side of (8) over all edges insteadof just a subset of them we obtain | C GD ( x ) − C HD ( x ) | ≤ (cid:88) e ∈ E | W ( e ) − W (cid:48) ( e ) | . (9)The right hand side of (9) is exactly d ( V,E ) ( G, H ) [cf. (3)],showing inequality (4) for K G = 1 . When considering directedgraphs, the proof can be replicated to show that the in-degree C ID and out-degree C OD centrality measures are stable. (cid:4) A consequence of the stability property of degree centralityshown in Proposition 1 is the limited effect that a perturbationin the weights of a graph has on the centrality values. InSection VI, we illustrate this in both synthetic and real-worldnetworks.
B. Closeness centrality
Closeness is a relevant centrality measure when we areinterested in how fast information can spread from one nodeto every other node in a network. The most commonly useddefinition of closeness centrality is the one in [7] where thecentrality C C ( x ) of a node x in a graph G = ( V, E, W ) isdefined as the inverse of the sum of the shortest path lengthsfrom this node to every other node in the graph, i.e. C C ( x ) := (cid:32) (cid:88) x (cid:48) ∈ V s G ( x, x (cid:48) ) (cid:33) − . (10)For (10) to make sense, the weights in W must representdissimilarities between the nodes. Moreover, in general, weconsider strongly connected graphs so that every shortestpath has finite length. This implies that (10) is well-defined.However, as done in [15], we will work with the decentrality version ¯ C C , where the lower the value the more central thenode, defined as ¯ C C ( x ) := (cid:88) x (cid:48) ∈ V s G ( x, x (cid:48) ) . (11)Since we are ultimately interested in the centrality rankingbeing impervious to perturbations, it is immediate that theranking stability of C C and of ¯ C C are equivalent since theyare related by a strictly decreasing function. In the followingproposition, we show stability of closeness decentrality. Proposition 2
The closeness decentrality measure ¯ C C in (11) is stable as defined in Definition 1 with K G = n . In proving Proposition 2, we use the following lemma whichupper bounds the difference between a shortest path in twodifferent graphs by the distance between these graphs.
Lemma 1
Given two graphs G = ( V, E, W ) and H =( V, E, W (cid:48) ) then, for every pair of nodes x, x (cid:48) ∈ V such that x is connected to x (cid:48) , | s G ( x, x (cid:48) ) − s H ( x, x (cid:48) ) | ≤ d ( V,E ) ( G, H ) , (12) where s G and s H are the shortest path lengths defined in (2) . Proof:
Consider two arbitrary graphs G = ( V, E, W ) and H = ( V, E, W (cid:48) ) and two connected nodes x, x (cid:48) ∈ V . Supposethat one shortest path from x to x (cid:48) in G is given by thepath P ( x, x (cid:48) ) = [ x = x , x , . . . , x l = x (cid:48) ] and one shortestpath from x to x (cid:48) in H is given by P (cid:48) ( x, x (cid:48) ) = [ x = x (cid:48) , x (cid:48) , . . . , x (cid:48) l (cid:48) = x (cid:48) ] . Then, by the definition of shortest pathin (2), we have that | s G ( x, x (cid:48) ) − s H ( x, x (cid:48) ) | = (13) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) l − (cid:88) i =0 W ( x i , x i +1 ) − l (cid:48) − (cid:88) i =0 W (cid:48) ( x (cid:48) i , x (cid:48) i +1 ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . Without loss of generality, assume that the graph G has a largershortest path, i.e., s G ( x, x (cid:48) ) ≥ s H ( x, x (cid:48) ) . By this assumption,the difference between the shortest path lengths in (13) isnonnegative even without the absolute value. Hence, if insteadof considering the shortest path P ( x, x (cid:48) ) in G we considera (possibly) different path such as P (cid:48) ( x, x (cid:48) ) , we can assurethat the difference between these two lengths is not going todecrease. Then, it follows that | s G ( x, x (cid:48) ) − s H ( x, x (cid:48) ) | ≤ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) l (cid:48) − (cid:88) i =0 W ( x (cid:48) i , x (cid:48) i +1 ) − W (cid:48) ( x (cid:48) i , x (cid:48) i +1 ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . (14)A direct application of the triangular inequality yields | s G ( x, x (cid:48) ) − s H ( x, x (cid:48) ) | ≤ l (cid:48) − (cid:88) i =0 (cid:12)(cid:12) W ( x (cid:48) i , x (cid:48) i +1 ) − W (cid:48) ( x (cid:48) i , x (cid:48) i +1 ) (cid:12)(cid:12) . (15)Finally, if on the right hand side of (15) instead of summingover the links in P (cid:48) ( x, x (cid:48) ) we sum over all edges in the set E , we obtain another upper bound given by | s G ( x, x (cid:48) ) − s H ( x, x (cid:48) ) | ≤ (cid:88) e ∈ E | W ( e ) − W (cid:48) ( e ) | . (16)The right hand side of (16) is exactly d ( V,E ) ( G, H ) [cf. (3)],concluding the proof. (cid:4) We can now leverage Lemma 1 to show the stability ofcloseness decentrality.
Proof of Proposition 2:
Given two strongly connected graphs G = ( V, E, W ) and H = ( V, E, W (cid:48) ) , from the definition of ¯ C C in (11) we have that | ¯ C GC ( x ) − ¯ C HC ( x ) | = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:88) x (cid:48) ∈ V s G ( x, x (cid:48) ) − (cid:88) x (cid:48) ∈ V s H ( x, x (cid:48) ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . (17) Consolidating the summations in (17) and applying the trian-gular inequality we obtain that | ¯ C GC ( x ) − ¯ C HC ( x ) | ≤ (cid:88) x (cid:48) ∈ V | s G ( x, x (cid:48) ) − s H ( x, x (cid:48) ) | . (18)Using the result in Lemma 1 we conclude that | ¯ C GC ( x ) − ¯ C HC ( x ) | ≤ (cid:88) x (cid:48) ∈ V d ( V,E ) ( G, H ) = n d ( V,E ) ( G, H ) , (19)showing inequality (4) for K G = n . (cid:4) Some alternative definitions of closeness centrality exist[16], [17] including that in [8] where the measure in (11) isnormalized by n − . However, since normalization constantscan be absorbed into K G , stability does not depend on theappearance of normalization terms. Remark 1
If we adopt the convention that ∞ − ∞ = 0 , thenthe result in Lemma 1 is true even when nodes x and x (cid:48) arenot connected. This, in turn, implies that Proposition 2 canbe shown for general graphs and the requirement of strongconnectivity can be dropped. C. Betweenness centrality
Centrality can be interpreted as the possibility of a node tocontrol the communication or the optimal flow within a graph.Betweenness centrality takes this position by giving highercentrality values to nodes that fall within the shortest path ofmany pairs of nodes. Formally, given a graph G = ( V, E, W ) and three arbitrary nodes x, x (cid:48) , x (cid:48)(cid:48) ∈ V , denote by σ x (cid:48) x (cid:48)(cid:48) thenumber of shortest paths from x (cid:48) to x (cid:48)(cid:48) , i.e. the number ofpaths P ( x (cid:48) , x (cid:48)(cid:48) ) of length s G ( x (cid:48) , x (cid:48)(cid:48) ) , and by σ x (cid:48) x (cid:48)(cid:48) ( x ) thenumber of these shortest paths that go through node x . Forconvenience, we define σ xx = 1 for all x ∈ V . Notice thatsince G might be directed, we can have that σ x (cid:48) x (cid:48)(cid:48) (cid:54) = σ x (cid:48)(cid:48) x (cid:48) for some x (cid:48) , x (cid:48)(cid:48) ∈ V . The betweenness centrality C B ( x ) forany given node x ∈ V is defined as [10] C B ( x ) := (cid:88) x (cid:48) ,x (cid:48)(cid:48) ∈ Vx (cid:48) (cid:54) = x (cid:54) = x (cid:48)(cid:48) σ x (cid:48) x (cid:48)(cid:48) ( x ) σ x (cid:48) x (cid:48)(cid:48) . (20)In (20), we compute the betweenness centrality value of a node x ∈ V by sequentially looking at the shortest paths betweenany two nodes distinct from x and summing the proportion ofshortest paths that contain node x . As was the case for close-ness centrality, the weights in W should denote dissimilaritiesfor C B to be a reasonable measure of centrality. Sometimes(20) is normalized by the number of pairs in the network orthe maximum centrality value achievable [10], [18], [19] suchthat C B ( x ) takes values in the interval [0 , . However, weare interested in comparing centrality values between differentnodes within a network and these comparisons are invariant toany normalization. Moreover, the stability property does notdepend on normalizing constants since these can get absorbedby K G [cf. (4)]. Hence, we omit the normalizing constant indefinition (20). Despite its extensive use in the study of both technological[20] and social [21] networks, the betweenness centralitymeasure is not stable as we show next. Proposition 3
The betweenness centrality measure C B in (20) is not stable in the sense of Definition 1. Proof:
Consider the undirected graphs G = ( V, E, W ) and H = ( V, E, W (cid:48) ) depicted in Fig. 1. Since the sum in (3)is done over the set of directed edges, it is immediate that d ( V,E ) ( G, H ) = 4 (cid:15) .For any (cid:15) > , according to (20) we have that C GB ( x ) = 9 since the node x is part of one of the two shortest paths fromany node in { x , x , x } to any node in { x , x , x } and viceversa, where the other path goes through x . However, for thatsame (cid:15) , C HB ( x ) = 0 since x is not an intermediate node inany shortest path in graph H . This implies that, | C GB ( x ) − C HB ( x ) | d ( V,E ) ( G, H ) = 94 (cid:15) . (21)Note that constant K G in (4) cannot depend on (cid:15) since thisis not a parameter of the graph G . Thus, for any candidateconstant K G there exists a small enough (cid:15) > such thatthe above ratio is greater than the proposed K G . Thus, suchconstant cannot exist and C B is not stable. (cid:4) The instability of the betweenness centrality measure entailsan undesirable behavior when applied to synthetic and real-world networks as shown in Section VI. Also, the result inProposition 3 motivates an alternative definition of between-ness centrality presented in Section IV.
D. Eigenvector centrality
The eigenvector centrality C E of a node, just as the de-gree centrality, depends on its neighbors. However, it doesnot depend on the number of neighbors but rather on howimportant its neighbors are. The importance of its neighborsin turn depends on how important their neighbors are, andso on. In this way, a node with a few important neighborshas larger eigenvector centrality than a node with variousneighbors of limited importance. Following this premise, fora given graph G = ( V, E, W ) with adjacency matrix A whereweights denote similarities, we may write for every node xC E ( x ) := 1 λ (cid:88) ( x,x (cid:48) ) ∈ E W ( x, x (cid:48) ) C E ( x (cid:48) ) , (22)for some constant λ . In (22), the centrality value of a node isdefined as a weighted average of the centrality values of itsneighbors. In terms of the adjacency matrix, we have that C E ( x i ) = 1 λ (cid:88) j A ij C E ( x j ) . (23)We may rewrite (23) in matrix form to obtain [9] λ C E = A C E , (24)where C E = ( C E ( x ) , . . . , C E ( x n )) T . From (24) it is imme-diate that C E is an eigenvector of the adjacency matrix A . Inorder to ensure that the components of C E are real numbers, G x x x x x x x x H x x x x x x x x (cid:15) (cid:15) Fig. 1: Instability of betweenness centrality C B . The distance between G and H vanishes with decreasing (cid:15) , however C GB ( x ) =9 and C HB ( x ) = 0 for every (cid:15) > . A must be symmetric which corresponds to graph G beingundirected. Although some extensions have been proposedfor directed graphs [22], the most commonly used version ofeigenvector centrality requires the graph to be undirected. Thesolution of (24) is not uniquely determined, since every pair ( λ, C E ) of eigenvalues and eigenvectors solves the equation.However, for connected graphs the Perron-Frobenius Theoremensures that the eigenvector corresponding to the maximaleigenvalue contains all positive components; see Lemma 2below. Thus, C E in (24) is defined as the normalized dominanteigenvector of A where the corresponding graph G must beconnected and undirected. As a consequence, C E ( x ) is upperbounded by 1 for every node x . Eigenvector centrality is astable measure as the following proposition shows. Proposition 4
The eigenvector centrality measure C E in (24) is stable as defined in Definition 1 with K G = 4 λ n − λ n − , (25) where λ n > λ n − ≥ . . . ≥ λ are the eigenvalues of theadjacency matrix of graph G . In proving Proposition 4, we will use as lemmas two knownresults from linear algebra. The first result is the Perron-Frobenius Theorem which we restate below in a form thatis useful for our proof.
Lemma 2 [Perron-Frobenius Theorem]
Let A ≥ beirreducible. Then there is a unique positive real number r such that: (i) There is a real vector v > with A v = r v . (ii) The geometric and algebraic multiplicities of r are one. (iii) For each eigenvalue s of A we have that | s | ≤ r . Proof:
See [23, Chapter 2, Theo. 1.4]. (cid:4)
The second result studies the behavior of eigenvectors whena symmetric matrix is perturbed.
Lemma 3 [Eigenvector Perturbation Theorem]
Let A and A + E in R n × n be symmetric with eigenvalues λ n ≥ . . . ≥ λ and µ n ≥ . . . ≥ µ . If | λ i − λ j | ≥ β > || E || , i (cid:54) = j, (26) then A and A + E have normalized eigenvectors u j and v j corresponding to λ j and µ j such that || u j − v j || ≤ γ (1 + γ ) / , (27) where γ = || E || / ( β − || E || ) . Proof:
See [24, Theorem 3.3.7]. (cid:4)
We now use the results in Lemmas 2 and 3 to show thestability of the eigenvector centrality measure.
Proof of Proposition 4:
Consider an arbitrary undirected,connected graph G = ( V, E, W ) with adjacency matrix A and another undirected graph H = ( V, E, W (cid:48) ) with the sameset of edges and adjacency matrix B . The following resultrelating the distance between G and H and the norm of thedifference of adjacency matrices will be useful for the rest ofthe proof. Claim 1
The distance between the graphs upper bounds the − norm of the difference between the adjacency matrices, i.e. d ( V,E ) ( G, H ) ≥ || B − A || . (28) Proof :
From the definition of d ( V,E ) in terms of adjacencymatrices (3), we may write that d ( V,E ) ( G, H ) = (cid:88) i,j | B ij − A ij | ≥ (cid:115)(cid:88) i,j ( B ij − A ij ) = (cid:113) Trace (( B − A ) T ( B − A )) , (29)where the inequality is given by the relation between the (cid:96) and (cid:96) vector norms. Noticing that ( B − A ) T ( B − A ) is apositive semi-definite matrix and the fact that the trace equalsthe sum of the eigenvalues, it follows that d ( V,E ) ( G, H ) ≥ (cid:113) λ max (( B − A ) T ( B − A )) . (30)Observing that the right hand side of (30) is exactly || B − A || concludes the proof. (cid:4) Continuing with the main proof of Proposition 4, denote by λ n ≥ . . . ≥ λ the eigenvalues of A with corresponding eigen-vectors v n , . . . , v and, similarly for B where the eigenvaluesare µ i and the associated eigenvector u i , for i = 1 , . . . , n .From the connectedness of graphs G and H , it follows thatmatrices A and B are irreducible (cf. [23, Chapter 2, Theo.2.7]). Thus, by Lemma 2, the dominant eigenvalues of bothmatrices λ n and µ n must be simple. Denote by δ the distancefrom the dominant eigenvalue in A to the second largesteigenvalue, i.e. δ := λ n − λ n − . For simplicity, we dividethe proof of Proposition 4 into two cases. In the first case, weassume that || B − A || ≥ δ . (31) For this case, pick an arbitrary node x ∈ V and, given thatthe eigenvector centrality is bounded between 0 and 1 we canwrite that | C GE ( x ) − C HE ( x ) | ≤ δ δ ≤ δ || B − A || ≤ δ d ( V,E ) ( G, H ) , (32)where we used (31) and (28). This shows inequality (4) for K G = 4 /δ in the first case studied.Consider as a second scenario the situation where || B − A || < δ . (33)Notice that this implies that (26) is satisfied for j = n , β = δ and E = B − A . For any given node x ∈ V , we have that | C GE ( x ) − C HE ( x ) | = | v n ( x ) − u n ( x ) | ≤ || v n − u n || , (34)where the equality comes from the definition of eigenvectorcentrality [cf. (24)] and the inequality is a trivial fact fromvector algebra. Combining the result of Lemma 3 with (34),we can write | C GE ( x ) − C HE ( x ) | (35) ≤ || B − A || ( δ − || B − A || ) (cid:18) || B − A || ( δ − || B − A || ) (cid:19) / = ( δ − δ || B − A || + 2 || B − A || ) / ( δ − || B − A || ) || B − A || . Using the assumed relation in (33), the numerator in theright hand side of (35) can be upper bounded by δ and thedenominator can be lower bounded by ( δ/ . Thus, it followsthat | C GE ( x ) − C HE ( x ) | ≤ δδ || B − A || ≤ δ d ( V,E ) ( G, H ) , (36)where we used (28) in the last inequality. This shows inequal-ity (4) for K G = 4 /δ in the second case analyzed. Sincefor every pair of graphs G and H , either (31) or (33) mustbe satisfied, inequality (4) for K G = 4 /δ is true in general,showing that the eigenvector centrality measure is stable. (cid:4) Notice that the constants K G found for degree centrality [cf.Proposition 1] and closeness centrality [cf. Proposition 2] onlydepend on the number of nodes n and are independent of theweight structure of the graph G . However, this is not the casefor eigenvector centrality, where the constant K G depends onthe eigenvalues of the adjacency matrix which are a functionof the weights of the graph. This difference does not impactthe practical implementation of eigenvector centrality as wesee in Section VI.Among the four centrality measures studied – degree, close-ness, betweenness, and eigenvector –, betweenness centralityis the only measure that fails to be stable. This motivates thealternative definition for a stable betweenness centrality thatwe develop in the following section. IV. S TABLE B ETWEENNESS C ENTRALITY
A consequence of the instability of betweenness centralityshown in Proposition 3 is that perturbations in the weightsof a graph have major impacts on the centrality ranking ofthe nodes; see Section VI. Thus, in this section we presentan alternative centrality measure that preserves the centralitynotion of betweenness centrality while being stable.Given an arbitrary graph G = ( V, E, W ) and a node x ∈ V ,define a new graph G x = ( V x , E x , W x ) with V x = V \{ x } , E x = E \{ ( x (cid:48) , x (cid:48)(cid:48) ) | x (cid:48) = x or x (cid:48)(cid:48) = x } , and W x = W | E x .I.e., the graph G x is constructed by deleting from G the node x and every edge directed to or from it. Define the stablebetweenness centrality C SB ( x ) of any node x ∈ V as C SB ( x ) := (cid:88) x (cid:48) ,x (cid:48)(cid:48) ∈ Vx (cid:48) (cid:54) = x (cid:54) = x (cid:48)(cid:48) s G x ( x (cid:48) , x (cid:48)(cid:48) ) − s G ( x (cid:48) , x (cid:48)(cid:48) ) . (37)Note that every term in the above summation is nonnegativesince shortest paths in the graph G x cannot be shorter thanthe corresponding paths in G . Measure C SB quantifies thecentrality of a given node x by the change in the length ofshortest paths once this node is removed. Intuitively, if a nodeis part of many shortest paths, when we remove this node thecorresponding paths will increase in length and result in a highcentrality value. In this sense, measure C SB is similar to theoriginal betweenness centrality measure C B . However, howcritical a given node is in connecting the network depends onthe best alternative path if this node fails. As was the casefor C B , definition (37) should be applied to graphs where theweights represent dissimilarities between nodes. In contrast tothe traditional centrality measure, C SB is stable as shown afterthe following remark. Remark 2
To guarantee that C SB in (37) will achieve finitevalues, we must require that the graph being studied is 2-connected or biconnected [25]. In this way, the shortest pathlength s G x ( x (cid:48) , x (cid:48)(cid:48) ) is finite for all triplets x, x (cid:48) , x (cid:48)(cid:48) ∈ V . Analternative is to adopt the convention that ∞ − ∞ = 0 andin such a case no assumption needs to be made about theconnectivity of the graph [cf. Remark 1]. Proposition 5
The stable betweenness centrality measure C SB in (37) is stable as defined in Definition 1 with K G =2 n . In proving this proposition, we use the following lemma.
Lemma 4
Given two arbitrary graphs G = ( V, E, W ) and H = ( V, E, W (cid:48) ) we have that d ( V,E ) ( G, H ) ≥ d ( V x ,E x ) ( G x , H x ) , (38) for all x ∈ V . Proof:
Use the definition of d ( V,E ) in (3) and separate all the terms that involve x to write d ( V,E ) ( G, H ) = (cid:88) e ∈ E | W ( e ) − W (cid:48) ( e ) | (39) = (cid:88) ( x (cid:48) ,x (cid:48)(cid:48) ) ∈ Ex (cid:48) (cid:54) = x (cid:54) = x (cid:48)(cid:48) | W ( x (cid:48) , x (cid:48)(cid:48) ) − W (cid:48) ( x (cid:48) , x (cid:48)(cid:48) ) | + (cid:88) ( x (cid:48) ,x (cid:48)(cid:48) ) ∈ Ex = x (cid:48) or x = x (cid:48)(cid:48) | W ( x (cid:48) , x (cid:48)(cid:48) ) − W (cid:48) ( x (cid:48) , x (cid:48)(cid:48) ) | The first term in the rightmost side of the equality in (39)is, by definition, the distance d ( V x ,E x ) ( G x , H x ) . We can thenrewrite (39) as d ( V,E ) ( G, H ) = d ( V x ,E x ) ( G x , H x ) (40) + (cid:88) ( x (cid:48) ,x (cid:48)(cid:48) ) ∈ Ex = x (cid:48) or x = x (cid:48)(cid:48) | W ( x (cid:48) , x (cid:48)(cid:48) ) − W (cid:48) ( x (cid:48) , x (cid:48)(cid:48) ) | , The result in (38) follows because the second term in (40) isnonnegative. (cid:4)
We now use Lemmas 1 and 4 to prove Proposition 5.
Proof of Proposition 5:
Given two biconnected graphs G =( V, E, W ) and H = ( V, E, W (cid:48) ) we have that for an arbitrarynode x ∈ V , | C GSB ( x ) − C HSB ( x ) | = (cid:12)(cid:12)(cid:12) (cid:88) x (cid:48) ,x (cid:48)(cid:48) ∈ Vx (cid:48) (cid:54) = x (cid:54) = x (cid:48)(cid:48) s G x ( x (cid:48) , x (cid:48)(cid:48) ) − s G ( x (cid:48) , x (cid:48)(cid:48) ) − (cid:88) x (cid:48) ,x (cid:48)(cid:48) ∈ Vx (cid:48) (cid:54) = x (cid:54) = x (cid:48)(cid:48) s H x ( x (cid:48) , x (cid:48)(cid:48) ) − s H ( x (cid:48) , x (cid:48)(cid:48) ) (cid:12)(cid:12)(cid:12) . (41)Rearranging terms and using the triangle inequality we obtain | C GSB ( x ) − C HSB ( x ) | ≤ (42) (cid:88) x (cid:48) ,x (cid:48)(cid:48) ∈ Vx (cid:48) (cid:54) = x (cid:54) = x (cid:48)(cid:48) | s H ( x (cid:48) , x (cid:48)(cid:48) ) − s G ( x (cid:48) , x (cid:48)(cid:48) ) | + | s G x ( x (cid:48) , x (cid:48)(cid:48) ) − s H x ( x (cid:48) , x (cid:48)(cid:48) ) | Applying Lemma 1 to (42) we have that | C GSB ( x ) − C HSB ( x ) | (43) ≤ (cid:88) x (cid:48) ,x (cid:48)(cid:48) ∈ Vx (cid:48) (cid:54) = x (cid:54) = x (cid:48)(cid:48) d ( V,E ) ( G, H ) + d ( V x ,E x ) ( G x , H x ) ≤ n (cid:0) d ( V,E ) ( G, H ) + d ( V x ,E x ) ( G x , H x ) (cid:1) . Using now Lemma 4 we obtain that | C GSB ( x ) − C HSB ( x ) | ≤ n d ( V,E ) ( G, H ) , (44)showing inequality (4) for K G = 2 n and concluding theproof. (cid:4) To compare the stable betweenness centrality measure C SB with the traditional measure C B , consider the graphs G and H in Fig. 2 where < (cid:15) (cid:28) (cid:28) M , i.e., (cid:15) is a smallmodification to the reference edge weight of and M is alarge modification. For the traditional betweenness centralitywe have C GB ( x ) = C HB ( x ) = 18 because x is part of 18shortest paths in both networks [cf. proof of Proposition 3]. However, intuition suggests that x is more central to graph H than it is to graph G . A failure of this node in graph H would compromise the graph dynamics deeply since allthe flows that passed through x are now required to passthrough the much costlier edges that run through x . Graph G , however, is more resilient to a failure of x because theflows can pass through x instead of x at similar cost. Thus,it is reasonable to expect x to be less central to G than itis to H . The stable betweenness centrality C SB captures thisnotion. If node x is deleted from G , the 18 shortest paths ofwhich x was originally a part of, have their length increasedby (cid:15) . Consequently, C GSB ( x ) = 36 (cid:15) . The centrality of node x is limited by the existence of a comparable path throughnode x . However, if node x is deleted from H , the 18shortest paths have their length increased by M resulting in C HSB ( x ) = 36 M (cid:29) C GSB ( x ) , which corresponds with ourintuition. The centrality of x depends on the quality of thebest alternative. Remark 3
Computing the betweenness centrality C B forevery node in a graph with n nodes and m weighted edgesrequires O ( nm + n log n ) computations [26]. For C SB , wecan use the Floyd-Warshall [27], [28] or the Johnson [29]algorithm to compute all-pairs shortest paths in a graph. Thelatter is more suitable for sparse graphs with a computationalcomplexity of O ( nm + n log n ) . In a naive computation of C SB for every node in the graph, we can compute the shortestpaths for all pairs of nodes in the original network and in everynetwork generated when deleting one node at a time. Thisrequires n + 1 implementations of Johnson algorithm with atotal complexity of O ( n m + n log n ) , i.e., a factor of n morethan the traditional betweenness centrality. A faster algorithmcould exist since, when a node is deleted from the network,only the shortest paths originally containing this node need tobe recomputed. The development of this algorithms is beyondthe scope of this paper.V. C ONTINUITY OF C ENTRALITY M EASURES
Continuity is a subtler notion of how impervious a centralitymeasure is to noise. Specifically, we define a continuouscentrality measure as one in which the centrality values ofevery node in a given graph are a continuous function of theweights in the edges of this graph as we formally state next.
Definition 2
Let G = ( V, E, W ) be an arbitrary graph withadjacency matrix A . For every matrix B such that B ij = 0 if A ij = 0 and B + A ≥ element-wise, define the graph H = ( V, E, W (cid:48) ) whose adjacency matrix is A + B . Then, acentrality measure C is continuous if for every x ∈ V , C H ( x ) → C G ( x ) as || B || → , (45) where C G ( x ) is the centrality of node x in graph G andsimilarly for H . In the above definition, matrix B can be interpreted as aperturbation defined on the edges of graph G . A continuouscentrality measure ensures that as this perturbation vanishes,the centrality values tend to those in graph G . Continuity is a G x x x x x x x x (cid:15) (cid:15)
11 11
H x x x x x x x x M M
11 11
Fig. 2: Implementation example of betweenness C B and stable betweenness C SB centrality. The betweenness centrality valueof x is equal for both graphs, C GB ( x ) = C HB ( x ) = 18 . However, the stable betweenness is different, C GSB ( x ) = 36 (cid:15) and C HSB ( x ) = 36 M . The stable betweenness centrality value of x depends on the quality of the best alternative path.weaker notion than stability since the latter implies the formeras we show next. Proposition 6
If a centrality measure C is stable as inDefinition 1 then it is continuous as in Definition 2. Proof:
By the equivalence of matrix norms [30], it is imme-diate that as || B || → then d ( V,E ) ( G, H ) → where B , G and H are defined as in Definition 2. Thus, if a given measure C is stable, it must satisfy (4) which implies that (cid:12)(cid:12) C G ( x ) − C H ( x ) (cid:12)(cid:12) → as || B || → , (46)which is equivalent to the definition in (45), concluding theproof. (cid:4) As stated in Section III, a centrality measure is a function ofa graph that assigns a nonnegative real number to each node.This broad definition enables the existence of a wide varietyof measures. In particular, there can exist centrality measureswhich are continuous but not stable, as we show next.
Proposition 7
If a centrality measure C is continuous as inDefinition 2 then it need not be stable as in Definition 1. Proof:
For an arbitrary graph G = ( V, E, W ) , consider thedegree squared centrality measure C DS such that for everynode x ∈ V , C GDS ( x ) := (cid:88) x (cid:48) | ( x,x (cid:48) ) ∈ E (cid:0) W ( x, x (cid:48) ) (cid:1) . (47)In the above measure, for every node we assign a centralityvalue equal to the sum of the squares of the weights of incidentedges instead of just summing the weights as in degree central-ity. This is a valid centrality measure which is continuous butnot stable. Continuity follows immediately from the fact that C DS is defined as the sum of quadratic – hence, continuous– functions of the weights in the graph. Thus, vanishingperturbations of the weights must have vanishing effect on C DS .To see that C DS is not stable, consider two particular undi-rected graphs G = ( V, E, W ) and H = ( V, E, W (cid:48) ) with twonodes, V = { x, x (cid:48) } and weights W ( x, x (cid:48) ) = W ( x (cid:48) , x ) = 1 and W (cid:48) ( x, x (cid:48) ) = W (cid:48) ( x (cid:48) , x ) = 1+ δ for δ > . From definition(3) we have that d ( V,E ) ( G, H ) = 2 δ and from (47), we obtain C GDS ( x ) = 1 and C HDS ( x ) = (1 + δ ) . Thus, for C DS to be stable the following must be fulfilled [cf. (4)], | − (1 + δ ) | ≤ K G δ (48) δ + 2 δ ≤ K G δδ ≤ K G However, K G is a constant that does not depend on δ sincethis is not a parameter of graph G . Thus, for any candidateconstant K G , there exists a δ big enough such that (48) isviolated, showing that C DS is not stable and concluding theproof. (cid:4) Proposition 6 guarantees that degree, closeness, eigenvectorand stable betweenness centrality are continuous centralitymeasures. Proposition 7 leaves open the question of whetherbetweenness centrality, which is not stable, is continuous ornot. The result below shows that it is not.
Proposition 8
The betweenness centrality measure C B in (20) is not continuous as defined in Definition 2. Proof:
The same counter-example used in the proof of Propo-sition 3 can be used to show failure of continuity. As (cid:15) → ,we have that || B || → . However, (cid:12)(cid:12) C GB ( x ) − C HB ( x ) (cid:12)(cid:12) → ,violating Definition 2. (cid:4) Being not only unstable but discontinuous further hinderspractical applicability of C B . Given that C SB captures asimilar notion but does so while being stable, thus continuous,makes it an appealing alternative. The numerical experimentsin the following section further illustrate how the undesirablestructural properties of betweenness centrality translate intolack of robustness when applied to synthetic and real-worlddata. VI. N UMERICAL E XPERIMENTS
Stability and continuity regulate the behavior of centralitymeasures in the presence of noise. We empirically validatethree facts: the behavior of betweenness centrality in thepresence of noise is fundamentally different from the othermeasures (Section VI-A), continuity and stability encode dif-ferent robustness properties (Section VI-B), and the stablebetweenness alternative C SB retains the same centrality notionas the original C B (Section VI-C).For a given node set V of size n ≥ , we define arandom network as one where an undirected edge ( x, x (cid:48) ) belongs to E with probability q = 10 /n . The weight of Network Size M ean m a x i m u m c hange i n r an k i ng C D C C C B C E C SB (a) Network Size M ean a v e r age c hange i n r an k i ng C D C C C B C E C SB (b) Network Size P r obab ili t y m a x c hange g r ea t e r t han C D C C C B C E C SB (c) Network Size P r obab ili t y m a x c hange g r ea t e r t han C D C C C B C E C SB (d) Maximum change in ranking F r equen cy C D C C C B C E C SB (e) Network Size P r obab ili t y un c hanged t op C D C C C B C E C SB (f) Fig. 3: Comparison of stability indicators when type 1 noise ( p = 1 , δ = 0 . ) is introduced in random networks for allcentrality measures: degree (green circle), closeness (purple right triangle), betweenness (orange upwards triangle), eigenvector(yellow left triangle), and stable betweenness (cyan downwards triangle). (a) Mean of the maximum change recorded whenperturbing a random network as a function of network size. (b) Mean of the average node rank change recorded when perturbinga random network as a function of network size. (c) Probability that the maximum change in the ranking exceeds 3 positionsas a function of the network size. (d) Probability that the maximum change in the ranking exceeds 5 positions as a functionof the network size. (e) Histogram of the maximum change recorded when perturbing random networks with 150 nodes. (f)Probability that the top 5 ranking remains unchanged when perturbing a network.this edge is randomly picked from a uniform distributionin [0 . , . . We consider these weights to be indication ofdissimilarities. Notice that the centrality rankings obtained byapplying a centrality measure based on dissimilarities – e.g.,closeness – and one based on similarities – e.g., degree – onthe same graph are not comparable. Thus, for every randomgraph we generate a similarity based graph with the samenodes and edges but where the weights are computed as 2minus the edges in the original dissimilarity graph. In thisway, all weights in the similarity graphs are also containedin [0 . , . and all centrality rankings can be compared.Closeness, betweenness and stable betweenness centralitieswill be applied to dissimilarity networks while eigenvectorand degree centrality will be applied to similarity networks.As real-world data, we use two networks, one containsinformation about the air traffic between the most popularairports in Unites States (U.S.) [31] while the second networkrecords interactions between sectors of the U.S. economy[32]. More precisely, in the undirected airport network G A =( V A , E A , W A ) , the node set V A is composed of 25 popularairports in U.S., an edge ( x, x (cid:48) ) exists between two airports x, x (cid:48) ∈ V A if there is a regularly scheduled flight betweenthem, and the weight of this edge W A ( x, x (cid:48) ) is equal tothe number of passenger seats – either occupied or empty– between both destinations in a given year. The economicnetwork G I = ( V I , E I , W I ) , contains as nodes the 61 indus- trial sectors of the economy as defined by the North AmericanIndustry Classification System (NAICS). There exists an edge ( x, x (cid:48) ) ∈ E I if part of the output of sector x is used as inputto sector x (cid:48) , and the weight W I ( x, x (cid:48) ) is given by how muchoutput of x – in dollars – is productive input of x (cid:48) . We considerboth W A ( x, x (cid:48) ) and W I ( x, x (cid:48) ) as measures of similarity anduse the inverses /W A ( x, x (cid:48) ) and /W I ( x, x (cid:48) ) as weights forthe centrality measures that require dissimilarity graphs. A. Robustness indicators
We analyze the robustness of the centrality rankings whenthe random networks are perturbed by random noise. Our spec-ification of random noise has two parameters: the probabilityof perturbation p and the amplitude of perturbation δ . Given anetwork, we build a perturbed version of it by modifying everyedge weight with probability p . The perturbed edge weightsare multiplied by a uniform random number in [1 − δ, δ ] . Inour simulations, we analyze two kinds of noise: type 1 noisehas parameters p = 1 and δ = 0 . while type 2 noise hasparameters p = 0 . and δ = 0 . . The first noise affects everyedge but modifies the weight by a maximum of 1% whereasthe second type of noise affects on average one out of everyten edges but modifies the weight up to 10%.For the following experiment, we generate 100 randomnetworks of n nodes, where n varies from 10 to 200 in Network Size M ean m a x i m u m c hange i n r an k i ng C D C C C B C E C SB (a) Network Size M ean a v e r age c hange i n r an k i ng C D C C C B C E C SB (b) Network Size P r obab ili t y m a x c hange g r ea t e r t han C D C C C B C E C SB (c) Fig. 4: Comparison of stability indicators when type 2 noise ( p = 0 . , δ = 0 . ) is introduced in random networks for allcentrality measures: degree (green circle), closeness (purple right triangle), betweenness (orange upwards triangle), eigenvector(yellow left triangle), and stable betweenness (cyan downwards triangle). (a) Mean of the maximum change recorded whenperturbing a random network as a function of network size. (b) Mean of the average node rank change recorded when perturbinga random network as a function of network size. (c) Probability that the maximum change in the ranking exceeds 10 positionsas a function of the network size.multiples of 10. We then generate two perturbed versions ofeach of these networks by applying both types of noises. Forevery network, we generate a centrality ranking of the nodes,i.e. we sort the nodes in decreasing order of centrality value,and compare it with the centrality ranking of the perturbedversions of that network. We perform this comparison for therankings output by the four commonly used centrality mea-sures – degree, closeness, betweenness and eigenvector – aswell as the stable betweenness centrality measure introducedin Section IV.A number of stability indicators are analyzed when per-turbing the networks with both types of noise; see Figs. 3 and4. For type 1 noise, we begin by analyzing the maximumvariation in ranking position experienced by a node whenperturbing the network. In Fig. 3a we plot the mean of thisindicator among the networks analyzed as a function of thenetwork size. For example, for a network with 100 nodes,the type 1 perturbation generates a maximum change of 1.8positions on average for the C D ranking, 2.6 positions onaverage for the C C ranking, 5.9 positions on average for the C B ranking, 2.0 positions on average for the C E ranking, and2.7 positions for the C SB ranking. All measures experience anapproximately linear increase of the maximum change with thesize of the network, but the rate of increase is fastest for C B ,generating big performance differences between the measuresfor larger networks. Moreover, the behavior of degree andeigenvector centrality as well as the behavior of closeness andstable betweenness centrality are similar to each other. This isnot surprising since they depend upon similar properties of thenetwork. Both closeness and stable betweenness are definedin terms of shortest paths. Also, both degree and eigenvectorcentrality depend on a notion of neighborhood of each node.For the former, centrality coincides with the graph theoreticnotion of neighborhood whereas for the latter, centrality de-pends on a neighborhood weighted by its influence. In Fig. 3bwe plot the mean average change when perturbing the networkwith type 1 noise. I.e., the expected rank variation of any givennode in the network. The trend is very similar to the one formaximum variation in ranking. E.g., for a network containing 150 nodes, on average every node experiences a change in1 position for betweenness centrality while the change isaround 0.5 positions for closeness and stable betweennessand 0.3 for degree and eigenvector centralities. Apart fromcomputing the mean rank variations across networks, weare interested in the distributions of these variations for thedifferent centrality measures. Thus, we plot the probabilitythat the maximum change in the ranking generated by aperturbation of type 1 is greater than 3 positions (Fig. 3c) and5 positions (Fig. 3d) as a function of the network size. E.g.,for networks of 60 nodes, there is a 0.5 probability that thebetweenness centrality ranking undergoes a variation greaterthan 3 positions while this probability is less than 0.1 forall other measures. Moreover, for over 90% of the networksof 180 nodes, the betweenness centrality ranking undergoesa variation greater than 5 positions when perturbed whilethis percent is smaller than 10% for the other measures. Tofacilitate the understanding of Figs. 3a, 3c, and 3d, in Fig. 3ewe present the histogram of the maximum change found in therankings when perturbing a network for the particular case ofnetworks with 150 nodes for all measures. The mean of thesehistograms correspond to the markers for networks with 150nodes in Fig. 3a. In this way, the mean of the green histogramcorresponds to the green circle, the orange histogram to theorange upwards triangle and so on. To relate the histogramwith Fig. 3c, notice that the green histogram has a frequencyof 7 for changes of 4 positions and zero frequency for largerchanges. Since we consider 100 sample networks of each size,this translates into a 0.07 probability of observing changesgreater than 3 positions for networks of size 150 nodes, whichcorresponds to the green circle in Fig. 3c. The same is true forFig. 3d, but considering changes greater than 5 positions in thehistogram. It is immediate that only the orange histogram has aconsiderable portion of its weight for changes of 6 positions ormore, translating into a big difference in probabilities betweenthe orange marker and the rest in Fig. 3d. Having a longertail, the silhouette of the orange C B histogram is essentiallydifferent from the rest. E.g., for one of the studied networks,the C B ranking presents a change of 19 positions when the Perturbation Size (1000 x δ ) P r obab ili t y m a x c hange g r ea t e r t han C D C C C B C E C SB (a) Maximum change in ranking F r equen cy C D C C C B C E C SB (b) Perturbation Size (1000 x δ ) P r obab ili t y m a x c hange g r ea t e r t han C D C C C B C E C SB (c) Fig. 5: Comparison of stability indicators in real-world networks for all centrality measures: degree (green circle), closeness(purple right triangle), betweenness (orange upwards triangle), eigenvector (yellow left triangle), and stable betweenness (cyandownwards triangle). (a) Probability that the maximum change in the ranking of the airport network G A exceeds 1 position asa function of the perturbation size. (b) Histogram of the maximum change recorded when perturbing the airport network G A with δ = 0 . . (c) Probability that the maximum change in the ranking when perturbing the economic network G (cid:48) I exceeds3 positions as a function of the perturbation size.perturbation is introduced whereas the largest variation for allother measures combined is of 8 positions. This is an empiricalexample of instability as shown in Proposition 3.Another indicator we analyze is the position where thechange in the ranking occurs. A change towards the lastpositions of the ranking is irrelevant whereas a change varyingthe positions of the most central nodes carries importantimplications. In Fig. 3f, we plot the probability that the top 5nodes in the ranking retain their positions after perturbing thenetwork with type 1 noise. Observe that there is no clear trendwith the size of the network but probabilities oscillate arounddifferent values for different centrality measures. In this way,we can state that for around 75% to 95% of the networksthere is no change in the top 5 centrality ranking computedwith all measures except for betweenness centrality where thispercentage falls to around 60% on average.The same conclusions can be extracted when perturbingthe networks with type 2 noise; see Fig. 4. Even thoughthe difference between C B and the rest of the measures isnot as marked as with type 1 noise, it is immediate thatbetweenness centrality entails the largest maximum changefor every network size (Fig. 4a) as well as the largest averagechange (Fig. 4b). Also, the probabilities of having changes inthe ranking greater than 10 positions is consistently around0.25 larger in C B compared to centralities based in shortestpaths – C C and C SB – while this probability is negligible for C D and C E (Fig. 4c).Similar behaviors in the presence of noise can be observedwhen analyzing the real-world data; see Fig. 5. Notice thatfor graphs G A and G I , the network size is fixed. Thus, weanalyze performance metrics as a function of the magnitudeof the perturbation. A perturbation magnitude of δ impliesthat every weight in the network is multiplied by a randomnumber in [1 − δ, δ ] . For every perturbation level, wegenerate 100 perturbed networks. In Fig. 5a, we compute theprobability of observing a change in the ranking of more than1 position as a function of the magnitude of the perturbation.As expected, the probability of observing a change in thenetwork increases with the perturbation magnitude. Moreover, for a fixed magnitude of perturbation, larger probabilitiesof variations are observed in the rankings generated by C B compared to those generated by all other measures. E.g., for aperturbation of 0.035, 85% of the rankings generated by C B presented a change greater than 1 position whereas among theother measures, C SB presented the greater variation with only50% of the analyzed networks. To clarify this plot, in Fig. 5bwe present the histogram of maximum changes observed for aperturbation of δ = 0 . . For example, all networks analyzedpresented either no change or a change of only one positionfor C D , thus, the corresponding marker for degree centrality inFig. 5a is at null probability for δ = 0 . . Similarly, only 15networks out of the 100 analyzed presented either no changeor just 1 position change for C SB , resulting in the probabilityof 0.85 for changes greater than one position plotted in Fig.5a. As was the case for random networks [cf. Fig. 3e], thehistogram corresponding to measure C B presents a longer tailthan the rest which is an empirical proof of instability. Weapplied this same procedure to the second real-world network G I . Notice that G I is directed, thus, in order to compareall centrality measures including eigenvector centrality, wesymmetrized G I into G (cid:48) I by generating undirected edges withweights equal to the mean of the weights in both directions. InFig. 5c we plot the probability that G (cid:48) I experiences a changeof more than 3 positions in the ranking for varying perturba-tion magnitudes. As expected, this probability is consistentlyhighest for C SB , and the difference with other measures ismaximized for perturbation of δ = 0 . and smaller. B. Effects of continuity and stability
The previous experiment points towards the conclusion that,in practice, stable and continuous centrality measures outputcentrality rankings with variations which are less meaningfuland smaller in magnitude than those obtained with non contin-uos and non stable measures such as betweenness centrality.Given that betweenness centrality is neither continuous norstable and the rest of the measures analyzed are both stableand continuous, it is unclear the lack of which property is Network size A v e r age c hange i n r an k i ng C D (type 1)C SD (type 1)C FD (type 1)C D (type 2)C SD (type 2)C FD (type 2) Fig. 6: Average change in ranking for degree, degree squaredand floor degree centrality measures under two types of noise.For small perturbations (blue), degree squared behaves similarto degree centrality. For larger perturbations (red), degreesquared behaves similar to floor degree centrality.responsible for the low robustness of betweenness centrality.In order to answer this question, we compare three centralitymeasures: degree centrality C D which is both continuousand stable, degree squared centrality C DS as defined in (47)which is continuous but not stable, and floor degree centrality C F D which is neither continuous nor stable and we defineas follows. For every node x ∈ V in an arbitrary graph ( V, E, W ) , we have that C F D ( x ) := (cid:88) x (cid:48) | ( x,x (cid:48) ) ∈ E floor (cid:0) W ( x, x (cid:48) ) (cid:1) . (49)The fact that C F D is a non-continuous centrality measure isimmediate from the discontinuity in the definition of the floorfunction. In Fig. 6 we plot the average change in rankingsoutput by the three measures when perturbing networks ofdifferent sizes. The results for small noise uniform acrossedges (type 1) is plotted in blue while the result for largerand sparser noise (type 2) is plotted in red. As expected,degree centrality has the higher robustness followed by degreesquared and floor degree being the less robust of the threemeasures under both types of noise. However, notice that fornoise of small magnitude (type 1) the degree squared behavesmore similar to degree centrality, showing a robust behaviorin the presence of noise. For larger magnitudes of noise (type2), degree squared centrality has a similar behavior to theunstable floor degree centrality. This points towards the factthat continuity provides robustness under small perturbationswhile the stronger concept of stability provides robustness formore general perturbations.
C. Ranking similarity across measures
In order to compare the centrality rankings across differentmeasures, we pick 100 random networks of size 100 nodesand compute the average and maximum change for a pair ofrankings output by different measures; see Table I. E.g., in TABLE I: Average and maximum variation of centrality rank-ing across different measures for networks with 100 nodes.The upper triangular part of the table informs the averagevariation while the lower triangular part informs the maximumvariation for the corresponding pair of measures in the rowsand columns. C D C C C B C E C SB C DS C D C C C B C E C SB C DS this 100 samples the mean average ranking variation of nodesranked by the degree C D and the eigenvector C E centralities is7.3 positions. Moreover, the mean maximum variation betweentwo given rankings output by the betweenness C B and thecloseness C C centrality is 41.6 positions. Notice that thesmallest variations – both in average and maximum – areachieved when comparing the rankings of the betweenness C B and the stable betweenness C SB centrality measures.This is empirical proof that both measures encode a similarcentrality concept, as was our objective when defining stablebetweenness centrality in Section IV. Further observe thatthe variations between these two rankings are even smallerthan the ones between degree C D and squared degree C DS centrality, two measures with closely related definitions [cf.(5) and (47)].To complete the analysis, we use the economic network G I to illustrate the fact that the centrality concept in the proposedmeasure C SB closely resembles the one in the traditionalbetweenness centrality C B ; see Table II. To avoid introducingartifacts through symmetrization, we consider the originalnetwork G I instead of the symmetrized version G (cid:48) I , hence,the eigenvector centrality is not informed. Stable betweennesscentrality C SB provides the ranking closer to the one outputby C B . Both measures share the top 3 economic sectors and4 out of 5 sectors in the top 5. In contrast, none of the othermeasures – closeness, out-degree, and in-degree – contain thethree sectors preferred by C B in their top 5 ranking.VII. C ONCLUSION
Stability, as a formal characterization of the robustness ofnode centrality measures, was introduced. The most commonlyused centrality measures were shown to be stable with theexception of betweenness centrality, thus, a stable alternativedefinition was proposed. A milder continuity property wasintroduced and betweenness centrality was shown not to becontinuous. We illustrated the stability difference between be-tweenness centrality and the rest of the measures by studyingindicators in both random and real-world networks. Moreover,by proposing alternative definitions of degree centrality, thepractical differences between continuity and stability wereexemplified. Finally, by comparing the centrality rankingsoutput by different measures, it was shown that stable between-ness preserves the centrality notion encoded in traditional TABLE II: Comparison of the centrality rankings for the economic network G I . The ranking output by C SB is the closestone to the C B ranking. C B and C SB share the top 3 ranking while these three economic sectors are not contained in the top5 ranking of any other measure. Rank C B C SB C C C OD C ID Food & Beverage
Oil and gas extraction
Real Estate Real Estate3 Professional serv. Construction
Petroleum products Oil and gas extraction Petroleum products4 Wholesale trade Petroleum products Administrative serv. FR banks, credits Chemical products5 FR banks, credits FR banks, credits
Real estate
Administrative services
Construction betweenness but has the additional practical advantage ofstability. R
EFERENCES[1] S. Segarra and A. Ribeiro, “A stable betweenness centrality measure innetworks,” in
Acoustics, Speech and Signal Processing (ICASSP), 2014IEEE International Conference on . Florence, Italy, May 4-9 2014, pp.3859–3863.[2] K. S. Cook, R. M. Emerson, M. R. Gillmore, and T. Yamagishi, “Thedistribution of power in exchange networks: theory and experimentalresults,”
American Journal of Sociology , vol. 89, no. 2, pp. 275–305,1983.[3] C. J. Garroway, J. Bowman, D. Carr, and P. J. Wilson, “Applications ofgraph theory to landscape genetics,”
Evolutionary Applications , vol. 1,no. 4, pp. 620–630, 2008.[4] P. Holme, B. J. Kim, C. N. Yoon, and S. K. Han, “Attack vulnerabilityof complex networks,”
Phys. Rev. E , vol. 65, p. 056109, May 2002.[5] M. E. Shaw, “Group structure and the behavior of individuals in smallgroups,”
The Journal of Psychology , vol. 38, no. 1, pp. 139–149, 1954.[6] U. Nieminen, “On the centrality in a directed graph,”
Social ScienceResearch , vol. 2, no. 4, pp. 371 – 378, 1973.[7] G. Sabidussi, “The centrality index of a graph,”
Psychometrika , vol. 31,no. 4, pp. 581–603, 1966.[8] M. A. Beauchamp, “An improved index of centrality,”
BehavioralScience , vol. 10, no. 2, pp. 161–163, 1965. [Online]. Available:http://dx.doi.org/10.1002/bs.3830100205[9] P. Bonacich, “Factoring and weighting approaches to clique identifica-tion,”
Journal of Mathematical Sociology , vol. 2, pp. 113–120, 1972.[10] L. C. Freeman, “A set of measures of centrality based on betweenness,”
Sociometry , vol. 40, no. 1, pp. 35–41, 1977.[11] E. Costenbader and T. W. Valente, “The stability of centrality measureswhen networks are sampled,”
Social Networks , vol. 25, no. 4, pp. 283– 307, 2003.[12] S. P. Borgatti, K. M. Carley, and D. Krackhardt, “On the robustnessof centrality measures under conditions of imperfect data,”
SocialNetworks
Social Networks
Social Networks
Social Networks
Sociometry
TheJournal of the Acoustical Society of America , vol. 22, no. 6, pp.725–730, 1950. [Online]. Available: http://scitation.aip.org/content/asa/journal/jasa/22/6/10.1121/1.1906679[18] D. R. White and S. P. Borgatti, “Betweenness centrality measures fordirected graphs,”
Social Networks
Social Networks , vol. 30, no. 2, pp. 136 –145, 2008.[20] J. P. Onnela, J. Saramki, J. Hyvnen, G. Szab, D. Lazer, K. Kaski,J. Kertsz, and A. L. Barabsi, “Structure and tie strengths in mobilecommunication networks,”
Proceedings of the National Academy ofSciences
Phys. Rev. E , vol. 64, p.016132, Jun 2001. [Online]. Available: http://link.aps.org/doi/10.1103/PhysRevE.64.016132[22] P. Bonacich and P. Lloyd, “Eigenvector-like measures of centrality forasymmetric relations,”
Social Networks
Nonnegative Matrices inthe Mathematical Sciences . Society for Industrial and AppliedMathematics, 1994. [Online]. Available: http://epubs.siam.org/doi/abs/10.1137/1.9781611971262[24] J. Ortega,
Numerical Analysis: A Second Course , ser. Classics in AppliedMathematics. Society for Industrial and Applied Mathematics, 1990.[25] D. B. West,
Introduction to graph theory . Prentice hall Upper SaddleRiver, 2001, vol. 2.[26] U. Brandes, “A faster algorithm for betweenness centrality,”
Journal ofMathematical Sociology , vol. 25, pp. 163–177, 2001.[27] R. W. Floyd, “Algorithm 97: shortest path,”
Commun. ACM ,vol. 5, no. 6, p. 345, Jun 1962. [Online]. Available: http://doi.acm.org/10.1145/367766.368168[28] S. Warshall, “A theorem on boolean matrices,”
J. ACM , vol. 9, no. 1,pp. 11–12, Jan 1962. [Online]. Available: http://doi.acm.org/10.1145/321105.321107[29] D. B. Johnson, “Efficient algorithms for shortest paths in sparsenetworks,”
J. ACM , vol. 24, no. 1, pp. 1–13, Jan. 1977. [Online].Available: http://doi.acm.org/10.1145/321992.321993[30] R. Horn and C. Johnson,
Matrix Analysis . Cambridge University Press,2012.[31] V. Colizza, R. Pastor-Satorras, and A. Vespignani, “Reaction–diffusionprocesses and metapopulation models in heterogeneous networks,”
Na-ture Physics , vol. 3, no. 4, pp. 276–282, 2007.[32] B. of Economic Analysis, “Input-output accounts: the use of commodi-ties by industries before redefinitions,”