[PDF] Co-Betweenness: A Pairwise Notion of Centrality

Abstract

Betweenness centrality is a metric that seeks to quantify a sense of the importance of a vertex in a network graph in terms of its "control" on the distribution of information along geodesic paths throughout that network. This quantity however does not capture how different vertices participate together in such control. In order to allow for the uncovering of finer details in this regard, we introduce here an extension of betweenness centrality to pairs of vertices, which we term co-betweenness, that provides the basis for quantifying various analogous pairwise notions of importance and control. More specifically, we motivate and define a precise notion of co-betweenness, we present an efficient algorithm for its computation, extending the algorithm of Brandes in a natural manner, and we illustrate the utilization of this co-betweenness on a handful of different communication networks. From these real-world examples, we show that the co-betweenness allows one to identify certain vertices which are not the most central vertices but which, nevertheless, act as important actors in the relaying and dispatching of information in the network.

Full PDF

aa r X i v : . [ c s . N I] S e p Co-Betweenness: A Pairwise Notion of Centrality

Eric D. Kolaczyk, David B. Chua, and Marc Barth´elemy Dept. of Mathematics and StatisticsBoston UniversityBoston, MA, USA Boston, MA CEA-Centre d’Etudes de Bruy`eres-le-Chˆatel, D´epartement de PhysiqueTh´eorique et Appliqu´ee BP12, 91680 Bruy`eres-Le-Chˆatel, France (Dated: November 11, 2018)Betweenness centrality is a metric that seeks to quantify a sense of the importance of a vertexin a network graph in terms of its ‘control’ on the distribution of information along geodesic pathsthroughout that network. This quantity however does not capture how diﬀerent vertices participate together in such control. In order to allow for the uncovering of ﬁner details in this regard, we intro-duce here an extension of betweenness centrality to pairs of vertices, which we term co-betweenness ,that provides the basis for quantifying various analogous pairwise notions of importance and control.More speciﬁcally, we motivate and deﬁne a precise notion of co-betweenness, we present an eﬃcientalgorithm for its computation, extending the algorithm of [1] in a natural manner, and we illustratethe utilization of this co-betweenness on a handful of diﬀerent communication networks. From thesereal-world examples, we show that the co-betweenness allows one to identify certain vertices whichare not the most central vertices but which, nevertheless, act as important actors in the relayingand dispatching of information in the network.

PACS numbers:

I. INTRODUCTION

In social network analysis, the problem of determiningthe importance of actors in a network has been studiedfor a long time (see, for example, [2]). It is in this contextthat the concept of the centrality of a vertex in a networkemerged. There are numerous measures that have beenproposed to numerically quantify centrality which diﬀerboth in the nature of the underlying notion of verteximportance that they seek to capture, and in the mannerin which that notion is encoded through some functionalof the network graph. See [3], for example, for a recentreview and categorization of centrality measures.Paths – as the routes by which ﬂows (e.g., of informa-tion or commodities) travel over a network – are funda-mental to the functioning of many networks. Therefore,not surprisingly, a number of centrality measures quan-tity importance with respect to the sharing of paths inthe network. One popular measure is betweenness cen-trality . First introduced in its modern form by [4], thebetweenness centrality is essentially a measure of howmany geodesic (ie., shortest) paths run over a given ver-tex. In other words, in a social network for example,the betweenness centrality measures the extent to whichan actor “lies between” other individuals in the network,with respect to the network path structure. As such, itis a measure of the control that actor has over the distri-bution of information in the network.The betweenness centrality – as with all other central-ity measures of which we are aware – is deﬁned speciﬁ-cally with respect to a single given vertex. In particular,vertex centralities produce an ordering of the vertices interms of their individual importance, but do not provide insight into the manner in which vertices act together inthe spread of information across the network. Insight ofthis kind can be important in presenting an appropriatelymore nuanced view of the roles of the diﬀerent vertices,beyond their individual importance. A ﬁrst natural ex-tension of the idea of centrality in this manner is to pairsof vertices.In this paper, we introduce such an extension, whichwe term the co-betweenness centrality , or simply the co-betweenness . The co-betweenness of two vertices is essen-tially a measure of how many geodesic paths are sharedby the vertices, and as such provides us with a sense of theinterplay of vertices across the network. For example, theco-betweenness alone quantiﬁes the extent to which pairsof vertices jointly control the distribution of informationin the network. Alternatively, a standardized version ofco-betweenness produces a well-deﬁned measure of cor-relation between ﬂows over the two vertices. Finally, analternative normalization quantiﬁes the extent to whichone vertex controls the distribution of information to an-other vertex.This paper is organized as follows. In Section II,we brieﬂy review necessary technical background. InSection III, we provide a precise deﬁnition for theco-betweenness and related measures, and motivateeach in the context of an Internet communication net-work. An algorithm for the eﬃcient computation of co-betweenness, for all pairs of vertices in a network, issketched in Section IV, and its properties are discussed.In Section V, we further illustrate our measures usingtwo social networks whose ties are reﬂective of commu-nication. Some additional discussion is provided in Sec-tion VI. Finally, a formal description of our algorithm,as well as pseudo-code, may be found in the appendix.

II. BACKGROUND

Let G = ( V , E ) denote an undirected, connected net-work graph with n v vertices in V and n e edges in E .A walk on G , from a vertex v to another vertex v ℓ ,is an alternating sequence of vertices and edges, say { v , e , v , . . . , v ℓ − , e ℓ , v ℓ } , where the endpoints of e i are { v i − , v i } . The length of this walk is said to be ℓ . A trail is a walk without repeated edges, and a path , a trailwithout repeated vertices. A shortest path between twovertices u, v ∈ V is a path between u and v whose length ℓ is a minimum. Such a path is also called a geodesic andits length, the geodesic distance between u and v . In thecase that the graph G is weighted i.e., there is a collectionof edge weights { w e } e ∈E , where w e ≥

0, shortest pathsmay be instead deﬁned as paths for which the total sumof edge weights is a minimum. In the material that fol-lows, we will restrict our exposition primarily to the caseof unweighted graphs, but extensions to weighted graphsare straightforward. For additional background of thistype, see, for example, the textbook [5].Let σ st denote the total number of shortest paths thatconnect vertices s and t (with σ ss ≡ σ st ( v )denote the number of shortest paths between s and t thatalso run over vertex v . Then we deﬁne the betweennesscentrality of a vertex v as a weighted sum of the numberof paths through v , B ( v ) = X s,t ∈V\{ v } σ st ( v ) σ st . (1)Note that this deﬁnition excludes the shortest paths thatstart or end at v . However, in a connected graph wewill have σ st ( v ) = σ st whenever s = v or t = v , sothe exclusion amounts to removing a constant term thatwould otherwise be present in the betweenness centralityof every vertex.As an illustration, which we will use throughout thissection and the next, consider the network in Figure 1.This is the Abilene network, an Internet network that ispart of the Internet2 project [ ? ], a research project de-voted to development of the ‘next generation’ Internet.It serves as a so-called ‘backbone’ network for universitiesand research labs across the United States, in a manneranalogous to the federal highway system of roads. We usethis network for illustration because, as a technologicalcommunication network, the notions of connectivity, in-formation, ﬂows, and paths are all explicit and physical,and hence facilitate our initial discussion of betweennessand co-betweenness. Later, in Section V, we will illus-trate further with two communication networks from thesocial network literature.The information traversing this network takes the formof so-called ‘packets’, and the packets ﬂow between ori-gins and destinations on this network along paths strictlydetermined according to a set of underlying routing pro-tocols (Technically, the Abilene network is more accu-rately described by a directed graph. But, given the fact IndianapolisHoustonLos AngelesSunnyvaleSeattle Denver Chicago New YorkWash. DCAtlantaKansas City

FIG. 1: Graph representation of the physical topology of theAbilene network. Nodes represent regional network aggrega-tion points (so-called ‘Points-of-Presence’ or PoP’s), and arelabeled according to their metropolitan area, while the edgesrepresent systems of optical transportation technologies androuting devices. that routing is typically symmetric in this network, wefollow the Internet2 convention of displaying Abilene us-ing an undirected graph.). A reasonable ﬁrst approxima-tion of the routing of information in this network is withrespect to a set of unique shortest paths. In this case,the betweenness B ( v ) of any given vertex v ∈ V will beexactly equal to the number of shortest paths through v . The vertices in Figure 1 correspond to metropolitanregions, and have been laid out roughly with respect totheir true geographical locations. Intuitively and accord-ing to earlier work on centrality in spatial networks [7],one might suspect that vertices near the central portionof the network, such as Denver or Indianapolis, havelarger betweenness, being likely forced to support most ofthe ﬂows of communication between east and west. Wewill see in Section III that such is indeed the case.Until recently, standard algorithms for computing be-tweenness centralities B ( v ) for all vertices in a networkhad O ( n v ) running times, which was a stumbling block totheir application in large-scale network analyses. Fasteralgorithms now exist, such as those introduced in [1],which have running time of O ( n v n e ) on unweighted net-works and O ( n v n e + n v log n v ) on weighted networks,with an O ( n v + n e ) space requirement. These improve-ments derive from exploiting a clever recursive relationfor the partial sums P t ∈V σ s,t ( v ) /σ s,t . As we will see,the need for eﬃcient algorithms is even more importantin the case of the co-betweenness, and we will make simi-lar usage of recursions in developing an eﬃcient algorithmfor computing this quantity. III. CO-BETWEENNESS

We extend the concept of vertex betweenness centralityto pairs of vertices u and v by letting σ st ( u, v ) denote thenumber of shortest paths between vertices s and t thatpass through both u and v , and deﬁning the vertex co- IndianapolisHoustonLos AngelesSunnyvaleSeattle Denver Chicago New YorkWash. DCAtlantaKansas City

FIG. 2: Graph representation of the betweenness and co-betweenness values for the Abilene network. Vertices are inproportion to their betweeness. The width of each link isdrawn in proportion to the co-betweenness of the two verticesincident to it. betweenness as C ( u, v ) = X s,t ∈V\{ u,v } σ st ( u, v ) σ st . (2)Thus co-betweenness gives us a measure of the numberof shortest paths that run through both vertices u and v .To gain some insight into the relation between be-tweenness and co-betweenness, consider the following sta-tistical perspective. Recall the Abilene network describedin the previous section, and suppose that x s,t is a mea-sure of the information (i.e., Internet packets) ﬂowingbetween vertices s and t in the network. Similarly, let y v be the total information ﬂowing through vertex v . Next,deﬁne x to be the n p × x s,t , where n p is the total number of pairs of vertices exchanging infor-mation, and y , to be the n v × y v . Acommon expression modeling the relation between thesetwo quantities is simply y = R x , where R is an n v × n p matrix (i.e., the so-called ‘routing matrix’) of 0’s and 1’s,indicating through which vertices each given routed pathgoes.Now if x is considered as a random variable, with un-correlated elements, then its covariance matrix is simplyequal to the n p × n p identity matrix. The elements of y ,however, will be correlated, and their covariance matrixtakes the form Ω = RR T , by virtue of the linear relationbetween y and x . Importantly, note that the diagonalelements of Ω are the betweenness’ B ( v ). Furthermore,the oﬀ-diagonal elements are the co-betweenness’ C ( u, v ).When shortest paths are not unique, the same resultshold if the matrix R is expanded so that each shortestpath between a pair of vertices s and t is aﬀorded a sepa-rate column, and the non-zero entries of each such columnhas the value σ − s,t , rather than 1. In this case, R may beinterpreted as a stochastic routing matrix.To illustrate, in Figure 2, we show a network graphrepresentation of the matrix Ω for the Abilene network.The vertices are again placed roughly with respect totheir actual geographic location, but are now drawn in proportion to their betweenness. Edges between pairs ofvertices now represent non-zero co-betweenness for thepair, and are also drawn with a thickness in proportionto their value. A number of interesting features are ev-ident from this graph. First, we see that, as surmisedearlier, the more centrally located vertices tend to havethe largest betweenness values. And it is these verticesthat typically are involved with the larger co-betweennessvalues. Since the paths going through both a vertex s anda vertex t are a subset of the paths going through eitherone or the other, this tendancy for large co-betweennessto associate with large betweenness should not be a sur-prise. Also note that the co-betweenness values tend tobe smaller between vertices separated by a larger geo-graphical distance, which again seems intuitive.Somewhat more surprising perhaps, however, is themanner in which the network becomes disconnected. TheSeattle vertex is now isolated, as there are no paths thatroute through that vertex – only to and from. Addi-tionally, the vertices Houston, Atlanta, and Washingtonnow form a separate component in this graph, indicat-ing that information is routed on paths running throughboth the ﬁrst two and the last two, but not through allthree, and also not through any of these and some othervertex. Overall, one gets the impression of informationbeing routed primarily over paths along the upper por-tion of the network in Figure 1. A similar observationhas been made in [8], using diﬀerent techniques.While the raw co-betweenness values appear to bequite informative, one can imagine contexts in which itwould be useful to compare co-betweenness’ across pairsof vertices in a manner that adjusts for the unequal be-tweenness of the participating vertices. The value C corr ( u, v ) = C ( u, v ) p B ( u ) B ( v ) (3)is a natural candidate for a standardized version of the co-betweeness in (2), being simply the corresponding entryof the correlation matrix deriving from Ω = RR T .Figure 3 shows a network graph representation of thequantities in C corr for the Abilene network, with edgesagain drawn in proportion to the values and vertices nownaturally all drawn to be the same size. Much of thisnetwork looks like that in Figure 2. The one notableexception is that the magnitude of the values betweenthe three vertices in the lower subgraph component arenow of a similar order to most of the other values in theother component. This fact may be interpreted as in-dicating that among themselves, adjusting for the lowerlevels of information ﬂowing through this part of the net-work, these vertices are as strongly ‘correlated’ as manyof the others.The co-betweenness may also be used to deﬁne a di-rected notion of the strength of pairwise relationships.Let C ( u | v ) = C ( u, v ) B ( v ) (4) IndianapolisHoustonLos AngelesSunnyvaleSeattle Denver Chicago New YorkWash. DCAtlantaKansas City

FIG. 3: Graph representation of the standardized co-betweenness values C corr for the Abilene network. Verticesare all drawn with equal size. Edge width is drawn in propor-tion to the standardized co-betweenness of the two verticesindicent to it. NewYorkChicagoWashingtonDC Atlanta IndianapolisHoustonKansasCityDenverSunnyvaleLosAngelesSeattle

FIG. 4: Directed graph representation of the conditional be-tweenness values C ( u | v ) (given by Eq. (4)) for the Abilenenetwork. Edges are drawn with width in proportion to theirvalue of C ( u | v ) and indicate how one vertex (at the head)controls the ﬂow of information through another (at the tail). denote the relative proportion of shortest paths through v that also go through u . This quantity may be inter-preted as a measure of the control that vertex v has overthe information that passes through vertex u . Alterna-tively, under uniqueness of shortest paths, if from amongthe set of shortest paths through v one is chosen uni-formly at random, the value C ( u | v ) is the probabilty thatthe chosen path will also go through u . We call C ( u | v )the conditional betweenness of u , given v . Note that, ingeneral, C ( u | v ) = C ( v | u ).Figure 4 shows a graph representation of the values C ( u | v ) for the Abilene network. Due to the asymmetryof these values in u and v , arcs are used, rather thanedges, with an arc from v to u corresponding to C ( u | v ).The thickness of the arcs is proportional to these values,and is therefore indicative of the control exercised on thevertex at the tail by the vertex at the head. For improved visualization, we have used a simple circular layout forthe vertices. Examination of this ﬁgure shows symmetryin the relationships between some pairs of vertices, buta strong asymmetry between most others. For example,vertices like Indianapolis, which were seen previously tohave a large betweenness, clearly exercise a strong degreeof control over almost any other vertices with which theyshare paths. More interestingly, note that certain ver-tices that are neighbors in the original Abilene networkhave more symmetric relationships than others. The con-ditional betweenness’ for Atlanta and Washington, DC,are fairly similar in magnitude, while those for Los An-geles and Sunnyvale are quite dissimilar, with the latterevidently exercising a noticeably greater degree of controlover the former. IV. COMPUTATION OF CO-BETWEENNESS

We discuss here the calculation of the co-betweennessvalues C ( u, v ) in (2), for all pairs ( u, v ), from which theother quantities in (3) and (4) follow trivially. At a ﬁrstglance, it would appear that an algorithm of O ( n v ) run-ning time is necessary, given that the number of ver-tex pairs grows as the square of the number of vertices.Such an implementation would render the notion of co-betweenness infeasible to implement in any but networkgraphs of relatively modest size. However, exploitingideas similar to those underlying the algorithms of [1]for calculating the betweenness’ B ( v ), a decidedly moreeﬃcient implementation may be obtained, as we now de-scribe brieﬂy. Details may be found in the appendix.Our algorithm for computing co-betweenness involvesa three-stage procedure for each vertex v ∈ V . In the ﬁrststage, we perform a breadth-ﬁrst traversal of the networkgraph G , to quickly compute intermediary quantities suchas σ sv , the number of shortest paths from a source s toeach other vertex v in the network; in the process weform a directed acyclic graph that contains all shortestpaths leading from vertex s . In the second stage, we it-erate through each vertex in order of decreasing distancefrom s and compute a score δ s ( v ) for each vertex that isrelated to its contribution to the co-betweenness. Thesecontributions are then aggregated in a depth-ﬁrst traver-sal of the directed acyclic graph, which is carried out inthe third and ﬁnal stage.In order to compute the number of shortest paths σ sv in the ﬁrst stage, we note that the number of shortestpaths from s to a vertex v is the sum of all shortestpaths to each parent of v in the directed acyclic graphrooted at s , namely, σ sv = X t ∈ p s ( v ) σ st . (5)In the case of an undirected graph, this can be computedin the course of a breadth-ﬁrst search with a running timeof O ( n e ).In the second stage, we compute δ s ( v ) using the recur-sive relation established in Theorem 6 of [1], δ s ( v ) = X w ∈ c s ( w ) σ sv σ sw (1 + δ s ( w )) , (6)where c s ( v ) denotes the set of child vertices of v in thedirected acyclic graph rooted at s .Finally, in the third stage, we compute the co-betweennesses by interpreting the relation C ( u, v ) = X s ∈V\{ u,v } δ s ( v ) σ sv σ sv ( u ) (7)as assigning a contribution of δ s ( v ) σ sv to C ( u, v ) for each ofthe σ sv ( u ) shortest paths to v that run through u . Weaccumulate these contributions at each step of the depth-ﬁrst traversal when we visit a vertex v by adding δ s ( v ) σ sv to C ( u, v ) for every ancestor u of the current vertex v .Our proposed algorithms exploit recursions analogousto those of [1] to produce run-times that are in the worstcase O ( n v ), but in empirical studies were found to varylike O ( n v n e + n pv log n v ) in general, or O ( n pv log n v )in the case of sparse graphs. Here p is related to the totalnumber of shortest paths in the network and seems to liecomfortably between 0 . . O ( n v n e + n v log n v ), and O ( n v log n v ) if the network is sparse as well as ‘small-world’ (i.e., with diameter of size O (log n v )). See theappendix for details. V. ADDITIONAL ILLUSTRATIONS

We provide in this section additional illustration ofthe use of co-betweenness, based on two other networksgraphs. Both graphs originally derive from social net-work analyses in which one goal was to understand theﬂow of certain information among actors.

A. Michael’s

Strike Network

Our ﬁrst illustration involves the strike dataset of [9],which is also analyzed in detail in Chapter 7 of [10]. Newmanagement took over at a forest products manufactur-ing facility, and this management team proposed certainchanges to the compensation package of the workers. Thechanges were not accepted by the workers, and a strikeensued, which was then followed by a halt in negotia-tions. At the request of management, who felt that theinformation about their proposed changes was not be-ing communicated adequately, an outside consultant an-alyzed the communication structure among 24 relevantactors.The social network graph in Figure 5 represents thecommunication structure among these actors, with an

DomingoCarlosAlejandroEduardo BobMikeIke HalGillFrank JohnLanny KarlOzzieNormPaul QuintRussTedVern Sam* Wendle*XavierUltrecht

FIG. 5: Original strike-group communication networkof [9]. Three subgroups are represented in this net-work: younger, Spanish-speaking employees (black vertices),younger, English-speaking employees (gray vertices), andolder, English-speaking employees (white vertices). The twounion negotiators, Sam and Wendle, are indicated by asterix’next to their names. Edges indicate that the two incident ac-tors communicated at some minimally suﬃcient level of fre-quency about the strike. edge between two actors indicating that they commu-nicated at some minimally suﬃcient level of frequencyabout the strike. Three subgroups are present in the net-work: younger, Spanish-speaking employees (black ver-tices), younger, English-speaking employees (gray ver-tices), and older, English-speaking employees (white ver-tices). In addition, the two union negotiators, Sam andWendle, are indicated by asterix’ next to their names.It is these last two that were responsible for explainingthe details of the proposed changes to the employees.When the structure of this network was revealed, two ad-ditional actors – Bob and Norm – were approached, hadthe changes explained to them, which they then discussedwith their colleagues, and within two days the employeesrequested that their union representatives re-open nego-tiations. The strike was resolved soon thereafter.That such a result could follow by targeting Bob andNorm is not entirely surprising, from the perspective ofthe network structure. Both are cut-vertices (i.e., theirremoval would disconnect the network), and are inci-dent to edges serving as bridges (i.e., their removal simi-larly would disconnect the network) from their respectivegroups to at least one of the other groups.Co-betweenness provides a useful alternative charac-terization, one which explicitly emphasizes the patternsof communication in the network, as shown in Figure 6.As with Figure 2, vertices (now arranged in a circular lay-out) are drawn in proportion to their betweenness, andedges, to their co-betweenness. Bob and Norm clearlyhave the largest betweenness values, followed by Ale-jandro, who we remark also is a cut-vertex, but inci-

DomingoCarlosAlejandroEduardo Bob MikeIke Hal Gill FrankJohnLannyKarlOzzieNormPaulQuintRussTedVernSam*Wendle*XavierUltrecht

FIG. 6: Co-betweenness for the strike-group communicationnetwork. Actors located apart from the network, in the cor-ners, are isolated under this representation, as they have zerobetweenness and hence no co-betweenness with any other ac-tors. (Note: Isolated vertices are drawn to have unit diameter,and not in proportion to their (zero) betweenness.) dent to a bridge to a smaller subnetwork than the othertwo (i.e., four younger Spanish-speakers, in comparisonto nine younger English-speakers and 11 older English-speakers, for Bob and Norm, respectively). The impor-tance of these three actors on the communication processis evident from the distinct triangle formed by their largeco-betweenness values. Note that for the two union rep-resentatives, the co-betweenness values suggest that Samalso plays a non-trivial role in facilitating communica-tion, but that Wendle is not well-situated in this regard.In fact, Wendle is not even connected to the main com-ponent of the graph, since his betweenness is zero (as isalso true for six other actors).A plot of the standardized co-betweenness C corr showssimilar patterns overall, and we have therefore not in-cluded it here. The conditional betweenness C ( u | v ) forthis network primarily shows most of the actors withlarge arcs pointing to Bob and Norm, and much smallerarcs pointing the opposite direction. This pattern furtherconﬁrms the inﬂuence that these two actors can have onthe other actors in the communication process. How-ever, there are also some interesting asymmetrical rela-tionships among the actors with smaller parts. For ex-ample, consider Figure 7, which shows the conditionalbetweenness among the older English-speaking employ-ees. Ultrecht, for example, clearly has potential for alarge amount of control on the communication of infor-mation passing through Russ, and similarly, Karl, on thatthrough John. Ozzie Norm Paul Quint Russ TedVernSam*Wendle*XavierUltrecht

FIG. 7: Conditional co-betweenness for the older English-speaking actors in the strike-group communication network. a1a2 a3 a4a5a6 a7 a8 a9 a10a11 a12 a13a14 a15a16a17 a18 a19a20 a21a22 a23a24a25a26 a27a28 a29 a30 a31a32 a33a34

FIG. 8: Karate club network of [11]. The gray vertices repre-sent members of one of the two smaller clubs and the whitevertices represent members who went to the other club. Theedges are drawn with a width proportional to the number ofsituations in which the two members interacted.

B. Zachary’s

Karate Club

Network

Our second illustration uses the karate club datasetof [11]. Over the course of a couple of years in the 1970s,Zachary collected information from the members of a uni-versity karate club, including the number of situations(both inside and outside of the club) in which interac-tions occurred between members. During the course ofthis study, there was a dispute between the club’s admin-istrator and the principal karate instructor. As a result,the club eventually split into two smaller clubs of approx-imately equal size—one centered around the administra-tor and the other centered around the instructor.Figure 8 displays the network of social interactions be-tween club members. The gray vertices represent mem- a1a2 a3a4 a5a6a7a8a9a10 a11a12 a13a14 a15 a16a17 a18 a19a20 a21a22 a23a24 a25a26 a27a28a29 a30a31 a32a33a34

FIG. 9: Co-betweenness for the karate club network. Actorsin the upper-left and lower-right corners, separated from theconnected component, are isolated due to zero betweenness.The two actors in the lower right-hand corner (i.e., a a

11) have non-zero betweenness, but are bridges, in the sensethat they only serve to connect to other vertices, and hencehave zero co-betweenness. (Note: The vertices for actors withzero betweenness are drawn to have unit diameter, for pur-poses of visibility.) bers of one of the two smaller clubs and the white ver-tices represent members who went to the other club. Theedges are drawn with a width proportional to the numberof situations in which the two members interacted. Thegraph clearly shows that the original club was alreadypolarized into two groups centered about actors 1 and34, who were the key players in the dispute that split theclub in two.The co-betweenness for this network is shown in Fig-ure 9. As in Figure 8, the layout is done using an energyminimization algorithm. Again, as in our other examples,the co-betweenness entries are dominated by a handfulof larger values. As might be expected, actors 1 and 34,who were at the center of the dispute, have the largest be-tweenness centralities and are also involved in the largestco-betweenness’. More interesting, however, is the factthat these two actors have a large co-betweenness witheach other – despite not being directly connected in theoriginal network graph. This indicates that they are nev-ertheless involved in connecting a large number of otherpairs – probably through key intermediaries such as ac-tors 3 and 32. These latter two actors, while certainlynot cut-vertices, nevertheless seem to operate like con-duits between the two groups, quite likely due to theirdirect ties to both actor 1 and either of actors 33 and34, the latter of which are both central to the group ofwhite vertices. The co-betweenness for actors 1 and 32is in fact the largest in the entire network.Also of potential interest are the 14 vertices that areisolated from the network in the co-betweenness repre-sentation. Some of these vertices, such as actor 8, have strong social interactions with certain other actors (i.e.,with actors 1, 2, 3 and 4), but evidently play a peripheralrole in the communication patterns of the network, as ev-idenced by their lack of betweenness. Alternatively, thereare the vertices like those representing actors 5 and 11,who have some betweenness centrality but nonethelessﬁnd themselves cut oﬀ from the connected component inthe co-betweenness graph. An examination of the def-inition of the co-betweenness tells us that such verticesmust be bridge-vertices, in the sense that they only serveto connect pairs of other vertices, i.e. , they only occur inthe middle of paths of length two.

VI. DISCUSSION

We introduced in this paper the notion of co-betweenness as a natural and interpretable metric forquantifying the interplay between pairs of vertices in anetwork graph. As we discussed in diﬀerent real worldexamples, this quantity has several interesting features.In particular, unlike the usual betweenness centralitywhich orders the vertices according to their importance inthe information ﬂow on the network, the co-betweennessgives additional information about the ﬂow structureand the correlations between diﬀerent actors. Using thisquantity, we were able to identify vertices which are notthe most central ones, but which however play a very im-portant role in relaying the information and which there-fore appear as crucial vertices in the control of the infor-mation ﬂow.In principle, of course, one could continue to deﬁnehigher-order analogues, involving three or more verticesat a time. But the computational requirements asso-ciated with calculating such analogues would soon be-come burdensome. In the case of triplets of vertices, onecan expect algorithms analogous to those presented hereto scale no better than O ( n v ). Additionally, we remarkthat, in keeping with the statistics analogy made in Sec-tion III, it is likely that the pairwise ‘correlations’ pickedup by co-betweenness captures to a large extent the moreimportant elements of vertex interplay in the network,with respect to shortest paths.Following the tendancies in the statistical physics lit-erature on complex networks [12, 13], it can be ofsome interest to explore the statistical properties of co-betweenness in large-scale networks. Some work in thisdirection may be found in [14], where co-betweenness andfunctions thereof were examined in the context of stan-dard network graph models. The most striking proper-ties discovered were certain basic scaling relations withdistance between vertices.On a ﬁnal note, we point out that, while our discus-sion here has been focused on co-betweenness for pairsof vertices in unweighted graphs, we have also devel-oped the analogous quantities and algorithms for ver-tex co-betweenness on weighted graphs and for edge co-betweenness on unweighted and weighted graphs. Alsosee [8], where a result is given relating edge betweennessto the eigen-values of the matrix edge-betweenness ‘co-variance’ matrix, deﬁned in analogy to the matrix Ω inSection III.This appendix contains details speciﬁc to the pro-posed algorithm for computing co-betweeness, includinga derivation of key expressions, a rough analysis of algo-rithmic complexity. The pseudo-codes can be found atthe address [15]. Actual software implementing our al-gorithm, written in the Matlab software enviroment, isavailable at [16].

APPENDIX A: DERIVATION OF KEYEXPRESSIONS

Central to our algorithm are the expressions in (6) and(7), the derivations for which we present here. Beforedoing so, however, we need to introduce some deﬁnitionsand relations. First note that a simple combinatorialargument will show that σ st ( v ) = ( σ sv σ vt if d ( s, t ) = d ( s, v ) + d ( v, t ),0 otherwise, (A1)and σ st ( u, v ) =  σ su σ uv σ vt if d ( s, t ) = d ( s, u ),+ d ( u, v ) + d ( v, t ) ,σ sv σ vu σ ut if d ( s, t ) = d ( s, v ),+ d ( v, u ) + d ( u, t ) , d ( s, u ) ≤ d ( s, v ) . (A3)for the remainder of this discussion.The remaining quantities we need to introduce are no-tions of the path-dependency of vertices. In the spirit of[1], we deﬁne the “dependency” of vertices s and t on thevertex pair ( u, v ) as δ st ( u, v ) = σ st ( u, v ) σ st , (A4)and we deﬁne the dependency of s alone on the pair ofvertices ( u, v ) as δ s ( u, v ) = X t ∈V\{ u,v } δ st ( u, v ) = X t ∈V\{ u,v } σ st ( u, v ) σ st ..(A5)Similarly, we deﬁne the pair-wise dependency of s and t on a single vertex v as δ st ( v ) = σ st ( v ) σ st , (A6) and the dependency of s alone on v as δ s ( v ) = X t ∈V\{ v } δ st ( v ) = X t ∈V\{ v } σ st ( v ) σ st . (A7)Note that unlike [1], we exclude t = v from the sum in(A7). Two relations that follow immediately from thesedeﬁnitions, combined with (A1) and (A2), are σ st ( u, v ) = σ su σ uv σ vt = σ sv ( u ) σ vt = σ sv ( u ) σ sv σ sv σ vt = δ sv ( u ) σ st ( v ), (A8)and δ st ( u, v ) = σ st ( u, v ) σ st = δ sv ( u ) σ st ( v ) σ st = δ sv ( u ) δ st ( v ).(A9)These two relations allow us to show that δ s ( u, v ) = X t ∈V\{ u,v } δ st ( u, v ) (A10)= X t ∈V\{ u,v } δ sv ( u ) δ st ( v ) by (A9) (A11)= δ sv ( u ) δ s ( v ) (A12)since δ su ( v ) = 0 by (A3) and using Eq. (A8), we obtain δ s ( u, v ) = δ s ( v ) σ sv σ sv ( u ) (A13)We use this result to re-express the co-betweenness de-ﬁned in (2) as C ( u, v ) = X s,t ∈V\{ u,v } δ st ( u, v ) (A14)= X s ∈V\{ u,v }  X t ∈V\{ u,v } δ st ( u, v )  (A15)= X s ∈V\{ u,v } δ s ( u, v ) (A16)= X s ∈V\{ u,v } δ s ( v ) σ sv σ sv ( u ). (A17)Lastly, to establish the recursive relation in (6), notethat for a child vertex w ∈ c s ( v ) every path to v givesrise to exactly one path to w by following the edge ( v, w ).This means that σ sw ( v ) = σ sv for w ∈ c s ( v ), (A18)and that δ sw ( v ) = σ sw ( v ) σ sw = σ sv σ sw for w ∈ c s ( v ). (A19)Also note that for t = w we have δ st ( w ) = 1. (A20)This allows us to decompose δ s ( v ) in essentially the samemanner as [1], namely, δ s ( v ) = X t ∈V\{ v } δ st ( v ) (A21)= X t ∈V\{ v } X w ∈ c s ( v ) δ st ( v, w ) (A22)= X w ∈ c s ( v ) X t ∈V\{ v } δ st ( v, w ) (A23)= X w ∈ c s ( v ) X t ∈V\{ v } δ sw ( v ) δ st ( w ) by (A9)(A24)= X w ∈ c s ( v ) σ sv σ sw  X t ∈V\{ v,w } δ st ( w )  (A25)Using (A19) and (A20), we then obtain δ s ( v ) = X w ∈ c s ( v ) σ sv σ sw (1 + δ s ( w )). (A26)Where the last equality is due to the fact that since w isa child of v we have σ sv ( w ) = 0 and thus δ sv ( w ) = 0. APPENDIX B: ALGORITHMIC COMPLEXITY

Standard breadth-ﬁrst search results put the runningtime for the ﬁrst stage of our algorithm at O ( n e ), and since we touch each edge at most twice when we com-pute the dependency scores δ s ( v ), the running time forthe second stage is also O ( n e ). Since we repeat eachstage for each vertex in the network, the ﬁrst two stageshave a running time of O ( n v n e ). The running time forthe depth-ﬁrst traversal, that occurs during the thirdstage, depends on the number and length of all short-est paths in the network. Overall, we visit every short-est path once and compute a co-betweenness contribu-tion for each edge of every shortest path. For ‘small-world’ networks i.e., networks with an O (log n v ) diame-ter, we must compute O ( σ · log n v ) contributions, where σ = P u,v ∈V σ uv is the total number of shortest pathsin the network. So the overall running time for the al-gorithm is O ( n v n e + σ log n v ). Empirical evidence sug-gests that the upper bound for the average |V| P u ∈V σ uv ranges from n . v to n . v for common random graphmodels, and at worst has been seen to reach n . v in thecase of a network of airports. (In the latter case, therewere extreme ﬂuctuations in |V| P u ∈V σ uv so the totalnumber of shortest paths, σ , might be much smaller than n v ( n v −

1) times this upper bound.) This suggests a run-ning time of O ( n v n e + n pv log n v ), though it is an openquestion to show this rigorously. In the case of sparsenetworks, where n e ∼ n v , this reduces to a running timeof O ( n pv log n v ). [1] U. Brandes, Journal of Mathematical Sociology , 163(2001).[2] S. Wasserman and K. Faust, Social Network Analysis:Methods and applications , Cambridge University Press(1994).[3] S.P. Borgatti, M.G. Everett, Social Networks , 466-484(2006).[4] L.C. Freeman, Sociometry , 35-41 (1977).[5] J. Clark and D.A. Holton, A ﬁrst look at graph theory ,World Scientiﬁc (1991).[6] [7] A. Barrat, M. Barth´elemy, A. Vespignani, J. Stat. Mech.(2005) P05003.[8] D.B. Chua, E.D. Kolaczyk, M. Crovella, IEEE Journalon Selected Areas in Communications, Special issue on‘Sampling the Internet’, , 2263-2272 (2006). [9] J.H. Michael, Forest Products Journal , 41-45 (1997).[10] W. de Nooy, A. Mrvar, V. Batagelj, Exploratory So-cial Network Analysis with Pajek , Cambridge UniversityPress (Cambridge, UK, 2005).[11] W. Zachary, Journal of Anthropological Research ,452-473 (1977).[12] R. Albert and A.-L. Barab´asi, Rev. Mod. Phys. , 47(2000).[13] R. Pastor-Satorras and A. Vespignani, Evolution andstructure of the Internet: A statistical physics approach (Cambridge University Press, Cambridge, 2003).[14] D.B. Chua, PhD thesis (2007).[15] http://math.bu.edu/people/kolaczyk/pubs/ChuaThesis/ [16][16]