Improving information centrality of a node in complex networks by adding edges
aa r X i v : . [ c s . S I] A p r Improving information centrality of a node in complex networks by adding edges ∗ Liren Shan , Yuhao Yi , Zhongzhi Zhang , Shanghai Key Laboratory of Intelligent Information Processing, School of Computer Science,Fudan University, Shanghai 200433, [email protected], [email protected], [email protected]
Abstract
The problem of increasing the centrality of a net-work node arises in many practical applications. Inthis paper, we study the optimization problem ofmaximizing the information centrality I v of a givennode v in a network with n nodes and m edges, bycreating k new edges incident to v . Since I v is thereciprocal of the sum of resistance distance R v be-tween v and all nodes, we alternatively consider theproblem of minimizing R v by adding k new edgeslinked to v . We show that the objective functionis monotone and supermodular. We provide a sim-ple greedy algorithm with an approximation factor (cid:0) − e (cid:1) and O ( n ) running time. To speed up thecomputation, we also present an algorithm to com-pute (cid:0) − e − ǫ (cid:1) -approximate resistance distance R v after iteratively adding k edges, the runningtime of which is e O ( mkǫ − ) for any ǫ > , wherethe e O ( · ) notation suppresses the poly(log n ) fac-tors. We experimentally demonstrate the effective-ness and efficiency of our proposed algorithms. Centrality metrics refer to indicators identifying the varyingimportance of nodes in complex networks [L¨u et al. , 2016],which have become a powerful tool in network analysis andfound wide applications in network science [Newman, 2010].Over the past years, a great number of centralityindices and corresponding algorithms have been pro-posed to analyze and understand the roles of nodes innetworks [White and Smyth, 2003; Boldi and Vigna, 2014].Among various centrality indices, betweennees centrality andcloseness centrality are probably the two most frequentlyused ones, especially in social network analysis. However,both indicators only consider the shortest paths, excluding thecontributions from other longer paths. In order to overcomethe drawback of these two measures, current flow closenesscentrality [Brandes and Fleischer, 2005; Newman, 2005] wasintroduced and proved to be exactly the information central-ity [Stephenson and Zelen, 1989], which counts all possiblepaths between nodes and has a better discriminating power ∗ This work was supported by NSF. than betweennees centrality [Newman, 2005] and closenesscentrality [Bergamini et al. , 2016].It is recognized that centrality measures have proved ofgreat significance in complex networks. Having high central-ity can have positive consequences on the node itself. In thispaper, we consider the problem of adding a given number ofedges incident to a designated node v so as to maximize thecentrality of v . Our main motivation or justification for study-ing this problem is that it has several application scenarios, in-cluding airport networks [Ishakian et al. , 2012], recommen-dation systems [Parotsidis et al. , 2016], among others. Forexample, in airport networks, a node (airport) has the incen-tive to improve as much as possible its centrality (transporta-tion capacity) by adding edges (directing flights) connect-ing itself and other nodes (airports) [Ishakian et al. , 2012].Another example is the link recommendation problem ofrecommending to a user v a given number of links froma set of candidate inexistent links incident to v in or-der to minimize the shortest distance from v to othernodes [Parotsidis et al. , 2016].The problem of maximizing the centrality of a spe-cific target node through adding edges incident to it hasbeen widely studied. For examples, some authors havestudied the problem of creating k edges linked to a node v so that the centrality value for v with respect to con-cerned centrality measures is maximized, e.g., betweennesscentrality [Crescenzi et al. , 2015; D’Angelo et al. , 2016;Crescenzi et al. , 2016; Hoffmann et al. , 2018]and closeness centrality [Crescenzi et al. , 2015;Hoffmann et al. , 2018]. Similar optimization problemsfor a predefined node v were also addressed for other nodecentrality metrics, including average shortest distance be-tween v and remaining nodes [Meyerson and Tagiku, 2009;Parotsidis et al. , 2016], largest distance from v to othernodes [Demaine and Zadimoghaddam, 2010], PageR-ank [Avrachenkov and Litvak, 2006; Olsen, 2010], and thenumber of different paths containing v [Ishakian et al. , 2012].However, previous works do not consider improving infor-mation centrality of a node by adding new edges linked to it,despite the fact that it can better distinguish different nodes,compared with betweennees [Newman, 2005] and closenesscentrality [Bergamini et al. , 2016].In this paper, we study the following problem: Given agraph with n nodes and m edges, how to create k new edgesncident to a designated node v , so that the information cen-trality I v of v is maximized. Since I v equals the recipro-cal of the sum of resistance distance R v between v and allnodes, we reduce the problem to minimizing R v by intro-ducing k edges connecting v . We demonstrate that the opti-mization function is monotone and supermodular. To mini-mize resistance distance R v , we present two greedy approx-imation algorithms by iteratively introducing k edges one byone. The former is a (cid:0) − e (cid:1) -approximation algorithm with O ( n ) time complexity, while the latter is a (cid:0) − e − ǫ (cid:1) -approximation algorithm with e O ( mkǫ − ) time complexity,where the e O ( · ) notation hides poly(log n ) factors. We testthe performance of our algorithms on several model and realnetworks, which substantially increase information centralityscore of a given node and outperform several other addingedge strategies. Consider a connected undirected weighted network G =( V, E, w ) where V is the set of nodes, E ⊆ V × V is theset of edges, and w : E → R + is the edge weight func-tion. We use w max to denote the maximum edge weight. Let n = | V | denote the number of nodes and m = | E | denotethe number of edges. For a pair of adjacent nodes u and v ,we write u ∼ v to denote ( u, v ) ∈ E . The Laplacian matrixof G is the symmetric matrix L = D − A , where A is theweighted adjacency matrix of the graph and D is the degreediagonal matrix.Let e i denote the i th standard basis vector, and b u,v = e u − e v . We fix an arbitrary orientation for all edges in G .For each edge e ∈ E , we define b e = b u,v , where u and v are head and tail of e , respectively. It is easy to verify that L = P e ∈ E w ( e ) b e b ⊤ e , where w ( e ) b e b ⊤ e is the Laplacian of e . L is singular and positive semidefinite. Its pseudoinverse L † is (cid:0) L + n J (cid:1) − − n J , where J is the matrix with allentries being ones.For network G = ( V, E, w ) , the resistance dis-tance [Klein and Randi´c, 1993] between two nodes u, v is R uv = b ⊤ u,v L † b u,v . The resistance distance R v of a node v is the sum of resistance distances between v and all nodesin V , that is, R v = P u ∈ V R uv , which can be expressed interms of the entries of L † as [Bozzo and Franceschet, 2013] R v = n L † vv + Tr (cid:16) L † (cid:17) . (1)Let L v denote the submatrix of Laplacian L , which is ob-tained from L by deleting the row and column correspond-ing to node v . For a connected graph G , L v is invertible forany node v , and the resistance distance R uv between v andanother node u is equal to (cid:0) L − v (cid:1) uu [Izmailian et al. , 2013].Thus, we have R v = Tr (cid:0) L − v (cid:1) . (2)The resistance distance R v can be used as a measure ofthe efficiency for node v in transmitting information to othernodes, and is closely related to information centrality intro-duced by Stephenson and Zelen to measure the importance of nodes in social networks [Stephenson and Zelen, 1989]. Theinformation I uv transmitted between u and v is defined as I uv = 1 B − ( u, u ) + B − ( v, v ) − B − ( u, v ) , where B = L + J . The information centrality I v of node v is the harmonic mean of I uv over all nodes u [Stephenson and Zelen, 1989]. Definition 2.1
For a connected graph G = ( V, E, w ) , theinformation centrality I v of a node v ∈ V is defined as I v = n P u ∈ V /I uv . It was shown [Brandes and Fleischer, 2005] that I v = n R v . (3)We continue to introduce some useful notations and toolsfor the convenience of description for our algorithms, includ-ing ǫ -approximation and supermodular function.Let a, b ≥ be two nonnegative scalars. We say a is an ǫ -approximation [Peng and Spielman, 2014] of b if exp( − ǫ ) a ≤ b ≤ exp( ǫ ) a . Hereafter, we use a ≈ ǫ b torepresent that a is an ǫ -approximation of b .Let X be a finite set, and X be the set of all subsets of X . Let f : 2 X → R be a set function on X . For any subsets S ⊂ T ⊂ X and any element a ∈ X \ T , we say func-tion f ( · ) is supermodular if it satisfies f ( S ) − f ( S ∪ { a } ) ≥ f ( T ) − f ( T ∪ { a } ) . A function f ( · ) is submodular if − f ( · ) is supermodular. A set function f : 2 X → R is called mono-tone decreasing if for any subsets S ⊂ T ⊂ X , f ( S ) > f ( T ) holds. For a connected undirected weighted network G ( V, E, w ) ,given a set S of weighted edges not in E , we use G ( S ) todenote the network augmented by adding the edges in S to G , i.e. G ( S ) = ( V, E ∪ S, w ′ ) , where w ′ : E ∪ S → R + is the new weight function. Let L ( S ) denote the Lapla-cian matrix for G ( S ) . Note that the information centralityof a node depends on the graph topology. If we augment agraph by adding a set of edges S , the information central-ity of a node will change. Moreover, adding edges incidentto some node v can only increase its information central-ity [Doyle and Snell, 1984].Assume that there is a set of nonexistent edges incident toa particular node v , each with a given weight. We denote thiscandidate edge set as E v . Consider choosing a subset S of k edges from the candidate set E v to augment the network sothat the information centrality of node v is maximized. Let I v ( S ) denote the information centrality of the node v in aug-mented network. We define the following set function opti-mization problem: maximize S ⊂ E v , | S | = k I v ( S ) . (4)ince the information centrality I v of a node v is proportionalto the reciprocal of R v , the optimization problem (4) is equiv-alent to the following problem: minimize S ⊂ E v , | S | = k R v ( S ) , (5)where R v ( S ) is the resistance distance of v in the augmentednetwork G ( S ) . Without ambiguity, we take R ( S ) to replace R v ( S ) for simplicity. Let E v denote all subsets of E v . Then the resistance dis-tance of node v in the augmented network can be representedas a set function R : 2 E v → R . To provide effective algo-rithms for the above-defined problems, we next prove that theresistance distance of v is a supermodular function.Rayleigh’s monotonicity law [Doyle and Snell, 1984]shows that the resistance distance between any pair of nodescan only decrease when edges are added. Then, we have thefollowing theorem. Theorem 4.1 R ( S ) is a monotonically decreasing functionof the set of edges S . That is, for any subsets S ⊂ T ⊂ E v , R ( T ) < R ( S ) . We then prove the supermodularity of the objective func-tion R ( S ) . Theorem 4.2 R ( S ) is supermodular. For any set S ⊂ T ⊂ E v and any edge e ∈ E v \ T , R ( T ) − R ( T ∪ { e } ) ≤ R ( S ) − R ( S ∪ { e } ) . Proof.
Suppose that edge e connects two nodes u and v ,then L ( S ∪ { e } ) v = L ( S ) v + w ( e ) E uu , where E uu is asquare matrix with the u th diagonal entry being one, and allother entries being zeros. By (2), it suffices to prove that Tr (cid:0) L ( T ) − v (cid:1) − Tr (cid:0) ( L ( T ) v + w ( e ) E uu ) − (cid:1) ≤ Tr (cid:0) L ( S ) − v (cid:1) − Tr (cid:0) ( L ( S ) v + w ( e ) E uu ) − (cid:1) . Since S is a subset of T , L ( T ) v = L ( S ) v + P , where P is anonnegative diagonal matrix. For simplicity, in the followingproof, we use M to denote matrix L ( S ) v . Then, we onlyneed to prove Tr (cid:0) ( M + P ) − (cid:1) − Tr (cid:0) M − (cid:1) ≤ Tr (cid:0) ( M + P + w ( e ) E uu ) − (cid:1) − Tr (cid:0) ( M + w ( e ) E uu ) − (cid:1) . Define function f ( t ) , t ∈ [0 , ∞ ) , as f ( t ) = Tr (cid:0) ( M + P + t E uu ) − (cid:1) − Tr (cid:0) ( M + t E uu ) − (cid:1) . Then, the above inequality holds if f ( t ) takes the minimumvalue at t = 0 . We next show that f ( t ) is an increasing func-tion by proving df ( t ) dt ≥ . Using the matrix derivative for-mula ddt Tr (cid:0) A ( t ) − (cid:1) = − Tr (cid:18) A ( t ) − ddt A ( t ) A ( t ) − (cid:19) , we can differentiate function f ( t ) as df ( t ) dt = − Tr (cid:0) ( M + P + t E uu ) − E uu ( M + P + t E uu ) − (cid:1) + Tr (cid:0) ( M + t E uu ) − E uu ( M + t E uu ) − (cid:1) = − Tr (cid:0) E uu ( M + P + t E uu ) − (cid:1) + Tr (cid:0) E uu ( M + t E uu ) − (cid:1) = − (cid:0) ( M + P + t E uu ) − (cid:1) uu + (cid:0) ( M + t E uu ) − (cid:1) uu . Let N = M + t E uu , and let Q be a nonnegative diagonalmatrix with exactly one positive diagonal entry Q hh > andall other entries being zeros. We now prove that N − ij ≥ ( N + Q ) − ij for ≤ i, j ≤ n − . Using Sherman-Morrisonformula [Meyer, 1973], we have N − − ( N + Q ) − = Q hh N − e h e ⊤ h N − Q hh e ⊤ h N − e h . Since N is an M-matrix, every entry of N − is pos-itive [Plemmons, 1977], it is the same with every entryof N − e h e ⊤ h N − . In addition, the denominator Q hh e ⊤ h N − e h is also positive, because N is positive defi-nite. Therefore, N − − ( N + Q ) − is a positive matrix, theentries of which are all greater than zero.By repeatedly applying the above process, we concludethat N − ≥ ( N + P ) − is a positive matrix. Thus, df ( t ) dt = − (cid:16) ( N + P ) − (cid:17) uu + (cid:0) N − (cid:1) uu ≥ , which completes the proof. ✷ Theorems 4.1 and 4.2 indicate that the objective function (5)is a monotone and supermodular. Thus, a simple greedy algo-rithm is sufficient to approximate problem (5) with provableoptimality bounds. In the greedy algorithm, the augmentededge set S is initially empty. Then k edges are iterativelyadded to the augmented edge set from the set E v of candidateedges. At each iteration, an edge e i in the candidate edge setis selected to maximize R ( S ) − R ( S ∪ { e i } ) . The algorithmterminates when | S | = k .According to (1), the effective resistance R v is equal to n L † vv + Tr( L † ) . A naive algorithm requires O ( k | E v | n ) timecomplexity, which is prohibitively expense. Below we showthat the computation cost can be reduced to O ( n ) by usingSherman-Morrison formula [Meyer, 1973]. Lemma 5.1
For a connected weighted graph G = ( V, E, w ) with weighted Laplacian matrix L , let e be a nonexistent edgewith given weight w ( e ) connecting node v . Then, ( L ( { e } )) † = (cid:16) L + w ( e ) b e b ⊤ e (cid:17) † = L † − w ( e ) L † b e b ⊤ e L † w ( e ) b ⊤ e L † b e . For a candidate edge not added to S , let R ∆ v ( e ) = R ( S ) −R ( S ∪ { e } ) . Lemma 5.1 and (1) lead to the following result. emma 5.2 Let G = ( V, E, w ) be a connected weightedgraph with weighted Laplacian matrix L . Let e E be acandidate edge with given weight w ( e ) incident to node v .Then, R ∆ v ( e ) = w ( e ) (cid:0) n (cid:0) L † b e b ⊤ e L † (cid:1) vv + Tr (cid:0) L † b e b ⊤ e L † (cid:1)(cid:1) w ( e ) b ⊤ e L † b e . (6) Lemma 5.2 yields a simple greedy algorithmE
XACT SM ( G, v, E v , k ) , as outlined in Algorithm 1. Thefirst step of this algorithm is to compute the pseudoinverseof L , the time complexity of which is O ( n ) time. Thenthis algorithm works in k rounds, each involving operationsof computations and updates with time complexity O ( n ) .Thus, the total running time of Algorithm 1 is O ( n ) . Algorithm 1: E XACT SM ( G, v, E v , k ) Input :
A connected graph G ; a node v ∈ V ; acandidate edge set E v ; an integer k ≤ | E v | Output :
A subset of S ⊂ E v and | S | = k Initialize solution S = ∅ Compute L † for i = 1 to k do Compute R ∆ v ( e ) for each e ∈ E v \ S Select e i s.t. e i ← arg max e ∈ E v \ S R ∆ v ( e ) Update solution S ← S ∪ { e i } Update the graph G ← G ( V, E ∪ { e i } ) Update L † ← L † − w ( e i ) L † b ei b ⊤ ei L † w ( e i ) b ⊤ ei L † b ei return SMoreover, due to the result in [Nemhauser et al. , 1978],Algorithm 1 is able to achieve a (cid:0) − e (cid:1) approximation fac-tor, as given in the following theorem. Theorem 5.3
The set S returned by Algorithm 1 satisfies R ( ∅ ) − R ( S ) ≥ (cid:18) − e (cid:19) ( R ( ∅ ) − R ( S ∗ )) , where S ∗ is the optimal solution to (5), i.e., S ∗ def = arg min S ⊂ V, | S | = k R ( S ) . Although Algorithm 1 is faster than the naive algorithm, itis still computationally infeasible for large networks, sinceit involves the computation of the pseudoinverse for L . Inthis section, in order to avoid inverting the matrix L , wegive an efficient approximation algorithm, which achievesa (cid:0) − e − ǫ (cid:1) approximation factor of optimal solution toproblem (5) in time e O ( kmǫ − ) . R ∆ v ( e ) In order to solve problem (5), one need to compute the keyquantity R ∆ v ( e ) in (6). Here, we provide an efficient algo-rithm to approximate R ∆ v ( e ) properly. We first consider the denominator in (6). Assume that thenew added edge e connects nodes u and v . Note that theterm r e = b ⊤ e L † b e in the denominator is in fact the resis-tance distance R uv between u and v in the network exclud-ing e . It can be computed by the following approximationalgorithm [Spielman and Srivastava, 2011]. Lemma 6.1
Let G = ( V, E, w ) be a weighted connectedgraph. There is an algorithm A PPROXI ER ( G, E v , ǫ ) that re-turns an estimate ˆ r e of r e for all e ∈ E v in e O ( mǫ − ) time.With probability at least − /n , ˆ r e ≈ ǫ r e holds for all e ∈ E v . For the numerator of (6), it includes two terms, (cid:16) L † b e b ⊤ e L † (cid:17) vv and Tr (cid:16) L † b e b ⊤ e L † (cid:17) . The first term canbe calculated by (cid:16) L † b e b ⊤ e L † (cid:17) vv = e ⊤ v L † b e b ⊤ e L † e v .The second term is the trace of an implicit matrixwhich can be approximated by Hutchinson’s Monte-Carlomethod [Hutchinson, 1989]. By generating M indepen-dent random ± vectors x , x , · · · , x M ∈ R n (i.e., in-dependent Bernoulli entries), M P Mi =1 x ⊤ i Ax i can be usedto estimate the trace of matrix A . Since E (cid:2) x ⊤ i Ax i (cid:3) =Tr ( A ) , by the law of large numbers, M P Mi =1 x ⊤ i Ax i should be close to Tr ( A ) when M is large. The followinglemma [Avron and Toledo, 2011] provides a good estimationof Tr ( A ) . Lemma 6.2
Let A be a positive semidefinite matrix withrank rank( A ) . Let x , . . . , x M be independent random ± vectors. Let ǫ, δ be scalars such that < ǫ ≤ / and < δ < . For any M ≥ ǫ − ln(2rank( A ) /δ ) , the fol-lowing statement holds with probability at least − δ : M M X i =1 x ⊤ i Ax i ≈ ǫ Tr ( A ) . Thus, we have reduced the estimation of the numeratorof (6) to the calculation of the quadratic form of L † b e b ⊤ e L † .If we directly compute the quadratic form, we must first eval-uate L † , the time complexity is high. To avoid inverting L , wewill utilize the nearly-linear time solver for Laplacian systemsfrom [Kyng and Sachdeva, 2016], whose performance can becharacterized in the following lemma. Lemma 6.3
The algorithm y = L APL S OLVE ( L , z , ǫ ) takesa Laplacian matrix L of a graph G with n nodes and m edges,a vector z ∈ R n and a scalar ǫ > as input, and returns avector y ∈ R n such that with probability − / poly( n ) thefollowing statement holds: (cid:13)(cid:13)(cid:13) y − L † z (cid:13)(cid:13)(cid:13) L ≤ ǫ (cid:13)(cid:13)(cid:13) L † z (cid:13)(cid:13)(cid:13) L , where k x k L = √ x ⊤ Lx . The algorithm runs in expectedtime e O ( m ) . Lemmas 6.1, 6.2 and 6.3 result in the following algo-rithm VR
EFF C OMP ( G, v, E v , ǫ ) for computing R ∆ v ( e ) forall e ∈ E v , as depicted in Algorithm 2. The algorithm hasa total running time e O ( mǫ − ) , and returns a set of pairs { ( e, ˆ R ∆ v ( e )) | e ∈ E v } , satisfying that R ∆ v ( e ) ≈ ǫ ˆ R ∆ v ( e ) forall e ∈ E v . lgorithm 2: VR EFF C OMP ( G, v, E v , ǫ ) Input :
A graph G ; a node v ∈ V ; a candidate edge set E v ;a real number ≤ ǫ ≤ / Output : { ( e, ˆ R ∆ v ( e )) | e ∈ E v } Let z , . . . , z M be independent random ± vectors, where M = (cid:6) ǫ − ln(2 n ) (cid:7) . for i = 1 to M do y i ← L APL S OLVE ( L , z i , ǫn − w − max ) for each e ∈ E v do Compute t i ( e ) def = y ⊤ i b e b ⊤ e y i x ← L APL S OLVE ( L , e v , ǫn − w − max ) for each e ∈ E v do Compute α ( e ) def = x ⊤ b e b ⊤ e x ˆ r e ← A PPROXI ER ( G, ǫ/ Compute ˆ R ∆ v ( e ) = w ( e ) nα ( e )+ M M P i t i ( e )1+ w ( e )ˆ r e for each e return { ( e, ˆ R ∆ v ( e )) | e ∈ E v } By using Algorithm 2 to approximate R ∆ v ( e ) , we give afast greedy algorithm A PPROXI SM ( G, v, E v , k, ǫ ) for solv-ing problem (5), as outlined in Algorithm 3. Algorithm 3: A PPROXI SM ( G, v, E v , k, ǫ ) Input :
A graph G ; a node v ∈ V ; a candidate edge set E v ;an integer k ≤ | E v | ; a real number ≤ ǫ ≤ / Output : S : a subset of E v and | S | = k Initialize solution S = ∅ for i = 1 to k do { e, ˆ R ∆ v ( e ) | e ∈ E v \ S } ← VR EFF C OMP ( G, v, E v \ S, ǫ ) . Select e i s.t. e i ← arg max e ∈ E v \ S ˆ R ∆ v ( e ) Update solution S ← S ∪ { e i } Update the graph G ← G ( V, E ∪ { e i } ) return S Algorithm 3 works in k rounds (Lines 2-6). In every round,the call of VR EFF C OMP and updates take time e O ( mǫ − ) .Then, the total running time of Algorithm 3 is e O ( kmǫ − ) .The following theorem shows that the output ˆ S of Algo-rithm 3 gives a (cid:0) − e − ǫ (cid:1) approximate solution to prob-lem (5). Theorem 6.4
For any < ǫ ≤ / , the set ˆ S returned by thegreedy algorithm above satisfies R ( ∅ ) − R ( ˆ S ) ≥ (cid:18) − e − ǫ (cid:19) ( R ( ∅ ) − R ( S ∗ )) , where S ∗ is the optimal solution to problem (5), i.e., S ∗ def = arg min S ⊂ V, | S | = k R ( S ) . We omit the proof, since it is similar to thatin [Badanidiyuru and Vondr´ak, 2014].
In this section, we experimentally evaluate the effective-ness and efficiency of our two greedy algorithms on somemodel and real networks. All algorithms in our experi-ments are implemented in Julia. In our algorithms, we usethe L
APL S OLVE [Kyng and Sachdeva, 2016], the implemen-tation (in Julia) of which is available on website . All ex-periments were conducted on a machine with 4.2 GHz Inteli7-7700 CPU and 32G RAM.We execute our experiments on two popular model net-works, Barab´asi-Albert (BA) network and Watts–Strogatz(WS) network, and a large connection of realistic networksfrom KONECT [Kunegis, 2013] and SNAP . Table 1 pro-vides the information of these networks, where real-worldnetworks are shown in increasing size of the number of nodesin original networks. Table 1: Statistics of datasets. For a network with n nodes and m edges, we denote the number of nodes and edges in its largest con-nected component by n ′ and m ′ , respectively. Network n m n ′ m ′ BA network 50 94 50 94WS network 50 100 50 100Zachary karate club 34 78 34 78Windsufers 43 336 43 336Jazz musicians 198 2742 195 1814Virgili 1,133 5,451 1,133 5,451Euroroad 1,174 1,417 1,039 1,305Hamster full 2,426 16,631 2,000 16,098Facebook 2,888 2,981 2,888 2,981Powergrid 4,941 6,594 4,941 6,594ca-GrQc 5,242 14,496 4,158 13,422ca-HepPh 12,008 118,521 11,204 117,619com-DBLP 317,080 1,049,866 317,080 1,049,866roadNet-TX 1,379,917 1,921,660 1,351,137 1,879,201
To show the effectiveness of our algorithms, we compare theresults of our algorithms with the optimum solutions on twosmall model networks, BA network and WS network, and twosmall real-world networks, Zachary karate club network andWindsufers contact network. Since these networks are small,we are able to compute the optimal edge set.For each network, we randomly choose 20 target nodes.For each target node v , the candidate edge set is composed ofall nonexistent edges incident to it with unit weight w = 1 .And for each designated k = 1 , , · · · , , we add k edgeslinked to v and other k non-neighboring nodes of v . We thencompute the average information centrality of the 20 targetnodes for each k . Also, we compute the solutions for therandom scheme, by adding k edges from randomly selected k non-neighboring nodes. The results are reported in Fig-ure 1. We observe that there is little difference between thesolutions of our greedy algorithms and the optimal solutions,since their approximation ratio is always greater than 0.98,which is far better than the theoretical guarantees. Moreover,our greedy schemes outperform the random scheme in thesefour networks. https://github.com/danspielman/Laplacians.jl https://snap.stanford.edu a) Number of Inserted Edges I n f o r m a t i o n C e n t r a li t y RandomExactSMApproxiSMOptimum (b)
Number of Inserted Edges I n f o r m a t i o n C e n t r a li t y RandomExactSMApproxiSMOptimum (c)
Number of Inserted Edges I n f o r m a t i o n C e n t r a li t y RandomExactSMApproxiSMOptimum (d)
Number of Inserted Edges I n f o r m a t i o n C e n t r a li t y RandomExactSMApproxiSMOptimum
Figure 1: Average information centrality of target nodes as a func-tion of the number k of inserted edges for E XACT
SM, A
PPROXI
SM,random and the optimum solution on four networks: BA (a), WS (b),Karate club (c), and Windsufers (d).
To further demonstrate effectiveness of our algorithms, wecompare the results of our methods with the random schemeand other two baseline schemes, Top-degree and Top-cent, onfour other real-world networks. In Top-degree scheme, theadded edges are simply the k edges connecting target node v and its nonadjacent nodes with the highest degree in the orig-inal network; while in Top-cent scheme, the added edges aresimply those k edges connecting target node v and its non-adjacent nodes with the largest information centrality in theoriginal network.Since the results may vary depending on the initial infor-mation centrality of the target node v , for each of the fourreal networks, we select 10 different target nodes at random.For each target node, we first compute its original informa-tion centrality and increase it by adding up to k = 20 newedges, using our two greedy algorithms and the three base-lines. Then, we compute and record the information central-ity of the target node after insertion of every edge. Finally,we compute the average information centrality of all the 10target nodes for each k = 1 , , . . . , , which is plotted inFigure 2. We observe that for all the four real-world networksour greedy algorithms outperform the three baselines. Although both of our greedy algorithms are effective, we willshow that their efficiency greatly differs. To this end, we com-pare the efficiency of the greedy algorithms on several real-world networks. For each network, we choose stochastically20 target nodes, for each of which, we create k = 10 newedges incident to it to maximize its information centrality ac-cording to Algorithms 1 and 3. We compute the average in-formation centrality of 10 target nodes for each network andrecord the average running times. In Table 2 we provide theresults of average information centrality and average runningtime of our greedy algorithms. We observe that A PPROXI
SMalgorithm are faster than E
XACT
SM algorithm, especially forlarge networks, while their final information centrality scoreare close. More interestingly, A
PPROXI
SM applies to mas- (a)
Number of Inserted Edges I n f o r m a t i o n C e n t r a li t y ApproxiSMExactSM
Top-centTop-degreeRandom (b)
Number of Inserted Edges I n f o r m a t i o n C e n t r a li t y ApproxiSMExactSMTop-centTop-degreeRandom (c)
Number of Inserted Edges I n f o r m a t i o n C e n t r a li t y ApproxiSMExactSMTop-centTop-degreeRandom (d)
Number of Inserted Edges I n f o r m a t i o n C e n t r a li t y ApproxiSMExactSMTop-centTop-degreeRandom
Figure 2: Average information centrality of target nodes as a func-tion of the number k of inserted edges for the five heuristics on Jazzmusicians (a), Euroroad (b), Facebook (c), Powergrid (d).Table 2: The average running times and results of A PPROXI
SM(ASM) and E
XACT
SM (ESM) algorithms on several real-worldnetworks, as well as the ratios for times and results of A
PPROXI
SMto those of E
XACT
SM.
Network Time (seconds) Information centralityASM ESM Ratio ASM ESM RatioVirgili 1.3996 0.9172 1.5259 2.5005 2.5037 0.9987Euroroad 0.6563 0.7593 0.8643 0.4003 0.4069 0.9838Hamster full 3.0785 4.8528 0.6344 2.9904 2.9944 0.9987Facebook 1.7151 12.9203 0.1327 0.7937 0.7947 0.9987Powergrid 5.8727 58.3359 0.1006 0.4327 0.4369 0.9904ca-GrQc 5.3023 34.0228 0.1558 1.2118 1.2136 0.9985ca-HepPh 28.7462 620.4557 0.0463 2.2569 2.2592 0.9990com-DBLP 697.1835 - - 1.1327 - -roadNet-TX 1569.5059 - - 0.0556 - - sive networks. For example, for com-DBLP and roadNet-TXnetworks, A
PPROXI
SM computes their information central-ity in half an hour, while A
PPROXI
SM fails due to its hightime complexity.
In this paper, we considered the problem of maximizing theinformation centrality of a designated node v by adding k newedges incident to it. This problem is equivalent to minimiz-ing the resistance distance R v of node v . We proposed twoapproximation algorithms for computing R v when k edgesare repeatedly inserted in a greedy way. The first one gives a (cid:0) − e (cid:1) approximation of the optimum in time O ( n ) . Whilethe second one returns a (cid:0) − e − ǫ (cid:1) approximation in time e O ( mkǫ − ) . Since the considered problem has never ad-dressed before, we have no other algorithms to compare with,but compare our algorithms with potential alternative algo-rithms. Extensive experimental results on model and realis-tic networks show that our algorithms can often compute anapproximate optimal solution. Particularly, our second algo-rithm can achieve a good approximate solution very quickly,making it applicable to massive networks. eferences [Avrachenkov and Litvak, 2006] K. Avrachenkov and N.Litvak. The effect of new links on Google PageRank. Stochastic Models , 22(2):319–331, 2006.[Avron and Toledo, 2011] H. Avron and S. Toledo. Random-ized algorithms for estimating the trace of an implicit sym-metric positive semi-definite matrix.
Journal of ACM ,58(2):8:1–8:34, 2011.[Badanidiyuru and Vondr´ak, 2014] A. Badanidiyuru and J.Vondr´ak. Fast algorithms for maximizing submodularfunctions. In
SODA , pages 1497–1514. SIAM, 2014.[Bergamini et al. , 2016] E. Bergamini, M. Wegner, D.Lukarski, and H. Meyerhenke. Estimating current-flowcloseness centrality with a multigrid Laplacian solver. InCSC, pages 1–12, 2016.[Boldi and Vigna, 2014] P. Boldi and S. Vigna. Axioms forcentrality.
Internet Mathematics , 10(3-4):222–262, 2014.[Bozzo and Franceschet, 2013] E. Bozzo and M.Franceschet. Resistance distance, closeness, and be-tweenness.
Social Networks , 35(3):460–469, 2013.[Brandes and Fleischer, 2005] U. Brandes and D. Fleischer.Centrality measures based on current flow. In
STACS , vol-ume 3404, pages 533–544. Springer-Verlag, 2005.[Crescenzi et al. , 2015] P. Crescenzi, G. D’Angelo, L. Sev-erini, and Y. Velaj. Greedily improving our own centralityin a network. In
SEA , pages 43–55. Springer, 2015.[Crescenzi et al. , 2016] P. Crescenzi, G. D’Angelo, L. Sev-erini, and Y. Velaj. Greedily improving our own closenesscentrality in a network.
ACM Transactions on KnowledgeDiscovery from Data , 11(1):9, 2016.[D’Angelo et al. , 2016] G. D’Angelo, L. Severini, and Y. Ve-laj. On the maximum betweenness improvement prob-lem.
Electronic Notes in Theoretical Computer Science ,322:153–168, 2016.[Demaine and Zadimoghaddam, 2010] E. D. Demaine andM. Zadimoghaddam. Minimizing the diameter of a net-work using shortcut edges. In
SWAT , pages 420–431.Springer-Verlag, 2010.[Doyle and Snell, 1984] P. G. Doyle and J. L. Snell.
RandomWalks and Electric Networks . Mathematical Associationof America, 1984.[Hoffmann et al. , 2018] C. Hoffmann, H. Molter, and M.Sorge. The parameterized complexity of centrality im-provement in networks. In
SOFSEM , pages 111–124.Springer, 2018.[Hutchinson, 1989] M. F. Hutchinson. A stochastic estimatorof the trace of the influence matrix for Laplacian smooth-ing splines.
Communications in Statistics-Simulation andComputation , 18(3):1059–1076, 1989.[Ishakian et al. , 2012] V. Ishakian, D. Erd¨os, E. Terzi, andA. Bestavros. A framework for the evaluation and man-agement of network centrality. In
SDM , pages 427–438.SIAM, 2012. [Izmailian et al. , 2013] N. Sh Izmailian, R. Kenna, andFY Wu. The two-point resistance of a resistor network:a new formulation and application to the cobweb net-work.
Journal of Physics A: Mathematical and Theoret-ical , 47(3):035003, 2013.[Klein and Randi´c, 1993] D. J Klein and M. Randi´c. Re-sistance distance.
Journal of Mathematical Chemistry ,12(1):81–95, 1993.[Kunegis, 2013] J. Kunegis. Konect: the koblenz networkcollection. In
WWW , pages 1343–1350. ACM, 2013.[Kyng and Sachdeva, 2016] R. Kyng and S. Sachdeva.Approximate Gaussian elimination for Laplacians-fast,sparse, and simple. In
FOCS , pages 573–582. IEEE, 2016.[L¨u et al. , 2016] L L¨u, D Chen, X Ren, Q Zhang, Y Zhang,and T Zhou. Vital nodes identification in complex net-works.
Physics Reports , 650:1–63, 2016.[Meyer, 1973] C. D. Meyer, Jr. Generalized inversion ofmodified matrices.
SIAM Journal on Applied Mathemat-ics , 24(3):315–323, 1973.[Meyerson and Tagiku, 2009] A. Meyerson and B. Tagiku.Minimizing average shortest path distances via shortcutedge addition.
APPROX , pages 272–285. Springer-Verlag2009.[Nemhauser et al. , 1978] G. L. Nemhauser, L. A. Wolsey,and M. L. Fisher. An analysis of approximations for maxi-mizing submodular set functions.
Mathematical Program-ming , 14(1):265–294, 1978.[Newman, 2005] M. E. J. Newman. A measure of between-ness centrality based on random walks.
Social Networks ,27(1):39–54, 2005.[Newman, 2010] M. E. J. Newman.
Networks: An Introduc-tion . Oxford University Press, 2010.[Olsen, 2010] M. Olsen. Maximizing PageRank with newbacklinks. In
ICAC , pages 37–48. Springer-Verlag, 2010.[Parotsidis et al. , 2016] N. Parotsidis, E. Pitoura, and P.Tsaparas. Centrality-aware link recommendations. In
WSDM , pages 503–512. ACM, 2016.[Peng and Spielman, 2014] R. Peng and D. A Spielman. Anefficient parallel solver for SDD linear systems. In
STOC ,pages 333–342. ACM, 2014.[Plemmons, 1977] R. J. Plemmons. M-matrix characteriza-tions. I nonsingular M-matrices.
Linear Algebra and itsApplications , 18(2):175–188, 1977.[Spielman and Srivastava, 2011] D. A. Spielman and N. Sri-vastava. Graph sparsification by effective resistances.
SIAM Journal of Computing , 40(6):1913–1926, 2011.[Stephenson and Zelen, 1989] K. Stephenson and M. Zelen.Rethinking centrality: Methods and examples.
Social Net-works , 11(1):1–37, 1989.[White and Smyth, 2003] S. White and P. Smyth. Algo-rithms for estimating relative importance in networks. In