Identifying effective multiple spreaders by coloring complex networks
Xiang-Yu Zhao, Bin Huang, Ming Tang, Hai-Feng Zhang, Duan-Bing Chen
aa r X i v : . [ phy s i c s . s o c - ph ] O c t epl draft Identifying effective multiple spreaders by coloring complex net-works
Xiang-Yu Zhao , , Bin Huang , Ming Tang , Hai-Feng Zhang , and Duan-Bing Chen Web Sciences Center, University of Electronic Science and Technology of China, Chengdu 611731, P. R. China School of Computer Science and Technology, University of Science and Technology of China, Hefei 230027, P. R.China School of Applied Mathematics, Chengdu University of Information Technology, Chengdu 610225, P. R. China School of Mathematical Science, Anhui University, Hefei 230601, P. R. China Department of Communication Engineering, North University of China, Taiyuan, Shan’xi 030051, P. R. China
PACS – Networks and genealogical trees
PACS – Structures and organization in complex systems
PACS – Dynamics of social systems
Abstract –How to identify influential nodes in social networks is of theoretical significance, whichrelates to how to prevent epidemic spreading or cascading failure, how to accelerate informationdiffusion, and so on. In this Letter, we make an attempt to find effective multiple spreaders incomplex networks by generalizing the idea of the coloring problem in graph theory to complexnetworks. In our method, each node in a network is colored by one kind of color and nodes withthe same color are sorted into an independent set. Then, for a given centrality index, the nodeswith the highest centrality in an independent set are chosen as multiple spreaders. Comparingthis approach with the traditional method, in which nodes with the highest centrality from the entire network perspective are chosen, we find that our method is more effective in accelerating thespreading process and maximizing the spreading coverage than the traditional method, no matterin network models or in real social networks. Meanwhile, the low computational complexity ofthe coloring algorithm guarantees the potential applications of our method.
Introduction. –
Spreading phenomenon is ubiqui-tous in nature, which describes many important activitiesin society [1]. Examples include the propagation of in-fectious diseases, the dissemination of information (e.g.,ideas, rumors, opinions, behaviors), and the diffusion ofnew technological innovations. With the advancementof complex network theory, spreading dynamics on com-plex networks have been intensively studied in the pastdecades. Many studies have revealed that the spread-ing process is strongly influenced by the network topolo-gies [2, 3].An important issue in analyzing complex networks is toidentify the most influential nodes in a spreading process,which is crucial for developing efficient strategies to controlepidemic spreading, or accelerate information diffusion.For this reason, more and more attentions have been paidto identify the most influential nodes in networks [4–10]. (a) [email protected] (b) [email protected]
Many centrality indices have been proposed, such as, de-gree centrality (defined as the degree of a node) [11], be-tweenness centrality (measured by the number of timesthat all shortest paths travel through the node) [12], eigen-vector centrality (defined as the dominant eigenvector ofthe adjacency matrix) [13], neighborhood centrality (de-fined as the average connectivity of all neighbors) [14] andcloseness centrality (reciprocal of the sum of the lengths ofthe geodesic distance to every other node) [15]. RecentlyKitsak et al . proposed a k -core decomposition to identifythe most influential spreaders, which is found to be betterthan the degree centrality index in many real networks [5].However, most of these methods measure the influence ofeach node from the viewpoint of entire network, whichmay be particularly suitable to the case in which singlespreader of information is considered (i.e., only one nodeis selected as the initial spreader) [16, 17]. Many times,the spreading processes of rumors, ideas, opinions, or ad-vertisements may initiate from different spreaders. In thisp-1iang-Yu Zhao et al. case, the traditional methods that only select the nodes inthe top of one certain ranking (e.g., the ranking obtainedby ordering the nodes according to the degree centrality)may be not the optimal strategy since these chosen nodescannot be dispersively distributed [18]. Thus, how to prop-erly choose the multiple initial spreaders is an importantand challenging problem. To this end, apart from the con-sideration for the influence of each node, we need to makethe chosen spreaders being sufficiently dispersive to ensurethat the information can quickly diffuse.Motivated by the above reasons, in this Letter, we pro-pose a method to detect the effective multiple spread-ers which can enhance the spreading processes. For thismethod, the independent sets with disjointed nodes areobtained by coloring a network at first, and then the nodeswith the highest centrality in an independent set are cho-sen as the initial spreaders. By implementing extensivelysimulations on network models as well as the real net-works, we find that our method can effectively enhancethe spreading process. More importantly, the computa-tional complexity of our method is O ( N ) when the sizeof network is N , which further ensures the potential ap-plicability of this method.The remainder of this paper is organized as follows. InSec. II, we first describe the details of our method. Thenwe present the main results in Sec. III. Finally, we sum-marize the conclusions in Sec. IV. Methods. –
The four-color theorem is the most fa-mous theorem in the graph coloring problem, which statesthat, given any a plane graph, no more than four colorsare required to color the regions of the plane graph sothat no two adjacent regions have the same color [19–21].In other words, the number of colors for all vertexes ina plane graph is not greater than four. The graph col-oring problem has a wide range of applications, includ-ing the problem of the wireless channel allocation [22],the problem of the scheduling [23, 24], and so on. Here,we generalize the idea of the graph vertex coloring prob-lem to complex networks to obtain the effective multiplespreaders . The main steps are as follows. We first colora given network G = ( V, E ) ( V denotes the set of nodesand E denotes the set of edges) with the Welsh-Powell al-gorithm [24] [see below] and each node in set V is coloredby one kind of color. Secondly, sorting the nodes with thesame color into the same subset V i , i = 1 , , · · · , K (eachsubset is called an independent set, K denotes the numberof colors being used to color the network), which ensuresthat V = V ∪ V · · · ∪ V K and V i ∩ V j = φ, ∀ i = j , where φ is an empty set. As nodes with the same color are notdirectly connected, the distance between any two nodes inan independent set will not be smaller than two. Lastly,we choose the nodes with highest centrality index in anindependent set as the initial spreaders. To ensure thatthere are sufficient nodes from which to choose, we givepriority to those large independent sets with more nodes,especially the largest independent set with the maximum nodes in this Letter.Though there are many graph coloring algorithms,an ideal algorithm should have the qualities: the timecomplexity is low and the number of colors to color thenetwork is few since many real networks have huge sizes.In view of this, we choose the Welsh-Powell algorithm,whose time complexity is O ( N ) [25]. For a given network G = ( V, E ) with V = { v , v , · · · , v N } , we let the colorfunction be π and the color set be C = { , , · · · , ∆ + 1 } ,where ∆ be the maximum degree of network G . Thedetails of the Welsh-Powell algorithm are [24]: Step 1: according to the degree centrality, re-rank the node set V in descending order, such that k ( v ) ≥ k ( v ) ≥ · · · ≥ k ( v N ); Step 2: let π ( v ) = 1, i = 1; Step 3: if i = N , stop; otherwise, let C ( v i +1 ) = { π ( v j ) | j ≤ i , and v j is connected by v i +1 } .Let m be the minimal positive integral of the subset C \ C ( v i +1 ) [where C \ C ( v i +1 ) is the complementaryset of the subset C ( v i +1 ) in the set C ]. Then π ( v i +1 ) = m . Step 4: let i = i + 1, and back to step 3.In the above algorithm, k ( v i ) denotes the degree of node v i , and π ( v i ) = m denotes that the node v i is colored bya color labeled m .When the multiple spreaders are selected, a spreadingmodel should be used to check the effectiveness of theproposed method. In many literatures, the susceptible–infected–recovered (SIR) epidemic model is used to sim-ulate the spreading process in networks, in which eachnode can be in one of three states: susceptible, infected,or recovered. A susceptible node is healthy and can catchthe disease from each infected neighbor with transmissionrate β , whereas an infected node becomes recovered withrecovery rate µ and is immune to the disease. In the clas-sical SIR model, each infected node can contact all of itsneighbors at per time step with transmission rate β . Inreality, at a time step, one often can contact one neigh-bor at most, taking the sex activity and the telephonemarketing activity as examples. Thus, in this Letter, weuse the SIR epidemic model based on a contact processto simulate the spreading process and measure the effec-tiveness of the proposed method [26–28]. Its worth notingthat our method can also be applied to the classical SIRmodel. The epidemic spreading process ends when thereis no infected node in the network. We define the effec-tive transmission rate λ = β/µ by fixing the recovery rate µ = 0 . Results. –
One should note that our method is com-pared with the traditional method according to one givencentrality index. Taking the degree centrality index as anexample, for our method, the nodes in the largest inde- p-2dentifying effective multiple spreaders by coloring complex networks pendent set are ranked in a descending order accordingto the degree centrality index, and then the nodes at thetop of the ranking are selected as spreaders (labeled as ISmethod). For the traditional method, all nodes are rankedaccording to the degree centrality index from entire net-work perspective, and the same amount nodes at the topof the ranking are selected as the spreaders (labeled as ENmethod).To measure the effectiveness of the IS method, we definethe relative difference of outbreak size ∆ r R as ∆ r R =( R i − R e ) /R e , where R i and R e are the final number ofrecovered nodes for the IS method and the EN method,respectively. Thus, the larger value of ∆ r R is, the bettereffectiveness of the IS method is. All results are averagedover 500 independent realizations.We first perform the Welsh-Powell algorithm toBarab´asi-Albert (BA) network with size N = 10000 andaverage degree h k i = 12 [29]. From the inset of Fig. 1(a),one can see that such a network can be successfully di-vided into different independent sets, where the numberof color is K = 8 and the node number in the largestindependent set is over 2000. Fig. 1 also compares theIS method with the EN method based on the degree cen-trality index. In general, ∆ r R is larger than 0 for thedifferent initial spreaders n and the different values oftransmission rate λ . This indicates that the IS method isbetter than the latter case in the most situations. Moreimportantly, from Fig. 1(a) one can see that the advan-tage of the IS method is the most striking when λ is nottoo small or too large. As we know, when λ is very small,the information initiated from any node can only spreadto a very small fraction of nodes. The influence regionsof multiple spreaders for these two methods scarcely over-lap each other. The outbreak size is approximately equalto the sum of multiple spreaders’ spreading coverage (i.e.,the number of infected nodes). Thus, the difference ofthe two methods cannot be distinguished obviously at asmall λ , which results in a small value of ∆ r R . With theincrease of transmission rate λ , single node can induce agreater spreading coverage. More dispersive locations ofmultiple spreaders for the IS method lead to less overlapof the influence regions, and the IS method thus performsbetter. When λ is very large, the information can diffuseto a very wide range even single node is selected as theinitial spreader. In this case, there are too many overlapinfluence regions to play the role of the IS method. InFig. 1(b), ∆ r R as a function of n displays distinct trendsfor different values of λ , which stems from the combined ef-fects of both the dispersive locations of multiple spreadersand the intricate spreading processes at different λ . Thesedistinct trends will be verified by the relative difference ofeffective contacts in Fig. 2(c) later.To maximize the spreading coverage, we hope thesespreaders not only have high centrality, but also are dis-persive enough so that a susceptible node has only one orfew infected neighbor instead of surrounded by many in-fected nodes to reduce the overlap effect of the spreaders. r R r R (a) n =50 n =200 n =500 (b) =0.05 =0.15 =0.45 n N u m be r o f node s Independent Set
Fig. 1: (Color online) For the degree centrality index, the ENmethod and the IS method are compared in the BA network.(a) the relative difference of outbreak size ∆ r R as a function ofthe effective transmission rate λ for different number of initialspreaders n ; (b) ∆ r R as a function of n for different values of λ . The inset of subfigure (a) is the size distribution of indepen-dent sets.The error bars are given by the standard deviation. As a result, if we can verify that the initial spreaders inthe IS method have a larger average distance among themand produce less overlap of spreading influence (i.e., moreeffective contacts between infected nodes and their suscep-tible neighbors) than that in the EN method, the phenom-ena in Fig. 1 will be naturally explained. For this reason,we define two metrics. One is the relative difference ofaverage distance, ∆ r D = ( D i − D e ) /D e , where D i and D e are the average distance among the initial spreadersbased on the IS method and the EN method, respectively.The other is the relative difference of effective contacts,∆ r C = ( C i − C e ) /C e , where C i and C e are the numberof effective contacts between infected nodes and their sus-ceptible neighbors based on the IS method and the ENmethod, respectively. In each time step of the transmis-sion processes, an infected node randomly chooses one ofits neighbors to transmit the information with probability β . If the chosen neighbor is susceptible, this contact isdefined as an effective contact; otherwise, it is not effec-tive. A greater number of effective contacts denotes theless overlap of spreading regions initiated from multiplespreaders. In Fig. 2, one can see that ∆ r D is always largerthan 0 [see Fig. 2(a)]. The reason can be explained as fol-low: the distance between any two nodes in an indepen-dent set is greater than or equal to 2, while the nodes withhighest centrality are connected more easily in the BA net-work. In Figs. 2(b) and 2(c), ∆ r C is generally larger than0, too. Moreover, comparing Figs. 2(b) and (2)(c) withFigs. 1(a) and 1(b), respectively, one can observe that thep-3iang-Yu Zhao et al. r C r C r D Degree n (a) n =50 n =200 n =500 (c) (b) n =0.05 =0.15 =0.45 Fig. 2: (Color online) For the degree centrality index, the rel-ative difference of average distance ∆ r D and the relative dif-ference of effective contacts ∆ r C are plotted to explain thephenomena in Fig. 1. (a) ∆ r D vs. n , (b) ∆ r C vs. λ for differ-ent values of n , (c) ∆ r C vs. n for different values of λ . Thedetail definitions of the two metrics are given in the main text. effects of λ and n on ∆ r D and ∆ r C are similar to that on∆ r R . Thus, the reason for the advantage of the IS methodis well explained. To be specific, the greater ∆ r D inducesthe greater ∆ r C and then results in the greater ∆ r R .In Fig. 3, we further compare the IS method with the ENmethod based on five commonly used indices–degree cen-trality, betweeness centrality, closeness centrality, eigen-vector centrality, and neighborhood centrality. Since the k -core decomposition can not quantify the relative influ-ence of nodes in the BA network, this index is not con-sidered here [9]. As in Fig. 1, the results in Fig. 3 in-dicate that the IS method is more effective than the ENmethod for all cases. In particular, the advantage of the ISmethod is the most remarkable for the betweeness central-ity index. The positive ∆ r D and ∆ r C for different casesshown in Fig. 4 can explain that the IS method is superiorto the EN method for different centrality indices. Mean-while, Figs. 4(a) and 4(b) show that the methods based onthe degree and betweeness centrality indices can generatethe largest values of ∆ r D and ∆ r C , leading to the highestefficiency in enhancing the spreading coverage in Fig. 3.The time evolutions of the relative difference of outbreaksize ∆ r R ( t ) = [ R i ( t ) − R e ( t )] /R e ( t ) for different indicesare also shown in Fig. 5. As shown in Figs. 5(a) [ n =200 and λ = 0 .
15] and 5(b) [ n = 500 and λ = 0 . r R ( t ) is generally larger than 0. It meansthat, compared with the EN method, the IS method cannot only extend the spreading coverage but also speed up the spreading process. Moreover, Fig. 5(a) illustrates that∆ r R ( t ) monotonously increases with time step t when thevalues of n and λ are small. Nonetheless, when the values r R (a) r R Degree Betweenness Closeness Eigenvector Neighborhood (b) n Fig. 3: (Color online) Comparing the IS method with the ENmethod based on five indices (i.e., degree, betweeness, close-ness, eigenvector, and neighborhood) in the BA network. (a)∆ r R vs. λ at n = 200, (b) ∆ r R vs. n at λ = 1 . of n and λ are large, ∆ r R ( t ) increases with t at firstand then decreases to a stable level [see Fig. 5(b)]. Forthe latter case, the information begins to diffuse from theinitial spreaders in the early stages, the IS method ensuresthese multiple spreaders are more dispersive, leading tothe better effectiveness of the IS method. With the furtherincrease of t , the information will diffuse to a wide range ofnetwork and the influence regions of the multiple spreadersare more likely to overlap each other, the advantage of theIS method will thus be weakened.Finally, we use two real networks–Blogs network [30]and Email network [31]. To further confirm the effective-ness of our method, where some basic structural featuresof the two networks are given in Table 1. In Fig. 6, ∆ r R as a function of λ and n for six centrality indices are pre-sented. Note that k -core index is also considered, besidesthe five indices shown in Fig. 3. No matter the Blogs net-work [Figs. 6(a) and 6(b)] or the Email network [Figs. 6(c)and 6(d)], the IS method is more effective in enhancing thespreading process than the EN method. In order to fur-ther verify our method, other independent sets such as thesecond largest one are also investigated. As expected, allsimulations reveal the same conclusion. Summary. –
Even though great improvement hasbeen made in the research of identifying influential spread-ers, there are still many problems needed to be solved,among which how to find multiple effective spreaders is animportant question. It is commonly believe that, to effec-tively speed up the spreading process, the selected multi-ple spreaders should be as dispersive as possible to reducethe overlap of the spreading regions initiated from mul-tiple spreaders. But, how to design an effective methodp-4dentifying effective multiple spreaders by coloring complex networks (a)
Degree Betweenness Closeness Eigenvector Neighborhood n r C n r D Degree Betweenness Closeness Eigenvector Neighborhood (b)
Fig. 4: (Color online) ∆ r D and ∆ r C as the functions of n aregiven to explain the phenomena in Fig. 3. (a) ∆ r D vs. n , (b)∆ r C vs. n . The parameter is chosen as λ = 1 . (a) t r R ( t ) t r R ( t ) (b) Degree Betweenness Closeness Eigenvector Neighborhood
Fig. 5: (Color online) For five indices, the time evolutions of∆ r R ( t ) in the BA network are plotted for (a) λ = 1 . n = 200, (b) λ = 3 . n = 500.Table 1: Basic structural parameters of Blogs network andEmail network. N is the total number of nodes, h k i denotesthe average degree, and H is the degree heterogeneity, definedas H = h k i / h k i . D is the average shortest distance, and C and r are the clustering coefficient and assortative coefficient,respectively. Network N h k i H D C r
Blogs 3982 3.42 4.038 6.227 0.146 0.133Email 1133 9.62 1.942 3.716 0.110 0.078 r R r R r R (a) r R n (b) n (c) Degree k -core Betweenness Closeness Eigenvector Neighborhood (d)
Fig. 6: (Color online) For different indices, ∆ r R in Blogs andEmail networks are given as the functions of λ and n , re-spectively. ∆ r R as a function of the values λ (a) and n (b),respectively in Blogs network; ∆ r R as a function of the values λ (c) and n (d), respectively in Email network. Detail networkinformation is summarized in Table 1. to achieve this goal is almost vacant. In this Letter, wehave proposed an effective method by selecting the nodeswith the highest centrality from an independent set asthe initial spreaders rather than from the entire network.By testing such a method on BA network and two realnetworks, we found that our method can greatly enhancethe average distance among these initial spreaders and theeffective contacts between the susceptible nodes and theinfected nodes. Therefore, our method can ensure thatthe information diffuses much wider and faster. Mean-while, the computational complexity of the coloring algo-rithm used in the paper is O ( N ), which further guar-antees its possible applications. Although the efficiencyof our method was studied from the perspective of thespreading process, it is immediately related to many otheraspects [32], including network resilience to attacks, im-munization of epidemics, commercial product promotionsin markets and other aspects, which implies the potentialapplications of our method. ∗ ∗ ∗ X.-Y. Zhao would like to thank Kai Qi for stimulat-ing discussions. This work was partially supported bythe National Natural Science Foundation of China (GrantNos. 11105025, 61473001, 11331009, 61433014) and ChinaPostdoctoral Science Special Foundation (Grant No.2012T50711).p-5iang-Yu Zhao et al.
REFERENCES[1]
Pastor-Satorras R., Castellano C., Van MieghemP. and
Vespignani A. , arXiv:1408.2701 , (2014) .[2] Newman M. E. J. , Networks - An Introduction (OxfordUniversity Press, New York) 2010.[3]
Pastor-Satorras R. and
Vespignani A. , Phys. Rev. E , (2001) 066117.[4] Kempe D., Kleinberg J. and
Tardos ´E. , Maximizingthe spread of influence through a social network in proc. of
Proceedings of the 9th ACM SIGKDD international con-ference on Knowledge discovery and data mining (ACM)2003 pp. 137–146.[5]
Kitsak M., Gallos L. K., Havlin S., Liljeros F.,Muchnik L., Stanley H. E. and
Makse H. A. , Nat.Phys. , (2010) 888.[6] Zhang J., Xu X.-K., Li P., Zhang K. and
Small M. , CHAOS , (2011) 016107.[7] Borge-Holthoefer J., Rivero A. and
Moreno Y. , Phys. Rev. E , (2012) 066123.[8] Chen D., L¨u L., Shang M.-S., Zhang Y.-C. and
ZhouT. , Physica A , (2012) 1777.[9] Zeng A. and
Zhang C.-J. , Phys. Lett. A , (2013)1031.[10] de Arruda G. F., Barbieri A. L., Rodr´ıguez P. M.,Rodrigues F. A., Moreno Y. and Costa L. d. F. , Phys. Rev. E , (2014) 032812.[11] Newman M. E. J. , SIAM Rev. , (2003) 167.[12] Freeman L. C. , Sociometry , (1977) 35.[13] Estrada E. and
Rodriguez-Velazquez J. A. , Phys.Rev. E , (2005) 056103.[14] Maslov S. and
Sneppen K. , Science , (2002) 910.[15] Sabidussi G. , Psychometrika , (1966) 581.[16] Pei S., Muchnik L., Jos´e S. Andrade J., Zheng Z. and
Makse H. A. , Sci. Rep. , (2014) 5547.[17] Liu Y., Tang M., Zhou T. and
Do Y. , arXiv:1409.5187 ,(2014) .[18] Hu Z.-L., Liu J.-G., Yang G.-Y. and
Ren Z.-M. , Eu-rophys. Lett. , (2014) 18002.[19] Bollob´as B. , Modern graph theory
Vol. 184 (Springer)1998.[20]
Appel K., Haken W. et al. , Illinois J. Math. , (1977)429.[21] Appel K., Haken W., Koch J. et al. , Illinois J. Math. , (1977) 491.[22] Riihij¨arvi J., Petrova M. and
M¨ah¨onen P. , Frequencyallocation for WLANs using graph colouring techniques. inproc. of
WONS
Vol. 5 2005 pp. 216–222.[23]
Leighton F. T. , Nat. Bur. Standard , (1979) 489.[24] Welsh D. J. and
Powell M. B. , Comput. J. , (1967)85.[25] Klotz W. , Mathematics Report , (2002) 1.[26] Castellano C. and
Pastor-Satorras R. , Phys. Rev.Lett. , (2010) 038701.[27] Yang R., Huang L. and
Lai Y.-C. , Phys. Rev. E , (2008) 026111.[28] Borge-Holthoefer J. and
Moreno Y. , Phys. Rev. E , (2012) 026116.[29] Barab´asi A.-L. and
Albert R. , Science , (1999)509.[30] Xie N. , Social network analysis of blogs
Ph.D. thesis MScDissertation. University of Bristol (2006). [31]
Guimera R., Danon L., Diaz-Guilera A., Giralt F. and
Arenas A. , Phys. Rev. E , (2003) 065103.[32] Huang B., Zhao X.-Y., Qi K., Tang M. and
Do Y. , Acta Phys. Sin. , (2013) 218902.(2013) 218902.