A novel method based on node correlation to evaluate the important nodes in complex networks
AA novel method based on node’s correlation to evaluate theimportant nodes in complex networks ∗ Pengli Lu † , Chen Dong and Yuhong Guo
1. School of Computer and Communication, Lanzhou University of Technology, Lanzhou, 730050, Gansu, China2. School of Mathematics and Statistics, Hexi University, Zhangye, 734000, Gansu, China
Abstract:
Finding the important nodes in complex networks by topological structure is of greatsignificance to network invulnerability. Several centrality measures have been proposed recentlyto evaluate the performance of nodes based on their correlation, showing that the interactionbetween nodes has an influence on the importance of nodes. In this paper, a novel method basedon node’s distribution and global influence in complex networks is proposed. Our main idea isthat the importance of nodes being linked not only to the relative position in the network but alsoto the correlations with each other. The nodes in the complex networks are classified accordingto the distance matrix, then the correlation coefficient between pairs of nodes is calculated.From the whole perspective in the network, the global similarity centrality (
GSC ) is proposedbased on the relevance and shortest distance between any two nodes. The efficiency, accuracyand monotonicity of the proposed method are analyzed in two artificial datasets and eight realdatasets of different sizes. Experimental results show that the performance of
GSC methodoutperforms those current state-of-the-art algorithms.
Keywords:
Node importance, Network topology, Global similarity centrality (
GSC ), Distribu-tion vector, Susceptible-Infected-Recovered (
SIR ) model
Complex system can be modeled or mapped as complex network structure consisting ofnodes and edges, with every vertex represents an entity and edges denote the relationshipsbetween pairs of entities. The identification of influential nodes has attracted many researchersin large and complex networks including social network, protein network, transportation network,information network and next generation network. If the influential nodes in a traffic networkor protein network lose efficacy, the entire network may occur a catastrophic failure. In socialnetwork, information network and communication network, message can be spread easily andquickly throughout the network by influential nodes [1, 2]. The variety of users’ needs leads tothe discrepancy in information transmission efficiency, so it is impossible for all the informationspread in time. The users on the corners always receive messages relatively late, which ismeaningless to them [3–5].In complex networks, finding the influential nodes which are willing to spread information isof great significance. News spreading starts from one or few users and the information diffuses ∗ Supported by the National Natural Science Foundation of China (No.11361033) and the National NaturalScience Foundation of China (No.11861045). † Corresponding author. E-mail addresses: [email protected] (
P. Lu ), [email protected] (
C. Dong ),[email protected] (
Y. Guo ). a r X i v : . [ c s . S I] F e b o friends who are closely related to or interested in it, then these friends transmit the newsto theirs friendship networks. A organization in social networks corresponds to a group ofindividuals with the same or similar backgrounds [6–8]. Take the gymnasium as an example.Keep the store’s management philosophy remains the same, the owners replace will only affectthe employees of the gym, not the members. Therefore, these news will merely generate a greatresponse among the employees rather than cause waves among the customers.Influential users play an important role in the information spreading and ranking themaccording to theirs influence capability have received much attention in recent years. In orderto find key nodes, researchers have proposed a number of centrality measures from differentperspective. The most common ways are degree centrality which only considers the node’s owntopological structure [9], betweenness centrality and closeness centrality of the shortest distancebetween nodes [10,11], and k-core decomposition centrality about the relative position of nodes inthe network [12]. However, degree centrality lacks accuracy, betweenness centrality and closenesscentrality are not applicable to large-scale networks, and k-core decomposition centrality tendsto assign nodes with different spreading capability to the same k-shell index. Therefore, theseexisting methods have been proved not to meet the current needs [13]. Local dimension centrality( LD ) [14] broken through the traditional global dimension thought pattern, which combinedwith the characteristics of the power low distribution of BA scale-free network and each node’sattribute. The main idea behind the method is that the distribution concentration of remainingnodes is related to the position of the initial node. However, LD centrality considers the node’sinfluence range but neglects the correlation between pairs of nodes. Motivated by LD centrality,we propose our method. The nodes in the network are classified by distance matrix and thepertinence between any two nodes is calculated by pearson correlation coefficient. From theglobal perspective of the network, the influence of the shortest distance and the correlationbetween any pairs of nodes on the importance of nodes are analyzed, and global similaritycentrality ( GSC ) is proposed. In this paper, we apply the proposed method to different sizesnetworks and compare it with the state-of-the-art algorithms. Experiment results show that theproposed method has better performance in efficiency, accuracy and monotonicity than otherpopular measures.The rest of the paper is organized as follows. Section 2 analyzes the existing methods of thenode importance research. In Section 3,
GSC algorithm is introduced. Experimental resultsand discussions are included in Section 4. Finally, conclusion of the paper is in Section 5.
In this section, we will briefly introduce the current progress of identifying important nodesin complex networks. A series of classic centrality measures have been proposed to evaluate thespreading capability of nodes. Degree centrality is a simple and straightforward way to measurethe importance of nodes by counting the number of neighbors [9]. However, there is a huge flawin this measure. Simply figuring up the neighboring nodes’ numbers but ignoring the importanceof the vertices themselves may be result in nodes with smaller degree being more vital than largerones. Otherwise, the relative position of nodes in complex networks is also a significant thoughtpattern. Compared with nodes in larger degree, smaller nodes are more likely to be in the keyposition of news spreading and play an important role in the whole network. This phenomena is2lso the starting point of betweenness centrality and closeness centrality [10, 11]. Based on thedefinition of h-index and the degree of each node’s neighboring nodes, T. Zhou et al. proposeda more feasible evaluation measure of node importance than degree centrality [15]. Consideringthe neighboring nodes performance can improve the accuracy of identifying important nodes, Q.Liu et al. proposed the local H-index centrality to promote the reliability of the measure [16].P.L. Lu et al. also proposed an extended H-index centrality based on local H-index centralityand clustering coefficient [17].Besides, it is also an important topic to measure the importance of nodes by decomposingthe network. Kitstak et al. proposed the k -core decomposition centrality ( KS ) to determine theimportance of nodes based on their relative positions in the network [12]. First, set the KS valueof all nodes in the network to 1, then find out all nodes with degree 1 in the network as well asremove these nodes and theirs edge relationships. Whereafter, recalculate the degree of nodesin the network, then delete the nodes with degree 1 and theirs edge relationships up to there areno nodes with degree 1 in the network. At this time, the KS value of the remaining nodes inthe network is set to 2, then the above operation is repeated through to there are no nodes withdegree 2 in the network. So on until the network is completely decomposed or there are onlyisolated nodes. The larger the KS value of a node, the closer it is to the center of the network.Considering the influence of neighboring nodes, J. Wang et al. proposed the neighborhoodcoreness centrality ( cn ), which reflected the relative distance between neighboring nodes andnetwork center [18]. In k -core decomposition centrality, the number of nodes deleted during eachstep also can reflect the performance of nodes. Mixed degree decomposition ( M DD ) considersthe variation of network topology structure in each decomposing step [19]. Qi et al. appliedlaplacian matrix and quasi-laplacian matrix to the study of node centrality in complex networkswith the knowledge of graph theory, the importance of nodes was represented by calculatingthe change of spectral energy with nodes deletion, which greatly improved the practicabilityof the method [20, 21]. For the first time, newton’s classical mechanics theorem in physics iscombined with the topological structure of complex networks to propose the newton gravitycentrality ( G ). The degree of nodes is corresponding to the mass of planets and the shortestdistance between nodes is parallelism to the radius [22]. Wang et al. proposed an improvednewton gravity centrality ( IGC ), which replaced the degree of the node to the k -core [23]. A.Namtirtha et al. further improved the newton gravity centrality and put forward a new idea,which combined the degree and core of nodes to evaluate node’s importance [24]. A. Dutta et al.analyzed the applicable network of degree centrality and k -core decomposition centrality, thencombined these two measures and proposed a new method which can be applicable to differentnetworks [25].In addition to considering the spreading capability of one node, evaluating the importance ofnodes from the network global perspective is also a widely used measure. On the basis of kirchhoffpolynomials, Z. Dai et al. proposed a spanning tree centrality method to determine importantnodes and changed the evaluation of node importance from simple networks to weighted networks[26]. On this basis, a near-linear time algorithm based on kirchhoff index is proposed to measurethe edge centrality of weighted networks, which further broadens the application range of thealgorithm [27]. In combination with the basic concept of fractal dimension in physics, Silva et al.proposed local dimension centrality to explore the nature of networks. Since each node in thenetwork has different sphere of influence, the local dimension will also change with the diverse3f the central node, which has an impact on the feasibility of the method. Therefore, Y. Denget al. improved the local dimension centrality to make the method more practicable [14]. Ourmethod is proposed based on the shortest distance and correlation between nodes to identifythe importance of nodes more accurately. Distance matrix indicates the shortest distance between node pairs in the network, and itreflects the relative position of nodes. Core nodes are located at the center of the network,and the shortest path between many node pairs will go through these nodes, therefore theshortest distance between these nodes and other nodes is relatively small. Common nodes arelocated at the nooks of the network, while the surrounding nodes are dispersed, so the length ofshorted paths are relatively large. Local dimension centrality ( LD ) combines the characteristicsof distance matrix with the power law distribution, which matches the importance of nodes withthe scale of locality of each node. The lower LD means the higher importance. In other words,the distance between the node and the core of the network also affects the importance of thenode, and nodes in the dense location are often more important than nodes in the sparse location.However, the local dimension centrality only considers the distribution of nodes and does nottake the properties of vertices as the evaluation criterion. Therefore, an accurate algorithmconsidering node’s property is certainly needed.Let G = ( V, E ) be an unweighted network with vertex set V ( G ) = { , , , .., N } and edgeset E ( G ). We define the weighted matrix W ( G ) of size N × N as follows: W ( G ) = , if i=j1 , if i and j are adjacent ∞ , if i and j are not adjacent (2.1)The distance between two nodes i, j ∈ V ( G ), denoted by d i,j , is the length of the shortestpath from node i to j . The distance matrix of G , denoted by D ( G ), is a N × N matrix with the( i, j ) − th entry being d i,j , defined as follows: D ( G ) = d , d , ... d ,N d , d , ... d ,N ... ... ... ...d N, d N, ... d N,N (2.2)Distance matrix of the network can be obtained by calculating the two-node shortest distancefrom W ( G ) by Floyd-Warshall algorithm.The maximus distance from node i to other nodes, which represents the surrounding size ofnode i , is denoted as: D i = max ( d i,j ) , j ∈ V (2.3)and the diameter D of the network is: D = max ( D i ) (2.4)After calculating the relative distance between nodes, node distribution vector and distancevector are defined based on the location of each node.4ig. 1: A simple graph. (Take node 13 as the initial node, the nodes are divided into four partsby the distance from node 13 and each shown in different colors) Definition 2.1. (Node Distribution Vector and Distance Vector)
The node distribution vector
N DV i and distance vector DV i for node i are defined as follows, where | V ki | represents thenumber of nodes in the network whose shortest distance from node i is k . N DV i = ( | V i | , | V i | , | V i | , ..., | V Di | ) (2.5) DV i = ( | V i | , | V i | , | V i | , ..., D | V Di | ) (2.6)As shown in Fig. 1, we take node 13 as the initial node and divide the other nodes inthe network into four levels. The distance between nodes 10, 11, 12 and 13 are 1, the distancebetween nodes 5, 7, 8 and 13 are 2, the distance between nodes 1, 4, 6, 9 and 13 are 3, the distancebetween nodes 2, 3 and 13 are 4, and D is 4. We can represent the distribution vector of node13 as N DV = (3 , , , N DV = (3 , , , N DV = (5 , , , DV = (5 , , , DV = (3 , , , DV = (3 , , , P i,j and D i,j , respectively. P i,j describes the similarity between nodes in node distribution andis calculated using the traditional pearson correlation coefficient formula, while D i,j improvespearson correlation coefficient according to the distance distribution of nodes, which calculatesthe correlation between the distance distribution and the average shortest distance of nodes.By counting the number of nodes on each distance and calculating the difference between thedistance of two nodes and the average shortest distance, the similarity of topological structurebetween pairs of nodes is reflected and the relative position of nodes in the network can be5xpressed. The specific formulas are as follows: P i,j = D (cid:88) k =1 ( N DV ki − N DV i ) · ( N DV kj − N DV j ) (cid:118)(cid:117)(cid:117)(cid:116) D (cid:88) k =1 ( N DV ki − N DV i ) · (cid:118)(cid:117)(cid:117)(cid:116) D (cid:88) k =1 ( N DV kj − N DV j ) (2.7) D i,j = D (cid:88) k =1 (cid:34) NDV ki × (cid:32) DV ki NDV ki − ( DV i × DN − (cid:33)(cid:35) · (cid:34) NDV kj × (cid:32) DV kj NDV kj − ( DV j × DN − (cid:33)(cid:35)(cid:118)(cid:117)(cid:117)(cid:116) D (cid:88) k =1 (cid:34) NDV ki × (cid:32) DV ki NDV ki − ( DV i × DN − (cid:33)(cid:35) · (cid:118)(cid:117)(cid:117)(cid:116) D (cid:88) k =1 (cid:34) NDV kj × (cid:32) DV kj NDV kj − ( DV j × DN − (cid:33)(cid:35) (2.8) where N DV ki denotes the k − th element value of vector N DV i , and N DV i is the mean value ofthe vector N DV i , DV ki and DV i are represented as the element value and mean value of vector DV i . The results of Eq.(3.7) and Eq.(3.8) are between − P , = D (cid:88) k =1 ( N DV k − N DV ) · ( N DV k − N DV ) (cid:118)(cid:117)(cid:117)(cid:117)(cid:117)(cid:116) D (cid:88) k =1 ( N DV k − N DV ) · (cid:118)(cid:117)(cid:117)(cid:117)(cid:117)(cid:116) D (cid:88) k =1 ( N DV k − N DV ) ∼ = 0 . D , = D (cid:88) k =1 (cid:20) NDV k × (cid:18) DV k NDV k − ( DV × DN − (cid:19)(cid:21) · (cid:20) NDV k × (cid:18) DV k NDV k − ( DV × DN − (cid:19)(cid:21) (cid:118)(cid:117)(cid:117)(cid:117)(cid:117)(cid:116) D (cid:88) k =1 (cid:20) NDV k × (cid:18) DV k NDV k − ( DV × DN − (cid:19)(cid:21) · (cid:118)(cid:117)(cid:117)(cid:117)(cid:117)(cid:116) D (cid:88) k =1 (cid:20) NDV k × (cid:18) DV k NDV k − ( DV × DN − (cid:19)(cid:21) ∼ = 0 . the correlation coefficient between node 7 and node 13 are P , = D (cid:88) k =1 ( N DV k − N DV ) · ( N DV k − N DV ) (cid:118)(cid:117)(cid:117)(cid:117)(cid:117)(cid:116) D (cid:88) k =1 ( N DV k − N DV ) · (cid:118)(cid:117)(cid:117)(cid:117)(cid:117)(cid:116) D (cid:88) k =1 ( N DV k − N DV ) = 0 and D , = D (cid:88) k =1 (cid:20) NDV k × (cid:18) DV k NDV k − ( DV × DN − (cid:19)(cid:21) · (cid:20) NDV k × (cid:18) DV k NDV k − ( DV × DN − (cid:19)(cid:21) (cid:118)(cid:117)(cid:117)(cid:117)(cid:117)(cid:116) D (cid:88) k =1 (cid:20) NDV k × (cid:18) DV k NDV k − ( DV × DN − (cid:19)(cid:21) · (cid:118)(cid:117)(cid:117)(cid:117)(cid:117)(cid:116) D (cid:88) k =1 (cid:20) NDV k × (cid:18) DV k NDV k − ( DV × DN − (cid:19)(cid:21) ∼ = 0 . thus, node 8 plays more active role in news spreading of node 7 in the network than node 13. Definition 2.2. (Global Similarity Centrality)
The global similarity centrality consists of two6lgorithm: Ranking nodes on the basis of cumulative centrality
01 Input : G = ( V, E )
02 Output : A ranking list of nodes’ importance
03 Begin Algorithm04
Floyd-Warshall algorithm is used to calculate the shortestdistance between nodes and the diameter of the graph G
05 For i=1 to | V | Calculate
N DV i and DV i using Eq. (3 .
5) and
Eq. (3 .
07 End for08 For i=1 to | V | Set
GSC i =0
10 For j=1 to | V | Calculate P i,j and D i,j using Eq. (3 .
7) and
Eq. (3 . According to the value of P i,j and D i,j to use Eq. (3 . N C i,j GSC i = GSC i + N C i,j
14 End for15 End for16
Sort the nodes in descending order based on
GSC values toobtain the ranking list
17 End Algorithm parts, and it is defined as follows:
N C i,j = − P i,j d i,j + (1 + D i,j d i,j ) , P i,j >
01 + P i,j d i,j + (1 + D i,j d i,j ) , P i,j <
01 + D i,j d i,j , P i,j = 0 (2.9) GSC i = (cid:88) v j ∈ V N C i,j (2.10)The formula in this section consists of two parts: the distance clustering coefficient of nodes,and the global correlation of node i . When the distance coefficient is minus, node j has anegative effect on the spreading ability of node i , which affects the propogating of node i in thenetwork. Therefore, 1 + P i,j is used to accurately calculate the clustering coefficient betweennode i and node j . At the same time, considering the different influence capability betweennodes, coefficient d i,j and D i,j also have positive effect on the whole algorithm, and 1 + D i,j d i,j isused to control the influence of distance between nodes on the proposed method.Algorithm provides an idea of the proposed method which contains specific calculation detailsof each step. Floyd-Warshall algorithm is used in line 4 to calculate the distance matrix and thediameter of graph G , lines 5-7 use Eq. (3 .
5) and
Eq. (3 .
6) to calculate the node distribution vectorfor each node, the correlation coefficient of node i and the other nodes in the network in lines8-15 through formula Eq. (3 .
7) and
Eq. (3 .
8) to calculate, then compute node’s
GSC . Finally,7he nodes are sorted by the value of the
GSC . The time complexity of the Floyd-Warshallalgorithm is O ( | V | ), the rest of the proposed measure is O ( | V | + | E | ).When evaluating the importance of nodes, the proposed method first defines the distributionvector and distance vector of nodes according to the structure of the network, then calculates thesimilarity degree between pairs of nodes with pearson correlation coefficient and the importanceof nodes are based on the node’s correlation. Compared with the existing global clustering coef-ficient algorithm, the proposed algorithm has made the improvement on the clustering method.We consider the network structure, and also based on the similarity degree between nodes, re-divided the nodes from the perspective of propagation. The measure can determine the node’sspreading capability more accurately, which make up the shortcomings of the global clusteringcoefficient algorithm for only considering the single parameter. In this section, to evaluate the proposed method, we compare it with a series of currently pop-ular algorithms, including: K-Shell decomposition centrality ( KS ) [12], neighborhood corenesscentrality ( cn ) [20], H-index centrality ( H ) [15], Local H-index centrality ( LH ) [16], Newton’sgravity centrality ( G ) [24], Improved Newton’s gravity centrality ( IGC ) [25], K-shell hybridmethod (
Ksh ) [26], Weighted k-shell degree neighborhood centrality (
Ksd ) [27], Betweennesscentrality ( BC ) [10], Closeness centrality ( CC ) [11], Eigenvector centrality ( EC ) [48] and Pager-ank centrality ( P A ) [49]. Then, these methods are used in eight real-world datasets and twoartificial datasets. The networks used in this paper are all undirected networks, and the al-gorithms are not experimented in directed networks. Real-world datasets including network ofmutual relations between club employees and customers (Karate) [28], Lusseau’s Bottlenose Dol-phins social network (Dolphins) [29], the network of selling political books about the presidentialelection in Amazon during 2004 (Polbooks) [30], the schedule network of major league soccerclubs (Football) [31], a network of collaborative relationships among jazz musicians (Jazz) [32],American airlines flight route network (USAir) [33], Rovira Virgili university E-mail messagenetwork between teachers and students (Email) [34], a network of interrelationships betweenproteins (Yeast) [35]. In artificial network datasets, including Small-World network (WS) [36]and Lancichinetti-Fortunato-Radicchi network (LFR-2000) [37], both sets of these datasets aregenerated by software Gephi. The specific parameters of the datasets are shown in Table 1.Table 1: specific parameters of the datasets.Network | N | | E | Average number Maximum degree β th β AssortativityKarate 34 78 4.588 17 0.129 0.13 -0.4756Dolphins 62 159 5.129 12 0.147 0.15 -0.0436Polbooks 105 441 8.400 25 0.0838 0.09 -0.1279Football 115 613 10.661 12 0.0932 0.10 0.1624Jazz 198 2742 27.967 100 0.026 0.03 0.0202USair 332 2126 12.81 139 0.0225 0.03 -0.2079Email 1133 5451 9.622 71 0.0535 0.06 0.0782WS 2000 6012 6.021 11 0.1559 0.16 -0.0563LFR-2000 2000 4997 9.988 39 0.0477 0.05 -0.0032Yeast 2361 7181 6.083 65 0.0600 0.07 -0.04898 .1 Discrimination capability
In this experiment, we will study the discriminating ability of ranking lists generated byinvolved measures from the aspects of monotonicity and resolution [38, 39]. In order to betterevaluate the performance of nodes and calculate the capability of different measures to distin-guish the importance of nodes, researchers applied monotonicity to assess the ability of differentmeasures about distributing the spreading efficiency of nodes in social networks. The formulafor monotonicity is as follows: M ( A ) = (cid:18) − (cid:80) a ∈ A | X | a × ( | X | a − | X | × ( | X | − (cid:19) (3.1)where A is the ranking list of one measure, | X | is the total nodes number of A , | X | a is the numberof nodes in level a. The range of monotonicity is [0,1]. The better the measure’s discriminationability, the bigger the value of monotonicity is. Experimental results are shown in Table 2.Involved methods are applied to different networks for comparison, the results show that themeasure which considers the performance of neighboring nodes ( cn , LH ) can better reveal thediscrimination ability of nodes than only a single node ( KS , H ), and the proposed method GSC indicates the best performance while the existing algorithms
Ksd , BC and EC also performwell.In order to further compare the ability of different methods to distinguish node importance,the second part of the experiment uses the cumulative distribution function ( CDF ) curve to rep-resent the resolution of these methods. A represents the ranking list generated by one measure,while the CDF of A represents the probability that the element in A is less than or equal to agiven value. In other words, the slower the curve rises, the higher the resolution of the method,and the better it is to distinguish the importance of nodes. Fig. 2 compares the CDF curvesof the ranking list generated by different algorithms including
GSC . Experimental results showthat the proposed method has best performance in distinguishing node importance.Table 2: The M value of ranking list generated by different measures in different networks. Network M(KS) M(cn) M(H) M(LH) M(G) M(IGC) M(Ksh) M(Ksd) M(BC) M(CC) M(EC) M(PA) M(GSC)Karate 0.4958 0.8526 0.5766 0.8925 0.9334 0.9577 0.9334 0.9542 0.7754 0.8993 . . . . Polbooks 0.4949 0.9641 0.7067 0.9821 0.9982 0.9993 0.9993 . . . Football 0.0003 0.4218 0.2349 0.9190 0.8626 0.9903 0.8626 0.9994 . . . Jazz 0.7944 0.9982 0.9383 0.9982 0.9995 0.9995 . . USair 0.8114 0.9628 0.8335 0.9856 0.9942 0.9949 0.9943 . . Email 0.8089 0.9839 0.8584 0.9899 0.9996 0.9998 . . . WS 0.0002 0.6085 0.2904 0.9155 0.9757 0.9982 0.9799 0.9998 . . . LFR-2000 0.0385 0.9789 0.7184 0.9927 0.9997 0.9998 0.9998 . . . Yeast 0.6643 0.9458 0.6873 0.9686 0.9959 0.9964 0.9963 0.9964 0.7012 0.9964 0.7210 0.9916 . In this experiment, we will compare the accuracy between the ranking lists obtained bydifferent measures and the real spreading capability of nodes. In order to acquire the performanceof nodes, we simulated the spreading process of nodes in the traditional epidemic spreadingmodel, then calculated the correlation between the results and ranking lists obtained by differentalgorithms. Susceptible-Infected-Recovered (
SIR ) model has become the most popular epidemicspreading model because of its simple principle and wide range of applications, it also has beenapplied to different articles [40–43]. 9ig. 2: The
CDF curve of all measures on Dolphins, Football, USAir, WS networks.In standard
SIR model, every node has only three different states: susceptibility ( S ), infec-tion ( I ), and recovery ( R ). In order to obtain the spreading capability of each node, we onlyset one node to the infected state at the beginning of the experiment, while all the remainingnodes are set to the susceptible state. In each time period, the infected nodes will spread toall the susceptible nodes which connected to them with probability α , and these nodes will alsorecover with probability β after being infected. After the experiment, the number of nodes inthe recovery state is defined as the real spreading capability of the nodes. The above experimentis repeated for 1000 times, so that all nodes of the network can obtain the spreading capacityrange of nodes and take the average value as the final result. The threshold of the network isdefined as β th =
1) (3.2)where R a , R b are the numbers of concordant and discordant pairs, n is the number of all pairs.Table 3 shows the correlation at a certain point between the node’s real spreading capabilityand ranking lists generated by involved algorithms. It is obvious that the proposed measure10ig. 3: The influence of the change of infection rate on the accuracy of different methods in fourdata sets including Jazz, Email, LFR-2000 and Yeast. GSC has the best performance in 9 of the 10 experimental datasets, while the cn algorithmhas the best performance in Football network, and KS , BC and P A show the worst effect inall networks due to the limitations of the algorithm. These results reflect the superiority of theproposed method over the other state-of-the-art algorithms.Table 3: The kendall τ value of each method in 10 networks with a given β value. Network KS cn H LH G IGC Ksh Ksd BC CC EC PA GSCKarate 0.5799 0.6789 0.6219 0.7079 0.7580 0.7838 0.7472 0.7972 0.5433 0.6626 0.8245 0.3535 . Dolphins 0.7363 0.8275 0.8420 0.8678 0.7499 0.8091 0.5810 0.7984 0.5900 0.6175 0.6132 0.5948 . Polbooks 0.7196 0.8143 0.7946 0.8507 0.7505 0.7713 0.6196 0.7628 0.3646 0.3715 0.5818 0.4516 . Football 0.1320 0.4931 0.3897 . . USair 0.7550 0.8462 0.7580 0.8478 0.7532 0.7782 0.4633 0.8232 0.5590 0.7805 0.8361 0.3710 . Email 0.8218 0.8631 0.8401 0.8840 0.8359 0.8533 0.6854 0.8161 0.8210 0.8190 0.8517 0.5747 . WS 0.1239 0.6701 0.5227 0.6515 0.6255 0.6384 0.4932 0.6373 0.6052 0.5872 0.6235 0.4657 . LFR-2000 0.4049 0.7004 0.6795 0.7065 0.6614 0.6571 0.5360 0.6811 0.6843 0.7033 0.7157 0.6278 . Yeast 0.7553 0.8231 0.7604 0.8492 0.7983 0.8108 0.5835 0.7703 0.6301 0.5653 0.7270 0.3046 . Otherwise, we research the accuracy of the algorithm in the
SIR model under differentinfection rates. Taking four networks of different sizes as an example, Fig. 3 expresses thecorrelation curve between the ranking lists and real spreading ability of nodes. In the experimentnetworks, with the increasing of β , the proposed method is more accurate than other methods.Especially near the threshold β th , the accuracy reaches the peak. The performance of theexisting algorithms is equal to GSC measure in the comparison of discriminating ability, whilethese measures are far less than
GSC in accuracy.11able 4: Top-10 nodes ranked by different centrality methods in five real-world networks andthe simple graph network.Rank Karate DolphinsKS cn H LH GSC KS cn H LH GSC1 34 1 34 34 1 60 15 52 15 382 33 34 33 1 34 58 46 51 46 153 31 3 14 3 3 55 38 46 38 464 14 33 3 33 33 53 34 41 34 345 9 2 1 2 9 52 21 38 21 516 8 4 31 4 14 51 30 34 30 417 4 32 24 14 32 48 41 30 52 228 3 14 9 9 2 46 52 25 51 199 2 9 8 32 4 44 58 22 41 3010 1 24 4 24 31 43 2 21 19 17Rank Polbooks FootballKS cn H LH GSC KS cn H LH GSC1 101 9 74 9 9 115 105 84 68 682 100 13 85 85 13 114 89 74 54 83 92 85 74 13 85 113 68 68 89 34 87 4 83 74 74 112 54 54 16 545 85 73 77 31 31 111 16 50 3 896 84 74 76 73 4 110 8 48 8 167 83 31 75 4 67 109 7 47 7 1058 80 67 73 67 73 108 6 33 105 79 77 48 67 76 12 107 4 16 2 110 76 41 48 75 75 106 3 8 1 4Rank Jazz Simple graph(Fig.1)KS cn H LH GSC KS cn H LH GSC1 172 100 100 100 100 13 7 13 7 72 168 8 8 8 8 12 4 12 13 43 158 4 4 4 4 11 13 11 11 124 131 131 131 131 131 10 12 10 12 115 130 80 129 80 80 8 11 7 10 106 129 129 80 129 194 7 10 8 4 137 106 5 53 5 129 6 5 6 8 88 105 32 5 194 5 5 8 5 5 69 104 194 194 53 53 4 6 4 6 110 103 84 69 69 69 3 3 3 3 511 102 69 130 32 162 2 2 2 2 312 100 85 85 162 32 1 1 1 1 213 98 53 84 77 59 9 9 9 9 912 .3 Similarity
In the last experiment, disparate measures will generate diverse ranking lists because ofconsidering the different aspects of network topology structure, so we use the number of samehigh-order vertices in each list to determine the similarity between the methods [47]. Thenumbers of same nodes increases the credibility of the measure, while the unique nodes in the
GSC list will bring significant changes to the spreading process. Experimental results are shownin Table 4. In karate network, KS , cn , H and LH algorithms have high matching degree with GSC measure, and the number of the same nodes is 9, 9, 8, 9, respectively. In the small-scalenetworks, the number in the Dolphins network is 2, 6, 7, 8, the number in the Polbooks networkis 1, 8, 6, 9, the number in the Football network is 0, 9, 4, 9, and the number in the Jazz networkis 3, 10, 10, 12. KS algorithm gradually weakens with the increase of network size, while theother three algorithms are still similar to GSC . In the simple graph Fig. 1, compared withthe other four algorithms, the proposed measure further details the importance of nodes in thenetwork, and better shows the performance of nodes in the network.
How to identify and select users to efficiently spread information has become one of the mostconcerned research topics. In order to achieve this goal, finding the influential nodes is the widelyused method. In this paper, a new method is proposed to evaluate the importance of nodes incomplex networks: classifying nodes based on distance matrix and combining the correlationbetween nodes, then applying the global clustering coefficient of networks to the research ofnode importance. Through extensive experiments on both artificial networks and real-worldnetworks, compared our algorithm with the current popular algorithms, we demonstrate that thethe proposed method has a better performance in accuracy, similarity, discrimination capabilityand other aspects, and which is valuable and significant for the further research.
References [1] J. Heidemann, M. Klier, F. Probst, Online social networks: A survey of a global phenomenon, Comput.Netw, 56(18): 3866-3878, 2012.[2] A. Bozorgi, H. Haghighi, M.S. Zahedi, M. Rezvani, Incim: A community-based algorithm for influencemaximization problem under the linear threshold model, Inf. Process. Manage, 52(6): 1188-1199, 2016.[3] W. Chen, C. Wang, Y. Wang, Scalable influence maximization for prevalent viral marketing in large-scale social networks, in: Proceedings of the 16th ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining, ACM, pp. 1029-1038, 2010.[4] Z. Yu, C. Wang, J. Bu, X. Wang, Y. Wu, C. Chen, Friend recommendation with content spread enhancementin social networks, Inform. Sci, 309: 102-118, 2015.[5] A. Sheikhahmadi, M.A. Nematbakhsh, A. Shokrollahi, Improving detection of influential nodes in complexnetworks, Physica A, 436: 833-845, 2015.[6] A. Sheikhahmadi, M.A. Nematbakhsh, A. Zareie, Identification of influential users by neighbors in onlinesocial networks, Physica A, 486: 517-534, 2017.[7] R.M. Bond, et al., A 61-million-person experiment in social influence and political mobilization, Nature489(7415): 295, 2012.[8] M.-E.G. Rossi, F.D. Malliaros, M. Vazirgiannis, Spread it good, spread it fast: Identification of influentialnodes in social networks, in: Proceedings of the 24th International Conference on World Wide Web, ACM,pp. 101-102, 2015.[9] L.C. Freeman, Centrality in social networks conceptual clarification, Soc. Netw, 1(3): 215-239, 1978.[10] L.C. Freeman, A set of measures of centrality based on betweenness, Sociometry, 40(1): 35-41, 1977.
43] W.R. Knight, A computer method for calculating Kendall’s tau with ungrouped data, J. Amer. Statist.Assoc, 61(314): 436-439, 1966.[44] M. Jalili, M. Perc, Information cascades in complex networks, J. ComplexNetw, 5(5): 665-693, 2017.[45] A. Buscarino, L. Fortuna, M. Frasca, V. Latora, Disease spreading in populations of moving agents, Euro-phys. Lett, 82(3): 38002, 2008.[46] R. Pastor-Satorras, A. Vespignani, Epidemic dynamics and endemic states in complex networks, Phys. Rev.E, 63(6): 066117, 2001.[47] J. Zhao, Y.C. Wang, Y. Deng, Identifying influential nodes in complex networks from global perspective,Chaos, Solitons and Fractals, 133, 109637, 2020.[48] S. Brin, L. Page, The anatomy of a large-scale hypertextual web search engine, Comput. Netw. ISDN Syst,30(1-7): 107-117, 1998.[49] X. Zhang, J. Zhu, Q. Wang, H. Zhao, Identifying influential nodes in complex networks with communitystructure, Knowl. Base Syst, 42: 74-84, 2013.43] W.R. Knight, A computer method for calculating Kendall’s tau with ungrouped data, J. Amer. Statist.Assoc, 61(314): 436-439, 1966.[44] M. Jalili, M. Perc, Information cascades in complex networks, J. ComplexNetw, 5(5): 665-693, 2017.[45] A. Buscarino, L. Fortuna, M. Frasca, V. Latora, Disease spreading in populations of moving agents, Euro-phys. Lett, 82(3): 38002, 2008.[46] R. Pastor-Satorras, A. Vespignani, Epidemic dynamics and endemic states in complex networks, Phys. Rev.E, 63(6): 066117, 2001.[47] J. Zhao, Y.C. Wang, Y. Deng, Identifying influential nodes in complex networks from global perspective,Chaos, Solitons and Fractals, 133, 109637, 2020.[48] S. Brin, L. Page, The anatomy of a large-scale hypertextual web search engine, Comput. Netw. ISDN Syst,30(1-7): 107-117, 1998.[49] X. Zhang, J. Zhu, Q. Wang, H. Zhao, Identifying influential nodes in complex networks with communitystructure, Knowl. Base Syst, 42: 74-84, 2013.