[PDF] A study on the performance of similarity indices and its relationship with link prediction: a two-state random network case

Abstract

Similarity index measures the topological proximity of node pairs in a complex network. Numerous similarity indices have been defined and investigated, but the dependency of structure on the performance of similarity indices has not been sufficiently investigated. In this study, we investigated the relationship between the performance of similarity indices and structural properties of a network by employing a two-state random network. A node in a two-state network has binary types that are initially given, and a connection probability is determined from the state of the node pair. The performance of similarity indices affects the number of links and the ratio of intra-connections to inter-connections. Similarity indices have different characteristics depending on their type. Local indices perform well in small-size networks and do not depend on whether the structure is intra-dominant or inter-dominant. In contrast, global indices perform better in large-size networks, and some such indices do not perform well in an inter-dominant structure. We also found that link prediction performance and the performance of similarity are correlated in both model networks and empirical networks. This relationship implies that link prediction performance can be used as an approximation for the performance of the similarity index when metadata for node types are unavailable. This relationship may help to find the appropriate index for given networks.

Full PDF

aa r X i v : . [ phy s i c s . s o c - ph ] S e p A study on the performance of similarity indices and itsrelationship with link prediction: a two-state randomnetwork case

Min-Woo

Ahn

Department of Physics, Pohang University of Scienceand Technology, Pohang, 37673, Republic of Korea

Woo-Sung

Jung ∗ Department of Physics, Pohang University of Scienceand Technology, Pohang, 37673, Republic of KoreaDepartment of Industrial and Management Engineering,Pohang University of Science and Technology,Pohang, 37673, Republic of Korea andAsia Paciﬁc Center for Theoretical Physics, Pohang, 37673, Republic of Korea bstract Similarity index measures the topological proximity of node pairs in a complex network. Nu-merous similarity indices have been deﬁned and investigated, but the dependency of structure onthe performance of similarity indices has not been suﬃciently investigated. In this study, we in-vestigated the relationship between the performance of similarity indices and structural propertiesof a network by employing a two-state random network. A node in a two-state network has bi-nary types that are initially given, and a connection probability is determined from the state ofthe node pair. The performance of similarity indices aﬀects the number of links and the ratio ofintra-connections to inter-connections. Similarity indices have diﬀerent characteristics dependingon their type. Local indices perform well in small-size networks and do not depend on whetherthe structure is intra-dominant or inter-dominant. In contrast, global indices perform better inlarge-size networks, and some such indices do not perform well in an inter-dominant structure.We also found that link prediction performance and the performance of similarity are correlatedin both model networks and empirical networks. This relationship implies that link predictionperformance can be used as an approximation for the performance of the similarity index whenmetadata for node types are unavailable. This relationship may help to ﬁnd the appropriate indexfor given networks.

PACS numbers:Keywords: Complex network, Link prediction, Similarity index, AUC value ∗ E-mail: [email protected] . INTRODUCTION Complex network analysis has been an important framework for understanding the un-derlying structure of many systems. Social interactions [1–4], economic relations [5, 6], andbiological and technical systems [7–10] are composed of complex networks and are analyzedvia complex network analysis. Understanding their structure via empirical data has previ-ously been investigated. However, obtained networks are diﬃcult to observe in their rawstructure because of their huge size. Thus, network parameters such as clustering coeﬃcientand average path length are employed as representative parameters for the whole networkstructure [11–13], or centrality measures such as betweenness centrality [14] and eigenvectorcentrality [15] are used as characteristics of individual nodes. Similarity index is such a typeof measure; it measures the structural proximity of a node pair.The similarity index is employed to ﬁnd similar nodes in empirical networks, such asnode pairs that have a similar function in an empirical system (e.g., synonym extractionfrom a word network [16] or extraction of protein pairs for a similar function from protein-protein interaction network [17]). Also, similarity indices are applied to link prediction,which makes a priority list for missed connections in an empirically observed network [20,25]. As a practical usage, they are also applied to recommendation systems [28, 29]. Thereare many similarity indices; therefore, we should employ a proper index that performs wellin a given network. However, it is diﬃcult to deﬁne which similarity index performs well inan empirical system without node metadata. Therefore, the selection of a proper similarityindex in a given network is a challenging problem.There have been some trials to verify the performance of similarity indices from modelnetworks. Leicht et al. employed a stratiﬁed model network to investigate the performanceof a similarity index [16]. They deﬁned a new similarity measure called the LHN index andtested it in model networks where connection probability is determined from the diﬀerencein age of node pairs, which is allocated as a 10-integer value. They tested their index inthis model network and observed a negative correlation between age diﬀerence and similarityvalues. In addition, they measured Pearson correlation between age diﬀerence and similarityvalues as a accuracy metric. However, their model study is hard to generalize because therelation between node property and similarity is not linear. Thus, other metrics may be moreappropriate. Furthermore, a wider variety of conditions should be considered to investigate2erformances of similarity indices in various situations.In this study, we investigated the relationship between the performance of the similar-ity index and structural property of a model network. We employed a two-state randomnetwork, which is comprised of nodes with binary type and their connection probabilitydepending on the type of node pair. The structure of this model network reﬂects nodetype information. We examined how well a similarity index can ﬁnd the same type ofnode from structural information. In addition, we investigated a parameter related withthis performance, which is link prediction performance. We observed the relationship be-tween discrimination performance and link prediction performance for various conditionsand veriﬁed this relationship through empirical networks.

II. METHODOLOGY

1. Similarity index

We considered ten similarity indices. We used seven local indices, which uses structuralinformation of common neighbors: common-neighbor (CN) index [18], Adamic-Adar (AA)index [19], Jaccard index [20], resource allocation (RA) index [21], preferential attachment(PA) index [20], Sorensen index [22], and hub-promoted (HP) index [17]. The other threeindices were global indices, which consider the whole network structure to calculate thesimilarity of a node pair: Simrank [23], Katz index [24], and LHN index [16]. All of theemployed indices and their mathematical descriptions are listed in Table 1. Global indicesusually have tunable parameters: C in Simrank, β in the Katz index, and α in the LHNindex. These parameters give weight for longer paths, which can be arbitrarily ﬁxed values.In this study, we set C = 0 . β = 0 .

01, and α = 0 .

2. Two-state random network model

We considered a two-state random network model, which is a simpliﬁed version of thestratiﬁed model network [16] (Fig. 1). In this model, two types of node exist with N , N nodes between total N nodes with N = N + N , where node type is initially given.We consider an intra-connection to be a connection between a node pair with the sametype and an inter-connection to be those with the diﬀerent type. We set connection prob-3 ame of similarity index Mathematical deﬁnitionCommon-neighbor (CN) | Γ( x ) ∩ Γ( y ) | Adamic-Adar (AA) P z ∈ Γ( x ) ∩ Γ( y ) 1 log ( | Γ( z ) | ) Jaccard | Γ( x ) ∩ Γ( y ) || Γ( x ) ∪ Γ( y ) | Resource Allocation (RA) P z ∈ Γ( x ) ∩ Γ( y ) 1 | Γ( z ) | Preferential Attachment (PA) | Γ( x ) || Γ( y ) | Sorensen | Γ( x ) ∩ Γ( y ) || Γ( x ) | + | Γ( y ) | Hub-Promoted (HP) | Γ( x ) ∩ Γ( y ) | max ( | Γ( x ) | , | Γ( y ) | ) Simrank s ( x, y ) = C | Γ( x ) || Γ( y ) | P x ′ ∈ Γ( x ) ,y ′ ∈ Γ( y ) s ( x ′ , y ′ )Katz ( I − β A ) − − I LHN 2 mλ D − ( I − αλ A ) − D − Table 1. The mathematical deﬁnition of similarity indices. Here are the deﬁnitions of mathsymbols: Γ( x ) is the set of neighbors for the node x , A is the adjacency matrix of the givennetwork, λ is the largest eigenvalue of the adjacency matrix, m is the mean degree, and D isthe degree matrix with D ij = k i δ ij . Global indices (Simrank, Katz, and LHN) have parametersdetermining depth level of structural property: C in the Simrank, β in the Katz, and α in theLHN. In this study, we set C = 0 . β = 0 .

01, and α = 0 . p for intra-connection and q for inter-connection. ability for intra-connection as p and probability for inter-connection as q . We can obtainvarious structures by controlling p and q . When p > q , the model network shows communitystructure, and if p < q , the network is close to the bipartite network. We referred to the4 ositive NegativeClassiﬁed to Positive True Positive (TP) False Positive (FP)Classiﬁed to Negative False Negative (FN) True Negative (TN)Table 2. The confusion matrix. Data have a binary type as positive or negative in a binaryclassiﬁcation problem. A positive class corresponds to intra-connection and a negative class tointer-connection in a discrimination process. A positive class corresponds to missing link and anegative class to a nonexistent link in a link prediction process. p > q case as an intra-dominant structure and the p < q case as an inter-dominant structure.We observed the relationship between performance and three structural parameters: thenumber of nodes, the number of links, and intra-inter ratio. We deﬁne intra-inter ratio as p/q , the ratio of intra-connection probability to inter-connection probability. When intra-inter ratio p/q becomes larger or smaller than 1, node types are more strongly reﬂected intothe network structure. If p/q = 1 ( p = q ), network structure does not depend on the nodetypes.We set the number of nodes for each type as N = N = N/ p/q = 3 and 6 (intra-dominant) and p/q = 1 / / L =1000 , , , , and 2600. To investigate the size eﬀect, we set network size to N =200 , , , , and 1000 under ﬁxed mean degrees h k i = 10 and 20 and ﬁxed intra-interratio at p/q = 4 and 1 /

4. 1,000 network ensembles were considered for each condition andaverage accuracy was obtained.

3. Performance calculation

We employed the widely used area under the ROC curve (AUC) value as the accuracymetric in the binary classiﬁcation problem [26, 27]. The receiver operating characteristic(ROC) curve is a curve in 2-dimensional space, where the x-axis corresponds to the truepositive (TP) rate and the y-axis to the false positive (FP) rate:

T P rate = T PT P + F N , F P rate = F PF P + T N , (1)where

T P , T N , F P , and

F N are described in Table. 2. Pairs above a threshold of similarityindex L are classiﬁed in the positive class. We can obtain points from various L values (Fig.5 (cid:2)(cid:1) (cid:1)(cid:2)(cid:3) (cid:4)(cid:2)(cid:1)(cid:1)(cid:2)(cid:1)(cid:1)(cid:2)(cid:3)(cid:4)(cid:2)(cid:1) (cid:1) (cid:5) (cid:1) (cid:2) (cid:3) (cid:4) (cid:5) (cid:6) (cid:7) (cid:1)(cid:2)(cid:3)(cid:4)(cid:5)(cid:6)(cid:7) Pair Similarity Class

Pair 2 4 PositivePair 5 3 NegativePair 1 2 PositivePair 4 2 NegativePair 3 2 Negative (a)

Threshold (1)(2)(3)(4) (1)(2) (3) (4) (b)

Fig. 2. An example of a receiver operating characteristic (ROC) curve. (a) Sample list for ROCcurve example. Class refers to their originally allocated class (positive or negative). We sorted thelist in descending order of similarity. None of the pairs are classiﬁed into positive class when weset the threshold in (1), so TP rate and FP rate are both 0. Thus, the starting point of the ROCcurve is (0,0). We can obtain points from various thresholds: (1/2,0) from (2), (1/2,1/3) from (3),(1,1) from (4). (b) The ROC curve from obtained points.

4. Discrimination process and link prediction process

We consider the discrimination process to be a classiﬁcation of the node pairs into nodepairs with the same type and node pairs with diﬀerent type in a given network using a6 (cid:2) (cid:3)(cid:4) (cid:5) (cid:6) (cid:7)(cid:8)

Fig. 3. The example for diﬀerence between discrimination and link prediction. In the discrimi-nation, nodes 3 and 4 are considered as a positive class, but they will be treated as a nonexistentlinks (negative class) in link prediction. In addition, nodes 4 and 6 are considered as a negativeclass in discrimination, but they can be positive class in the link prediction if the connection isselected as a missing link. similarity index. The similarity indices listed in Table 1 were applied as discriminators,assuming node pairs with the same type have higher similarity than those with the diﬀerenttype. Under the assumption, we can set the node pairs with the same type as the positiveclass and those with diﬀerent type as the negative class (Table 2). Then, we can treat thisproblem as a binary classiﬁcation problem. We listed all of the node pairs in a given networkand sorted the list in descending order of similarity (Fig. 2 (a)).Link prediction estimates probable missing links between nonadjacent pairs in the re-mained network from the priority list [25,26]. The goal of link prediction is to ﬁnd missinglinks from nonadjacent pairs in the remaining network using topological information of theremaining network. Usually, there are no metadata for missed connections; therefore, wecreated missing links in the original network by randomly selecting links and deleting themto estimate link prediction performance [26]. To test the methodology, a network was di-vided into the remaining network ( E t , training set) and missing links ( E p , probe set) with E t ∪ E p = E and E t ∩ E p = φ for missing link creation. After division, nonadjacent pairs inthe remaining network can be classiﬁed into two types: missing link (they are connected inthe original network) and nonexistent link (they are not connected in the original network).Then, we can treat the link prediction problem in the same way as the binary classiﬁca-tion problem, assuming that missing links have higher similarity than nonexistent links [25,26]. Similar to the discrimination process, we listed all nonadjacent pairs in the remaining7etwork and sorted the list in descending order of similarity. For the measurement of linkprediction performances, we considered 200 independent missing link creation steps. 5% oflinks were removed and considered as missing links.The two processes are quite similar because they both employ a similarity index andmake a priority list for discrimination, but they are diﬀerent in detail (Fig. 3). In thediscrimination process, we calculated similarity from the original network structure andconsider all of the node pairs. However, the link prediction process deletes some links tocreate missing links, and similarity is calculated on the remaining network. In addition, linkprediction only considers nonadjacent pairs in the remaining network. Therefore, the samenode pair can be classiﬁed into diﬀerent classes. Positive pairs in the discrimination processcan be negative pairs in the link prediction process and vice versa (Fig. 3).We can employ the same similarity indices for both the discrimination process and the linkprediction process. To reveal the relationship between them, we obtained the performanceof them for each index. We observed a scatter plot for each pairs performance of indices forone sample model network. III. RESULTS AND DISCUSSION

1. Correlation between discrimination performance and structural parameters

We observed a positive correlation between link density and discrimination performancein both intra-dominant and inter-dominant structures (Fig. 4). This result is related to thesparsity problem in collaborative ﬁltering, which comes from a small amount of availablestructural information of the small size data [29]. Intra-inter ratio is also important factorfor discrimination performances. Discrimination performances are increased when ratio isfar from 1, with the same number of links. Similarity indices perform better when thestructural gap between intra-connection and inter-connection becomes larger, regardless ofwhether intra-dominant or inter-dominant. However, some global indices (Katz and LHN)show poor performance in inter-dominant structures. These indices are based on the numberof paths between node pairs. Connected pairs has higher similarity than unconnected pairsbecause of decay factor of these indices. Thus, there are more node pairs with the diﬀerenttype with higher similarity in ’inter-dominant condition’, leading to poor performances.8

000 1400 1800 2200 26000.50.60.70.80.91.0

CN AA RA Jaccard Sorensen HP PA Simrank Katz LHN

AUC va l u e Number of link

AUC va l u e Number of link (d)(c) (b)(a)

Fig. 4. Discrimination performance for (a) p/q = 3, (b) p/q = 6, (c) p/q = 1 /

3, and (d) p/q = 1 / N = 200. Generally, performance increased with increased numbers of links andincreased intra-inter ratio p/q , except for PA in all cases and some global indices (Katz and LHN)in inter-dominant cases. Some local indices that showed similar performance are represented bythe same symbols. All indices show negative correlation with network size, but global indices are less depen-dent on the network size in intra-dominant structures. Previous studies reported that globalindices perform better than local indices because of their higher computational complexity[20, 21]. However, in intra-dominant condition, local indices show similar performance andeven exceed the performance of some global indices in dense, small-size networks. Never-theless, global indices can be a better choice in large, sparse networks. The performance ofglobal indices exceeds that of all the local indices in a large network size (Fig. 5), and theirdiﬀerence becomes larger with lower mean degree. However, although global indices arebetter than local indices with a large network size, absolute performances of them are lowerthan with a small network size. Lack of structural information may aﬀect the discriminationperformances. Some diﬀerent behaviors appear in inter-dominant structures. Local simiar-9

00 400 600 800 10000.50.60.70.80.91.0

CN AA RA Jaccard Sorensen HP PA Simrank Katz LHN

AUC va l u e Number of link

200 400 600 800 10000.50.60.70.80.91.0

AUC va l u e Number of link

200 400 600 800 10000.50.60.70.80.91.0

AUC va l u e Number of link

200 400 600 800 10000.50.60.70.80.91.0

AUC va l u e Number of link (d)(c) (b)(a)

Fig. 5. Discrimination performance for (a) h k i = 10, p/q = 4, (b) h k i = 20, p/q = 4, (c) h k i = 10, p/q = 1 /

4, and (d) h k i = 20, p/q = 1 /

4. In the small sized network, local similarityindices performed well, but global indices showed superior performance with larger sized networksin intra-dominant condition. Some local indices that showed similar performance are representedby the same symbols. ity indices perform better even in large, dense network. In addition, negative correlationalso holds in all of indices except the LHN index. Connected pairs have higher LHN in-dex values than unconnected pairs, therefore connected pair with diﬀerent type shows morelarger in the intra-dominant conditions. However, similarity of unconnected pair shows nomuch diﬀerent between the pair with the same type and those with diﬀerent type. Theproportion of connected pairs becomes smaller for larger network size, which causes positivecorrelation. However, this positive correlation does not mean LHN can be a good choice ininter-dominant large networks. Their performance is close to 0.5, which is worse than all ofthe local indices.Can similarity indices catch ‘similar’ nodes compared with random classiﬁer? In theintra-dominant condition, the answer is yes, except for with PA, and in the inter-dominant10 b) L i n k p r e d i c t i on p e r f o r m a n ce Discrimination performance L i n k p r e d i c t i on p e r f o r m a n ce Discrimination performance (a)

Fig. 6. Correlation between discrimination performance and link prediction performance under(a) intra-dominant and (b) inter-dominant structure. Figures are obtained from one sample of atwo-state random network with N = 200 and (a) p = 0 .

12 and q = 0 .

03 and (b) p = 0 .

03 and q = 0 . condition, the answer is partly yes. Local indices and Simrank show better performance inthe inter-dominant condition. From this result, we can deduce that some similarity indicescan generally catch similar nodes in both inter-dominant and intra-dominant structures.However, some global indices show a lower performance in the inter-dominant condition.Characteristics of similarity indices should be considered when we apply them.Some indices showed similar performances; CN, AA, and RA were similar in performance,and so to were Jaccard, Sorensen, and HP. Their performances were so close that each trioappear as a single point in Fig. 4 and Fig. 5. The mathematical deﬁnitions of these clusteredindices have some common features: CN, AA, and RA are deﬁned from the properties ofcommon neighbors, and Jaccard, Sorensen, and HP are deﬁned from the number of commonneighbors and appropriate normalization term. Their mathematical analogousness may bereﬂected in their discrimination performances.

2. Correlation between discrimination performance and link prediction perfor-mance

We discussed the relationship between discrimination performance and network parame-ters in the previous chapter, so how do we select proper similarity indices for given networkswhen we cannot access node metadata? From the relationship between discrimination per-11 etwork N N N L L Intra L Inter

Karate 34 17 17 78 67 11Adjnoun 112 58 54 425 119 306Table 3. Network properties of employed empirical networks. L intra is the number of intra-connection and L inter is the number of inter-connection. Karate shows intra-dominant structureand Adjnoun shows inter-dominant structure. formance and link prediction performance, we have revealed some clues about this problem.We calculated discrimination performance and link prediction performance for one samplemodel network for all indices. Each index can be used in both the discrimination process andlink prediction process. Then, we can obtain a performance pair for each index. We obtaineda scatter plot from them for a single ensemble of model networks with intra-dominant andinter-dominant structures, where a point in the scatter plot represents the performance pairof a similarity index.Fig. 6 shows that discrimination performance and link prediction performance are cor-related. However, their correlation relationship depends on their structural characteristics:positive correlation when intra-dominant, and negative correlation when inter-dominant.In the intra-dominant structure, intra-connections are mainly selected in the missing linkcreation process. Therefore, good discrimination performance ensures higher similarity formissing links. However, when the inter-dominant structure, inter-connections are usuallyselected in the missing link creation process. Thus, good discrimination performance leadsto lower similarity values for missing links because intra-connection has higher similarity,even in the inter-dominant structure.We veriﬁed the relationship between discrimination performance and link prediction per-formance from empirical networks with binary types. Two empirical networks are employed:the social network of a karate club in a university, where node type represents their socialgroup [31] (called Karate) and a word adjacency network from the novel David Copperﬁeld by Charles Dickens, where nodes are words and node types are noun and adjacent and anode pair is connected when both nodes are adjacent in the corpus [32] (called Adjnoun).Their network properties are listed in Table 3. A positive correlation is observed in Karate,whereas Adjnoun shows a negative correlation, which is consistent with the results of themodels. 12 .5 0.6 0.70.500.550.600.650.700.750.80 L i n k p r e d i c t i on p e r f o r m a n ce Discrimination performance L i n k p r e d i c t i on p e r f o r m a n ce Discrimination performance (a) (b)

Fig. 7. Empirical investigation of the correlation between discrimination performance and linkprediction performance for (a) Karate and (b) Adjnoun.

Discrimination performance cannot be calculated when metadata about node type areunavailable. However, link prediction performance requires only structural information ofthe network and can therefore be obtained without metadata. We expect to approximate dis-crimination performance without metadata from the linear correlation with link predictionperformance.

3. Discussion

We investigated the discrimination performance of model networks, where node typeand structural property reﬂects equivalence of node types. In the model test, results forthe relationship between network parameter and performance are also observed in the linkprediction process [30]. However, similarity indices perform well even in inter-dominantstructures except for some global indices. This observation shows that these similarityindices can detect similar nodes regardless of intra-dominant or inter-dominant structure.The characteristics of the local indices and global indices are diﬀerent. Performance ofthe local indices are clustered and shows consistent behaviors. Performance of each globalindex, however, shows diﬀerent characteristics. Therefore, a deeper understanding of thecharacteristics of a given network will be required when we use global indices. Globalindices may not a good choice when given network is close to inter-dominant structure.In addition, we observed a correlation between discrimination performance and link pre-diction performance. Link prediction performance is obtained from structural information13ith no meta-information. Therefore, link prediction performance is expected to be an ap-proximation of the discrimination performance when metadata of the node is inaccessible.However, this relationship depends on whether network structure is intra-dominant or inter-dominant. Therefore, development by prediction of correlation patterns may improve ourﬁndings as a precursor of discrimination performances.We studied a network model with binary node type as the simplest model. However,node types are usually various and can be continuous variables. A wider variety of modelsand accuracy metrics should be investigated in a future study.

IV. CONCLUSIONS

In this study, we investigated the relationship between discrimination performance andthe structural property of a two-state random network model. As a result, we found that thenumber of links and the intra-inter ratio are positively correlated with the discriminationperformances of all indices. However, their characteristics are diﬀerent in detail. Localindices show better performance in dense, small-size networks and perform well whetherintra-dominant or inter-dominant. In contrast, global indices perform better in sparse, large-size networks and some indices do not work well in inter-dominant structures. In addition,clustered behavior of some local similarity indices is observed, which reﬂects the propertiesof their mathematical deﬁnitions.We investigated the correlation between discrimination performance and link predictionperformance. They are positively correlated when networks are of intra-dominant structureand negatively correlated for inter-dominant structures. This correlation is also observed inan empirical network with binary types. Link prediction requires no metadata for node types;therefore, link prediction performance is expected to be an approximation of discriminationperformance when metadata are inaccessible.

ACKNOWLEDGMENTSREFERENCES [1] J. P. Onnela et al. , Proc. Natl. Acad. Sci. , 7332 (2007).[2] A. L. Barabasi, et al. , Phys. A: Stat. Mech. its Appl. 311, 590 (2002).

3] M. E. J. Newman, Proc. Natl. Acad. Sci. , 404 (2001).[4] R. Guimera, L. Danon, A. Diaz-Guilera, F. Giralt and A. Arenas, Phys. Rev. E , 065103(2003).[5] R. N. Mantegna, Eur. Phys. J. B , 193 (1999).[6] G. Bonanno, G. Caldarelli, F. Lillo and R. N. Mantegna, Phys. Rev. E (4), 046130 (2003).[7] D. A. Fell and A. Wagner, Nat. biotechnology 18, 1121 (2000)[8] I. Farkas, H. Jeong, T. Vicsek, A. L. Barabasi and Z. N. Oltvai, Phys. A: Stat. Mech. its Appl. , 601 (2003).[9] R. Albert, H. Jeong and A. L. Barabasi, Nature , 130 (1999)[10] A. Broder et al. , Comput. networks , 309 (2000).[11] D. J. Watts and S. H. Strogatz, Nature , 440 (1998).[12] M. E. J. Newman, SIAM review (2), 167 (2003).[13] R. Albert and A. L. Barabsi, Rev. of Mod. Phys. (1), 47 (2002).[14] L. C. Freeman, Sociometry, 35 (1977).[15] S. Brin and L. Page, Computer networks and ISDN systems (1-7), 107 (1998).[16] E. A. Leicht, P. Holme and M. E. J. Newman, Phys. Rev. E (2), 026120 (2006).[17] E. Ravasz, A.L. Somera, D.A. Mongru, Z.N. Oltvai and A.-L. Barabasi, Science , 1551(2002).[18] M. E. J. Newman, Phys. Rev. E (2), 025102 (2001).[19] L. A. Adamic and E. Adar, Social networks (3), 211 (2003).[20] D. Liben-Nowell and J. Kleinberg, J. Am. Soc. Inf. Sci. Technol. 58, 1019 (2007).[21] T. Zhou, L. Lu and Y. C. Zhang, Eur. Phys. J. B (4), 623 (2009).[22] T. Sorensen, Biol. Skr , 1 (1948).[23] G. Jeh and J. Widom, Proceedings of the ACM SIGKDD International Conference on Knowl-edge Discovery and Data Mining, 271 (2002).[24] L. Katz, Psychmetrika , 39 (1953).[25] L. Lu and T. Zhou, Phys. A: Stat. Mech. its Appl (6), 1150 (2011).[26] A. Clauset, C. Moore and M. E. J. Newman, Nature , 98 (2008).[27] T. Fawcett, Pattern Recognit. Lett. (8), 861 (2006).[28] H. Chen, X. Li and Z. Huang, Digital Libraries, 2005. JCDL’05. Proceedings of the 5thACM/IEEE-CS Joint Conference on. IEEE (2005)..

29] L. Lu et al. , Physics Reports (1), 1 (2012).[30] M. W. Ahn and W. S. Jung, Phys. A: Stat. Mech. its Appl. , 177 (2015).[31] W. W. Zachary, J. Anthropol. Res. , 452 (1977).[32] M. E. J. Newman, Phys. Rev. E , 036104 (2006)., 036104 (2006).