[PDF] Topological Centrality and Its Applications

Abstract

Recent development of network structure analysis shows that it plays an important role in characterizing complex system of many branches of sciences. Different from previous network centrality measures, this paper proposes the notion of topological centrality (TC) reflecting the topological positions of nodes and edges in general networks, and proposes an approach to calculating the topological centrality. The proposed topological centrality is then used to discover communities and build the backbone network. Experiments and applications on research network show the significance of the proposed approach.

Full PDF

aa r X i v : . [ c s . I R ] F e b Topological Centrality and Its Applications

Hai Zhuge,

Senior Member, IEEE, and Junsheng Zhang

Abstract —Recent development of network structure analysis shows that it plays an important role in characterizing complex systemof many branches of sciences. Different from previous network centrality measures, this paper proposes the notion of topologicalcentrality (TC) reﬂecting the topological positions of nodes and edges in general networks, and proposes an approach to calculatingthe topological centrality. The proposed topological centrality is then used to discover communities and build the backbone network.Experiments and applications on research network show the signiﬁcance of the proposed approach.

Index Terms —Network structure, Centrality, Community, e-Science ✦ NTRODUCTION T HE rich get richer phenomenon exists in manycomplex networks like the World Wide Web. It isknown that there are two ways for a node to becomericher: connecting to more nodes; and, connecting tomore important nodes.We observe that a node may earn more if it connects toan important node than connects to many but less importantnodes, and that both nodes and edges play an important rolein forming network centrality. Existing centrality measures focus on nodes. Theycannot explain the topological characteristic of centrality.This paper is to explore a new network centrality calledtopological centrality.Various centrality measures are deﬁned in a graph G =( V, E ) , where V is the vertex set, E is the edge set, | V | = n , and | E | = m .The authority and hub reﬂect in-degree and out-degreecharacteristics of a node in the Web respectively [1].The idea of HITS is that a good hub links to manyauthorities, while a good authority is linked by manygood hubs . Nodes with the highest authority or hub inthe Web graph act as authority centers and hub centers.The authority and hub of a node are calculated by:  a ( i ) = P ( j,i ) ∈ E h ( j ) h ( j ) = P ( i,j ) ∈ E a ( i ) , where a ( x ) and h ( x ) are the authority and hub of node x ∈ { i, j } respectively. Degree centrality describes the degree information ofeach node [2] [3]. It is based on the idea that moreimportant nodes are more active, that is, they have moreneighbors in the graph. Degree centrality can be used to • The authors are with China Knowledge Grid Research Group, Key Lab ofIntelligent Information Processing, Institute of Computing Technology,and graduate school, Chinese Academy of Sciences, 100190, PO Box2704-28, Beijing, China. ﬁnd the core nodes of a community; however, it onlyconsiders the hub characteristic and ignores the authority characteristic. Degree Centrality C D ( v ) for a vertex v iscalculated as follows: C D ( v ) = deg ( v ) n − . Calculating degree centrality for all nodes V in a graphtakes O ( n ) in a dense adjacency matrix representationof the graph. While in a sparse graph with edges E , thetime complexity is O ( m ) . Similar to the degree centrality,an approach was proposed to improve the efﬁciency ofinformation propagation in P2P network based on thein- and out-degrees of nodes [4]. Betweenness centrality describes the frequencies ofnodes in the shortest paths between two indirectly con-nected nodes [2] [5] [6]. It is based on the idea that ifmore nodes are connected via a node, then the node ismore important. Betweenness centrality can be used toﬁnd the edges between two communities in a complexnetwork. Betweenness Centrality C B ( v ) for vertex v is: C B ( v ) = X s = v = t ∈ Vs = t σ st ( v ) /σ st ( n − n − , where σ st is the number of shortest geodesic paths from s to t , and σ st ( v ) is the number of shortest geodesicpaths from s to t that pass through a vertex v . Theshortest paths between each pair of nodes in a graphcan be found by Floyd-Warshall algorithm with timecomplexity O ( n ) [7], so the time complexity of between-ness centrality is also O ( n ) . Betweenness centrality hasbeen used to study community structure of social andbiological networks [8]. Closeness centrality describes the efﬁciency of the infor-mation propagation from one node to the other nodes[2] [9] [10]. It is based on the idea that if a node canquickly reach others, then the node is central. Closenesscentrality can be regarded as a measure of how longit will take information to spread from a given vertexto other reachable vertices in the network. Closeness

Centrality is deﬁned as the mean geodesic distance (i.e.,the shortest path ) between a vertex v and all other verticesreachable from v : C c ( v ) = n − P t ∈ V \ v d G ( v, t ) , where n ≥ is the size of the network’s connectedcomponent reachable from v . Calculating the closenesscentrality for each node in the graph has time complexity O ( n ) . Eigenvector centrality describes the importance of nodesaccording to the adjacent matrix of a connected graph[11]. It assigns relative scores to all nodes in the networkbased on the principle that connections to high-scorednodes contribute more to the score of a node thanconnections to low-scored nodes.

PageRank is a variantof the eigenvector centrality measure [12].

Information centrality describes nodes’ inﬂuence on the network efﬁciency of information propagation [13]. Thenetwork efﬁciency is deﬁned by E G = P i = j ∈ G ǫ ij n ( n − n ( n − X i = j ∈ G d ij , where the efﬁciency ǫ ij in the communication betweentwo points i and j is equal to the inverse of the shortestpath length d ij . The information centrality of a vertex i is deﬁned as the relative drop in the network efﬁciencycaused by the removal from G of the edges incident with v : C I ( v ) = ∆ EE = E [ G ] − E [ G ′ v ] E , where G ′ v indicates a network by removing the edgesincident with node v from G . Information centrality hasbeen used to study the structures of communities incomplex networks [14]. OPOLOGICAL C ENTRALITY

In a dynamic network, the weights of nodes and theweights of edges will inﬂuence each other and keepchanging. Each time of inﬂuence between each pair ofnodes is called one time of iteration . If the order ofnodes’ weights keeps unchanging after many times ofiteration, the network reaches the steady state and thenodes with the highest weights are called topologicalcenters . An undirected graph may have one or moretopological centers. The number of topological centers isdecided by the graph structure. An undirected networkmay have one of the following structures.1. A network with circular structure has n ( n ≥ topological centers as shown in Fig. 1a.2. A network with symmetric structure has two topo-logical centers as shown in Fig. 1b.3. Otherwise, the network has a unique topologicalcenter as shown in Fig.1c. (b) Symmetric structure (c) General structure(a) Circular structure Fig. 1. Three types of topological structures. The darker isthe node, the higher the topological centrality is. The blacknodes are the topological centers. Networks of circularstructure have n ( n ≥ topological centers; networkof symmetric structure has topological centers; othernetworks have topological center. In an undirected graph, the length of the shortest pathbetween two nodes in a graph is the geodesic distancebetween them. Especially, if two nodes are unreachable,then their geodesic distance is + ∞ . Geodesic distancecan be used to ﬁnd the nearest topological center of anode. When a network is in the steady state, the topologicalcentrality (TC) of a node is the ratio of its weight to thelargest weight of nodes. The topological centers have thelargest weight of node . The topological centrality of an edgeis the ratio of its weight to the largest weight of node. The TC of a node reﬂects the geodesic distance from anode to its nearest topological center. The TC of an edgereﬂects the geodesic distance from the edge to its nearesttopological center. The higher is the TC of a node/edge,the closer it is to the nearest topological center.

Hypothesis 1.

The topological centrality of a node is pos-itively inﬂuenced by the topological centrality degrees of itsneighbor nodes.

Hypothesis leads to the following characteristics:1. a node connecting to nodes with higher TC degreesgets higher TC degree; and,2. a node connecting to more nodes gets higher TCdegree. Hypothesis 2.

If two nodes of an edge have higher TCdegrees, then the edge has higher TC; and, if an edge hashigher TC, then its two nodes also have higher TC degrees.

Hypothesis leads to the following characteristics:1. nodes closer to the topological center have higherTC degrees; and,2. edges closer to the topological center have higherTC degrees. These characteristics reﬂect that nodeswith higher TC degrees are incident with edgeshaving higher TC degrees.The two hypotheses can be represented by: (cid:26) ω ( n ) ↑ = ω ( n ) + P g ( ω ( link ( n, n i )) ↑ , ω ( n i ) ↑ ) ω ( l ) ↑ = f ( ω ( l s ) ↑ , ω ( l t ) ↑ ) (1) where n is a node, n i are neighbors of n , ω ( link ( n, n i )) is the weight of link between n and n i ; l is a link, l s and l t are the source and target nodes of l respectively; f and g are two functions, and ↑ means the positive correlativerelations.During the calculation process of TC degree, theweights of nodes and edges will increase after eachtime of iteration, but the descending order of weights ofnodes will converge to the steady state. The weights ofnodes can be normalized by dividing the largest weightof nodes. If the normalized weights of nodes converge,the descending order of nodes’ weights will keep un-changing, and the edges’ weights will also converge. Theconverged nodes’ weights and edges’ weights are the TCdegrees of nodes and links respectively.Normalization of weights of nodes satisﬁes the follow-ing characteristics:1. If the normalized weights of nodes converge, thenthe order of nodes by descending the weightsof nodes will also converge. The normalizationprocess does not change the order of weights ofnodes. The difference is that the weights of nodesare mapped onto the interval (0 , .2. If the normalized weights of nodes converge, theweights of edges also converge. According to thedeﬁnition of TC of an edge, the weights of edgesare the sum of the weights of its two end nodes.Since the normalized weights of nodes converge,the weights of incident edges will also converge.3. If the normalized weights of nodes converge, thenthe TC degrees of edges converge. It is also obvi-ous, because the normalization of weights of edgesis just to map the weights of edges onto the interval (0 , , and keeps the order of weights of edges.We propose the following approach to calculating theTC in a connected network. Suppose a connected graph G = ( V, E ) with n ( n > nodes and m ( m ≥ n − edges, V = v , v , . . . , v n , E = e , e , . . . , e m , and thecorresponding adjacency matrix is A . The element of A is a ij , and, a ij = (cid:26) { i, j } ∈ E { i, j } / ∈ E .

The following formula implements the iterative cal-culation of topological centrality of nodes and edges,where temp ω i and ω i are the weights of v i before andafter normalization, and temp ω e ( i,j ) and ω e ( i,j ) are theweights of edge e ( i, j ) before and after normalization,and t ≥ is the iteration time. ( temp ω ( t +1) i = ω ( t ) i + P nj =1 a ij ω ( t )) e ( i,j ) ω ( t ) j temp ω ( t +1) e ( i,j ) = temp ω ( t +1) i + temp ω ( t +1) j (2)The following formulas normalize the TC degrees of nodes and links.  ω ( t +1) i = temp ω ( t +1) i Max ni =1 temp ω ( t +1) i ω ( t +1) e ( i,j ) = temp ω ( t +1) e ( i,j ) Max mj =1 temp ω e ( i,j )( t +1) (3)The iterative calculation terminates, if the followingconditions are satisﬁed: ( P ni =1 ( ω ( t +1) i − ω ( t ) i ) < ǫ N P mj =1 ( ω ( t +1) e j − ω ( t ) e j ) < ǫ M (4)Algorithm 1 calculates the weights of nodes and linksiteratively, where M AX , ǫ N and ǫ M control the times ofiterative calculation.The time complexity of Algorithm 1 is O ( M AX ( n + m )) . At the initializing stage, all the weights of nodesare assigned to . If the weights of edges are not given,then all the weights of edges are assigned . After theﬁrst iteration, the weight of a node in next iteration is thesum of weights of its neighbor nodes and its own weight;then the weights of edges are the sum of two end nodes.The values of weights of nodes become larger comparingto the initial values. The weights of nodes and edges arenormalized by dividing the maximum weight of nodesand edges during each time of iteration.Algorithm 1 has two termination conditions: one is themaximum iteration times M AX ; the other is the squaredeviation threshold of weight difference of nodes ǫ N and the square deviation threshold of weight differenceof edges ǫ M . After Algorithm 1 stops, the nodes withweights 1 are the topological centers. The weight of anode is topology centrality, and the larger is the weightof node, the closer the node is to the nearest topologicalcenter.Table 1 makes a comparison between the topologicalcentrality and other centrality measures. TABLE 1Comparison of different centrality measures

Centrality Measure Time Complexity About Node or Edgedegree centrality O ( n ) nodebetweenness centrality O ( n ) node or edgecloseness centrality O ( n ) nodeeigenvector centrality - nodeinformation centrality O ( n ) nodetopological centrality O ( K ( n + m )) node and edge We carry out experiments on several types of network toverify the convergence of the algorithm. Fig. 2 shows theexperiment results of iterative TC calculation for nodesand links in different structured networks with differentscales: (a) Watts-Strogatz small-world network with n =1000 and m = 5000 ; (b) ring network with n = 1000 and m = 1000 ; (c) lattice network with n = 100 and m = 180 ; (d) full network with n = 30 and m = 435 ; Algorithm 1

Calculating topological centrality degrees of nodes and edges

Require: node number n , edge number m , edges like ( linknum, starN ode, endN ode, weight ) , limited iteration time M AX , deviation square limit of weight difference of nodes ǫ N , deviation square limit of weight difference oflinks ǫ M ; nodeW eight [1 ..n ] ← , count ← , nodeSum ← n, edgeSum ← m while ( count < M AX ) and (( nodeSum > ǫ N ) or ( edgeSum > ǫ M )) do oldN odeW eight [1 ..n ] ← nodeW eight [1 ..n ] oldEdgeW eight [1 ..m ] ← edgeW eight [1 ..m ] nodeW eight [1 ..n ] ← nodeW eight [1 ..n ]+ P incident edge edgeW eight ∗ nodeW eightmax ( nodeW eight ) edgeW eight ← P inciden node nodeW eightmax ( edgeW eight ) nodeSum ← P ni =1 ( nodeW eight [ i ] − oldN odeW eight [ i ]) edgeSum ← P mi =1 ( edgeW eight [ i ] − oldEdgeW eight [ i ]) count ← count + 1 end while return nodeW eight [1 ..n ] and edgeW eight [1 ..m ] (d) Ed ¨ors-R´enyi random graph with n = 1000 , p = 0 . ,and m = 10045 . Experiment results show that the TCdegrees of node and links can converge after many timesof iteration, which is related to n , m , ǫ N and ǫ M . Different centrality measures such as degree centrality,betweenness centrality, closeness centrality and infor-mation centrality are compared in [15]. Here we addtwo extra centrality measures: one is the PageRank ofnode as an instance of eigenvector centrality, the other isthe topological centrality we proposed. The comparisonis based on Fig. 3 which is a tree with vertices.Table 2 shows different centrality degrees of verticesin Fig. 3. The experiment results show the followingcharacteristics:1. Degree centrality is a local centrality, and it onlyrecords the degrees of nodes without any globalinformation. Nodes 1, 2, and 3 have degree 5, nodes7 and 12 have the degree 2, and the other nodeshave degree 1. Degree centrality is normalized bythe number of edges 15.2. Closeness centrality has similar result as informa-tion centrality. The difference is that the orders ofnodes {

1, 3 } and {

7, 12 } are different. Informationcentrality degrees of vertex 1 and 3 are larger than 7and 12. Because information centrality concentrateson the network efﬁciency. The inﬂuence on networkefﬁciency by removing 1 and 3 is larger than thatby removing 7 and 12.3. PageRank result is far from other measures. Nodes1 and 3 are two centers in PageRank, and node 2have lower PageRank than nodes 1 and 3, becausethe authority of nodes 7 and 12 are divided intotwo parts, while nodes 1 and 3 have four neighborswhich contributes all of their authority values tonodes 1 and 3 respectively. Nodes 7 and 12 havehigher rank values than nodes 9, 10 and 11, becausethey have more neighbors. 4. Betweenness centrality reﬂects the frequencies ofnodes occurring in the shortest paths between indi-rectly connected node pairs. However, betweennesscentrality has the worst resolution of nodes. Node2 has the highest betweenness centrality, nodes 1, 3,7, and 12 have higher betweenness centrality, andthe others have the same betweenness centrality 0.5. Topological centrality combines the degree infor-mation and neighbor weights information. It hasthe characteristics of degree centrality and PageR-ank. Node 2 is the topological center of the graph.Nodes 7 and 12 have higher TC degrees than nodes9, 10 and 11 because they have extra neighbors.Nodes 1 and 3 follow nodes 9, 10 and 11, andthen the left vertices. The order of node TC degreesconﬁrms the geodesic distance between nodes andthe topological centers correctly. Fig. 3. A simple case (a tree with 16 nodes) for thecomparison of centrality measures.

Here DBLP dataset is used to study the structure anddiscover communities in heterogeneous networks. It con-tains part of metadata of papers provided by DBLP inXML formats. The number of papers is , , and thenumber of citation relations is , . The heterogeneousresearch network is based on the DBLP data set. Theresource types are papers, researchers and conferences.The semantic links are authorOf between researcher and Fig. 2. Topological centrality convergence experiments (

M AX = 100 , ǫ N = ǫ M = 0 . ): the left column lists networksof several structures; the middle column lists the node convergence records (x-axis is iteration times, and y-axis isnormalized weights of nodes); and, the right column lists the link convergence records (x-axis is iteration times, and y-axis is normalized weights of links). (a) Watts-Strogatz small-world network with n = 1000 and m = 5000 , and iterationtime is ; (b) ring network with n = 1000 and m = 1000 , and iteration time is ; (c) lattice network with n = 100 and m = 180 , and iteration time is ; (d) full network with n = 30 and m = 435 , and iteration time is ; (e) Ed ¨ors-R ´enyirandom graph with n = 1000 , p = 0 . , m = 10045 , and iteration time is . TABLE 2Comparison between topological centrality and othercentrality measures v C I ( v ) C D ( v ) C C ( v ) C B ( v ) P R ( v ) log ( C T ( v )) paper, coauthor between researchers, publishedIn betweenpaper and conference/journal, and cite between papers.The research network contains , , semanticnodes and , , semantic links. The iteration timelimits are M AX = 40 and ǫ M = ǫ N = 200 . Thedistribution of TC degrees of nodes is shown in Fig. 4.It shows that nodes with lower TC degree contain moreresources than those with higher TC degree. F r equen cy Topological centrality (log())

Fig. 4. Topological centrality distributions.

PPLICATION : D

ISCOVERING R ESEARCH C OMMUNITIES

Research communities are formed by relations amongresearchers, papers, projects, and research activities. Dif-ferences between research communities and graph-basedcommunities are as follows.1. Research communities are dynamically formed byresearch activities such as applying (e.g., fundingand position), cooperating, publishing, and citing.Communities in general complex networks are viewed from connections (nodes within a com-munity are linked more densely than nodes crosscommunities).2. Research communities contain multiple types ofnodes (researchers and papers can play differentroles in research activities as discussed in [15])and relations (e.g., coauthor relation and citationrelation). There are no differences of nodes andedges in graph-based communities.Among existing centrality measures, only the PageR-ank considers the inﬂuences between neighbor nodes,and the authority of a node is divided by its neighbors.However, PageRank does not reﬂect different inﬂuencesof edges, that is, all the weights of edges are . In re-search network: collaborations between authority researchersare more important, and citations between authority papers aremore important. Topological centrality can well distinguish roles ofdifferent nodes in research network. (1) Nodes in anetwork elect the core nodes by a voting-like mechanism: a node connecting to more nodes is more probable to be thelocal core nodes.

After a certain times of iterations, thelocal core nodes and the global topological centers areelected. The topological centers are the nodes connectingto the most core nodes with higher TC degrees. (2) Edgesmay play different roles on the mutual inﬂuence betweenthe TC degrees of nodes. This conﬁrms the phenomenaof research communities: a researcher cooperating withauthority researchers will be closer to the centers of aresearch community; a paper citing (citing may not betrue) or is cited by authority papers will be more possibleto be closer to the core papers on a research topic.

Nodes can play different roles according to topologicalpositions in communities: core node , margin node , bridgenode and mediated node .1. Core nodes are usually hub or authority in thecommunity;2. Margin nodes belong to one community, and theyhave few connections to other nodes in the com-munity;3.

Bridge nodes connect to two or more communities,and they usually have equal number of connectionsto two or more communities; and,4. Other nodes except the core nodes , margin nodes and bridge nodes are mediated nodes .The proposed topological centrality can be used to dis-tinguish roles of nodes. For example, Fig. 5 contains threecommunities: C = { , , , , , } , C = { , , , , } and C = { , , , , , } . Node , and are thecore nodes of C , C and C respectively; Nodes and are bridge nodes; nodes , , and are margin nodesof C ; nodes , and are margin nodes of C ; and,nodes , , and are margin nodes of C .Nodes can be classiﬁed by TC degrees. Fig. 5. Distinguishing roles of nodes with topologicalcentrality degrees.

1. If the TC degree of a node is larger than that ofmost of its neighbors, then the node is a core node;2. If the TC degree of a node is no larger than the TCdegrees of all of its neighbors, then the node is a margin node;3. If the number of neighbors with lower TC degreesequals to the number of neighbors with higher TCdegrees, then the node is a bridge node;4. Otherwise, the node is a mediated node.Let α = L ( n ) / N ( n ) and β = H ( n ) / N ( n ) ,where n is a node, L ( n ) is the number of neighbornodes of n with TC degrees lower than n , H ( n ) is thenumber of neighbor nodes of n with TC degrees higherthan n , and N ( n ) is the neighbors of n , then role of n isdistinguished by role ( n ) =  core node α > threshold ( core ) margin node α = 0 bridge node α = βmediated node otherwise Where threshold ( core ) ∈ (0 . , controls the number ofcore nodes.A node is a core node because it connects to morenodes or more important nodes. A node is core nodeor not is decided by whether it has larger TC degreesthan its neighbors. However, the topological centers ofa connected network may be exceptions. In Fig. 5, node2 is both the topological center and a core node, butthe ellipse node in Fig. 6 is the topological center, andit is not a core node but a bridge node, although ithas higher TC degree than all of its neighbors. So it issigniﬁcant to distinguish the roles of topological centers.If the neighbors of a topological center are all core nodes,then, the topological center is a bridge node, else thetopological center is a core node.Researchers and papers may play such roles as source , authority , bee , hub and novice [15]. The source , authority ,and hub may be core nodes; bee nodes are often bridge nodes; and the novice may be the margin nodes or bridge nodes.In research network, a research group’s leader usuallyhas more publications and cooperators. Correspond-ingly, they have more coauthor relations connecting to Fig. 6. The ellipse node is a topological center, and it isnot a core node but a bridge node. other researchers in the coauthor network. If each re-search group is regarded as a community, the researchgroup’s leaders are the core nodes. The fresh studentshave few publications and cooperators, so they are themargin nodes in coauthor network. Visiting researchersand newly employed researchers are bridge nodes, be-cause they have cooperators in different research com-munities. After the core nodes, the margin nodes andbridge nodes are distinguished, the left nodes are medi-ated nodes. Usually, mediated nodes only belong to onecommunity.In citation network, core nodes are the authority orhub papers having more citations than others; the mar-gin nodes are the novice papers or newly publishedpapers; and the bridge nodes connect two or more paperclusters. Each paper cluster may belong to a speciﬁcresearch topic or discipline.Funding decision-making and research promotionneed to evaluate researchers and their papers. Topologi-cal centrality can help distinguish the roles of researchersand papers, and the roles can be used to evaluateresearchers and papers. TC degrees in the coauthornetwork help evaluate researchers, while TC degrees incitation network help evaluate papers.In research network, roles of nodes will change year byyear. In the coauthor network, a novice researcher maybecome an authority, a hub or even a bridge. With morepapers published, the TC degree of a node in a coauthornetwork will become higher than its neighbors, and thenthe researcher become an authority or hub. Cooperatingwith researchers in different research groups or evendifferent communities, a researcher becomes a bridge.

Tree in Fig. 3 can be a coauthor network or a citationnetwork with directions of edges ignored. General com-munity discovery algorithms like GN algorithm cannotdiscover its communities, because the betweenness ofeach edge is the same, and there is no way to choosethe proper edge for deletion. However, nodes in thecoauthor networks and citation networks play differentroles, and communities can be discovered according tothe roles of nodes.The roles of nodes can be used to discover commu-nities. One way is to ﬁnd the core nodes, and thenassign non-core nodes to the proper core nodes to formcommunities. Algorithm 2 discovers communities byﬁnding core nodes for each non-core node.

Algorithm 2

Finding k communities by core nodes Require: a network C ; Calculate the topological centrality degrees of nodesand links; Distinguish roles of nodes and add the core nodesinto

CoreSet ; for node x ∈ CoreSet do nodes ( x ) ← x end for for each non-core node x do Choose the nearest core nodes into

CandidateSet as the candidate nodes; for node y ∈ CandidateSet do nodes ( y ) ← nodes ( y ) ∪ x ; end for end for while | CoreSet | > k do Merge two most tightly connected communities; end while return k communities.The time complexity of algorithm 2 is O ( n ( n + m )) . Thenumber of core nodes can be controlled by setting thethreshold of L ( n ) / A ( n ) . If there are more than onecandidate core nodes, then the node should be classiﬁedinto different communities, and the bridge nodes areoften classiﬁed into several communities at the sametime.This way can discover communities globally in anetwork. If the number of communities is too many, theclosely connected communities can be merged into largercommunities. Closely connected communities may sharemany nodes and links, or there are many external con-nections between them. Suppose the number of commu-nities is k , Algorithm 3 merges communities.Another way is to ﬁnd the core nodes ﬁrst, andthen expand from a node to form local communities.According to role of nodes, the community expansionneeds to consider the following cases.1. Forming local community according to core node.Algorithm 4 is for discovering local communitiesfrom a core node. A community may have morethan one core node. If two communities share manycommon nodes and links, then the two communi-ties can be merged into a larger community. Thisway can ﬁnd the research groups in a coauthornetwork, and can ﬁnd the speciﬁc topic relatedpaper clusters in the citation network.2. Form local community according to non-core node.To ﬁnd local communities from a non-core node,it is necessary to ﬁnd the core nodes connectedto the node. Before ﬁnding communities of a non-core node, all the core nodes in the network shouldbe found ﬁrst. Then expand the local communitiesfrom the nearest core nodes connected to the non-core node respectively. Algorithm 3

Merging communities

Require: the number of communities k ; Step 1. If the number of communities is less than k ,then goto Step 4. Step 2. Calculate the Jaccard similarity of node setsof each community pair. Suppose A and B are twocommunities, Jaccard similarity of A and B is calcu-lated by Jaccard ( A, B ) = | A ∩ B || A ∪ B | . If all the Jaccard similarities of community pairsequal to , then goto Step 3; else, ﬁnd the communitypairs have the largest Jaccard similarity, and mergethem into a larger community respectively. Goto Step1. Step 3. Count the external links between communitypairs. An external link has two end nodes in twodifferent communities respectively. If all the numbersof external link set equal to , then goto Step 4; else,ﬁnd the community pairs have the maximum exter-nal links, and merge them into a larger communityrespectively. Goto Step 1. Step 4. Stop merging communities.3. Finding local community of a set of nodes. Givena set of nodes, the local community can be foundas follows.a) For each node, ﬁnd the core nodes connectedto it until the topological center is found; allthe core nodes are added to coreSet .b) Building the subgraph containing these nodesand nodes in coreSet ; and,c) Expanding the local community from thenodes in coreSet . Algorithm 4

Expanding community from a core node

Require:

A core node c and a connected network G ; nodeQueue ← { c } , nodeSet ← { c } , linkSet ← {} ; while nodeQueue = {} do Fetch a node x from nodeQueue ; for y is the neighbor node of x do Distinguish the role of y ; if ( y / ∈ nodeSet ) and ( y is not a core node) and( nodeW eight ( y ) < nodeW eight ( x ) ) then nodeQueue ← nodeQueue ∪ y ; nodeSet ← nodeSet ∪ y ; linkSet ← linkSet ∪ link ( x, y ) ; end if end for end while return linkSet .Fig. 7 shows a segment of network with TC degrees ofnodes. We can ﬁnd a local community from a core node,a non-core node, and a set of nodes as follows. Topological center Core node Non−core node...... ...ABC D EF G H IJ K LMN0.45 0.60.1 0.1 0.4 0.20.2 0.3 0.20.050.5 0.9 1 0.9

Fig. 7. A simple case for ﬁnding community: circle nodesare core nodes; square nodes are non-core nodes.

1. Finding local community of core node B . Theprocess is shown as Table 3. TABLE 3Finding local community of core node B Step Node nodeQueue nodeSet Expanded0 B B B C, D, E1 C D, E B, C2 D E B, C, D F, G, H3 E F, G, H B, C, D, E I, J4 F G, H, I, J B, C, D, E, F5 G H, I, J B, C, D, E, F, G6 H I, J B, C, D, E, F, G, H7 I J B, C, D, E, F, G, H, I8 J B, C, D, E, F, G, H, I, J

2. Finding local community of non-core node F isto ﬁnd the nearest core node D , then ﬁnd thelocal community from D . The expansion processis shown in Table 4. TABLE 4Finding local community of non-core node F Step Node nodeQueue nodeSet Expanded0 D D D, F G, H1 G H D, F, G2 H D, F, G, H

3. Finding local community of a node set { D, I, J } . D is a core node, while I and J are two non-core nodes. If D is the core node of the com-munity containing I and J , then { D, I, J } formsthe local community. However, D is not the corenode of the community containing I and J . Thepossible core nodes of the community containing D are { D, B, A } ; the possible core nodes of thecommunity containing I and J are the same, thatis, { E, B, A } . Then, we can construct the subgraphcontaining node D , I and J and their possible corenodes D , E , B and A as shown in Fig. 8.From the subgraph, we know that B the nearestcore node of the community containing D , I and ABD E IJ0.6 0.20.20.5 0.9 1

Fig. 8. Subgraph containing node D, I, J and the possiblecore nodes D, E, B and A. J . Then, we can expand from B to ﬁnd the localcommunity containing node D , I and J as men-tioned in case (1).In research network, this way can ﬁnd research teammembers of a researcher in a coauthor network and ﬁndtopic-related papers of a paper in a citation network.Given a set of papers, the coauthor relations form thecoauthor network, and the citation relations form thecitation network. After the TC degrees are calculated, theresearch groups can be discovered, and the papers can beclustered by citation relations. Researchers in the samecommunities may share the similar research interests,while papers in the same clusters are topic related.Topic-related papers can be recommended to researchershaving similar research interests. Global communitiesshow research groups and research topics in the paperset, while the local community expansion way helprecommend papers in a large paper set to appropriatereaders.When making a funding decision, it is necessary toevaluate the status of a research group, cooperators, andpublications. The discovered communities in coauthornetwork show the research groups of a research area,while the discovered communities in citation networkshow paper clusters in the research area. And, the rolesof the researcher and his/her publications can be distin-guished by TC degrees. PPLICATION : D

ISCOVERING B ACKBONE IN R ESEARCH N ETWORK

Researchers and the coauthor relation form the coauthornetwork. Coauthors of a paper formulate the motif [16] of research network. A coauthor relation from A to B means that A and B are coauthors of the same paper,and A is before B in the author list.Fig. 9 shows the structure of the coauthor network.With the directions of coauthor relations ignored, eachmotif describes the cooperation between authors of apaper: a loop for the sole author, an edge between twoauthors, a triangle for three authors, and a completegraph for n ( n > authors. Coauthor network has threelayers from local view to the global view: motif layer , module layer and global layer . Nodes’ degrees in coauthornetwork reﬂect the active degrees of researchers. The in-links reﬂect the hub characteristic, while out-links reﬂectauthority. Fig. 9. Structure of coauthor network from local view tothe global view: (1) the bottom layer contains the motifs;(2) the middle layer contains the modules combing oneor more motifs; (3) the top layer contains the networks ofmodules.

Our ﬁrst dataset collects papers of the InternationalSemantic Web Conference (ISWC) from to .The number of researchers and papers are and respectively. The number of coauthor relations is .The number of citation relationship is , and citationrelations are considered between the paper pairs bothin ISWC. The number of authorOf relations is . Fig.12 shows the node TC degrees of the largest moduleof coauthor networks with a circular layout. The centralnodes have higher TC degrees, and the topological nodeshave the highest centrality . From a topological centerto the margins, the TC degrees reduce to step by step.If the number of nodes are very huge, the TC degreesare very small, and function log () maps the TC frominterval (0 , to ( − , , and the order of node TC keepsunchanging.Fig. 10 shows the modules in coauthor network ofISWC dataset. It contains modules, researchersand coauthor relations. Fig. 11 shows the largestmodule of Fig. 10. It contains researchers and coauthor relations.The number of coauthor relations between two re-searchers reﬂects the frequency of their cooperation.Node degrees in coauthor network reﬂect the activedegrees of researchers. The in-links reﬂect the hub char- Fig. 10. Coauthor networks of ISWC data set: mod-ules, researchers and coauthor relations.Fig. 11. The largest module of coauthor networks ofISWC data set. acteristic, while out-links reﬂect authority.The density of a module is reﬂected by the frequencyof cooperation between researchers. The average coop-eration active degree between each pair of researchers,called cooperation density, can be used to assess theactive degree of a research community. Cooperationdensity is the number of coauthor relations dividing thenumber of researchers.

Theorem 1.

A module M of coauthor network has n researchers, the lower bound and upper bound ofmodule density are within the range [( n − /n, n − . Proof.

Suppose M is a connected digraph with n nodes.The lower bound of density: the number of edges is n − at least, otherwise there will be some isolatedresearchers. So the lower bound density of M is ( n − /n .The upper bound of density: if there are at most onedirected edge between two nodes, then the number of Fig. 12. The largest module of coauthor networks ofISWC data set with weights: the topological centralitydegrees are transformed by function log () . edges in M is n ( n − at most. So the upper bounddensity of M is n − . Therefore, the lower bound andupper bound of module density of module M with n nodes are within the range [( n − /n, n − . (cid:3) Citation network is a directed acyclic graph (DAG).Each paper has the ﬁxed publishing time, and paperscan only cite the papers already published, so there areno cycles in the citation network. Citation is directionsensitive, and it implies the time sequential relationshipbetween two papers. Fig. 13a shows a module of citationnetwork. Papers in the same module are topic related.Citation relations show the relevance between researchpapers, and paper communities can be discovered bycitation relations. Citations in the community show therelevance between papers, while citations between papercommunities show the relevance of research topics.Fig. 13b shows the modules in citation network ofISWC dataset. It contains modules, and the largestmodule contains papers and citation relations.All the citation relations are between papers published inISWC. The connectivity density is less than the connec-tivity density of coauthor network. The citation densityof a module reﬂects the relevance between the papers.The citation density is the number of citations dividingthe number of papers. In a network, after roles of nodes are distinguished bythe node TC degrees, core nodes and edges among themform a subgraph, called backbone network . The end nodesof edges in the backbone network are both core nodes.The backbone network consists of core nodes. It isuseful for visualization and browsing, and can play thefollowing roles in scientiﬁc research:1. It helps display the research network of differentlevels. Each community can be represented by the

Latest paperSource paperHub paperAuthoritative paper LatePublshing TimePapers in other reseearch topic Eearly (a)(b)

Fig. 13. Structure and instance of citation network: (a)The structure of citation network. (b) The citation networkof ISWC data set. core nodes in the backbone network. When a corenode is focused, the detailed information of its localcommunity can be browsed.2. It shows the important researchers in a coauthornetwork. When a research community or researchgroup is mentioned, the leaders of the communityor the head of the research group are well known.Fig. 14 shows the backbone network of the largestmodule of the coauthor networks of ISWC dataset. The threshold of the core nodes is . , andthe threshold of the margin nodes is . It containsall of the core nodes and the coauthor relationsamong them. Most of the core nodes are connected,and this veriﬁes the “rich club” phenomenon [17]:richer nodes are more possibly connected withother richer nodes. Some core nodes formulate theconnected components alone, because the bridgenodes between them are non-core nodes.3. Backbone network of coauthor network can beused to propagate information. Coauthor network is a kind of social network. Core nodes are impor-tant during the information propagation becausethey have more impact in their communities. Sup-pose an invitation of PC members needs to be sent,the researchers in the backbone network shouldtake the priority.4. Papers formulate communities via the citation re-lations, and papers in a community share the sameor relevant research topics. Core nodes are oftenimportant papers citing or are cited by more im-portant papers. The backbone network of citationnetwork helps ﬁnd the development and historyof a research area or a research topic. Core nodesand its neighbors reﬂect the main achievements atdifferent research stages.5. Paper publication venue network contains confer-ences and journals. Other research resources suchas researchers, papers and publishers connect con-ferences and journals into a connected network. Toﬁnd the citations among conferences and journals,the sub-network containing conferences, journalsand papers can be built. If a super node representsthe conference or journal containing papers, thencitation relations in the super nodes and betweendifferent super nodes can be counted. The numberof external citations reﬂects the relevance of con-ferences and journals. Fig. 14. Backbone network of the largest module ofcoauthor network of ISWC Dataset from to . Similarly, the relevance of publishers’ businesses,projects, and institutions can be analyzed. The rele-vance of publishers is reﬂected by the relevance betweenbooks and papers published by them. The relevanceamong projects is reﬂected by the cooperation betweenresearchers taking part in the projects and citations be-tween papers supported by the projects. The relevancebetween institutions can also be reﬂected by the rele-vance between researchers and papers.

Backbone networks can be used to study the develop-ment of scientiﬁc research. Backbone networks sorted by years reﬂect the evolvement of research networks. Sim-ilarly, the evolvement of backbone networks in citationnetwork, paper venue network, and institution networksetc can be studied.Fig. 15 shows the evolution of coauthor network ofISWC from to . The coauthor networks areaccumulated year by year, that is, the coauthor networkof year n (2002 ≤ n ≤ contains the coauthorrelations from year to year n .The evolvement of coauthor network reﬂects the his-tory of ISWC. More and more researchers have takenpart in the conference, while the nodes and links inbackbone networks are also changing. The followingcharacteristics in the evolvement of coauthor networkscan be discovered:1. New researchers in the coauthor network oftencooperate with the researchers that have publishedpapers in ISWC conference, because the scales ofmodules in coauthor networks become larger yearby year.2. Scientiﬁc researchers are tending to cooperate withothers. The evolution graph shows that the isolatednodes enter the connected components step bystep.3. Core researchers are tending to cooperate witheach other. The number of researchers in thelargest modules of backbone networks becominglarger and larger. This reﬂects the “rich club” phe-nomenon [17] in scientiﬁc research.4. Core researchers are active locally, and they havemore cooperators than their neighbors. The rolesof researchers in coauthor network are also chang-ing: new researchers may become core researchers,while core researchers may become middle nodesor margin nodes.5. The topological centers of the largest module arechanging. The topological centers emerge througha voting-like mechanism. Table 5 shows the topo-logical centers. TABLE 5Topological centers of coauthor networks of ISWC from to Year

ISCUSSIONS

The TC degree of a node reﬂects the geodesic distanceto the nearest topological center in the network. Thevalue of TC degree has no deﬁnite explanation, but Fig. 15. Evolvement of coauthor network of ISWC from to : each row shows the coauthor network andits backbone network; the left column shows the coauthornetwork, while the right column shows the backbonenetwork. it is different from the ranking results of PageRank.TC degrees have close relation with the authority ofnodes. Authoritative nodes have higher TC degreesthan its neighbors. The authority of a node reﬂects theimportance of a node in information propagation. TheTC degrees are explainable in communities. Core nodeshave higher TC degrees than their neighbors. Isolatedresources have less inﬂuence in the global society.Backbone networks can help study relations betweenresources of different types. Backbone network of hetero-geneous research networks connects important resourcesin a research topic and important resources may beresearchers, papers, conferences, journals, institutionsand publishers etc. This helps ﬁnd and recommendinformation. Furthermore, related information can bedisplayed by an interactive visualization based browser.In general complex networks, edges have no seman-tics. While in semantics-rich networks, edges have se-mantic relations. Weights of nodes are affected by theirneighbors, and different relations have different effects.So it is necessary to consider the inﬂuences of relationson the topological centrality calculation. Relations canbe assigned with different weights and participate theiterative calculation as shown in Eq. (5), where r is therelation of link e ( i, j ) , ω r is the weight of r that affectsthe calculation of TC in each iteration: ( temp ω ( t +1) i = ω ( t ) i + P nj =1 a ij ω r ω ( t )) e ( i,j ) ω ( t ) j temp ω ( t +1) e ( i,j ) = temp ω ( t +1) i + temp ω ( t +1) j (5)Where r is the relation of link e ( i, j ) , ω r is the weight of r , which affects the calculation of TC in each iteration.An important characteristic is that the original topo-logical centers may change when we merge two net-works into one by certain links and recalculate the topo-logical centers in the new network. For example, if wemerge the coauthor network with the citation networkby the authorOf semantic links, the topological centers ofthe new network may not be simply the sum of the topo-logical centers in the coauthor network and those in thecitation network. Recalculation of topological centers cansynthesize more relations, so this can more accuratelyevaluate nodes. For example, authors can be evaluatedby more factors (e.g., number of publications, numberof co-authors, number of citations) in the new networkthan in the old networks. If applications require to keepthe old topological centers in the new network and avoidrecalculation, we can adopt the following strategy: ﬁndthe relations (e.g., authorOf ) between the old topologicalcenters and then compose the corresponding old topo-logical centers to form new topological centers. Suchan integrated topological centers can provide semanticrelevant information services (e.g., the authority authorand his/her high impact papers can be obtained at thesame time) for applications in large network. General community discovery approaches are based onthe connections between vertices in a network. A fastcommunity discovery algorithm in very large networkwas proposed with approximate linear time complexity O ( nlog n ) , where n is the number of nodes [18]. Thegeneral methods like GN algorithm can be used todiscover communities in weighted networks by mappingthem onto unweighted networks [19].Research and learning resources form a network, andthe connections are the relations among resources. Differ-ent from the communities in general complex networks,semantic communities in the relational network werediscovered according to the roles of relations duringreasoning on relations [20].Many works are on the collaboration networks andcitation networks of scientiﬁc research. Most of themfocus on the characteristics of collaboration networks.For the structure of social science collaboration network,disciplinary cohesion from to was studied[21]. The structure of scientiﬁc collaboration networksincluding the shortest paths, weighted networks, andcentrality was studied [22] [23] [24]. Coauthor rela-tions were used to study the collaborations betweenresearchers especially the mathematician, and the dis-tribution of relations between papers of MathematicalReview against the number of authors was studied [25][26]. Relations between researchers were analyzed inEd ¨ors collaboration graph, and the shortest path lengthsbetween researchers were studied [27].Evolutions of the social networks of scientiﬁc collabo-rations in mathematics and neuro-science were studied[28]. The research result shows that the social networkof collaboration network is scale-free; and, the nodeseparation decreases with the increase of connections.Social network in academic research can be extractedfrom the webpages and paper metadata provided bythe online databases [29]; furthermore, relations amongresearchers are mined in academic social networks [30].Social structure in scientiﬁc research was studied basedon the citations [31].Citation relations between scientiﬁc papers, and thecitation distribution of papers was studied [32] [33] [34],and shows that some papers are not cited at all, mostpapers are cited once, while a little part of papers coversthe references of most papers in a research area.Resources in research networks are ranked in Ob-ject level. Research resources were ranked by popRank approach considering the mutual inﬂuences betweenrelevant resources [35]. Object based ranking approachcan help search and recommend different resources suchas papers, conferences, journals and researchers etc.Researchers and papers are often ranked in coauthornetwork and citation network respectively. A co-rankingframework of researchers and papers was proposed, inwhich researchers and papers were ranked in a hetero-geneous network combining the coauthor network andcitation network by coauthor relations [36]. Our approach is different from the existing approachesin the following aspects:1. We distinguish the roles of nodes by topologi-cal centrality, and then discover the communitiesby roles of nodes. Global communities and localcommunities are discovered based on the roles ofnodes. So our approach is based on role rather thanonly on the connections. Although the topologicalcentrality degrees of nodes and edges are calcu-lated considering connections between nodes, thetopological centrality degrees of neighbor nodeshave inﬂuences on each other at the same time. Therole based community discovery approach is ﬁt forthe research networks, and can discover communi-ties in tree-like networks that are hard to discoverby general community discovery approaches suchas GN algorithm.2. We have built the backbone networks for coauthornetworks and citation networks, and the evolutioncharacteristics of backbone networks have beenstudied. PageRank algorithm can also ﬁnd the localcore nodes, but it has no way to connect most ofthe core nodes into a backbone network, because itis hard to choose the connecting nodes between thecore nodes by the PageRank values. While topolog-ical centrality degrees of nodes can choose the corenodes and connect them into a connected backbonenetwork as more as possible, because the corenodes include the community central nodes andimportant nodes connecting different communities.The backbone network construction approach isalso based on the topological centrality. The ap-proach can be applied not only in the researchnetworks with single resource type but also thosewith multiple resource types.

ONCLUSION

This paper ﬁrst proposes the notion of topological cen-trality and the calculation approach to reﬂect the topo-logical positions of nodes and edges in a network, andthen studies its applications in discovering communitiesand building the backbone network in scientiﬁc researchnetworks. Research communities can be discovered ac-cording to the roles of nodes distinguished by topolog-ical centrality degrees. We also propose an approach tobuilding the backbone network by using the topologicalcentrality. Experiments on real research network andsimulation networks show the feasibility and effectiveof our approaches. A CKNOWLEDGMENTS

This research work was supported by the National BasicResearch Program of China (Project No. 2003CB317000). R EFERENCES [1] J. M. Kleinberg, “Authoritative sources in a hyperlinked environ-ment,”

J. ACM , vol. 46, no. 5, pp. 604–632, 1999.[2] L. Freeman, “Centrality in social networks: Conceptual clariﬁca-tion,”

Social Networks , vol. 1, no. 3, pp. 215–239, 1979.[3] J. Nieminen, “On the centrality in a graph,”

Scandinavian Journalof Psychology , vol. 15, no. 1, pp. 332–336, 1974.[4] H. Zhuge and X. Li, “Peer-to-Peer in Metric Space and SemanticSpace,”

IEEE Transactions on Knowledge and Data Engineering ,vol. 6, no. 19, pp. 759–771, 2007.[5] J. Anthonisse, “The Rush in a Graph,”

Amsterdam: University ofAmsterdam Mathematical Centre , 1971.[6] L. Freeman, “A Set of Measures of Centrality Based on Between-ness,”

Sociometry , vol. 40, no. 1, pp. 35–41, 1977.[7] S. Warshall, “A theorem on boolean matrices,”

J. ACM , vol. 9,no. 1, pp. 11–12, 1962.[8] M. Girvan and M. Newman, “Community structure in socialand biological networks,”

Proceedings of the National Academy ofSciences , vol. 99, no. 12, p. 7821, 2002.[9] G. Sabidussi, “The centrality index of a graph,”

Psychometrika ,vol. 31, no. 4, pp. 581–603, 1966.[10] S. Wasserman and K. Faust,

Social Network Analysis: Methods andApplications . Cambridge University Press, 1994.[11] P. Bonacich, “Factoring and weighting approaches to status scoresand clique identiﬁcation,”

Journal of Mathematical Sociology , vol. 2,no. 1, pp. 113–120, 1972.[12] P. Larry, B. Sergey, R. Motwani et al. , “The PageRank citationranking: Bringing order to the web,”

Online: http://citeseer. nj. nec.com/page98pagerank. html [04.06. 2003] , 1998.[13] V. Latora and M. Marchiori, “A measure of centrality based onthe network efﬁciency,”

Arxiv preprint cond-mat/0402050 , 2004.[14] S. Fortunato, V. Latora, and M. Marchiori, “Method to ﬁndcommunity structures based on information centrality,”

PhysicalReview E , vol. 70, no. 5, p. 56104, 2004.[15] H. Zhuge, “Discovery of knowledge ﬂow in science,”

Communi-cations of the ACM , vol. 49, no. 5, pp. 101–107, 2006.[16] R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, andU. Alon, “Network Motifs: Simple Building Blocks of ComplexNetworks,” pp. 824–827, 2002.[17] V. Colizza, A. Flammini, M. Serrano, and A. Vespignani, “De-tecting rich-club ordering in complex networks,”

Arxiv preprintphysics/0602134 , 2006.[18] A. Clauset, M. Newman, and C. Moore, “Finding communitystructure in very large networks,”

Physical Review E , vol. 70, no. 6,p. 66111, 2004.[19] M. Newman, “Analysis of weighted networks,”

Physical ReviewE , vol. 70, no. 5, 2004.[20] H. Zhuge, “Communities and emerging semantics in seman-tic link network: Discovery and learning,”

IEEE Transactions onKnowledge and Data Engineering , Jul 2008, iEEE computer SocietyDigital Library. IEEE Computer Society,.[21] J. Moody, “The Structure of a Social Science Collaboration Net-work: Disciplinary Cohesion from 1963 to 1999,”

American Socio-logical Review , vol. 69, no. 2, pp. 213–238, 2004.[22] M. Bordons and I. G´omez, “Collaboration Networks in Science,”

The Web of Knowledge: A Festschrift in Honor of Eugene Garﬁeld , p.197, 2000.[23] M. Newman, “The structure of scientiﬁc collaboration networks,”

Proceedings of the National Academy of Sciences , p. 21544898, 2001.[24] M. Newman, “Scientiﬁc collaboration networks. II. Shortest paths,weighted networks, and centrality,”

Physical Review E , vol. 64,no. 1, p. 16132, 2001.[25] J. Grossman and P. Ion, “On a Portion of the Well-Known Collab-oration Graph,”

CONGRESSUS NUMERANTIUM , pp. 129–132,1995.[26] G. Melin and O. Persson, “Studying research collaboration usingco-authorships,”

Scientometrics , vol. 36, no. 3, pp. 363–377, 1996.[27] V. Batagelj and A. Mrvar, “Some analyses of Erd¨os collaborationgraph,”

Social Networks , vol. 22, no. 2, pp. 173–186, 2000.[28] A. Barab´asi, H. Jeong, Z. N´eda, E. Ravasz, A. Schubert, and T. Vic-sek, “Evolution of the social network of scientiﬁc collaborations,”

Physica A: Statistical Mechanics and its Applications , vol. 311, no.3-4, pp. 590–614, 2002.[29] J. Tang, D. Zhang, and L. Yao, “Social Network Extraction ofAcademic Researchers,” in

Data Mining, 2007. ICDM 2007. SeventhIEEE International Conference on , 2007, pp. 292–301. [30] J. Tang, J. Zhang, L. Yao, J. Li, L. Zhang, and Z. Su, “Arnetminer:extraction and mining of academic social networks,” in

KDD ’08:Proceeding of the 14th ACM SIGKDD international conference onKnowledge discovery and data mining . New York, NY, USA: ACM,2008, pp. 990–998.[31] H. White, B. Wellman, and N. Nazer, “Does citation reﬂect socialstructure,”

Journal of the American Society for Information Scienceand Technology , vol. 55, no. 2, pp. 111–126, 2004.[32] D. Price, “Networks of Scientiﬁc papers,”

Nuovo Cimento , vol. 5,p. 199, 1957.[33] P. Seglen, “The Skewness of Science.”

Journal of the AmericanSociety for Information Science , vol. 43, no. 9, pp. 628–38, 1992.[34] S. Redner, “How popular is your paper? An empirical study of thecitation distribution,”

The European Physical Journal B-CondensedMatter and Complex Systems , vol. 4, no. 2, pp. 131–134, 1998.[35] Z. Nie, Y. Zhang, J.-R. Wen, and W.-Y. Ma, “Object-level ranking:bringing order to web objects,” in

Proceedings of the 14th interna-tional conference on World Wide Web (WWW 2005) . New York, NY,USA: ACM Press, 2005, pp. 567–574.[36] D. Zhou, S. Orshanskiy, H. Zha, and C. Giles, “Co-rankingAuthors and Documents in a Heterogeneous Network,” in